Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A HPC cluster is a high-performance, parallel computing infrastructure which consists of three key components: compute, network, and storageOn a cluster one can take advantage of multiple cores by running several instances of a program at once or using a parallelized version of the program.  The lilac cluster is appropriate for all types of workloads including Genomic Analysis, Artificial Intelligence and Machine Learning. It is available to all users at  MSKCC. The juno cluster is dedicated to processing Genomic pipelines and analysis and access is limited. You can find more information about the two clusters hereat http://hpc.mskcc.org/compute-accounts/

All access to the clusters is via SSH to the login nodes. From the login node you can view files and dispatch jobs to compute nodes on the private network. LSF  IBM Spectrum Scale LSF  is the job scheduler  we use to manage these jobs.  All nodes on a cluster mount a shared GPFS filesystem. Each node also has a local 1TB /scratch drive for temporary data.  We also provide special data transfer servers with optimized network connections for moving large data sets to and from the clusters. 


How do job schedulers work?

On an HPC cluster, the scheduler manages which jobs run where and when. On our clusters, you control your jobs using a job scheduling system called IBM Spectrum Scale LSF that allocates and manages compute resources for you. You can submit your jobs in one of two ways. For testing and small jobs you may want to run a job interactively. This way you can directly interact with the compute node(s) in real time. The other way, which is the preferred way for multiple jobs or long-running jobs, involves writing your job commands in a script and submitting that to the job scheduler. Our LSF documentation is at list of links

Table with a list of  common bsub parameters and their defaults

How do I find out what resources I need to request?

Running your first job

Getting information about your jobs

We will be updating or LDF documentation soon.

Where can I find a basic linux tutorial?

...

All access to the clusters is via SSH keys. We recommend that you use authentication forwarding. If you have trouble connecting add Chris’s text on debugging hereplease read the SSH page at

http://mskcchpc.org/display/CLUS/Secure+Shell+SSH

...

Your home directory has a 100G quota. High performance GFPS  Please use your lab’s data directory /data/labname for datasets and analysis. Each compute node has >1T of local scratch disk. We also have /warm storage which is lower performance and not computable. This storage is not backed up. Links to storage pages

 

Information about our storage offerings is at http://hpc.mskcc.org/data-storage/

 How do I transfer data to and from the cluster? 

...

http://mskcchpc.org/display/CLUS/questions/3211561/how-can-i-transfer-data-from-or-to-a-windows-server

Each cluster has a special data transfer (xfer) server with a faster network connection optimized for transferring data called called lilac-xfer01-mkscc.org and  and juno-xfer02.mskcc.org. To use them just SSH to them and start your data transfers from there.

Available software and specifying paths

Links to using miniconda etc

 Data can be transferred from other linux or MacOS using rsync over SSH.  

44303

Data can be transferred from  Windows or SAMBA shares using smbclient.

3211561


What software is available?

Available Software

Installing Software


How do I be a good citizen on the cluster?

Don't run compute jobs o on the login nodes.

Do not use /tmp as a scratch space and clean up any scratch data you have generated when your job finishes.

Use the data transfer servers for transfering data transferring data on on and off of the clusters.