LSF scheduler information specific to the luna cluster

By default all jobs are scheduled on the sol queue. Jobs are scheduled based on their resource requests, estimated runtime, and user priorities. You can also specify the test queue, at your own risk.

LSF host groups:

  • largeHG: t01-t02
  • internetHG: s01-s03
  • commonHGs04-s33 & u01-u26
  • testHG: u34-u35

LSF host partitions

  • largeHPt01-t02
  • commonHPs01-s33 & u01-u26
  • testHPu34-u35

Long vs. Short Jobs

  • If a job specifies estimated run time, -We, or hard run time, -W, of 59 [minutes] or less, it is considered ‘short’.
  • To run a short job you must specify -We HOURS:MINUTES or -W HOURS:MINUTES
  • Short jobs can run on all hosts. They have a hard run limit of 2 times the requested run time.
  • All jobs are that do not request a run time of 59 minutes or less are considered ‘long’.
  • Long jobs can run on 30% of commonHG hosts, and on 50% of largeHG hosts if they request enough memory to be eligible.

Large memory hosts

  • largeHG hosts are for jobs that need a lot of memory.
  • Short jobs, and long jobs that request more than 376gb of memory, can run on largeHG hosts.
  • There is no internet access on largeHG hosts.
  • Request the total amount of memory your job will use, not the fraction per job slot or thread.

Internet hosts

  • internetHG hosts are for jobs that need internet access.
  • Short jobs, as well as long jobs that request internet, can run on internetHG hosts.
  • To request an internet host use -R "select[internet]".

Service Level Agreement Guarantees

  • If you belong to an SLA group, you can specify it in your bsub command, as in bsub -sla Pipeline, to be guaranteed a certain amount of the cluster (so long as that portion isn’t in use by anyone else in the same SLA group.)
  • All SLAs have a loan policy, which lets anyone’s short jobs use idle hosts.
  • Guaranteed resources are assigned by full hosts, not CPUs/slots. If a job with -sla Pipeline is using 1 CPU/slot on a host, the entire host is reserved for Pipeline until Pipeline reaches its SLA.
  • Current SLA breakdown on luna:
    • Pipeline gets 40% of commonHG, 50% internetHG, and 50% largeHG.
    • Haystack gets 10% of CommonHG
    • Short (short jobs auto attach to this) gets 20% of commonHG if there are no priority jobs.
    • The remaining 30% of commonHG and 50% of largeHG is unallocated and available for long jobs.

Current defaults for jobs:

  • Soft memory limits -R “rusage[mem=GB] and Hard memory limits -M GB should be requested.
    • If none are requested, the default for soft is 8 GB and for hard is 16 GB.
    • If hard is requested but soft is not, soft = hard.
  • -R "span[hosts=1]" Jobs that request multiple processors span a single host.
  • -R "rusage[iounits=1]"  The maximum iounits per host is 10. IOUNITS are an arbitrary measure of the amount of reading/writing that the job incurs.
  • -We specifies expected or “soft” runtime. -W species the hard run time. Jobs are considered long if -We or -W is not specified. Anything less than 60 minutes is considered a ‘short’ job, which can run on all nodes.
    • If soft runtime (-We hour:minute) is set and hard (-W hour:minute) is not, hard runtime = 2x soft runtime.
    • If hard runtime is set, soft runtime does not need to be.
  • Long jobs are restricted to 30% of commonHG, or 50% of largeHG if you request more than 376gb of memory.
  • stdout normally goes to -o file. To redirect you must add quotes around the command to execute inside the bsub command. For example: bsub -We 1 -J jobName -o output_file.txt "ls -al 1> redirect_file.txt"
  • bsub -w is the wait option, as in bsub -w "post_done($PREV_JOBNAME)"
  • Auto-emailing is turned off, but can be enabled in the bsub command.

Use post_done to hold jobs, instead of done, which may start too quickly. If holding on multiple jobs with very similar names, -w “post_done($PREV_JOBNAME*)” should work, unless you have one. This will only let the job run if $PREV_JOBNAME job completed with exit status 0, and completed its post_done processes.

Examples

  • bsub sleep 30 This submits a basic sleep job (sleeps for 30 seconds)
  • bsub -J jobname -We 0:30 -R "select[internet]" myjob Submits job with job name “jobname” with an estimated runtime of 30 minutes, selecting for hosts with internet.
  • bsub -m commonHG -R “rusage[mem=20]” myjob Submits jobs only to hosts in host group commonHG, with 20GB mem requested

How to send an email at the end of a job:

First the user must `export LSB_JOB_REPORT_MAIL=Y` on the terminal that they are going to submit their job.
Then they use bsub -u <emailaddress@site.com> -N
The -N means email the job output file (people usually write it to a file using -o) at the end of the job. This is what the e-mail will look like.

Job was submitted from host by user in cluster .
Job was executed on host(s) , in queue , as user in cluster .
was used as the home directory.
was used as the working directory.
Started at Tue May 24 11:14:37 2016
Results reported on Tue May 24 11:14:50 2016
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
sleep 13
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 0.07 sec.
Total Requested Memory : -
Delta Memory : -
Run time : 13 sec.
Turnaround time : 14 sec.
The output (if any) follows:

*Also a NOTE before using this: If you have that LSB_JOB_REPORT_MAIL=Y exported and do not put -u or -N ( and you don’t have -o or -oo), a message gets sent to you in the terminal at /var/mail/username and is only on the host that you ran the job on. In order to change it back just export LSB_JOB_REPORT_MAIL=N after you are done! If the users DON’T there is probably a potential to flood the /var/mail directories on the hosts with junk!


LSF Rules:

Memory Request Rules:
– Both Soft memory limits -R “rusage[mem=GB] and Hard memory limits -M GB should be requested.
– If none are requested the default for soft is 8 GB and for hard is 16 GB
– If hard is requested but soft is not: soft = hard

Runtime Request Rules for short jobs:
– If soft runtime -We hour:minute is set and hard -W hour:minute is not, hard runtime = 2x soft runtime
– If hard runtime is set, soft runtime does not need to be.

Note:
– There is no hard runtime for long jobs. A job is considered long if there is no runtime specified.
– If soft mem limit is less than small host threshold (376 GB), job is long (>60 minutes), and it does not have internet requested the jobs will only be submitted to “commonHG”.
– Queue test is exempt from these rules.