- Secure Shell SSH
- Available Software
- Installing Software
- Guidelines and Policies
- Grant Support
- Sharing Data
- Singularity Usage
- UserGroup Presentations
Slides from June 22 2017 UG: LSF_UG_1.potx
The LILAC cluster uses LSF (Load Sharing Facility) 10.1 FP8 from IBM to schedule jobs. The default LSF queue, ‘cpuqueue’, includes subset LILAC compute nodes and should be used for CPU jobs only. The gpuqueue queue should be used for GPU jobs only.
lw-gpu: lw01-02 (8xGPUs GeForce RTX 2080 Ti)
Job resource control enforcement in LSF with cgroups
LSF 10.1 makes use of Linux control groups (cgroups) to limit the CPU cores, number of GPUs and memory that a job can use. The goal is to isolate the jobs from each other and prevent them from consuming all the resources on a machine. All LSF job processes are controlled by the Linux cgroup system. Jobs can only access the GPUs which have been assigned to them. If the job processes on a host use more memory than it requested, the job will be terminated by the Linux cgroup memory sub-system.
LSF cluster level resources configuration (Apr 3 2018)
GPUs are consumable resources per host, not per slot. Job can request N CPUs and M GPUs per host, where N>M, N=M and N<M.
Memory is consumable resource in GB per slot (-n).
LSF supports a variety of job submission techniques. By accurately requesting the resources you need, you can have your jobs execute as quickly as possible on available nodes which can process them.
bsub -n 1 -R "fscratch" ...
More information on job submission and control
For more information on the commands to submit and manage jobs, please see the following page: Lilac LSF Commands
Simple Submission Script
There are default values for all batch parameters, but it is a good idea to always specify the number of threads, GPUs (if needed), memory per thread, and expected wall time for batch jobs. To minimize time spent waiting in the queue, specify the smallest wall time that will safely allow your jobs to complete.
Note that the memory requirement (
-R rusage[mem=4]) is in GB (gigabytes) and is PER CORE (
-n) rather than per job. A total of 576GB of memory will be allocated for this example job.
Submit a batch script with the bsub command:
bsub < myjob.lsf
Interactive batch jobs provide interactive access to compute resources, such as for debugging. You can run a batch-interactive using “bsub -Is”.
Here is an example command to create an interactive shell on a compute node:
bsub -n 2 -W 2:00 -q gpuqueue -gpu "num=1" -R "span[hosts=1]" -Is /bin/bash
LILAC GPUs offer several modes. All GPUs on LILAC are configured in EXCLUSIVE_PROCESS compute mode by default.
For more information on GPU resources, terminology and fine grain GPU control, please see the Lilac GPU Primer.
Requesting Different CPU and GPU Configurations
Warning: Please use -R "span[ptile=number_of_slots_per_host]" to get requested number of slots and requested number of GPUs on the same host, otherwise LSF may try to distribute the job among many hosts
bsub -q gpuqueue -n N -gpu "num=2" -R "span[ptile=2]"
LSF uses the blaunch framework (aka hydra) to allocate GPUs on execution nodes. The major versions of MPI integrate with blaunch.
bsub -q gpuqueue -I -n 4 -gpu "num=1" -R "span[ptile=2] " blaunch 'hostname; echo CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES; my_executable"
Job options and cookbook
For the set of common
bsub flags and more examples as cookbook, please see: LSF bsub flags