The Open Saion partitions

Saion has a couple of partitions open to anybody with an OIST account.

test-gpu gives you access to a couple of GPUs for a limited time;
intel gives you access to CPU-only nodes.

These partitions are useful when you want to develop and test your code; experiment with GPU computations; or to run short jobs with less waiting time.

They are not well suited for long-running computations. If you need several hours or days for your job you may want to apply for access to a restricted GPU partition that will give you plenty of compute time and resources.

The Saion system is set up slightly different from the main cluster. Please read the Saion introduction here for the best way to organise your computations.

“test-gpu”

The “test-gpu” partition is freely available. It gives you easy access to modern GPU nodes without applying for restricted resources.

This partition is best used for short computations, including interactive use and compiling and testing code. Your allocation is 18 CPU cores and 2 GPUs, for up to 8 hours.

This partition is running on hardware that is also used for specific high-priority tasks. If such a task is started and enough free nodes aren’t available, jobs running on the “test-gpu” partition will be stopped. As your jobs can be interrupted at any time it is not suitable for important or long-running jobs.

“test-gpu” has 6 nodes with 4 NVIDIA P100 GPUs each. The layout and usage is the same as for the restricted Saion “gpu” partition. To install software on these nodes, you generally have to start a job on one of the nodes and build your software there. The login nodes don’t have the drivers or libraries needed to build GPU software.

To use a node, you ask for the “test-gpu” partition. To get an actual GPU, you also need to specify the number of GPUs with the “–gres” parameter:

$ srun -t 0-1 -c 8 -p test-gpu --mem=32G --gres=gpu:1 --pty bash -l

This gives you an interactive allocation with 32G memory, 8 cores and one GPU for 1 hour.

Note: At this time the test-gpu partition is somewhat out of date and lacks recent software and a recent CUDA version. If you need a more up to date environment, please apply for the GPU partition here.

For more details on using this partition, please refer to the regular “gpu” partition

Intel

“intel” is a set of Intel Xeon nodes, each with 40 cores and 512GB memory. We ask you to use no more memory than 120G per job.

The “intel” partition is gang scheduled. Your job will start immediately, but if too many other jobs are also running on the partition, the jobs will take turns running on the system. The time slice is 10 seconds, meaning your job runs for 10 seconds at a time, before it is suspended in favour of another job.

This system is good for shorter computations, and especially good when you want to quickly test a program. You might simply want to see if your program runs at all, or you want to run it for a few minutes to see that it does what it’s supposed to do. On the “intel” system you don’t need to wait for your turn; your jobs will usually start immediately.

The frequent interruptions mean that the “intel” partition is not suitable for interactive use. Also, if the node runs out of memory it can not accept more jobs, and further submitted applications will have to wait until memory is freed. This is the reason you are not allowed to use more than 120G per job on the “intel” partition.

Next Section: The restricted Saion partitions.