Use the module system

At OIST we install scientific software on the clusters as modules. Modules let us have several versions of the same software available. You can use the latest release or you can stay with a trusted older version, and you can switch between versions with a single command.

You can also create your own modules for yourself or for your unit. The details of doing that are available here.

Load Software

You use the module system to list the available modules and to load the software that you need. Use the 'module' command to interact with the module system. All commands are of the form ‘module <command>’. The longer commands can be abbreviated, usually to 2-3 characters.

In addition there is a very convenient short-form command 'ml' that you can use instead of 'module'. We will show both interchangeably below.

First, let us see what modules are available. We do that with ‘module avail’ (short for “available”) or ‘ml av’:

$ module avail
---------------------- /apps/.metamodules81 ----------------------
   amd-modules             sango-legacy-modules
   bioinfo-ugrp-modules    user-modules
   intel-modules

-------------------- /apps/.modulefiles81 --------------------
   AIMAll/19.10                hdf5.icc/1.10.6
   BUSCO/3.0.2                 hmmer/3.1b2
   BUSCO/4.0.6                 igv/2.3.82
   BUSCO/4.1.2          (D)    isoseq3/3.4.0
   Gaussian/09RE01R2           java-jdk/1.8.0_20
   Gaussian/09RE01             java-jdk/11
   Gaussian/16RC01      (D)    java-jdk/14
   HTSeq/0.9.1                 java-jdk/17
   MaterialsStudio/2016        java-jdk/21         (D)
   MrBayes.mpi/3.2.3           jellyfish/2.2.7
...

‘avail’ or ‘av’ gives us a list of software modules on the system. As you can see, each software is listed by name and version, with many packages having multiple versions. Each package version is a separate module.

We also have “metamodules”, listed at the top. Metamodules are collections of modules that belong together in some way. We will talk more about these further down.

You load a module with the ‘load’ command:

$ module load julia

The short form is simply:

$ ml julia

Let’s see what version of Julia we loaded:

$ julia --version

julia version 1.9.4

The load command will normally load the latest version of the software by default (the version marked with “(D)” in the list). If you want a specific version, you give the module name and the version separated with a slash:

$ module load julia/1.3.1
$ julia --version

julia version 1.3.1

You can see what modules you have loaded with ‘list’ or ‘li’:

$ module li
# or short form:
$ ml

Currently Loaded Modules:
  1) julia/1.3.1

You can remove a module again with the ‘unload’ command:

$ module unload bamtools
$ module list
No modules loaded

With the short form we can list modules with just 'ml', and we can unload modules by adding a minus sign “-“ in front:

# load julia, ruse and bamtools at once
$ ml julia ruse bamtools
$ ml
Currently Loaded Modules:
  1) julia/1.4.1   2) ruse/1.0   3) bamtools/2.5.1

# unload ruse and bamtools, and switch to Julia 1.3.1
$ ml -ruse -bamtools julia/1.3.1

Some modules depend on other modules, and will load them automatically. Take, for instance, the BUSCO application:

$ ml BUSCO
$ ml

Currently Loaded Modules:
  1) openmpi.gcc/4.0.3   4) hmmer/3.1b2      7) Prodigal/2.6.2
  2) python/3.7.3        5) bamtools/2.4.1   8) BUSCO/4.0.6
  3) ncbi-blast/2.7.1+   6) augustus/3.3

BUSCO depends on a number of other modules, some of which depend on others in turn. The module system loads all the modules we need for us. The module system keeps track of how a module was loaded, so when you unload BUSCO these dependencies will also be unloaded.

Often you want to clear out all modules. Instead of unloading modules one by one you can use the ‘purge’ command to unload all loaded modules at once:

$ module purge
$ module list
No modules loaded

This is also useful in scripts when you want to make sure that you’re starting from a clean slate. Begin the script with a ‘module purge’ and you won’t have any other modules interfering by accident.

The Metamodules

We have organised the modules on Deigo into separate areas. We now have a common area where most software is installed, and four specialised areas.

Module area	Purpose
common	The default area. Most software is installed here.
intel-modules	Software that runs best or only on the Intel nodes.
amd-modules	Software that runs best or only on the AMD nodes.
bioinfo-ugrp-modules	Modules maintained by the Bioinformatics User Group

To use, say, software from the Bioinformatics user group, you load the “bioinfo-ugrp-modules” metamodule:

$ module load bioinfo-ugrp-modules

If you then look at available modules:

$ module av
------------------------------- /apps/.bioinfo-ugrp-modulefiles81 --------------------------------
   DB/Dfam/3.6                               Other/canu/2.1.1
   DB/Dfam/3.8                        (D)    Other/compleasm/0.2.2
   DB/Dfam_RepeatMasker/3.6__4.1.3           Other/deepvariant/1.1.0
   DB/Pfam/34.0                              Other/deepvariant/1.6.0            (D)
   DB/Pfam/35.0                       (D)    Other/dovetail_tools/20210914
   DB/blastDB/ncbi/238                       Other/edirect/18.2
   DB/blastDB/ncbi/2021-11-28                Other/fasttree/2.1.11
   DB/blastDB/ncbi/2022-07-nr                Other/genescope/2021.03.26
...

-------------------------------------- /apps/.metamodules81 --------------------------------------
   amd-modules                 intel-modules           user-modules
   bioinfo-ugrp-modules (L)    sango-legacy-modules

-------------------------------------- /apps/.modulefiles81 --------------------------------------
   AIMAll/19.10                comsol/43a               matlab/MCR
   BUSCO/3.0.2                 comsol/43b               matlab/R2009b
   BUSCO/4.0.6                 comsol/44                matlab/R2011b
...

The Bioinformatics user-group modules are now listed first, followed by the metamodules for the different areas, and then the common modules. You could now load, say, “canu” or “deepvariant”.

The Intel and AMD modules

The modules in “amd-modules” will work on all systems. Modules in “intel-modules” will be faster on the Intel nodes, but will crash on AMD nodes. The intel compiler in intel-modules is an exception and does work everywhere (and should perhaps have been installed in the general module area).

For more information, please read more about this on the Deigo page.

Finding Information

How do you find the module you want? You may know what you need, but not the name of the module. Or maybe you want to know more about how a module is installed.

The ‘spider’ subcommand will let you search for modules by name:

# 'spider' searches for any module matching the text:
$ ml spider trimmo
--------------------------------------------------------------------------
  Trimmomatic: Trimmomatic/0.33
--------------------------------------------------------------------------
    Description:
      A flexible trimmer for Illumina sequence data.
...

# 'spider' by itself lists all modules with a short description:
$ module spider
--------------------------------------------------------------------------
The following is a list of the modules and extensions currently available:
--------------------------------------------------------------------------
  BUSCO: BUSCO/3.0.2, BUSCO/4.0.6
    Assess genome assembly and annotation completeness with benchmarking
    universal single-copy orthologs.

  Gaussian: Gaussian/09RE01

  HTSeq: HTSeq/0.9.1
    High-throughput sequencing data analysis with Python.
...

The ‘whatis’ command will show you a brief description of a module:

$ ml whatis augustus
augustus/3.3.3      : Name: augustus
augustus/3.3.3      : Version: 3.3.3
augustus/3.3.3      : URL: http://bioinf.uni-greifswald.de/augustus/
augustus/3.3.3      : Category: bioinformatics
augustus/3.3.3      : Keywords: sequencing, analysis
augustus/3.3.3      : Description: Predict genes in eukaryotic genomic sequences.

‘module help’ will give you in-depth information on a single module:

$ ml help qiime2

---------------------- Module Specific Help for "qiime2/2019.1" ----------------------
Powerful, extensible, and decentralized microbiome analysis package with a
focus on data and analysis transparency. A complete redesign and rewrite of
QIIME 1.

QIIME 2 comes distributed as a container. We add a small script that lets you
run it as 'qiime' without having to deal with the container directly.

Some modules have only a brief description. Some have more information, including helpful tips for running them on the cluster. If you feel a module description could be improved, please let us know!

The ‘key’ subcommand will search the tags and words in the description in every module. This is good when you don’t really know what you are looking for:

$ ml key numeric
----------------------------------------------------------------------------------

The following modules match your search criteria: "numeric"
----------------------------------------------------------------------------------

  OpenBLAS.gcc: OpenBLAS.gcc/0.3.9
    An optimized BLAS and lapack library.

  R: R/3.4.2
    A popular software environment for statistical computing and graphics.

  aocl.aocc: aocl.aocc/2.1
    A set of numerical libraries tuned specifically for the AMD EPYC processor
    family. This is the AOCC version.
...

Unit-specific modules

Your unit may have software installed in your own unit-specific area. If you want to use that software, you need to tell the module system where to find those module files. If they have been installed according to our instructions, you can add your software modules with the ‘module use’ command:

$ module use /apps/unit/[unit name]U/.modulefiles/

The module commands will now look in that directory as well for module files, and you will be able to use any software you have installed there. If you use this often, it might be a good idea to add this command to your .bashrc file so it gets run each time you log in.

If you want to remove the unit-specific modules again, the ‘module unuse’ command will do that for you:

$ module unuse /apps/unit/[unit name]U/.modulefiles/

A Summary

Here’s a summary of our commands, with the ‘ml’ command, the equivalent ‘module’ command, and the effect:

command	full module command	effect
ml	module list	lists modules you have loaded
ml <module>	module load <module>	loads <module>
ml -<module>	module unload <module>	unloads module (note the minus sign for ml)
ml av	module av	lists available modules (also ‘avail’ and ‘available’)
ml purge	module purge	removes all loaded modules
ml spider <text>	module spider <text>	Search for “text” in names of modules
ml whatis <module>	module whatis <module>	brief information about the module
ml help <module>	module help <module>	more in-depth information about the module
ml key <text>	module key <text>	search tags and descriptions for “text”
ml use <path>	module use <path>	use modules stored under “path”
ml unuse <path>	module unuse <path>	no longer use modules stored under “path”

The ‘ml’ command can take all the same subcommands as ’module’. The one thing it can’t do is load a module that happens to have the same name as a module subcommand. If you have a module named “purge” for instance, ’ml purge' would purge all loaded modules, not load the “purge” module. In such a case, use ‘module load purge’ instead.