Update in Linux Computation Servers of ICT – Service break 12.10.2023

Updates to the Linux computation servers and compute server related infrastructure maintained by ICT Services will be made on Thursday, October 12, 2023 at 4 pm.

The updates will affect the students using Linux computing servers and the service break will last approximately till 10 pm. The updates will affect all Kataja, ITEE, bcocore and Lehmus servers.

The update also involves changes to the ThinLinc software used on the ITEE servers, and will enable more extensive use of the Lehmus computing environment.

Server operating systems running Rocky Linux 8 will be upgraded to RHEL 8. The update enables wider application support, for example for Cadence.

Update of the ThinLinc remote desktop

The ThinLinc environment will be updateted to the latest version 14.5, which requires users to download the new client from the Software Center. After the update, the login address will change to thinlinc.oulu.fi instead of the current itee01.oulu.fi. The thinlinc.oulu.fi address will be used to log in to the ITEE01 - ITEE04 servers, access to which is restricted as follows:

ITEE01 - ITEE04 servers are primarily intended for interactive work via the thinlinc remote desktop environment. The servers have limited resource usage, with a maximum of 25% of the total server resources available to a single user, and the administrator has the right to terminate processes of users who are interfering with the thinlinc environment. An example of a disruptive activity is running a heavy Matlab simulation on the servers. In such situations, the user is recommended to run the job on servers ITEE05 - ITEE12 or in the Lehmus computing environment, which guarantees the necessary resources for the user.

Use of Applications on Computation Servers

Several applications are installed on the computing servers and new application installations can be requested via ict@oulu.fi. New applications are installed in the Lmod environment (https://lmod.readthedocs.io/en/latest/010_user.html), which allows easy management environment variables, as well as easy switching between different versions.

The Most Important Commands Using Lmod are:

$ module avail
Module avail lists all available applications and their versions

$ module load sovellus/versio
Module load downloads the application's execution environment, e.g. in the case of Matlab module load matlab/R2023b

$ module list
Lists the applications downloaded to the platform

$ module purge
Disables all modules

$ module show sovellus/versio
Shows what changes the module file for application in question makes.

$ module help sovellus/versio
Prints help text written by the administrator, currently the scope of help texts varies greatly, but in the future the intention is to use the Help functionality for application installations.

Use of the Lehmus Computation Server

The Lehmus computation server environment is the latest resource available to the staff and the students at the University of Oulu. The environment currently consists of 6 computation servers, each of which has nvidia graphics cards. The environment can be accessed interactively via the https://lehmus.oulu.fi portal or via ssh-connection from the lehmus-login1.oulu.fi server. Running jobs in the Lehmus server differs from the previous Kataja and ITEE servers in that the SLURM batch environment is used to run the jobs.

Instructions on how to use slurm can be found in the SLURM documentation (https://slurm.schedmd.com/quickstart.html).

Technical specifications of the computing environment to start using:

SLURM sections:

$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
interactive up 8:00:00 6 idle lehmus-cn[1-6]
normal* up 14-00:00:0 6 idle lehmus-cn[1-6]
debug up infinite 2 idle lehmus-testcn[1-2]

Interactive works (https://lehmus.oulu.fi) from the portal are launched in the interactive partition. Other jobs should be launched in the normal partition, which is also defined as the default partition for users.

Starting SBATCH batch run:

Below is an example of how to start a simple batch run. The main variables in the file are -cpus-per-task, which allocates the number of cpu cores, --mem, which specifies the amount of central memory required, --time, the maximum time the job will take:

$ cat job.sh
#!/bin/bash
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --time=1:00:00
#SBATCH --job-name="Example slurm job"
#SBATCH --partition=normal

# Load Anaconda python environment and activate personal ml environment
module load anaconda/2023.03
conda activate ml
# Start the python job
python3 my_simulation.py

The work can be started as follows:

$ sbatch job.sh

After starting the job, slurm creates a slurm-<jobid>.out file in the folder where the job was started. The same output is written to the file as would be written to the terminal window in interactive mode.

If you have problems running the job, please contact ict@oulu.fi via email with the following information: job id, job.sh file, slurm-<jobid>.out file, and a description of what you were trying to do.

Using GPU cards

SBATH also supports the provisioning of gpu's for use and can be implemented for example with the job file below.

$ cat job2.sh
#!/bin/bash
#SBATCH --cpus-per-task=8
#SBATCH –mem=16G
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH --job-name="Example slurm job"
#SBATCH --partition=normal

# Load Anaconda python environment and activate personal ml environment
module load anaconda/2023.03
conda activate ml
# Start the python job
python3 my_simulation.py

The following gpu options can be selected:

--gres:gpu:1 Valitsee minkä tahansa saatavilla olevan GPU:n
--gers:gpu:v100:1 Valitsee nvidia V100 16G GPU:n
--gers:gpu:a30:1 Valitsee nvidia A30 24G GPU:n
--gres:gpu:t4:1 Valitsee nvididian T4 GPU:n

Last updated: 10.10.2023