espa logo

Bioinformatics Unit | HPC Documentation

Bioinformatics Unit
Home > Facilities > Bioinformatics Unit

HIBU Cluster – Quick Start Guide


1. Cluster Overview

HIBU is a high-performance computing (HPC) system consisting of four compute nodes organized into multiple partitions. Job scheduling is managed by the Slurm scheduler. A dedicated head (login) node allows users to prepare scripts, submit/cancel jobs, and transfer data. HIBU uses a NAS with a total capacity of up to 42 TB.

  • Cluster name: HIBU
  • Login hostname / IP: 139.91.75.141 (VPN required from outside the IMBB/FORTH network)
  • Default partition: long
  • Available partitions: long, short, smallhm
  • Filesystems:
    • HOME: /home/{{USER}}/ – small, for configs & scripts, not backed up
    • SCRATCH: /data/{{USER}}/ – fast workspace, not backed up (total shared capacity ~42 TB)
  • Containers: Singularity / Apptainer (coming soon)
  • Data transfer options: scp, rsync, FTP client, wget, curl
  • Support contact: hpc-support@imbb.forth.gr
  • Documentation page: TBD

1.1 Hardware summary

  • 3 nodes with:
    • 24 CPU cores per node
    • 2 threads per core
    • ~250 GB RAM per node
    • Hostnames: node1, node2, node3
  • 1 node with:
    • 20 CPU cores
    • 2 threads per core
    • ~126 GB RAM
    • Hostname: node4
  • 1 node with:
    • 16 CPU cores
    • 2 threads per core
    • ~256 GB RAM
    • Hostname: node5
Important
When you log in, you are on the head (login) node. This node is not for running analyses. Use it only to prepare jobs, data, and scripts. Running intensive workloads on the login node can disrupt or crash the gateway that all users rely on to access the cluster. Always submit analyses through Slurm to the compute nodes.

2. Access & First Login

2.1 If you do not have an account

  1. Request an account
    Send an email to the HPC support team including:
    • Your name and affiliation
    • Intended workloads (types of analyses)
    • Software and resource needs (CPU, RAM, storage)
  2. Set up VPN
    Configure and connect to the IMBB VPN to be able to access HIBU through SSH from outside the IMBB/FORTH network.
  3. Set up two-factor authentication
    Install Google Authenticator (or compatible TOTP app) on your smartphone and scan the QR code provided by the administrator upon account creation.

2.2 If you already have an account

2.2.1 Windows users

  1. Connect to the IMBB VPN (when you are outside the IMBB/FORTH network).
  2. Download and install PuTTY on your personal computer.
  3. Open PuTTY and fill in:
    • Host Name (or IP address): 139.91.75.141
    • Port: 22 (default SSH port, unless instructed otherwise)
    • Connection type: SSH
  4. Click Open. When prompted:
    • Enter your cluster username
    • Enter your password
    • Enter the current verification code from Google Authenticator on your smartphone

2.2.2 Linux / macOS users

  1. Connect to the IMBB VPN if you are outside the IMBB/FORTH network.
  2. Open a terminal (console application).
  3. Run:
    ssh {{USER}}@139.91.75.141  
  4. When prompted:
    • Enter your password
    • Enter the verification code from Google Authenticator on your smartphone

3. Filesystems & Data Storage

  • HOME: /home/{{USER}}
    • For configuration files, scripts, and small data
    • Not backed up
  • SCRATCH: /data/{{USER}}
    • For input data, temporary files, and analysis outputs
    • High-performance storage shared on the cluster
    • Not backed up
Important
Do not run analyses that read input data from, or write output to, your $HOME directory. Use /data/{{USER}} instead.

3.1 Data transfers

To transfer data to or from HIBU, you can use:

  • scp or rsync from your local machine:
rsync -avP local/ {{USER}}@{{DATA_HOST}}:/data/{{USER}}/  scp -r local/ {{USER}}@{{DATA_HOST}}:/data/{{USER}}/  
  • wget or curl from inside the cluster to download directly to /data/{{USER}}:
cd /data/{{USER}}  wget <URL>  # or  curl -O <URL>  
  • An FTP client, if configured by your local IT.

4. Slurm Job Scheduler

4.1 Introduction

Slurm is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters. As a cluster workload manager, Slurm has three key functions:

  1. Allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.
  2. Provides a framework for starting, executing, and monitoring work (usually parallel jobs) on the set of allocated nodes.
  3. Arbitrates contention for resources by managing a queue of pending work.

More documentation is available at the official Slurm website: https://slurm.schedmd.com/documentation.html

4.2 Core Slurm commands

Common user commands include:

  • sinfo – report status of partitions and nodes
  • squeue – list jobs in the queue (running and pending)
  • sbatch <script> – submit a batch script as a job
  • scancel <jobid> – cancel one of your jobs
  • sacct – get accounting information about past jobs

Examples:

# Cluster/partition status  sinfo    # Your jobs in the queue  squeue -u $USER    # Submit a job script  sbatch my_script.slurm    # Cancel a job  scancel <jobid>  

4.3 Common resource arguments

  • Partition: --partition=long (or short, smallhm)
  • Node / task / CPU allocation:
    • --nodes=<N>
    • --ntasks=<T>
    • --cpus-per-task=<C>
  • Memory allocation:
    • --mem=8G (per node)
    • --mem-per-cpu=4G (per CPU)
  • Job array: many similar tasks:
    • --array=1-100%10 (100 tasks, max 10 running at once)

5. Basic Job Types

5.1 Single-node jobs (serial or multicore)

You can request only a “slice” of a node, leaving the remaining resources for other users. Specify both the number of cores and the memory you need:

#!/bin/bash  #SBATCH --job-name="hello"  #SBATCH --partition=long  #SBATCH --nodes=1  #SBATCH --cpus-per-task=1      # request 1 CPU core  #SBATCH --mem=2G               # request 2 GB of RAM  #SBATCH -o slurm-%j.out    python myscript.py     --input /data/{{USER}}/data/input.dat     --out   /data/{{USER}}/results/out.txt  

If you really need the entire node, you can omit fine-grained resource flags (CPUs/memory) and request the whole node with options like --exclusive. However, avoid this unless absolutely necessary.

5.2 Job arrays (many similar tasks)

If your analysis can be split into independent chunks (e.g. different input files or parameter sets), you can use a job array:

#!/bin/bash  #SBATCH --job-name="jobarray"  #SBATCH --partition=long  #SBATCH --nodes=1              # or more  #SBATCH --cpus-per-task=1      # 1 core per array task  #SBATCH --mem=2G               # 2 GB RAM per array task  #SBATCH --array=1-10           # 10 tasks  #SBATCH -o logs/array_%A_%a.out    PARAMS=$(sed -n "${SLURM_ARRAY_TASK_ID}p" params.txt)    python simulate.py $PARAMS     --seed ${SLURM_ARRAY_TASK_ID}     --out results/${SLURM_ARRAY_TASK_ID}.json  

Here, each line of params.txt corresponds to one array task.

5.3 Job dependencies (run job B after job A finishes successfully)

jid1=$(sbatch jobA.slurm | awk '{print $4}')  sbatch --dependency=afterok:${jid1} jobB.slurm  

6. Requesting the Right Resources

  • Time limit: be realistic. Shorter jobs usually start sooner. Use past sacct records to calibrate:
    sacct -j <jobid> --format=JobID,JobName%20,Partition,State,Elapsed,Timelimit  
  • CPUs: for pure Python/R scripts that do not use multithreading, 1–2 CPUs are usually enough. Use:
    • OpenMP: --cpus-per-task
    • MPI: --ntasks (and --ntasks-per-node)
  • Memory: start modestly, then inspect memory usage with MaxRSS from sacct and adjust.
  • Nodes vs tasks:
    • Single-node shared memory job: -N 1 --cpus-per-task=X
    • Multi-node MPI job: -N N --ntasks-per-node=T
Tip
Avoid --exclusive unless you truly need the whole node; it reduces overall throughput and can lower your priority in the queue.

7. Monitoring & Logs

  • Queue & nodes:
sinfo  squeue -u $USER  
  • Inspect a job:
scontrol show job <jobid>  
  • History & usage:
sacct -j <jobid>     --format=JobID,JobName%20,Partition,State,Elapsed,Timelimit,AllocTRES%30,ReqMem,MaxRSS,ExitCode  

Logs: By default, stdout and stderr go to slurm-%j.out. You can customize this with:

# In your job script  #SBATCH -o myjob_%j.out   # stdout  #SBATCH -e myjob_%j.err   # stderr  

8. Best Practices & Etiquette

  • Develop small, run big: test with small inputs first; scale up only after validation.
  • Keep environments portable: script your environment setup (module load ..., conda activate ...) inside the job file.
  • Avoid HOME I/O: use /data/{{USER}} for read/write workloads.
  • Clean up SCRATCH: copy important results back to your own storage and remove unneeded files.
  • One process per CPU: do not oversubscribe CPUs unless you know what you are doing.
  • Version control: keep your job scripts in a Git repository; log software versions (e.g. --version output) to your job logs.
  • Cite the cluster: use {{CITATION_TEXT}} in publications that used HIBU resources.
  • Write heavy outputs to SCRATCH: avoid writing large files to $HOME.
  • Use checkpointing for long runs: so that jobs can resume after failures or timeouts.

9. Troubleshooting

  • Job stuck in PD (pending):
    • Check the reason with squeue
    • Reduce requested resources (time, memory, nodes)
    • Try another partition if appropriate
    • Verify your account / QoS settings
  • Job exceeded memory:
    • Increase --mem or --mem-per-cpu
    • Inspect MaxRSS in sacct output
    • Investigate memory leaks or unnecessary data copies in your code

If you face issues running jobs, you can email hpc-support@imbb.forth.gr including:

  • JobID(s)
  • Job script(s)
  • Relevant slurm-<jobid>.out files
  • Paths to your input data and scripts
  • A minimal reproducer, if possible

10. Policy Highlights

Policy
  • Login/head node usage: only for short, lightweight tasks (editing files, submitting jobs, data transfers). No heavy CPU/GPU jobs on the login node.
  • Data retention: SCRATCH (/data) may be purged after {{SCRATCH_PURGE_DAYS}} days.
  • Backup: no systematic backup is performed due to limited resources. Users are fully responsible for keeping copies of their files.
  • Sensitive data: for HIPAA/GDPR/IRB or other regulated data, follow all applicable IMBB/FORTH regulations and policies before storing or processing such data on HIBU.

11. Quick Reference (Cheat Sheet)

Task Command
Submit job sbatch job.slurm
Live queue squeue -u $USER
Job details scontrol show job <jobid>
Cancel job scancel <jobid>
Node/partition info sinfo -o '%P %a %l %D %C %G %F'
Interactive allocation salloc -p {{DEFAULT_PARTITION}} -t 02:00:00 --cpus-per-task=4 --mem=8G
Array job #SBATCH --array=1-100%10
Account / QoS #SBATCH -A {{SLURM_ACCOUNT}}
#SBATCH --qos={{DEFAULT_QOS}}
Efficiency seff <jobid> (if available)

12. Templates (Copy & Adapt)

12.1 Minimal job script

#!/bin/bash  #SBATCH --job-name="hello"  #SBATCH --partition=long  #SBATCH --nodes=1  #SBATCH --cpus-per-task=1        # 1 CPU core  #SBATCH --mem=2G                 # 2 GB RAM  #SBATCH -o slurm-%j.out    python myscript.py     --input /data/{{USER}}/data/input.dat     --out   /data/{{USER}}/results/out.txt  

12.2 Conda environment in a job

If you have installed software in a Conda environment under your account:

#!/bin/bash  #SBATCH --job-name="py"  #SBATCH -A {{SLURM_ACCOUNT}}  #SBATCH --partition={{DEFAULT_PARTITION}}  #SBATCH --cpus-per-task=2  #SBATCH --mem=8G  #SBATCH -o slurm-%j.out    # Load Conda if needed, e.g.:  # source ~/miniconda3/etc/profile.d/conda.sh    conda activate ENVIRONMENT_NAME   # always reactivate env inside the job    mafft --help                      # example tool installed in the environment  

12.3 Apptainer / Singularity (coming soon)

#!/bin/bash  #SBATCH --job-name="containers"  #SBATCH --partition=short  #SBATCH --nodes=1  #SBATCH --cpus-per-task=1  #SBATCH --mem=32G  #SBATCH -o slurm-%j.out    # Run MAFFT directly from a Docker image via Apptainer:  apptainer exec --nv docker://staphb/mafft:latest     mafft --help    # OR pull and then run with Singularity:    singularity pull docker://staphb/mafft:latest  singularity exec mafft_latest.sif mafft --help  

New User Guide - HIBU (Slurm HPC)