VEGA

Home

VEGA CLUSTER DETAILS

Name of the HPC cluster

Vega

Fully Qualified Domain Name

hpc.iitgn.ac.in

IP Address of HPC cluster

10.0.137.10

Make

Fujitsu

Usable Storage

~25TB

Total CPU

192 cores Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

448 cores Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz

80 cores Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

Total GPU

2 x NVIDIA Tesla K20Xm Cuda in two new GPU node

1 x NVIDIA Tesla P100-PCIE-16GB in two new GPU node

Total Compute nodes

24

Total GPU nodes

4

Job Scheduler

SLURM 14.03

User level Quota

100 GB per user in the home directory

To check your Quota

quota -v

Usage Guidelines
  • Users should run and write their jobs from /scratch/username/foldername only; Users should NOT run and write their jobs from /home/username
  • Users must understand that HPC is a central facility that is shared by all members of the institute. Users should therefore use an optimum number of processing cores by testing for scaleup.
  • A sample script is provided in the home directory of each user.
  • The quota for each user is 100 GB in home directory. There is no per-user limit for the scratch folder.
  • An automatic email will be sent to the HPC user community once 75 % of the scratch is used. Another automatic email will be sent to the users once 85% of the scratch space is used. The deletion of the files by the administrator will commence within 24 hours of the second email.
  • It is strongly recommended that users backup their files-folders periodically, as ISTF will not be having a mechanism to backup users’ data.
  • Deletion of files in scratch directory will be automatically done in scratch 21 days after the last time stamp/update.
  • Users are strictly NOT ALLOWED to run any jobs on the Master Nodes of the HPC cluster.
  • A priority based queueing system is implemented so that all users get a fair share of available resources. The priority will be decided on multiple factors including job size, queue priority, past and present usage, time spent on queue etc.
  • Please note that there is incentive to optimize your usage. You will get more priority!
  • It is strongly recommended that users must request the scheduler to pick cores from the same node whenever possible. If the cores are not available on the same node, the users can request for cores from other nodes.
  • There is no limit on the number of jobs per user. The maximum number of cores per user is set as 128. The maximum number of cores per job is currently set as 64.
  • For any issue or requests pertaining to Aneesur, please send your email with your working-path, error logs and error screenshots and submit-script only at helpdesk.istf@iitgn.ac.in
Software

Below are the list of software that are installed in the cluster.

  • All the software are installed under /opt file-system of the cluster.
  • We are using CentOS-6.7 64-bit as the Operating System in the cluster.
  • Applications – Latest and stable version of software installed as in VEGA

  • Name of Executables – Gromacs (cpu or gpu or patched with plumed) : gmx_mpi , Lammps: lmp_mps , Gaussian: g09 , NAMD (cpu or gpu or patched with plumed): namd2 , QE (cpu or gpu): pw.x , Uintah (cpu or gpu): sus , Charmm: charmm

Compilers
Software Version Binary Path
Cuda 6.5 /root/cuda-drivers in GPU nodes /usr/local/cuda/bin/nvcc/

 

Parallel Scientifc Libraries
Software Version Binary Path
Magma 1.6.0
Paralution 1.0 /opt/paralution/1.0/paralution-1.0.0/build/bin

 

High Fidelity Multi-Physics Computational Engineering Softwares
Commercial
Software Version Binary Path
Gaussian Version 09
Revision E.01
/opt/Gaussian09/G09/tar/g09/g09
ANSYS 14.5 /opt/ANSYS/ansys_inc/v145
STAR-CCM+ 9.06 /opt/CD-adapco/STAR-CCM+9.06.009/star/bin/starccm+
10.04 /opt/CD-adapco/STAR-CCM+10.04.009/star/bin/starccm+
Open Source
Software Version Binary Path
Charmm 39 /opt/charmm/em64t_M; Download sample_charmm_script.txt (will provide soon)
Quantum Espresso 5.4.0 /opt/QuantumEspresso/pkgs/espresso-5.4.0/bin/pw.x
NAMD 2.10 /opt/NAMD/NAMD_2.10b2_Linux-x86_64/namd2
OpenFOAM 2.2.2 /opt/openfoam1/OpenFOAM-2.2.2/bin
Palabos 1.4r1
DEAL.II 8.3.0 /opt/Deal2-8.3_14Aug2015/dealii-8.3.0/bin/mesh_converter
Sailfish
SU2 3.2.2
MACS /opt/umas_sw/macs/MACS-1.3.7.1/bin/macs
Samtools 1.3 /opt/umas_sw/samtools/samtools-1.3/samtools
bedtools 2.25.0 /opt/umas_sw/bedtools/bedtools2/bin/bedtools
cufflinks 2.2.1 /opt/umas_sw/cufflink/cufflinks-2.2.1.Linux_x86_64/cufflinks
seqmonk 0.33 /opt/umas_sw/seqmonk/seqmonk/SeqMonk/seqmonk
bedgraph /opt/umas_sw/bedgraph/bedGraphToBigWig
R 3.3.0 /opt/umas_sw/R-software/R/bin/R
fastqc 0.11.5 /opt/umas_sw/fastqc/FastQC/fastqc
Bismark 0.16.1 /opt/umas_sw/bismark/bismark_v0.16.1
FatsX 0.0.13 /opt/umas_sw/fastx/bin
Bowtie 2.2.9 /opt/umas_sw/bowtie/bowtie2-2.2.9
Meme 4.11.2 /opt/umas_sw/meme/meme_4.11.2/src
Homer 4.8.3 /opt/umas_sw/homer/bi
BWA 0.7.15 /opt/umas_sw/bwa-0.7.15/bwa-0.7.15/bwa

 

Queuing Systems & Scheduler
Queuing Systems

When a job is submitted, it is placed in a queue. There are different queues available for different purposes. The user must select any one of the queues from the ones listed below which is appropriate for his/her computation need.

Queue Details
  • Debug Queue: This queue is available to all the HPC users to quickly run a small test job to check whether it converges successfully or not.
Name of Queue = debug
No of nodes = 24
Max number of cores per job = 64
Walltime = 60 minutes
  • Main Queue: This queue is available to all the HPC users to run multicore parallel jobs in old nodes.
Name of Queue = main
No of nodes = 10
Max number of cores per job = 64
Walltime = 48 hours
  • Main_new Queue: This queue is available to all the HPC users to run multicore parallel jobs in new nodes.
Name of Queue = main_new
No of nodes = 14
Max number of cores per job = 64
Walltime = 48 hours
  • GPU Queue: This queue is available to all the HPC users and is it encouraged that jobs that utilize GPU cards should use this queue.
Name of Queue = gpu
No of nodes = 2
Name of node = gpu1 & gpu2
Walltime = 72 hours
  • GPU new Queue: This queue is available to all the HPC users and is it encouraged that jobs that utilize GPU cards should use this queue.
Name of Queue = gpu_new
No of nodes = 2
Name of node = gpu3 & gpu4
Walltime = 72 hours

Node Configuration

Based on the queuing system given above, the node configurations can be summarized as follows:

Queue Name Max Wall time Max number of cores per job Priority
debug 60 minutes 1 1
main 48 hours 64 2
main_new 48 hours 64 2
gpu 72 hours 64 2
gpu_new 72 hours 64 2

Sample Scripts to submit job for various queue:

Debug Main GPU
#!/bin/bash
#SBATCH –job-name=
#SBATCH –ntasks=1
#SBATCH –error=myjob.%J.err
#SBATCH –output=myjob.%J.out
#SBATCH –partition=debug
#SBATCH -v

cd ~/
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
mpirun -batch -np 1 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~//input-file

#!/bin/bash
#SBATCH –job-name=
#SBATCH –ntasks=16
#SBATCH –error=myjob.%J.err
#SBATCH –output=myjob.%J.out
#SBATCH –partition=main
#SBATCH -v

cd ~/
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
mpirun -batch -np 96 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~//input-file

#!/bin/bash #SBATCH –job-name=
#SBATCH –ntasks=16
#SBATCH –gres=gpu:1
#SBATCH –error=myjob.%J.err
#SBATCH –output=myjob.%J.out
#SBATCH –partition=gpu
#SBATCH -v

cd ~/
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
mpirun -batch -np 32 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~//input-file

 

main_new gpu_new
#!/bin/bash
#SBATCH –job-name=
#SBATCH –ntasks-per-node=16
#SBATCH –error=myjob.%J.err
#SBATCH –output=myjob.%J.out
#SBATCH –partition=main_new
#SBATCH -v

cd ~/
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
mpirun -batch -np 32 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~//input-file

#!/bin/bash
#SBATCH –job-name=
#SBATCH –ntasks=16
#SBATCH –gres=gpu:1
#SBATCH –error=myjob.%J.err
#SBATCH –output=myjob.%J.out
#SBATCH –partition=gpu_new
#SBATCH -v

cd ~/
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
mpirun -batch -np 32 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~//input-file

Useful Commands

• For submitting a job: sbatch submit_script.sh
• For checking queue status: squeue -l
• For checking node status: sinfo
• For cancelling the job: scancel <job-id>
• For checking whether the job is running in GPU: nvidia-smi
• For checking the genaration of output at runtime: tail -f output.log
• For scp a file/folder from cluster to your machine: scp -r files/folders username@your-machine-IP-Address:
• For scp a file/folder from your machine to the cluster: scp -P 2022 -r files/folders username@10.0.137.10:

Useful Link

• User guide to SLURM : [https://slurm.schedmd.com/pdfs/summary.pdf]

How-To's

How to Obtain an Account in Vega: Please send an email to helpdesk.istf@iitgn.ac.in with a copy to your supervisor. Also please do let us know the duration of the account required and the list of software which you wish to run.

Name of the cluster: Vega
IP of the cluster: 10.0.137.10
Hostname of the cluster: hpc.iitgn.ac.in

Login from Linux:
To login from Linux you simply need to open a Terminal which is installed with the base OS of any flavor of Linux.

Login from Windows:
If you use Windows then you can use Putty which can be downloaded from here.

Click “Yes” to continue

External Usage

Computational Resources for External Usage @HPCLab in IITGN

Access to our High Performance Computing (HPC) Facility is granted to external users (Academic/Research organizations and Industry only) through a Committee.

The Proposal from the user should reflect the

  • Technical Details of specific facility needed & its duration
  • Brief scientific narration of their proposed work

Please send your detailed proposal to support.hpc@iitgn.ac.in

Based on the review outcome and feasibility consideration of our facility, we will allocate computer resources.

Obtaining the HPC Account

  • Once the proposal is reviewed, accepted and approved by the committee, the user may download, fillup the HPC application form ink-sign, scan it and email us.
  • Thereafter a unique group/user name will be created for the external user and associated user(s) and thereafter the user credentials would be sent in reply-email.

Usage Policy

Forms

Contact

  • Email: support.hpc@iitgn.ac.in
Funding

Funding Agencies

                               

Recognized As

Publications

Coming Soon…!

Galleries

FAQ’s

How to Request for HPC account

Please send an email to helpdesk.istf@iitgn.ac.in with a copy to your supervisor. Also please do let us know the duration of the account required and the list of software which you wish to run.

Where is my quota and how do I change it?

The quota for each user is 100 GB in home directory. There is no per-user limit for the scratch folder.
quota -v your username /home/ your Supervisor_grp/username

How many Jobs can I run with how many cores?

There is no limit on the number of jobs per user. The maximum number of cores per user is set as 96. The maximum number of cores per job is currently set as 64.

How the scheduling has been implemented?

The SLURM scheduler will automatically find the required number of processing cores from nodes (even if a node is partially used). Please do not explicitly specify the node number/number of nodes in the script. A priority based queueing system is implemented so that all users get a fair share of available resources. The priority will be decided on multiple factors including job size, queue priority, past and present usage, time spent on queue etc.

Where do I run my Jobs?

Users should run and write their jobs from /scratch/username/foldername only; Users should NOT run and write their jobs from /home/username

User Data Backup

It is strongly recommended that users backup their files-folders periodically, as ISTF will not be having a mechanism to backup users’ data. An automatic email will be sent to the HPC user community once 75 % of the scratch is used. Another automatic email will be sent to the users once 85% of the scratch space is used. The deletion of the files by the administrator will commence within 24 hours of the second email. Deletion of files in scratch directory will be automatically done in scratch 21 days after the last time stamp/update.

Can I run Jobs on the master node or in any other node in an interactive manner without using script and bypassing scheduler?

Users are strictly NOT ALLOWED to run any jobs on the Master Nodes or any other node in an interactive manner. Users must run jobs only using the scripts through the job scheduler.

Whom should I contact for any issue?

For any issue or requests pertaining to VEGA, please send your email with your working-path, error logs and error screenshots and submit-script only at helpdesk.istf@iitgn.ac.in