paramananta.iitgn.ac.in
. (i.e, ssh username@paramananta.iitgn.ac.in
)
Operating System | Instructions |
---|---|
Linux | Use ssh -X username@paramananta.iitgn.ac.in |
Windows | Please see "X11 Forwarding in Putty" documentation |
man sbatch
.
The following is an example of a script file’s content:
#!/bin/bash
Tells the OS which shell to run this script in
#SBATCH –job-name <my_job_name>
Assigns the name “my_job_name” to the job
#SBATCH -p <partition_name>
Run the "sinfo" command to see the list of available partitions
#SBATCH --nodes=1
Run all processes on a single node
#SBATCH --ntasks=1
Run a single task
#SBATCH --cpus-per-task=4
Number of CPU cores per task
#SBATCH --gres=gpu
Include GPU for the task (optional, only for GPU jobs)
#SBATCH --time=00:00:05:00
Time limit days:hrs:min:sec (optional)
#SBATCH --output=first_%j.out
Standard output file
#SBATCH --error=first_%j.err
Standard error file
Sample job script:
#!/bin/bash
#SBATCH --job-name=test_job # Job name
#SBATCH --partition=small # Run "sinfo" command to check the available partitions
#SBATCH --nodes=1 # Run all processes on a single node
#SBATCH --ntasks=1 # Run a single task
#SBATCH --cpus-per-task=4 # Number of CPU cores per task
#SBATCH --gres=gpu:1 # Include gpu for the task (optional only for GPU jobs)
#SBATCH --output=first_%j.out # Standard output file
#SBATCH --error=first_%j.err # Standard error file
date;hostname;pwd
module load openmpi4
<Executable PATH> INPUT OUTPUT
sacct -j 'jobid' --format=user, JobID, JobName, MaxRSS, Elapsed
will give you statistics on completed jobs by jobID. Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc. The command output below shows the sample output. See the man page man sacct
for more details.
$ sacct -j 269667 --format=JobID, Jobname, partition, state, elapsed, MaxRss, MaxVMSize, nnodes, ncpus, nodelist, AveDiskWrite
JobID JobName Partition State Elapsed MaxRSS MaxVMSize NNodes NCPUS NodeList AveDiskWrite
------------ ---------- ---------- ---------- ---------- ---------- --------- -------- ------ ------------- --------------
269667 SBBS_dist standard-+ COMPLETED 03:43:01 16 625 cn[285-300]
269667.batch batch COMPLETED 03:43:01 12564K 1530304K 1 40 cn285 3.43M
269667.0 pmi_proxy COMPLETED 03:42:59 17989840K 43527004K 16 16 cn[285-300] 62530.07M
sbatch
to submit, squeue
to check the status, and scancel
to delete a batch request.
scancel <jobid>
.
The codes identify the reason that a job is waiting for execution. A job may be waiting for more than one reason, in which case, only one of those reasons is displayed.
JOB REASON CODE | Explanation |
---|---|
AssociationJobLimit | The job’s association has reached its maximum job count. |
AssocGrpNodeLimit | The jobs requested a number of nodes above the allowed for the entire project/association/group. |
AssocMaxNodesPerJobLimit | The job requested a number of nodes above the allowed. |
AssocMaxJobsLimit | Generally occurs when you have exceeded the number of jobs running in the queue. |
AssocMaxWallDurationPerJobLimit | The job requested a runtime greater than that allowed by the queue. |
AssociationResourceLimit | The job’s association has reached some resource limit. |
AssociationTimeLimit | The job’s association has reached its time limit. |
BadConstraints | The job’s constraints cannot be satisfied. |
BeginTime | The job’s earliest start time has not yet been reached. |
Cleaning | The job is being requeued and still cleaning up from its previous execution. |
Dependency | This job is waiting for a dependent job to be completed. |
FrontEndDown | No front-end node is available to execute this job. |
InactiveLimit | The job reached the system InactiveLimit. |
InvalidAccount | The job’s account is invalid. |
InvalidQOS | The Job’s QoS is invalid. |
JobHeldAdmin | The job is held by a system administrator. |
JobHeldUser | The user holds the job. |
JobLaunchFailure | The Job could not be launched. This may be due to a file system problem, an invalid program name, etc. |
Licenses | The job is waiting for a license. |
NodeDown | A node required by the job is down. |
NonZeroExitCode | The job was terminated with a non-zero exit code. |
PartitionConfig | Requesting more or the wrong number of resources than the partition is configured for. |
PartitionDown | The partition required by this job is in a DOWN state. |
PartitionInactive | The partition required by this job is in an Inactive state and is not able to start jobs. |
PartitionNodeLimit | The number of nodes required by this job is outside of its current limits. It can also indicate that required nodes are DOWN or DRAINED. |
PartitionTimeLimit | The job’s time limit exceeds its partition’s current time limit. |
Priority | One or more higher-priority jobs exist for this partition or advanced reservation. |
Prolog | Its PrologSlurmctld program is still running. |
QOSJobLimit | The Job’s QoS has reached its maximum job count |
QOSResourceLimit | The Job’s QoS has reached some resource limit. |
QOSTimeLimit | The Job’s QoS has reached its time limit. |
QOSMaxCpuPerUserLimit | The Job’s QoS has reached its CPU core limit allocated per user. |
ReqNodeNotAvail | Some node specifically required by the job is not currently available. For example, the Node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes that are DOWN, DRAINED, or not responding will be identified as part of the job’s “reason” field as “UnavailableNodes”. Such nodes will typically require the intervention of a system administrator to make them available. |
Reservation | The job is waiting for its advanced reservation to become available. |
Resources | The job is waiting for resources to become available. |
QOSResources | The job’s requested QOS limits resources. |
System failure | Failure of the Slurm system, a file system, the network, etc |
TimeLimit | The job has exceeded its time limit. |
QOSUsageThreshold | The required QOS threshold has been breached. |
WaitingForScheduling | No reason has been set for this job yet. I was waiting for the scheduler to determine the appropriate reason. |
lfs quota -hu $USER /home
lfs quota -hu $USER /scratch
du . | sort -n | tail -n 10
tar -czvf src.tar.gz src/
This archive can then be unpackaged using:
tar -xzvf src.tar.gz
where the resulting directory/file structure is identical to what it was initially.lfs find $HOME -mtime +30 -type f -print | xargs du -sh
lfs find $SCRATCH -mtime +30 -type f -print | xargs du -sh
$dos2unix myfile.txt
$mac2unix myfile.txt
$unix2dos myfile.txt
ssh -X @paramananta.iitgn.ac.in