When you log in to Apocrita you will be accessing a frontend. It is very important that you do not simply run your scripts here. These nodes are there to accommodate logins to the cluster, not workload. Instead, there is a scheduling program running on the cluster which takes instructions to your job and distributes the total load over all the worker nodes. You can also use the utility servers interactively by SSHing to them from the frontend.
These modes will log you in directly to a machine, allowing you to test, develop and try different things out in a single session. This is useful if for example you are trying a new tool out and need to test different parameters or if you are unsure about the syntax of the program.
These commands will connect you to one of the worker nodes with the requested resources available for you to use. This will allow you to run your scripts/jobs interactively on a command line. This can be useful if you want to have more freedom in what you are doing than a script allows, but should be avoided for longer jobs as the frontends are rebooted fairly frequently which may kill your job. These commands take the arguments on how what resources you need. For more details, have a look at the ITS Research website
Here is an example of logging onto a machine using qlogin:
ssh -X email@example.com #log in to Apocrita qlogin -l h_rt=3600 #request a session 3600s (1h) long qlogin -l h_rt=10:0:0,h_vmem=4G #request 10h and 4G memory exit #just type exit to logout
Do keep in mind that the frontends are rebooted reasonably frequently to be able to handle all the traffic on them. For this reason, it's best to keep interactive jobs fairly short. Use the utility servers or
qsub described below for longer runs.
Once you have logged into a frontend of Apocrita, all you need to do to log in to one of these servers is to SSH to that particular name. These machines have all been bought by different groups of academics and priority remains with these groups. This only means that you need to check carefully before you start to work here, if it is free everyone is allowed to make use of the resource.
ssh -X frontend5 ssh -X frontend6 ssh -X sm11
These machines are all different, to find out how and why, head to the advanced section. It is VERY important that you do not overload these servers, it is up to the user to figure out what the load on the machine is and determine if there is room for their job. Use commands such as
free etc. to make sure.
If you do not know exactly what this means or how to do it properly, ask your colleagues or set out to find Adrian, the cluster guy.
This is the easiest way to run your job on the Apocrita cluster. Even so, it is not as simple as running your command. Remember, don't run jobs on the frontends. This method uses a command called
qsub which takes a script as input. Inside the script you define the resources your job needs.
You need to write the script with the instructions for your job, below you find the simplest version of such a script. Use this as a template and add options as needed. There are several more options you can add to the header of your script which for example allows for more cores to be used.
#!/bin/sh #$ -cwd # Set the working directory for the job to the current directory #$ -V #$ -l h_rt=24:0:0 # Request 24 hour runtime #$ -l h_vmem=1G # Request 1GB RAM ./code # Your code goes here /data/home/btw000/scripts/my_fave_script.sh # It could be a separate script bwa mem -t 10 -p inref.fa infa.fastq > out.sam # Or it could be a program installed on apocrita echo "Hello" > world.txt # Or just bash code to do whatever
After you have written and saved the script you feed it to
qsub as such
qstat to show the status of all your jobs in the queue on Apocrita, and
qdel to delete a job from the queue.
To monitor a job running in an interactive session or on a utility server use a program like
top. It lists all processes currently running on the machine and allows you to sort them by various criteria like CPU or memory usage. There is an arguably better version of the same type of tool called
htop. You will need to load it with
module load htop/1.0.3 before you can use it though. Both
htop have an argument
-u taking a username so that you can inspect only your own processes if you want,
htop -u btw000 will show only btw000's processes.