DCCGuide
DCC documentation home:
DCC ondemand:
https://dcc-ondemand-01.oit.duke.edu/pun/sys/dashboard/
Interactive sessions (jupyterlab)
Interactive Job viewer (like WebbAppscicomp for ifarm)
Docs:
https://www.osc.edu/resources/online_portals/ondemand
Getting Started
To get access to the DCC server, you need a PI to add you to their group.
This gives you access to the dcc server as well as the group's directory and resources (GPU nodes).
In our case, ask Prof. Vossen to add you.
Once added, you can log in either through ssh or online using dcc ondemand
ssh address is
dcc-login.oit.duke.edu
e.g. use the command below, subsituting <netID> for your netID
ssh <netID>@dcc-login.oit.duke.edu
This will prompt you to provide your password and complete two-factor authentication
This works exactly as for any other duke login service (e.g. outlook, dukehub)
Optional
SSH key setup
If you want to skip the password and 2 factor step when using ssh, you can setup an ssh key
See the "SSH Keys" section here for more info:
https://dcc.duke.edu/dcc/login/#a-word-on-ssh-clients
In short, you need to generate an ssh key, copy the publish ssh key and add it to your ssh public keys under "advanced user options" at
https://idms-web-selfservice.oit.duke.edu/
Using the DCC
Filesystem
The DCC file system is explained here
https://dcc.duke.edu/dcc/files/
, but below is a brief summary.
There are several different disks, some partitioned and some volatile
/work/<netid>
- 650 TBUnpartitioned disk shared across all users
no backup
files purged after 75 days
/cwork/<netid>
- 850 TBLike work (unpartitioned), but newer and bigger
no backup
files purged after 90 days
/hpc/home/<netid>
- 25 GBYour own home folder that is your home folder when you ssh into the server
no backup
partitioned (each user has their own 25 GB)
NOTE: if your home folder reaches capacity, then you will have many problems...
If you start having problems (can't create a jupyterlab session, etc.) check your home directory capacity
/hpc/group/vossenlab/<netid>
- 1 TBOur group directory - storage shared between all group users
7 day snapshot backup
Use for anything that you need to keep for more than 75 days
OnDemand
dcc ondemand allows you to create jupyterlab, vscode, and RStudio servers for interactive computing
To create a jupyterlab server
navigate to
https://dcc-ondemand-01.oit.duke.edu/pun/sys/dashboard/
Click on "Interactive Apps" and click the "Jupyter Lab" server under "Servers"
Don't adjust the environment setup unless you know what you are doing...
Options
Partition
Common is the default - selects a dcc node that only has CPUs
gpu-common - dcc nodes with GPUs
scavenger - uses other group's nodes when they aren't in use
scavenger-gpu - uses other group's GPU nodes when they aren't in use
vossenlab-gpu - uses our GPU nodes
Walltime, CPUs, Memory, GPUs
Amount of each of these to request
If the configuration you request is unavailable, the website will complain after you click launch
To select a specific vossenlab-gpu node
Add
-w dcc-vossenlab-gpu-0*
to the "Any additional Slurm parameters" sectionE.g.:
-w dcc-vossenlab-gpu-04
This will ensure you get a specific gpu node
Can be helpful if you want to ensure no one else is using VRAM on your node
see more here:
https://dcc.duke.edu/OpenOnDemand/Jupyter/#using-jupyter-lab
Viewing slurm jobs
To view active slurm jobs, click the "Jobs" drop down button on the top ribbon and then "Active Jobs"
In the top right, you can select to view either all jobs or only your jobs (first drop down button)
You can search for your jobs in the "filter" search box
ID, user name, job name, account name, or node name all will filter the listing
Utilizing GPUs
GPUs can be utilized either in a jupyterlab server or via slurm batch jobs
Jupyterlab
See
OnDemand
section above to see how to request a jupyterlab serverEnsure that Partition is set to
vossenlab-gpu
,common-gpu
orscavenger-gpu
, and that "GPUs" field is set to1
Once connected to your server, the gpu should be available - you can verify by using
torch.cuda.is_available()
for pytorch ordevice_lib.list_local_devices()
for tensorflow
Slurm
Slurm batch jobs can request gpu nodes via the same partition scheme as jupyternotebooks
When running a job, add the slurm directive
-p
followed by your desired partition (see OnDemand section for more on partitions)E.g.:
<span class="cm-meta" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #555555;" >#!/bin/bash </span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --chdir=/cwork/path/to/file</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --job-name=my_job</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --output=/cwork/path/to/slurm/output/%x.out</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --error=/cwork/path/to/slurm/error/%x.err</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH -p vossenlab-gpu</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --account=vossenlab</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --cpus-per-task=1</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --mem=8G</span><br >python3 main.py
This job will be run on a vossenlab-gpu node, giving you access to a GPU within your python script (in this case in cwork)
Checking active jobs
Active jobs can be viewed on the OnDemand website (see above) or via squeue
When using squeue with the username flag set, use your netid as your username, e.g.
squeue -u <netid><span id="mce_marker" ></span>