DCCGuide

From vossengroup.wiki.duke.edu
Jump to navigation Jump to search

DCC Resources

  1. DCC documentation home:

https://dcc.duke.edu/

  1. DCC ondemand:

https://dcc-ondemand-01.oit.duke.edu/pun/sys/dashboard/

    1. Interactive sessions (jupyterlab)

    2. Interactive Job viewer (like WebbAppscicomp for ifarm)

    3. Docs:

https://www.osc.edu/resources/online_portals/ondemand


Getting Started

To get access to the DCC server, you need a PI to add you to their group.

  1. This gives you access to the dcc server as well as the group's directory and resources (GPU nodes).

  2. In our case, ask Prof. Vossen to add you.

Once added, you can log in either through ssh or online using dcc ondemand

  1. ssh address is dcc-login.oit.duke.edu

  2. e.g. use the command below, subsituting <netID> for your netID

  3. ssh <netID>@dcc-login.oit.duke.edu

  4. This will prompt you to provide your password and complete two-factor authentication

    1. This works exactly as for any other duke login service (e.g. outlook, dukehub)

Optional

SSH key setup

  1. If you want to skip the password and 2 factor step when using ssh, you can setup an ssh key

  2. See the "SSH Keys" section here for more info:

https://dcc.duke.edu/dcc/login/#a-word-on-ssh-clients

  1. In short, you need to generate an ssh key, copy the publish ssh key and add it to your ssh public keys under "advanced user options" at

https://idms-web-selfservice.oit.duke.edu/

Using the DCC

Filesystem

The DCC file system is explained here

https://dcc.duke.edu/dcc/files/

, but below is a brief summary.

There are several different disks, some partitioned and some volatile

  1. /work/<netid> - 650 TB

    1. Unpartitioned disk shared across all users

    2. no backup

    3. files purged after 75 days

  2. /cwork/<netid> - 850 TB

    1. Like work (unpartitioned), but newer and bigger

    2. no backup

    3. files purged after 90 days

  3. /hpc/home/<netid> - 25 GB

    1. Your own home folder that is your home folder when you ssh into the server

    2. no backup

    3. partitioned (each user has their own 25 GB)

    4. NOTE: if your home folder reaches capacity, then you will have many problems...

      1. If you start having problems (can't create a jupyterlab session, etc.) check your home directory capacity

  4. /hpc/group/vossenlab/<netid> - 1 TB

    1. Our group directory - storage shared between all group users

    2. 7 day snapshot backup

    3. Use for anything that you need to keep for more than 75 days

OnDemand

dcc ondemand allows you to create jupyterlab, vscode, and RStudio servers for interactive computing

To create a jupyterlab server
  1. navigate to

https://dcc-ondemand-01.oit.duke.edu/pun/sys/dashboard/

  1. Click on "Interactive Apps" and click the "Jupyter Lab" server under "Servers"

  2. Don't adjust the environment setup unless you know what you are doing...

  3. Options

    1. Partition

      1. Common is the default - selects a dcc node that only has CPUs

      2. gpu-common - dcc nodes with GPUs

      3. scavenger - uses other group's nodes when they aren't in use

      4. scavenger-gpu - uses other group's GPU nodes when they aren't in use

      5. vossenlab-gpu - uses our GPU nodes

    2. Walltime, CPUs, Memory, GPUs

      1. Amount of each of these to request

      2. If the configuration you request is unavailable, the website will complain after you click launch

    3. To select a specific vossenlab-gpu node

      1. Add -w dcc-vossenlab-gpu-0* to the "Any additional Slurm parameters" section

        1. E.g.: -w dcc-vossenlab-gpu-04

      2. This will ensure you get a specific gpu node

        1. Can be helpful if you want to ensure no one else is using VRAM on your node

see more here:

https://dcc.duke.edu/OpenOnDemand/Jupyter/#using-jupyter-lab

Viewing slurm jobs

To view active slurm jobs, click the "Jobs" drop down button on the top ribbon and then "Active Jobs"

  1. In the top right, you can select to view either all jobs or only your jobs (first drop down button)

  2. You can search for your jobs in the "filter" search box

    1. ID, user name, job name, account name, or node name all will filter the listing

Utilizing GPUs

GPUs can be utilized either in a jupyterlab server or via slurm batch jobs

  1. Jupyterlab

    1. See OnDemand section above to see how to request a jupyterlab server

    2. Ensure that Partition is set to vossenlab-gpu, common-gpu or scavenger-gpu, and that "GPUs" field is set to 1

    3. Once connected to your server, the gpu should be available - you can verify by using torch.cuda.is_available() for pytorch or device_lib.list_local_devices() for tensorflow

  2. Slurm

    1. Slurm batch jobs can request gpu nodes via the same partition scheme as jupyternotebooks

    2. When running a job, add the slurm directive -p followed by your desired partition (see OnDemand section for more on partitions)

    3. E.g.:

      1. <span class="cm-meta" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #555555;" >#!/bin/bash  </span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --chdir=/cwork/path/to/file</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --job-name=my_job</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --output=/cwork/path/to/slurm/output/%x.out</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --error=/cwork/path/to/slurm/error/%x.err</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH -p vossenlab-gpu</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --account=vossenlab</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --cpus-per-task=1</span><br ><span class="cm-comment" style="box-sizing: border-box; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-scroll-snap-strictness: proximity; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; color: #aa5500;" >#SBATCH --mem=8G</span><br >python3 main.py
      2. This job will be run on a vossenlab-gpu node, giving you access to a GPU within your python script (in this case in cwork)

    4. Checking active jobs

      1. Active jobs can be viewed on the OnDemand website (see above) or via squeue

        1. When using squeue with the username flag set, use your netid as your username, e.g.

        2. squeue -u <netid><span id="mce_marker" ></span>