avatar  


Recently viewed tickets

Log out

Loading PyTorch on Hoffman2

In this article:
  • Running PyTorch on nodes on the newest version of the OS
  • Installing PyTorch to your $HOME on the newest version of the OS
  • Running PyTorch on nodes on the current production version of the OS


Running PyTorch on nodes on the newest version of the OS

Updated on Aug 16, 2021

To load the CPU-version of PyTorch (v1.9.0) on a CentOS-7 CPU node:


$ qrsh -l rh7
$ if command -v conda &> /dev/null; then conda deactivate; fi
$ module load anaconda3
$ . $CONDA_DIR/etc/profile.d/conda.sh
$ conda activate pytorch-1.9.0-cpu
(pytorch-1.9.0-cpu) $ python -c "import torch; print(torch.__version__)"
1.9.0
(pytorch-1.9.0-cpu) $ python -c "import torchvision; print(torchvision.__version__)" 
0.10.0
(pytorch-1.9.0-cpu) $ conda deactivate
$ exit

To load the GPU-version of PyTorch (v1.9.0) on a CentOS-7 GPU node:


$ qrsh -l rh7,gpu,RTX2080Ti
$ if command -v conda &> /dev/null; then conda deactivate; fi
$ module load anaconda3
$ . $CONDA_DIR/etc/profile.d/conda.sh
$ conda activate pytorch-1.3.1-gpu
(pytorch-1.9.0-gpu) $ python -c "import torch; print(torch.__version__)"
1.9.0
(pytorch-1.9.0-gpu) $ python -c "import torchvision; print(torchvision.__version__)" 
0.10.0
(pytorch-1.9.0-gpu) $ python -c "import torchtext; print(torchtext.__version__)"
0.10.0
(pytorch-1.9.0-gpu) $ python -c "import torch; print(torch.version.cuda)"
10.2
(pytorch-1.9.0-gpu) $ conda deactivate
$ exit



Installing PyTorch on nodes on the newest version of the OS

Updated on Apr 13, 2021

Several nodes on the Hoffman2 Cluster are running a more recent version of the Linux operating system (CentOS 7.X). To access these nodes you need to request the "rh7" complex to your intetractive/batch job request. You can do so with:

qrsh/qsub -l rh7 # add any other needed resource (runtime, memory,cores, etc.)


Install/run pytorch on CPU nodes


Start with requesting an interactive session on a node on the next version of the OS (specify rh7) with, for example:  5GB of memory (h_data=5G) and a runtime of 2 hours (h_rt=2:00:00):


qrsh -l rh7,h_data=5G,h_rt=2:00:00

when the prompt return on the compute node, issue:

pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 \

now invoke python3:


python3


test the installation at the python prompt issue:


import torch
x = torch.rand(5, 3)
print(x)


Install/run pytorch on GPU nodes


Request an interactive session on a GPU node, for example with:

qrsh -l rh7,gpu,RTX2080Ti,h_rt=5:00:00,h_data=5g

when the prompt returns on the GPU node, issue:

module load cuda/10.2
pip3 install torch torchvision torchaudio --user

start python3:

python3

to test the installation at the python prompt issue:
import torch
x = torch.rand(5, 3)
print(x)

A note on installing python packages on nodes on the next version of the OS

To prevent conflicts between packages installed on nodes on the older version of the OS and on the new consider moving your:

$HOME/.local

to:

$HOME/.local_rh6


and then, if using python on the nodes on the older version of the OS, remember to set:

# for bash/sh type shells:
export PYTHONUSERBASE=$HOME/.local_rh6

# for csh/tcsh type shells:
setenv PYTHONUSERBASE $HOME/.local_rh6


Running PyTorch on nodes on the current production version of the OS

Updated on Mar 18, 2021

Unfortunately Hoffman2 cannot run the prebuilt PyTorch provided in Anaconda's pytorch channel due to its old system libs. Here are some options that a user can consider to use directly: 

NOTE: the following examples are only for illustration to load the PyTorch environment with a minimum resource request (one core with 1G Memory for 2 hours). Your actual program may need much more than that. It's recommended to go over the information linked below for details on how to request the necessary computational resources:


To load the CPU-version of PyTorch (v1.5.0):


$ qrsh
$ if command -v conda &> /dev/null; then conda deactivate; fi
$ module load python/anaconda3
$ . "/u/local/apps/anaconda3/etc/profile.d/conda.sh"
$ conda activate pytorch-1.5.0-cpu
(pytorch-1.5.0-cpu) $ python -c "import torch; print(torch.__version__)"
1.5.0
(pytorch-1.5.0-cpu) $ python -c "import torchvision; print(torchvision.__version__)" 
0.2.1
(pytorch-1.5.0-cpu) $ conda deactivate
$ exit


To load the GPU-version of PyTorch (v1.3.1):


$ qrsh -l gpu,P4
$ if command -v conda &> /dev/null; then conda deactivate; fi
$ module load python/anaconda3
$ . "/u/local/apps/anaconda3/etc/profile.d/conda.sh"
$ conda activate pytorch-1.3.1-gpu
(pytorch-1.3.1-gpu) $ python -c "import torch; print(torch.__version__)"
1.3.1
(pytorch-1.3.1-gpu) $ python -c "import torchvision; print(torchvision.__version__)" 
0.4.2
(pytorch-1.3.1-gpu) $ python -c "import torchtext; print(torchtext.__version__)"
0.6.0
(pytorch-1.3.1-gpu) $ python -c "import torch; print(torch.version.cuda)"
10.0.130
(pytorch-1.3.1-gpu) $ conda deactivate
$ exit


Running PyTorch program as a batch job


NOTE: It's recommended that you first make sure everything runs all right under in the interactive session and then consider to run it as a batch job. 

You can wrap up your python commands for PyTorch into a bash script, like a simple example below:


#!/bin/bash

if command -v conda &> /dev/null; then conda deactivate; fi
source /u/local/Modules/default/init/modules.sh
module load python/anaconda3
. "/u/local/apps/anaconda3/etc/profile.d/conda.sh"

#conda activate pytorch-1.5.0-cpu
conda activate pytorch-1.3.1-gpu

# Replace the 4 lines below with your python commands.
python -c "import torch; print(torch.__version__)"
python -c "import torchvision; print(torchvision.__version__)"
python -c "import torchtext; print(torchtext.__version__)"
python -c "import torch; print(torch.version.cuda)"

conda deactivate


Saving the script as an executable file to the proper working directories, you can submit it via queue script as shown in the page.
Creation date: 6/16/2020 4:32 PM (huqy)      Updated: 8/16/2021 10:18 AM (huqy)