Using AlphaFold on Hoffman2

Creation date: 12/20/2023 11:59 AM    Updated: 12/20/2023 12:40 PM
This document provides comprehensive instructions for utilizing AlphaFold within the Hoffman2 High Performance Computing (HPC) environment, leveraging the AlphaFold container, a specialized software package containing AlphaFold. This container can be executed on Hoffman2 through the Apptainer container runtime application.

Downloading Data

Start a job on a compute node

You will need to first start a job to use AlphaFold and Apptainer

qrsh -l h_data=10G,h_rt=12:00:00

This example using 10GB of memory and 12 hour time limit. Note, you can use a non-GPU node if you are just downloading data. You can also use a qsub job script instead to submit this as a non-interactive job

Setting Up the Data Directory

Before initiating AlphaFold, it's essential to download the required datasets. Start by setting the directory for data download:

export DOWNLOAD_DIR=$SCRATCH/alphafoldtest/data


Execute the data download script

Utilize the download scripts provided by AlphaFold for setting up databases. These scripts are located at /app/alphafold/scripts within the container. Refer to the AlphaFold GitHub repository for a detailed list of these scripts.

Run the following command to download all necessary data:

module load apptainer

apptainer exec $H2_CONTAINER_LOC/h2-alphafold.sif /app/alphafold/scripts/download_all_data.sh $DOWNLOAD_DIR


Note: Ensure there is adequate storage space in the specified directory for the downloaded data.

Running AlphaFold

Start a job on a compute node

You will need to first start a job to use AlphaFold and Apptainer

qrsh -l h_data=10G,h_rt=12:00:00,gpu,V100

This example using 10GB of memory and 12 hour time limit, using a V100 GPU compute node. You can also use a qsub job script instead to submit this as a non-interactive job

Setting Environment Variables:

Establish the required environment variables for the data and output directories, as well as the path to your FASTA file:

export DOWNLOAD_DIR=$SCRATCH/alphafoldtest/data
export OUTPUT_DIR=$SCRATCH/alphafoldtest/output
export FASTA_PATHS=test.fasta


Executing AlphaFold:

AlphaFold can be run using the run_alphafold.sh script, located at /app/run_alphafold.sh within the container. Execute AlphaFold with the following command, assuming the use of a GPU for relaxation steps:

module load apptainer

apptainer exec --nv $H2_CONTAINER_LOC/h2-alphafold.sif /app/run_alphafold.sh \
  --fasta_paths=$FASTA_PATHS \
  --max_template_date=2022-01-01 \
  --model_preset=monomer \
  --db_preset=full_dbs \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=$OUTPUT_DIR \
  --use_gpu_relax=TRUE 


Adjust the max_template_date according to your needs. Choose the suitable model_preset and db_preset as per the specific requirements of your project. Verify that the variables FASTA_PATHS, DOWNLOAD_DIR, and OUTPUT_DIR are set correctly.