Accounts
Cannot Connect to Hoffman2
User Accounts
Other
Applications, Compilers, and Libraries
AI, ML, and GPU programming
Adapting an App for Use on Hoffman2
Code/Programming/Scripts
Jupyter on Hoffman2
Licensed Software
Software Installation Request (Systemwide)
Software Installation Help (Project/Home directory)
Remote Desktop (NX/X2Go)
Other
Job Errors/Job Scheduler
How to Submit a Job
Job Not Starting
Job Status
Other
Node and Storage Purchases
Node and Storage Purchases
Other
Storage and Filesystems
Access / Permissions / Quotas
Cloud Storage (AWS, Box, etc.)
File Transfer and Globus
General Requests
Other
Policies

Using Ollama on Hoffman2

Creation date: 12/19/2024 4:11 PM Updated: 12/19/2024 4:11 PM

Ollama is a tool for running and managing Large Language Models (LLMs). On Hoffman2, you can run Ollama via an Apptainer container that has been prepared by the system administrators. Ollama provides a command-line interface and can also run a web-based interface (Open WebUI) for interacting with LLMs through a browser. The prepared Ollama container can run Open WebUI interface using the Ollama service. All locally from Hoffman2. This is a great option to run LLM models locally on Hoffman2, without needing to use a commercial/enterprise API license that computes the LLM on their servers.

Getting Started

1. Allocate a GPU node

To use Ollama, you will need access to a GPU node. While Ollama can run on CPU nodes, GPU nodes will provide significantly better performance. This command will give you access to one A100 GPU

qrsh -l h_data=10G,gpu,A100,cuda=1

2. Load the Apptainer module

module load apptainer

Apptainer is the software that is used to run containerize software. Ollama (and Open WebUI) that is used here is in a container that will be ran with Apptainer.

3. Start the Ollama container

The prepared container is located at $H2_CONTAINER_LOC/h2-ollama.sif. Starting the container as an “instance” will also start the Ollama server in the background.

apptainer instance run --nv $H2_CONTAINER_LOC/h2-ollama.sif myollama

Here, myollama is the name you assign to the instance. You can choose any name.

4. Check the running instance

apptainer instance list

This will show all running instances. In this case, the instance used was 'myollama' and it should be output here.

5. View instance logs

apptainer instance list --logs

This will show you the locations of the stdout and stderr logs for debugging if needed. These logs can give you information on the ports running the ollama services and other information you may need.

Using Ollama

Once the instance is running, you can execute Ollama commands inside the container using apptainer exec.

- Pull a model (e.g., llama3.2):

apptainer exec instance://myollama ollama pull llama3.2

- Pull a model from Hugging Face:

apptainer exec instance://myollama ollama pull hf.co/lmstudio-community/Llama-3.1-Nemotron-70B-Instruct-HF-GGUF

- List downloaded models:

apptainer exec instance://myollama ollama list

- Start a chat session with a model:

apptainer exec instance://myollama ollama run llama3.2

- Run a single prompt (e.g., summaring a file):

Suppose you have a file text.txt. You can have ollama summarize its contents as follows:

apptainer exec instance://myollama ollama run llama3.2 "Summarize this file: $(cat test.txt)"

All in all, you can run all ollama command by adding "apptainer exec instance://myollama" in front of the ollama command

apptainer exec instance://myollama ollama [command]

Stopping the Ollama Instance

When you are done, stop the Ollama instance:

apptainer instance stop myollama

Model Storage

By default, models are stored in $HOME/ollama_models. Once pulled, they remain there even after you stop the Ollama instance. To change this directory, set the OLLAMA_MODELS environment variable before starting the container instance:

export OLLAMA_MODELS=$SCRATCH/ollama_models

Configuring Ports

Ollama uses port 11434 by default. If 11434 is unavailable, the Ollama container will try subsequent ports until it finds an open one. To specify a port manually, set the ollama_port environment variable before starting the instance:

export ollama_port=11434

Using the Open Web UI Interface

To use Ollama with an Open Web UI, start the instance with the openwebui option:

apptainer instance run --nv $H2_CONTAINER_LOC/h2-ollama.sif myollama openwebui

By default, the Open Web UI listens on port 8081, but it will try other ports if 8081 is not available. Check the instance logs to see the exact port used.

SSH Port Forwading

To access the Open WebUI from your local machine, you will need your local machine access to the port on the compute node running Open WebUI. The best way to do this is with SSH tunneling:

1. Set up a new terminal new with SSH port forwarding to the compute node:

ssh -L PORT:node_name:PORT username@hoffman2.idre.ucla.edu

Replace 'PORT' with the 'port' running Open Web UI, node_name with the compute node running the Open Web UI. You may want to run the command 'hostname' on the Hoffman2 compute node to get the node_name.

For example:

ssh -L 8081:gX:8081 username@hoffman2.idre.ucla.edu

2. Once this ssh is successfully, then on your local machine, open a web browser and go to:

http://localhost:PORT

where 'PORT' is the port number running Open WebUI. The page loaded will by your running Open WebUI!

The first time you connect, you will need to create a username/email and password. These credentials will be stored in your $HOME/webui/data directory on Hoffman2. Once set, no additional accounts can be created by default.

Changing the Data Directory

By default, Open Web UI stores its data at $HOME/webui/data. You can change this by setting the DATA_DIR environment variable before starting the container:

export DATA_DIR=$SCRATCH/webui/data

Changing the Open Web UI port

Similarly, you can change the default Open Web UI port before starting the container:

export webui_port=8081

If the port is unavailable, the UI will try other ports automatically.

You will want to check the apptainer instance logs to see the exact port numbers used.

Creating a custom Ollama container

The Ollama container provided on Hoffman2 was built using Apptainer by the Hoffman2 staff. You can also create your own custom Ollama container if you require additional software or customization. The definition file used to create the Hoffman2 Ollama container is available on our HPC github page.

- Download this definition file

- Customize the definition file as needed to install additional packages or modify the environment.

- Build your new container:

apptainer build my-new-ollama.sif h2-ollama.def

his will create a new container named my-new-ollama.sif. You can then use this new container with the same commands described above, simply replacing $H2_CONTAINER_LOC/h2-ollama.sif with my-new-ollama.sif.

Summary (TL;DR)

Start Ollama:

apptainer instance run --nv $H2_CONTAINER_LOC/h2-ollama.sif myollama

Pull a Model:

apptainer exec instance://myollama ollama pull llama3.2

Run a Model

apptainer exec instance://myollama ollama run llama3.2

Stop Ollama:

apptainer instance stop myollama

Start Ollama with Open Web UI:

apptainer instance run --nv $H2_CONTAINER_LOC/h2-ollama.sif myollama openwebui

Then, port foward to access the UI:

ssh -L PORT:node_name:PORT username@hoffman2.idre.ucla.edu

Open in browser:

http://localhost:8081

Set environment variables before starting for customization:

export DATA_DIR=$SCRATCH/webui/data
export OLLAMA_MODELS=$HOME/ollama_models
export webui_port=8081
export ollama_port=11434

For any help about the usage of Ollama on Hoffman2, please contact Hoffman2 support.