Launch GPUs using CLI to run an inference model

This blog will explain how to easily access NVIDIA H100 GPUs using SF Compute and run an inference model using a Cog container. We'll use a ResNet50 model for image classification as an example.

SF Compute allows anyone to get flexible access to GPU using the CLI, while Cog enables you to package machine-learning models in a production-ready container. This blog post explains how developers can quickly start serving inferences on H100 GPUs.

Let's first set up a node using SF Compute's CLI. This will deploy an 8xH100s instance with SF Compute's VM.

To get started, sign up at sfcompute.com. Once your account is active, install SF Compute's CLI by following these steps.

curl -fsSL https://sfcompute.com/cli/install | bash

Then, login to the CLI.

sf login # this will open a browser window

Once authenticated, use this command to automatically generate SSH keys, which will be used later when launching SF Compute instances.

sf ssh --init

SF Compute provides flexible access to compute. You can get access to H100 nodes for a couple of hours or longer time periods. In this blog, we will buy a node for two hours, because our inference model is not going to take that long to produce output.

# Buy 2 hours, starting ASAP, on an H100 with IB (H100i)
sf buy -t h100i -d "2hr"

# Buy 2 hours, starting 1 hour from now
sf buy -t h100i -d "1hr" -s "+1hr"

Once you have placed an order, we will work on getting your instance ready for you. If the requested instance quantity is available, we will fill your contract, and you can see the status of your contract using:

sf contracts ls

Contract status

Once the contract is active and instance order is filled, you will see an output like this by issuing the sf instances ls command.

Instance status

Once your instance is up, you can SSH into it with the following command. Default username is "ubuntu"

sf ssh <your instance id>

At this point, you would have logged into your node and ready to start putting your model to work.

Let's now look into how you can run Cog on your newly created node.

Step 1: Install Cog

First, let's install Cog:

sudo curl -o /usr/local/bin/cog -L \
  https://github.com/replicate/cog/releases/\
latest/download/cog_$(uname -s)_$(uname -m)
sudo chmod +x /usr/local/bin/cog

Step 2: Set Up Your Project

Create a new directory for your project:

mkdir cog_inference_project
cd cog_inference_project

Step 3: Create the Cog Configuration File

Create a file named cog.yaml with the following content:

build:
  gpu: true
  system_packages:
    - "python3-pip"
    - "python3-dev"
  python_version: "3.10"
  python_packages:
    - "torch"
    - "torchvision"
    - "Pillow==9.5.0"
    - "requests==2.26.0"
  run:
    - pip install --upgrade pip

predict: "predict.py:Predictor"

This configuration is set up for CUDA 12.2 and Python 3.10, but it's flexible enough to work with other recent versions.

Step 4: Create the Predictor Script

Create a file named predict.py with the following content:

from cog import BasePredictor, Input, Path
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import requests

class Predictor(BasePredictor):
    def setup(self):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        print(f"Using device: {self.device}")
        self.model = models.resnet50(pretrained=True).to(self.device)
        self.model.eval()

        self.transform = transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        ])

        LABELS_URL = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
        self.labels = requests.get(LABELS_URL).json()

    def predict(self, image: Path = Input(description="Image to classify")) -> str:
        img = Image.open(image).convert('RGB')
        img_tensor = self.transform(img).unsqueeze(0).to(self.device)

        with torch.no_grad():
            output = self.model(img_tensor)

        _, predicted_idx = torch.max(output, 1)
        predicted_label = self.labels[predicted_idx.item()]

        return f"The image is classified as: {predicted_label}"

Step 5: Download weights

WEIGHTS_URL=https://storage.googleapis.com/tensorflow/\
keras-applications/resnet/\
resnet50_weights_tf_dim_ordering_tf_kernels.h5
curl -O $WEIGHTS_URL

Step 6: Download a Sample Image

Download a sample image to test your model:

wget https://upload.wikimedia.org/wikipedia/commons/thumb/3/\
3a/Cat03.jpg/1200px-Cat03.jpg -O cat.jpg

Step 7: Run Inference

Now you can run inference on your sample image:

cog predict -i image=@cat.jpg

You should see output indicating the classification of the image. The output should read whatever your model classifies the cat as, for example, "This is a tiger cat."

Troubleshooting

If you encounter any issues:

Ensure Docker is running: sudo systemctl status docker
You may have to upgrade your docker version if you are seeing errors when running cog predict command
Check CUDA availability: nvidia-smi
Verify Python version: python3 --version

Conclusion

You've now successfully set up and run an inference model using Cog on 8 NVIDIA H100 GPUs using just the CLI! This setup provides a flexible and reproducible environment for your machine learning projects. You can modify the predict.py script to use different models or adjust the preprocessing steps as needed for your specific use case.