Creating Computer vision datasets
How to create a new novel datasets from a few set of images.
Recently, I embarked on what I initially thought would be a straightforward journey: deploying a custom BERT-style model, trained on Databricks and packaged by MLflow, to Amazon SageMaker. Little did I know that this endeavor would lead me down a rabbit hole of CUDA errors, SageMaker-specific arguments, and scattered documentation. This two-week journey of dissecting SageMaker internals and scouring the web for information has equipped me with valuable insights that I’m eager to share. Consider this post your guide if you ever find yourself in a similar predicament.
Imagine a BERT-style model trained for token classification (essentially Named Entity Recognition or NER), enhanced with post-processing steps to refine its output by eliminating false positives based on a predefined list of terms. This model was developed on Databricks, leveraging its seamless MLflow integration for experiment tracking and logging. To maintain consistency, we opted to package the model as an MLflow bundle with a custom class encapsulating all necessary post-processing steps, with the intention of deploying it to SageMaker directly via MLflow.
Our initial approach was to use MLflow’s built-in method for creating a SageMaker-compatible image:
mlflow sagemaker build-and-push-container
This command is designed to generate a SageMaker-compatible image for seamless deployment. However, we quickly discovered a significant limitation: the resulting image is CPU-only, unsuitable for our Large Language Model (LLM) needs (Yes, BERT is an LLM). This realization necessitated a pivot to manually creating and pushing our own GPU-enabled image to Amazon Elastic Container Registry (ECR) for SageMaker deployment.
Creating a Dockerfile is typically straightforward, but SageMaker introduces its own set of requirements and nuances. Key questions emerged:
These uncertainties weren’t immediately clear from the available documentation.
Drawing inspiration from a colleague’s SageMaker deployment Dockerfile, which
curiously lacked an entrypoint, I initially opted for nvidia/cuda
as the base
image and attempted to use MLflow for model serving:
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
# Install Python and necessary packages
RUN apt update && apt install -y python3 python3-pip
RUN pip install mlflow==2.15.1 transformers torch sagemaker
# Set the entrypoint
ENTRYPOINT ["mlflow", "models", "serve", "-m", "/opt/ml/model", "-h", "0.0.0.0", "-p", "8080"]
This approach, however, led to an unexpected error regarding an unknown serve
argument. After some investigation and experimentation, including removing the
entrypoint entirely, we encountered more cryptic errors, suggesting deeper
issues with our container configuration.
After hitting multiple dead ends, I decided to take a step back and rethink our
approach. The next day, I opted to drop the use of MLflow for serving and
instead create a custom API inside the Docker image using FastAPI. We would
still use MLflow to load the model file, as our post-processing logic was
defined there. This approach allowed me to test the image locally, and it worked
fine. However, once deployed to SageMaker, we encountered the same serve
error.
This persistent issue prompted a deeper investigation into how SageMaker actually starts these containers. The AWS documentation, unfortunately, didn’t provide a clear, consolidated answer. Ironically, it was AWS’s Q AI that provided the crucial information, outlining exactly how SageMaker starts containers and what the requirements are for a container to work properly.
The key revelation was that SageMaker passes the serve
argument when starting
the container:
docker run image_name --volume /path/to/model:/opt/ml/model serve
This insight was a game-changer. It explained why our previous attempts failed and provided a clear direction for our solution. To handle this, we needed to make our entrypoint a shell script that could correctly consume and handle passed arguments.
Through further research and experimentation (mostly bugging Q about it), we pieced together the following requirements for a SageMaker-compatible container:
API Endpoints: The container must expose two specific endpoints:
/invocations
for handling prediction requests/ping
for health checksPort Binding: The Docker image must be labeled to accept port binding, as SageMaker dynamically changes the deployment port.
Port Configuration: The server inside the container should use the port
specified in the SAGEMAKER_BIND_TO_PORT
environment variable.
Custom Inference Script: We need to inform SageMaker that we’re using a
custom inference script by specifying its name as an environment variable
when creating the Model
object.
from sagemaker.model import Model
model = Model(
image_uri='224534533583.dkr.ecr.us-east-1.amazonaws.com/custom-image:tag',
role='arn:aws:iam::224534533583:role/service-role/AmazonSageMaker-ExecutionRole-20211224T123456',
model_data='s3://bucket/model.tar.gz',
env={'SAGEMAKER_PROGRAM': 'inference.py'} # Custom inference script
)
These requirements led us to create the following files:
#!/bin/bash
# Set the default port to existing sagemaker set environment variable or default to 8080
PORT=${SAGEMAKER_BIND_TO_PORT:-8080}
# start the API server
# yes the use of exec is actually necessary here
exec uvicorn inference:app --host 0.0.0.0 --port $PORT --workers 4
import mlflow.pyfunc
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Dict, Any
app = FastAPI()
# Load the model from the default SageMaker model directory
model = mlflow.pyfunc.load_model("/opt/ml/model")
class PredictionRequest(BaseModel):
text: List[str]
class PredictionResponse(BaseModel):
predictions: List[Dict[str, Any]]
@app.get("/ping")
def ping():
return {"status": "ok"}
@app.post("/invocations", response_model=PredictionResponse)
def predict(request: PredictionRequest):
try:
predictions = model.predict(request.text)
return PredictionResponse(predictions=predictions)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
if __name__ == '__main__':
import uvicorn
import os
port = os.environ.get("SAGEMAKER_BIND_TO_PORT", 8080)
uvicorn.run(app, host='0.0.0.0', port=port)
mlflow==2.15.1
cloudpickle==2.2.1
fastapi
uvicorn
pydantic
transformers
Even with these requirements in place, we faced two more significant hurdles:
CUDA Version Issues: We found that the CUDA version of the Docker image needed to match (at least on major versions) with the instance type’s CUDA version. This was particularly challenging as CUDA versions for instance types are not documented anywhere as usual.
Python Version Compatibility: We discovered that the Python version, along with some dependencies like MLflow and cloudpickle, needed to match the versions used during model packaging. This led to our decision to use pyenv in the Dockerfile to ensure we had the correct Python version.
These additional considerations resulted in our final, more complex Dockerfile:
FROM nvidia/cuda:11.4.3-runtime-ubuntu20.04
# sagemaker labels
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
# install system build dependencies (required by pyenv)
ENV HOME="/root"
ENV DEBIAN_FRONTEND=noninteractive
WORKDIR ${HOME}
RUN apt update && apt install -y \
build-essential \
curl \
git \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
lzma \
liblzma-dev \
libbz2-dev \
wget \
xz-utils \
tk-dev \
libffi-dev \
python3-dev \
gnupg
# cleanup to reduce image size
RUN apt clean && rm -rf /var/lib/apt/lists/*
# install pyenv
RUN git clone --depth=1 https://github.com/pyenv/pyenv.git .pyenv
ENV PYENV_ROOT="${HOME}/.pyenv"
ENV PATH="${PYENV_ROOT}/shims:${PYENV_ROOT}/bin:${PATH}"
# install correct python version
ENV PYTHON_VERSION=3.11
RUN pyenv install ${PYTHON_VERSION}
RUN pyenv global ${PYTHON_VERSION}
# install compatible pytorch version
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install necessary Python packages
COPY ./requirements.txt .
RUN pip install -r requirements.txt
# copy code
COPY . /app
WORKDIR /app
# Define the entry point for the container
RUN chmod +x entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]
With the Dockerfile and associated files in place, we were ready to build and push the image to Amazon ECR. The final directory structure looked like this, with each file serving a specific purpose:
.
├── Dockerfile # Contains all the build instructions and environment setup
├── entrypoint.sh # Script that handles container startup and server initialization
├── inference.py # FastAPI application with SageMaker-required endpoints
└── requirements.txt # Python package dependencies needed for the model
To build and push our image to ECR, we used an EC2 instance with sufficient resources. make sure you have docker and AWS cli installed and configured on the instance, Here’s the step-by-step build process:
aws ecr create-repository --repository-name custom-image --region us-east-1
docker build -t 224534533583.dkr.ecr.us-east-1.amazonaws.com/custom-image:tag .
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 224534533583.dkr.ecr.us-east-1.amazonaws.com
docker push 224534533583.dkr.ecr.us-east-1.amazonaws.com/custom-image:tag
Make sure to replace the account ID (224534533583
) and region with your own.
The process might take a while depending on the image size and network speed
(very long for cuda based images).
With the image successfully pushed to ECR, we were ready to deploy the model to
SageMaker. We used the following Python code to create the Model
object and
deploy it:
from sagemaker.model import Model
# Define your Docker image URI and S3 path to model artifacts
docker_image_uri = '224534533583.dkr.ecr.us-east-1.amazonaws.com/custom-image:tag'
model_artifacts = f"s3://bucket_name/path/to/model.tar.gz"
role_arn = "arn:aws:iam::224534533583:role/service-role/AmazonSageMaker-ExecutionRole-20211224T123456"
# Create a SageMaker model
model = Model(
image_uri=docker_image_uri,
model_data=model_artifacts,
role=role_arn,
sagemaker_session=sagemaker_session,
env={'SAGEMAKER_PROGRAM': 'inference.py'}
)
# Deploy the model to an endpoint
model.deploy(
initial_instance_count=1,
instance_type='ml.p3.2xlarge' # cuda 11.4
)
With the model deployed, we can finally test it by sending a sample request to the endpoint. We used the following Python code to send a sample request to the deployed endpoint:
import boto3
import json
# Create a SageMaker runtime client
client = boto3.client('sagemaker-runtime')
# Define the endpoint name (can be found in the SageMaker console)
# generally in the form of 'custom-image-endpoint-20211224123456'
endpoint_name = 'custom-image-endpoint'
# Define the sample request
request = {
'text': ['This is a sample text']
}
# Send the request to the endpoint
response = client.invoke_endpoint(
EndpointName=endpoint_name,
Body=json.dumps(request),
ContentType='application/json'
)
# Parse the response
response_body = json.loads(response['Body'].read().decode())
print(json.dumps(response_body, indent=2))
This journey through the intricacies of custom SageMaker deployment has been both challenging and enlightening. It underscores the complexity of deploying sophisticated machine learning models in cloud environments and the importance of understanding the underlying infrastructure. Key takeaways from this experience include the critical nature of proper documentation (if anyone from AWS is reading this), and the value of persistence in problem-solving. We’ve learned that successful deployment often requires a deep dive into system internals, a willingness to experiment, and the ability to piece together information from various sources. This process has not only resulted in a working solution but has also equipped us with valuable knowledge for future deployments. Developing the model that was deployed is another story for another blog post when I’m bored enough to recreate it with non proprietary data (brainrot classifier?).
How to create a new novel datasets from a few set of images.
Data Science Project
Data Science Project
Data Science Project