Blog

Bare minimum bring your own model on SageMaker

Updated January 23, 2026

6 minutes

Let's say you receive a notebook from a co-worker with a model and are tasked to get it up and running in a production setting. When running on AWS, you could apply AWS SageMaker for this task. The SageMaker documentation might appear as rather daunting at first, with a wall of text and little example code.

This blog post shows the bare minimum code required to train and deploy a (custom) model on AWS SageMaker. SageMaker also comes with a number of pre-built Docker images, it might be easier to use those in case your framework is supported.

Training and serving components

SageMaker provides a framework to both train and deploy (something they call "serve") your models. Everything is run in Docker containers so let's start with a bare minimum Dockerfile:

FROM continuumio/miniconda3:4.7.10

The images must hold two executables:

an executable named train
an executable named serve

SageMaker will run respectively docker run yourimage train and docker run yourimage serve for training and serving. Therefore, both executables must be present in the image. Disregarding all the infrastructure, the SageMaker setup roughly works as follows:

As you can see in the picture, AWS S3 will hold the training artifacts and load these when serving the model in a volume /opt/ml. SageMaker will mount this volume on the container for persisting any artifacts, and expects a specific directory structure:

/opt/ml
├── input
│   ├── config
│   │   ├── hyperparameters.json
│   │   └── resourceConfig.json
│   └── data
│       └── 
│           └── 
├── model
│   └──

Files like the hyperparameters.json are configurable and automatically mounted to the container but out of scope for this blog post (read more here).

Let's first create a (dummy) training job (this is our train executable):

#!/usr/bin/env python

def train():
    print("Do training...")
    with open("/opt/ml/model/model", "w") as fh:
        fh.write("...model parameters here...")

if __name__ == "__main__":
    train()

The training job (hypothetically) trains the model and writes the output to /opt/ml/model/model. The name in /opt/ml/model/ doesn't matter, but we'll call it model for now. After training, SageMaker archives the /opt/ml contents in an archive named model.tar.gz on S3, which is loaded again when serving. Important to note is this script is executed "as is". Without #!/usr/bin/env python at the top, the OS wouldn't know this is a Python script and wouldn't know what to do with it.

Next, let's check out the serving code. SageMaker offers two variants for deployment: (1) hosting an HTTPS endpoint for single inferences and (2) batch transform for inferencing multiple items. Batch transform is out of scope for this blog post, but only small changes are required to get that up and running.

For HTTPS hosting, SageMaker requires a REST API on port 8080 with two endpoints:

/ping for health checks
/invocations for inference

#!/usr/bin/env python

from flask import Flask, Response

app = Flask(__name__)

@app.route("/ping", methods=["GET"])
def ping():
    return Response(response="n", status=200)

@app.route("/invocations", methods=["POST"])
def predict():
    return Response(response="predict here...", status=200)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080)

Important to note here is the port number; SageMaker requires it to run on 8080.

The Dockerfile with these two scripts and the Flask API looks as follows:

FROM continuumio/miniconda3:4.7.10

RUN pip install flask
COPY serve /usr/local/bin
COPY train /usr/local/bin
EXPOSE 8080

The serve and train scripts are added to /usr/local/bin because this is on the PATH in the miniconda3 image. You can place it in any directory as long as it's on the PATH.

Bringing it together

SageMaker provides a (Python) framework to connect these dots. First, we must push the Docker image to ECR to make it available for SageMaker:

docker build -t helloworld .

account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
$(aws ecr get-login --region ${region} --no-include-email)
aws ecr create-repository --repository-name helloworld

docker tag helloworld "${account}.dkr.ecr.${region}.amazonaws.com/helloworld:latest"
docker push ${account}.dkr.ecr.${region}.amazonaws.com/helloworld:latest

Next, start a SageMaker notebook and run the following code for training the model:

import sagemaker

role = sagemaker.get_execution_role()
session = sagemaker.Session()

account = session.boto_session.client("sts").get_caller_identity()["Account"]
region = session.boto_session.region_name
image = f"{account}.dkr.ecr.{region}.amazonaws.com/helloworld:latest"

model = sagemaker.estimator.Estimator(
    image_name=image,
    role=role,
    train_instance_count=1,
    train_instance_type="ml.c4.2xlarge",
    sagemaker_session=session,
)

model.fit()

The important thing here is the Estimator class, which is a wrapper class for training any supplied algorithm. Provide the correct arguments such as image_name and train_instance_type. SageMaker will spin up a machine with the given type and train your model on it. Note the code above doesn't do much - having to spin up a machine just for running print("Do training...") and saving a dummy file is obviously unnecessary, however a beefy machine is often required for the real job.

The output of this code will look something like this:

2019-10-04 14:30:58 Starting - Starting the training job...
2019-10-04 14:31:00 Starting - Launching requested ML instances......
2019-10-04 14:32:00 Starting - Preparing the instances for training...
2019-10-04 14:32:41 Downloading - Downloading input data
2019-10-04 14:32:41 Training - Downloading the training image..
2019-10-04 14:33:08 Uploading - Uploading generated training model
2019-10-04 14:33:08 Completed - Training job completed
Training seconds: 35
Billable seconds: 35

The result of this is an archive model.tar.gz on AWS S3 holding the artifacts of the model, which could be a pickled model, a plain text file holding the model parameters, or whatever you find useful when loading the model in the serving stage. Let's inspect the code for serving:

predictor = model.deploy(1, 'ml.t2.medium', endpoint_name="helloworld")

Again, this spins up a machine of the given type and exposes the model on a given endpoint name, in this case "helloworld". We can call this endpoint from our notebook:

predictor.predict(data="n")

b'predict here...'

This endpoint is now callable from our Python notebook, to expose it to the outside world you could expose it with a Lambda and API Gateway as explained in this blog post.

Takeaways

Some takeaways I learned getting a custom model up and running on SageMaker:

The SageMaker developer documentation is quite lengthy and it's hard to find concrete examples. A useful resource was the SageMaker examples repository.
train and serve scripts must be executable and executed as Python code, test locally after building with docker run yourimage train or serve.
Your model must be written to /opt/ml/model/....
Your API must be exposed on port 8080.

Hope this helps getting your custom model up and running quickly on SageMaker.

Tags: