Let’s say you receive a notebook from a co-worker with a model and are tasked to get it up and running in a production setting. When running on AWS, you could apply AWS SageMaker for this task. The SageMaker documentation might appear as rather daunting at first, with a wall of text and little example code.
This blog post shows the bare minimum code required to train and deploy a (custom) model on AWS SageMaker. SageMaker also comes with a number of pre-built Docker images, it might be easier to use those in case your framework is supported.
Training and serving components
SageMaker provides a framework to both train and deploy (something they call "serve") your models. Everything is run in Docker containers so let’s start with a bare minimum Dockerfile:
FROM continuumio/miniconda3:4.7.10
The images must hold two executables:
- an executable named
train
- an executable named
serve
SageMaker will run respectively docker run yourimage train
and docker run yourimage serve
for training and serving. Therefore, both executables must be present in the image. Disregarding all the infrastructure, the SageMaker setup roughly works as follows:
As you can see in the picture, AWS S3 will hold the training artifacts and load these when serving the model in a volume /opt/ml
. SageMaker will mount this volume on the container for persisting any artifacts, and expects a specific directory structure:
/opt/ml ├── input │ ├── config │ │ ├── hyperparameters.json │ │ └── resourceConfig.json │ └── data │ └──│ └── ├── model │ └──
Files like the hyperparameters.json are configurable and automatically mounted to the container but out of scope for this blog post (read more here).
Let’s first create a (dummy) training job (this is our train
executable):
#!/usr/bin/env python def train(): print("Do training...") with open("/opt/ml/model/model", "w") as fh: fh.write("...model parameters here...") if __name__ == "__main__": train()
The training job (hypothetically) trains the model and writes the output to /opt/ml/model/model
. The name in /opt/ml/model/
doesn’t matter, but we’ll call it model
for now. After training, SageMaker archives the /opt/ml
contents in an archive named model.tar.gz
on S3, which is loaded again when serving. Important to note is this script is executed "as is". Without #!/usr/bin/env python
at the top, the OS wouldn’t know this is a Python script and wouldn’t know what to do with it.
Next, let’s check out the serving code. SageMaker offers two variants for deployment: (1) hosting an HTTPS endpoint for single inferences and (2) batch transform for inferencing multiple items. Batch transform is out of scope for this blog post, but only small changes are required to get that up and running.
For HTTPS hosting, SageMaker requires a REST API on port 8080 with two endpoints:
/ping
for health checks/invocations
for inference
#!/usr/bin/env python from flask import Flask, Response app = Flask(__name__) @app.route("/ping", methods=["GET"]) def ping(): return Response(response="n", status=200) @app.route("/invocations", methods=["POST"]) def predict(): return Response(response="predict here...", status=200) if __name__ == "__main__": app.run(host="0.0.0.0", port=8080)
Important to note here is the port number; SageMaker requires it to run on 8080.
The Dockerfile with these two scripts and the Flask API looks as follows:
FROM continuumio/miniconda3:4.7.10 RUN pip install flask COPY serve /usr/local/bin COPY train /usr/local/bin EXPOSE 8080
The serve
and train
scripts are added to /usr/local/bin
because this is on the PATH
in the miniconda3 image. You can place it in any directory as long as it’s on the PATH
.
Bringing it together
SageMaker provides a (Python) framework to connect these dots. First, we must push the Docker image to ECR to make it available for SageMaker:
docker build -t helloworld . account=$(aws sts get-caller-identity --query Account --output text) region=$(aws configure get region) $(aws ecr get-login --region ${region} --no-include-email) aws ecr create-repository --repository-name helloworld docker tag helloworld "${account}.dkr.ecr.${region}.amazonaws.com/helloworld:latest" docker push ${account}.dkr.ecr.${region}.amazonaws.com/helloworld:latest
Next, start a SageMaker notebook and run the following code for training the model:
import sagemaker role = sagemaker.get_execution_role() session = sagemaker.Session() account = session.boto_session.client("sts").get_caller_identity()["Account"] region = session.boto_session.region_name image = f"{account}.dkr.ecr.{region}.amazonaws.com/helloworld:latest" model = sagemaker.estimator.Estimator( image_name=image, role=role, train_instance_count=1, train_instance_type="ml.c4.2xlarge", sagemaker_session=session, ) model.fit()
The important thing here is the Estimator class, which is a wrapper class for training any supplied algorithm. Provide the correct arguments such as image_name
and train_instance_type
. SageMaker will spin up a machine with the given type and train your model on it. Note the code above doesn’t do much – having to spin up a machine just for running print("Do training...")
and saving a dummy file is obviously unnecessary, however a beefy machine is often required for the real job.
The output of this code will look something like this:
2019-10-04 14:30:58 Starting - Starting the training job... 2019-10-04 14:31:00 Starting - Launching requested ML instances...... 2019-10-04 14:32:00 Starting - Preparing the instances for training... 2019-10-04 14:32:41 Downloading - Downloading input data 2019-10-04 14:32:41 Training - Downloading the training image.. 2019-10-04 14:33:08 Uploading - Uploading generated training model 2019-10-04 14:33:08 Completed - Training job completed Training seconds: 35 Billable seconds: 35
The result of this is an archive model.tar.gz
on AWS S3 holding the artifacts of the model, which could be a pickled model, a plain text file holding the model parameters, or whatever you find useful when loading the model in the serving stage. Let’s inspect the code for serving:
predictor = model.deploy(1, 'ml.t2.medium', endpoint_name="helloworld")
Again, this spins up a machine of the given type and exposes the model on a given endpoint name, in this case "helloworld"
. We can call this endpoint from our notebook:
predictor.predict(data="n") b'predict here...'
This endpoint is now callable from our Python notebook, to expose it to the outside world you could expose it with a Lambda and API Gateway as explained in this blog post.
Takeaways
Some takeaways I learned getting a custom model up and running on SageMaker:
- The SageMaker developer documentation is quite lengthy and it’s hard to find concrete examples. A useful resource was the SageMaker examples repository.
train
andserve
scripts must be executable and executed as Python code, test locally after building withdocker run yourimage train
orserve
.- Your model must be written to
/opt/ml/model/...
. - Your API must be exposed on port 8080.
Hope this helps getting your custom model up and running quickly on SageMaker.