Skip to content

AWS Endpoints

Brian Wylie edited this page Oct 3, 2023 · 9 revisions

AWS SageMaker Endpoints

When deploying a machine learning model using AWS SageMaker, it's important to understand the architectural components involved. This will provide a more in-depth understanding of what happens under the hood when you make an inference request. Below is a breakdown of the different layers typically found in an AWS SageMaker real-time endpoint.

Components of a SageMaker Endpoint

Web Server

AWS provides a customized lightweight web server that gets deployed to the instance running your model/endpoint. It's pre-configured by AWS to accept and forward HTTP requests to the RESTful API layer. While you don't directly manage this web server, it plays a crucial role in the process.

RESTful API

The RESTful API is ready to handle requests forwarded from the web server. This layer handles any custom logic, as well as pre-processing and post-processing steps, before invoking the actual machine learning model.

Model Script

This layer contains your machine learning model code, which is invoked by the RESTful API. This code could involves additional logic and transformation before and after the model does its prediction/inference. See the Model Script subsection below

Model

This is the underlying machine learning model that performs the actual inference. It could be in a variety of model like XGBoost, TensorFlow, PyTorch, etc., and is responsible for taking in the processed input to return an inference or prediction.

SageMaker Model Script

Note: The important context here is that we're using the scikit-learn framework for our model script/endpoint. The scikit-learn model scripts are using XGBoost models internally. So gives us the best of both worlds. See XGBoost Framework section below.

In a SageMaker environment using the scikit-learn framework (calling XGBoost model), the model script serves as the main entry point for both training and inference tasks. It has distinct responsibilities and implements specific methods that SageMaker calls during the lifecycle of the model. Below is an overview of these responsibilities and methods:

Responsibilities

__main__ Block

  • Data Retrieval: Pull in training data from S3.
  • Data Preparation: Split data into training and validation sets.
  • Model Creation: Initialize the scikit-learn model.
  • Model Training: Train the model on the prepared data.
  • Model Saving: Save the trained model using joblib to a specified directory, typically accessible by SageMaker.

Required Methods

model_fn(model_dir)

  • Purpose: Deserializes and returns the fitted model.
  • Arguments:
    • model_dir: The directory where model files are stored.

input_fn(input_data, content_type)

  • Purpose: Preprocesses incoming inference requests.
  • Arguments:
    • input_data: The payload for the inference request.
    • content_type: MIME type of the incoming payload.

output_fn(output_df, accept_type)

  • Purpose: Post-processes the inference output.
  • Arguments:
    • output_df: DataFrame or other structure containing the inference results.
    • accept_type: The expected MIME type for the response payload.

predict_fn(df, model)

  • Purpose: Makes predictions using the deserialized model.
  • Arguments:
    • df: DataFrame or other data structure containing the input data.
    • model: The deserialized machine learning model.

This script provides a structured way to manage both the training and inference phases, making it easier to deploy and maintain models in a SageMaker environment.

Why not just use the XGBoost Framework for Model/Endpoint?

Short Answer: If you like pain, this is a good option

Limitations of XGBoost Framework Endpoints in SageMaker

When deploying machine learning models in SageMaker, using a scikit-learn framework that internally calls XGBoost models can offer more flexibility compared to deploying with an XGBoost framework endpoint. Below are key limitations of using an XGBoost framework endpoint:

Input/Output Serialization

XGBoost supports only 'bytes', making a endpoint request where you properly format the HTTP request as 'bytes' is often more complicated and error prone then when the endpoint supports a CSV/JSON serializers. Same for managing the outputs, getting the 'bytes' processed by the receiver of the predictions can be slightly more complicated.

Exact Column Match

  • The target column must be the first column.
  • Features must be in the exact same order as they were during training.
  • You cannot send a 'superset' of columns. Sending a superset of columns can be handy, like sending ids/descriptions/meta. When you get the predictions back they are now included into the output/dataframe. So you can show both the meta data and the predictions.

No Extensibility or Custom Logic

When you write a script for XGBoost framework models you basically just have top level code that loads data, trains the model, and saves the model. You don't really have flexible ways to handle input/output and additional logic.

Serverless vs Real-Time Endpoints in AWS SageMaker

AWS SageMaker offers two types of endpoints for deploying machine learning models—Serverless and Real-time. Each has its unique advantages and disadvantages, here's a quick run down of pros and cons of serverless versus realtime endpoints:

Serverless Endpoints

Pros

  • Zero Maintenance: No need to worry about underlying infrastructure or scaling.
  • Cost-Efficiency: Only pay for the time your function is executing.
  • Simpler Configuration: Easier to set up compared to real-time endpoints.

Cons

  • Cold Starts: Initial latency can be higher due to the serverless architecture.
  • Limited Resources: Restrictions on CPU and memory.
  • Timeout Limits: Typically a maximum execution timeout, which may not be suitable for very long-running predictions.

Real-Time Endpoints

Pros

  • Low Latency: Real-time endpoints are optimized for low-latency predictions.
  • Resource Customization: More control over the type and amount of resources allocated.
  • Advanced Configurations: Allows for A/B testing, multi-model endpoints, etc.

Cons

  • Complex Management: Requires more configuration and manual scaling.
  • Cost: You're paying for the reserved instance regardless of utilization.
  • Maintenance Overhead: May require manual interventions for scaling, updates, etc.

Selecting between Serverless and Real-Time depends on the specific needs of your project. For quick, low-maintenance deployments, Serverless may be ideal. For resource-intensive or latency-sensitive applications, Real-Time could be more appropriate.