Skip to content

AWS Endpoints

Brian Wylie edited this page Oct 3, 2023 · 9 revisions

AWS SageMaker Real-Time Endpoint Architecture

When deploying a machine learning model using AWS SageMaker, it's important to understand the architectural components involved. This will provide a more in-depth understanding of what happens under the hood when you make an inference request. Below is a breakdown of the different layers typically found in an AWS SageMaker real-time endpoint.

Components of a SageMaker Endpoint

Web Server

AWS provides a customized lightweight web server that gets deployed to the instance running your model/endpoint. It's pre-configured by AWS to accept and forward HTTP requests to the RESTful API layer. While you don't directly manage this web server, it plays a crucial role in the process.

RESTful API

The RESTful API is ready to handle requests forwarded from the web server. This layer handles any custom logic, as well as pre-processing and post-processing steps, before invoking the actual machine learning model.

Model Script

This layer contains your machine learning model code, which is invoked by the RESTful API. This code could involves additional logic and transformation before and after the model does its prediction/inference. See the Model Script subsection below

Model

This is the underlying machine learning model that performs the actual inference. It could be in a variety of model like XGBoost, TensorFlow, PyTorch, etc., and is responsible for taking in the processed input to return an inference or prediction.

SageMaker Model Script

Note: The important context here is that we're using the scikit-learn framework for our model/endpoint and we're calling XGBoost models internally. So this is really the best of both worlds. See XGBoost Framework section below.

In a SageMaker environment using the scikit-learn framework (calling XGBoost model), the model script serves as the main entry point for both training and inference tasks. It has distinct responsibilities and implements specific methods that SageMaker calls during the lifecycle of the model. Below is an overview of these responsibilities and methods:

Responsibilities

__main__ Block

  • Data Retrieval: Pull in training data from S3.
  • Data Preparation: Split data into training and validation sets.
  • Model Creation: Initialize the scikit-learn model.
  • Model Training: Train the model on the prepared data.
  • Model Saving: Save the trained model using joblib to a specified directory, typically accessible by SageMaker.

Required Methods

model_fn(model_dir)

  • Purpose: Deserializes and returns the fitted model.
  • Arguments:
    • model_dir: The directory where model files are stored.

input_fn(input_data, content_type)

  • Purpose: Preprocesses incoming inference requests.
  • Arguments:
    • input_data: The payload for the inference request.
    • content_type: MIME type of the incoming payload.

output_fn(output_df, accept_type)

  • Purpose: Post-processes the inference output.
  • Arguments:
    • output_df: DataFrame or other structure containing the inference results.
    • accept_type: The expected MIME type for the response payload.

predict_fn(df, model)

  • Purpose: Makes predictions using the deserialized model.
  • Arguments:
    • df: DataFrame or other data structure containing the input data.
    • model: The deserialized machine learning model.

This script provides a structured way to manage both the training and inference phases, making it easier to deploy and maintain models in a SageMaker environment.

FAQ

Why not just use an XGBoost Framework for Model/Endpoint?

Short Answer: If you like pain, this is a good option

Limitations of XGBoost Framework Endpoints in SageMaker

When deploying machine learning models in SageMaker, using a scikit-learn framework that internally calls XGBoost models can offer more flexibility compared to deploying with an XGBoost framework endpoint. Below are key limitations of using an XGBoost framework endpoint:

Input Serialization

  • Limited Formats: XGBoost typically supports only 'bytes', making it less versatile for different types of input data.

Column Order Sensitivity

  • Rigid Ordering: Features must be in the exact same order as during training, limiting dynamic or varied input handling.

Handling of Target Column

  • Fixed Position: Often requires the target to be the first column, causing less adaptability for varied data structures.

Extensibility and Custom Logic

  • Limited Customization: Harder to add pre-processing and post-processing steps directly within the XGBoost model script.

By wrapping XGBoost models within a scikit-learn framework, you can overcome these limitations while still leveraging the performance benefits of XGBoost.