-
Notifications
You must be signed in to change notification settings - Fork 1
AWS Endpoints
When deploying a machine learning model using AWS SageMaker, it's important to understand the architectural components involved. This will provide a more in-depth understanding of what happens under the hood when you make an inference request. Below is a breakdown of the different layers typically found in an AWS SageMaker real-time endpoint.
AWS provides a customized lightweight web server that gets deployed to the instance running your model/endpoint. It's pre-configured by AWS to accept and forward HTTP requests to the RESTful API layer. While you don't directly manage this web server, it plays a crucial role in the process.
The RESTful API is ready to handle requests forwarded from the web server. This layer handles any custom logic, as well as pre-processing and post-processing steps, before invoking the actual machine learning model.
This layer contains your machine learning model code, which is invoked by the RESTful API. This code could involves additional logic and transformation before and after the model does its prediction/inference. See the Model Script subsection below
This is the underlying machine learning model that performs the actual inference. It could be in a variety of model like XGBoost, TensorFlow, PyTorch, etc., and is responsible for taking in the processed input to return an inference or prediction.
Note: The important context here is that we're using the scikit-learn framework for our model/endpoint and we're calling XGBoost models internally. So this is really the best of both worlds. See XGBoost Framework section below.
In a SageMaker environment using the scikit-learn framework (calling XGBoost model), the model script serves as the main entry point for both training and inference tasks. It has distinct responsibilities and implements specific methods that SageMaker calls during the lifecycle of the model. Below is an overview of these responsibilities and methods:
- Data Retrieval: Pull in training data from S3.
- Data Preparation: Split data into training and validation sets.
- Model Creation: Initialize the scikit-learn model.
- Model Training: Train the model on the prepared data.
- Model Saving: Save the trained model using joblib to a specified directory, typically accessible by SageMaker.
- Purpose: Deserializes and returns the fitted model.
-
Arguments:
-
model_dir
: The directory where model files are stored.
-
- Purpose: Preprocesses incoming inference requests.
-
Arguments:
-
input_data
: The payload for the inference request. -
content_type
: MIME type of the incoming payload.
-
- Purpose: Post-processes the inference output.
-
Arguments:
-
output_df
: DataFrame or other structure containing the inference results. -
accept_type
: The expected MIME type for the response payload.
-
- Purpose: Makes predictions using the deserialized model.
-
Arguments:
-
df
: DataFrame or other data structure containing the input data. -
model
: The deserialized machine learning model.
-
This script provides a structured way to manage both the training and inference phases, making it easier to deploy and maintain models in a SageMaker environment.
Short Answer: If you like pain, this is a good option
When deploying machine learning models in SageMaker, using a scikit-learn framework that internally calls XGBoost models can offer more flexibility compared to deploying with an XGBoost framework endpoint. Below are key limitations of using an XGBoost framework endpoint:
- Limited Formats: XGBoost typically supports only 'bytes', making it less versatile for different types of input data.
- Rigid Ordering: Features must be in the exact same order as during training, limiting dynamic or varied input handling.
- Fixed Position: Often requires the target to be the first column, causing less adaptability for varied data structures.
- Limited Customization: Harder to add pre-processing and post-processing steps directly within the XGBoost model script.
By wrapping XGBoost models within a scikit-learn framework, you can overcome these limitations while still leveraging the performance benefits of XGBoost.