Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(quickstart): create notebook based on inferences quickstart instructions #4598

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
292 changes: 292 additions & 0 deletions tutorials/quickstarts/inferences_quickstart.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8a1f0d77-0261-469e-ac52-5fd1e82de4d6",
"metadata": {},
"source": [
"# Overview\n",
"\n",
"Observability for all model types (LLM, NLP, CV, Tabular)\n",
"\n",
"Phoenix Inferences allows you to observe the performance of your model through visualizing all the model’s inferences in one interactive UMAP view.\n",
"\n",
"This powerful visualization can be leveraged during EDA to understand model drift, find low performing clusters, uncover retrieval issues, and export data for retraining / fine tuning."
]
},
{
"cell_type": "markdown",
"id": "59a31a79-d75a-4217-8d5b-1061b94d6268",
"metadata": {},
"source": [
"# Quickstart\n",
"\n",
"The following Quickstart can be executed in a Jupyter notebook or Google Colab.\n",
"\n",
"We will begin by logging just a training set. Then proceed to add a production set for comparison."
]
},
{
"cell_type": "markdown",
"id": "204a1126-8ef2-4f64-ba39-0502f6061c12",
"metadata": {},
"source": [
"## Step 1: Install and load dependencies\n",
"\n",
"Use `pip` or `conda` to install `arize-phoenix`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b7ba3d4",
"metadata": {},
"outputs": [],
"source": [
"!pip install arize-phoenix"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5c96c16",
"metadata": {},
"outputs": [],
"source": [
"import phoenix as px"
]
},
{
"cell_type": "markdown",
"id": "38aa109d-337e-47ac-81f0-8416741abfb6",
"metadata": {},
"source": [
"## Step 2: Prepare Model Data\n",
"\n",
"Phoenix visualizes data taken from pandas dataframe, where each row of the dataframe compasses all the information about each inference (including feature values, prediction, metadata, etc.)\n",
"\n",
"For this Quickstart, we will show an example of visualizing the inferences from a computer vision model. See example notebooks for all model types [here](https://docs.arize.com/phoenix/notebooks).\n",
"\n",
"Let’s begin by working with the training set for this model."
]
},
{
"cell_type": "markdown",
"id": "5256d4bf-7ba6-41c8-9d43-ea4084fbb68e",
"metadata": {},
"source": [
"### Download the dataset and load it into a Pandas dataframe."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b78779d7",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"train_df = pd.read_parquet(\"http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet\")"
]
},
{
"cell_type": "markdown",
"id": "249ae962-918d-407b-a506-1b460e47afd7",
"metadata": {},
"source": [
"### Preview the dataframe (optional)\n",
"\n",
"Note that each row contains all the data specific to this CV model for each inference."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83a12c72-3a60-4399-b704-b6cdbcb57c33",
"metadata": {},
"outputs": [],
"source": [
"train_df.head()"
]
},
{
"cell_type": "markdown",
"id": "ddaf9f03-4fdf-425f-8f14-fb05d50131f1",
"metadata": {},
"source": [
"## Step 3: Define a Schema\n",
"\n",
"Before we can log these inferences, we need to define a Schema object to describe them.\n",
"\n",
"The Schema object informs Phoenix of the fields that the columns of the dataframe should map to.\n",
"\n",
"Here we define a Schema to describe our particular CV training set:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76b04e52",
"metadata": {},
"outputs": [],
"source": [
"train_schema = px.Schema(\n",
" timestamp_column_name=\"prediction_ts\",\n",
" prediction_label_column_name=\"predicted_action\",\n",
" actual_label_column_name=\"actual_action\",\n",
" embedding_feature_column_names={\n",
" \"image_embedding\": px.EmbeddingColumnNames(\n",
" vector_column_name=\"image_vector\",\n",
" link_to_data_column_name=\"url\",\n",
" ),\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7d3b229c-c34c-4ec5-bcfa-ae8fefc9aee2",
"metadata": {},
"source": [
"***Important:*** The fields used in a Schema will vary depending on the model type that you are working with.\n",
"\n",
"For examples on how Schema are defined for other model types (NLP, tabular, LLM-based applications), see example notebooks under [https://docs.arize.com/phoenix/notebooks#embedding-analysis](Embedding Analysis) and [https://docs.arize.com/phoenix/notebooks#structured-data-analysis](Structured Data Analysis)."
]
},
{
"cell_type": "markdown",
"id": "2542e3f4",
"metadata": {},
"source": [
"## Step 4: Wrap into Inference Object\n",
"\n",
"Wrap your `train_df` and schema `train_schema` into a Phoenix `inferences` object:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec7cbaf4",
"metadata": {},
"outputs": [],
"source": [
"train_ds = px.Inferences(dataframe=train_df, schema=train_schema, name=\"training\")"
]
},
{
"cell_type": "markdown",
"id": "525803af",
"metadata": {},
"source": [
"## Step 5: Launch Phoenix!\n",
"\n",
"We are now ready to launch Phoenix with our Inferences!\n",
"\n",
"Here, we are passing `train_ds` as the `primary` inferences, as we are only visualizing one inference set (see Step 6 for adding additional inference sets)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c4e42bb",
"metadata": {},
"outputs": [],
"source": [
"session = px.launch_app(primary=train_ds)"
]
},
{
"cell_type": "markdown",
"id": "de0f2a9b-dc3c-432c-9cca-be18d3176a31",
"metadata": {},
"source": [
"Running this will fire up a Phoenix visualization. Follow in the instructions in the output to view Phoenix in a browser, or in-line in your notebook. \n",
"Optional - try the following exercises to familiarize yourself more with Phoenix:\n",
"\n",
"\n",
"**You are now ready to observe the training set of your model!**"
]
},
{
"cell_type": "markdown",
"id": "b8a0711a-54c9-4b13-8583-0e0d22d1295c",
"metadata": {},
"source": [
"# Optional actions and activities"
]
},
{
"cell_type": "markdown",
"id": "8de7a986-ac2c-49f8-9f4d-1f36d01faf8c",
"metadata": {},
"source": [
"## Exercises to familiarize yourself more with Phoenix:\n",
"\n",
"- [ ] Click on `image_embedding` under the Embeddings section to enter the UMAP projector view\n",
"- [ ] Select a point where the model accuracy is <0.78, and see the embedding visualization below update to include only points from this selected timeframe\n",
"- [ ] Select the cluster with the lowest accuracy; from the list of automatic clusters generated by Phoenix\n",
" - Note that Phoenix automatically generates clusters for you on your data using a clustering algorithm called HDBSCAN (more information: [https://docs.arize.com/phoenix/concepts/embeddings-analysis#clusters](https://docs.arize.com/phoenix/concepts/embeddings-analysis#clusters)\n",
"- [ ] Change the colorization of your plot - e.g. select Color By ‘correctness’, and ‘dimension'\n",
"- [ ] Describe in words an insight you've gathered from this visualization\n",
"\n",
"*Discuss your answers in our [https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q](community)!*"
]
},
{
"cell_type": "markdown",
"id": "196b6e18-80ca-402f-a385-e798ed25d2f1",
"metadata": {},
"source": [
"## Export data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6e52bc5",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"prod_df = pd.read_parquet(\"http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet\")\n",
"prod_schema = px.Schema(\n",
" timestamp_column_name=\"prediction_ts\",\n",
" prediction_label_column_name=\"predicted_action\",\n",
" embedding_feature_column_names={\n",
" \"image_embedding\": px.EmbeddingColumnNames(\n",
" vector_column_name=\"image_vector\",\n",
" link_to_data_column_name=\"url\",\n",
" ),\n",
" },\n",
")\n",
"prod_ds = px.Inferences(dataframe=prod_df, schema=prod_schema, name=\"production\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading