diff --git a/tutorials/quickstarts/inferences_quickstart.ipynb b/tutorials/quickstarts/inferences_quickstart.ipynb new file mode 100644 index 0000000000..44b577fc37 --- /dev/null +++ b/tutorials/quickstarts/inferences_quickstart.ipynb @@ -0,0 +1,292 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8a1f0d77-0261-469e-ac52-5fd1e82de4d6", + "metadata": {}, + "source": [ + "# Overview\n", + "\n", + "Observability for all model types (LLM, NLP, CV, Tabular)\n", + "\n", + "Phoenix Inferences allows you to observe the performance of your model through visualizing all the model’s inferences in one interactive UMAP view.\n", + "\n", + "This powerful visualization can be leveraged during EDA to understand model drift, find low performing clusters, uncover retrieval issues, and export data for retraining / fine tuning." + ] + }, + { + "cell_type": "markdown", + "id": "59a31a79-d75a-4217-8d5b-1061b94d6268", + "metadata": {}, + "source": [ + "# Quickstart\n", + "\n", + "The following Quickstart can be executed in a Jupyter notebook or Google Colab.\n", + "\n", + "We will begin by logging just a training set. Then proceed to add a production set for comparison." + ] + }, + { + "cell_type": "markdown", + "id": "204a1126-8ef2-4f64-ba39-0502f6061c12", + "metadata": {}, + "source": [ + "## Step 1: Install and load dependencies\n", + "\n", + "Use `pip` or `conda` to install `arize-phoenix`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1b7ba3d4", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install arize-phoenix" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5c96c16", + "metadata": {}, + "outputs": [], + "source": [ + "import phoenix as px" + ] + }, + { + "cell_type": "markdown", + "id": "38aa109d-337e-47ac-81f0-8416741abfb6", + "metadata": {}, + "source": [ + "## Step 2: Prepare Model Data\n", + "\n", + "Phoenix visualizes data taken from pandas dataframe, where each row of the dataframe compasses all the information about each inference (including feature values, prediction, metadata, etc.)\n", + "\n", + "For this Quickstart, we will show an example of visualizing the inferences from a computer vision model. See example notebooks for all model types [here](https://docs.arize.com/phoenix/notebooks).\n", + "\n", + "Let’s begin by working with the training set for this model." + ] + }, + { + "cell_type": "markdown", + "id": "5256d4bf-7ba6-41c8-9d43-ea4084fbb68e", + "metadata": {}, + "source": [ + "### Download the dataset and load it into a Pandas dataframe." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b78779d7", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "train_df = pd.read_parquet(\"http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet\")" + ] + }, + { + "cell_type": "markdown", + "id": "249ae962-918d-407b-a506-1b460e47afd7", + "metadata": {}, + "source": [ + "### Preview the dataframe (optional)\n", + "\n", + "Note that each row contains all the data specific to this CV model for each inference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "83a12c72-3a60-4399-b704-b6cdbcb57c33", + "metadata": {}, + "outputs": [], + "source": [ + "train_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "ddaf9f03-4fdf-425f-8f14-fb05d50131f1", + "metadata": {}, + "source": [ + "## Step 3: Define a Schema\n", + "\n", + "Before we can log these inferences, we need to define a Schema object to describe them.\n", + "\n", + "The Schema object informs Phoenix of the fields that the columns of the dataframe should map to.\n", + "\n", + "Here we define a Schema to describe our particular CV training set:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "76b04e52", + "metadata": {}, + "outputs": [], + "source": [ + "train_schema = px.Schema(\n", + " timestamp_column_name=\"prediction_ts\",\n", + " prediction_label_column_name=\"predicted_action\",\n", + " actual_label_column_name=\"actual_action\",\n", + " embedding_feature_column_names={\n", + " \"image_embedding\": px.EmbeddingColumnNames(\n", + " vector_column_name=\"image_vector\",\n", + " link_to_data_column_name=\"url\",\n", + " ),\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "7d3b229c-c34c-4ec5-bcfa-ae8fefc9aee2", + "metadata": {}, + "source": [ + "***Important:*** The fields used in a Schema will vary depending on the model type that you are working with.\n", + "\n", + "For examples on how Schema are defined for other model types (NLP, tabular, LLM-based applications), see example notebooks under [https://docs.arize.com/phoenix/notebooks#embedding-analysis](Embedding Analysis) and [https://docs.arize.com/phoenix/notebooks#structured-data-analysis](Structured Data Analysis)." + ] + }, + { + "cell_type": "markdown", + "id": "2542e3f4", + "metadata": {}, + "source": [ + "## Step 4: Wrap into Inference Object\n", + "\n", + "Wrap your `train_df` and schema `train_schema` into a Phoenix `inferences` object:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ec7cbaf4", + "metadata": {}, + "outputs": [], + "source": [ + "train_ds = px.Inferences(dataframe=train_df, schema=train_schema, name=\"training\")" + ] + }, + { + "cell_type": "markdown", + "id": "525803af", + "metadata": {}, + "source": [ + "## Step 5: Launch Phoenix!\n", + "\n", + "We are now ready to launch Phoenix with our Inferences!\n", + "\n", + "Here, we are passing `train_ds` as the `primary` inferences, as we are only visualizing one inference set (see Step 6 for adding additional inference sets)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c4e42bb", + "metadata": {}, + "outputs": [], + "source": [ + "session = px.launch_app(primary=train_ds)" + ] + }, + { + "cell_type": "markdown", + "id": "de0f2a9b-dc3c-432c-9cca-be18d3176a31", + "metadata": {}, + "source": [ + "Running this will fire up a Phoenix visualization. Follow in the instructions in the output to view Phoenix in a browser, or in-line in your notebook. \n", + "Optional - try the following exercises to familiarize yourself more with Phoenix:\n", + "\n", + "\n", + "**You are now ready to observe the training set of your model!**" + ] + }, + { + "cell_type": "markdown", + "id": "b8a0711a-54c9-4b13-8583-0e0d22d1295c", + "metadata": {}, + "source": [ + "# Optional actions and activities" + ] + }, + { + "cell_type": "markdown", + "id": "8de7a986-ac2c-49f8-9f4d-1f36d01faf8c", + "metadata": {}, + "source": [ + "## Exercises to familiarize yourself more with Phoenix:\n", + "\n", + "- [ ] Click on `image_embedding` under the Embeddings section to enter the UMAP projector view\n", + "- [ ] Select a point where the model accuracy is <0.78, and see the embedding visualization below update to include only points from this selected timeframe\n", + "- [ ] Select the cluster with the lowest accuracy; from the list of automatic clusters generated by Phoenix\n", + " - Note that Phoenix automatically generates clusters for you on your data using a clustering algorithm called HDBSCAN (more information: [https://docs.arize.com/phoenix/concepts/embeddings-analysis#clusters](https://docs.arize.com/phoenix/concepts/embeddings-analysis#clusters)\n", + "- [ ] Change the colorization of your plot - e.g. select Color By ‘correctness’, and ‘dimension'\n", + "- [ ] Describe in words an insight you've gathered from this visualization\n", + "\n", + "*Discuss your answers in our [https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q](community)!*" + ] + }, + { + "cell_type": "markdown", + "id": "196b6e18-80ca-402f-a385-e798ed25d2f1", + "metadata": {}, + "source": [ + "## Export data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6e52bc5", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "prod_df = pd.read_parquet(\"http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet\")\n", + "prod_schema = px.Schema(\n", + " timestamp_column_name=\"prediction_ts\",\n", + " prediction_label_column_name=\"predicted_action\",\n", + " embedding_feature_column_names={\n", + " \"image_embedding\": px.EmbeddingColumnNames(\n", + " vector_column_name=\"image_vector\",\n", + " link_to_data_column_name=\"url\",\n", + " ),\n", + " },\n", + ")\n", + "prod_ds = px.Inferences(dataframe=prod_df, schema=prod_schema, name=\"production\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}