diff --git a/docs/sphinx/source/examples/feed_performance_cloud.ipynb b/docs/sphinx/source/examples/feed_performance_cloud.ipynb
new file mode 100644
index 00000000..85fc8e6e
--- /dev/null
+++ b/docs/sphinx/source/examples/feed_performance_cloud.ipynb
@@ -0,0 +1,3430 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "given-adoption",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "# Feeding to Vespa Cloud\n",
+ "\n",
+ "Our [previous notebook](https://pyvespa.readthedocs.io/en/latest/examples/feed_performance.html), we demonstrated one way of benchmarking feed performance to a local Vespa instance running in Docker.\n",
+ "In this notebook, we will llok at the same methods, but how feeding to [Vespa Cloud](https://cloud.vespa.ai) affects performance of the different methods.\n",
+ "\n",
+ "The key difference between feeding to a local Vespa instance and a Vespa Cloud instance is the network latency.\n",
+ "Additionally, we will introduce embedding in Vespa at feed time, which is a realistic scenario for many use-cases.\n",
+ "\n",
+ "We will look at these 3 different methods:\n",
+ "\n",
+ "1. Using `feed_iterable()` - which uses threading to parallelize the feed operation. Best for CPU-bound operations.\n",
+ "2. Using `feed_async_iterable()` - which uses asyncio to parallelize the feed operation. Also uses `httpx` with HTTP/2-support. Performs best for IO-bound operations.\n",
+ "3. Using [Vespa CLI](https://docs.vespa.ai/en/vespa-cli).\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8c967bd2",
+ "metadata": {},
+ "source": [
+ "
\n",
+ " Refer to troubleshooting\n",
+ " for any problem when running this guide.\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8345b2fe",
+ "metadata": {},
+ "source": [
+ "Install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).\n",
+ "The `vespacli` python package is just a thin wrapper, allowing for installation through pypi.\n",
+ "\n",
+ "> Do NOT install if you already have the Vespa CLI installed.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5acb52d8",
+ "metadata": {},
+ "source": [
+ "[Install pyvespa](https://pyvespa.readthedocs.io/), and other dependencies.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "03f3d0f2",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "zsh:1: 5.20 not found\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip3 install vespacli pyvespa datasets plotly>=5.20"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "db637322",
+ "metadata": {},
+ "source": [
+ "## Create an application package\n",
+ "\n",
+ "The [application package](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.ApplicationPackage)\n",
+ "has all the Vespa configuration files.\n",
+ "\n",
+ "For this demo, we will use a simple application package\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "bd5c2629",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from vespa.package import (\n",
+ " ApplicationPackage,\n",
+ " Field,\n",
+ " Schema,\n",
+ " Document,\n",
+ " FieldSet,\n",
+ " HNSW,\n",
+ ")\n",
+ "\n",
+ "# Define the application name (can NOT contain `_` or `-`)\n",
+ "\n",
+ "application = \"feedperformancecloud\"\n",
+ "\n",
+ "\n",
+ "package = ApplicationPackage(\n",
+ " name=application,\n",
+ " schema=[\n",
+ " Schema(\n",
+ " name=\"doc\",\n",
+ " document=Document(\n",
+ " fields=[\n",
+ " Field(name=\"id\", type=\"string\", indexing=[\"summary\"]),\n",
+ " Field(name=\"text\", type=\"string\", indexing=[\"index\", \"summary\"]),\n",
+ " Field(\n",
+ " name=\"embedding\",\n",
+ " type=\"tensor(x[1024])\",\n",
+ " # Note that we are NOT embedding with a vespa model here, but that is also possible.\n",
+ " indexing=[\"summary\", \"attribute\", \"index\"],\n",
+ " ann=HNSW(distance_metric=\"angular\"),\n",
+ " ),\n",
+ " ]\n",
+ " ),\n",
+ " fieldsets=[FieldSet(name=\"default\", fields=[\"text\"])],\n",
+ " )\n",
+ " ],\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2c5e2943",
+ "metadata": {},
+ "source": [
+ "Note that the `ApplicationPackage` name cannot have `-` or `_`.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "careful-savage",
+ "metadata": {},
+ "source": [
+ "## Deploy the Vespa application\n",
+ "\n",
+ "Deploy `package` on the local machine using Docker,\n",
+ "without leaving the notebook, by creating an instance of\n",
+ "[VespaDocker](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaDocker). `VespaDocker` connects\n",
+ "to the local Docker daemon socket and starts the [Vespa docker image](https://hub.docker.com/r/vespaengine/vespa/).\n",
+ "\n",
+ "If this step fails, please check\n",
+ "that the Docker daemon is running, and that the Docker daemon socket can be used by clients (Configurable under advanced settings in Docker Desktop).\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f74324a",
+ "metadata": {},
+ "source": [
+ "Follow the instrauctions from the output above and add the control-plane key in the console at `https://console.vespa-cloud.com/tenant/TENANT_NAME/account/keys`\n",
+ "(replace TENANT_NAME with your tenant name).\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "canadian-blood",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Setting application...\n",
+ "Running: vespa config set application vespa-team.feedperformancecloud\n",
+ "Setting target cloud...\n",
+ "Running: vespa config set target cloud\n",
+ "\n",
+ "Api-key found for control plane access. Using api-key.\n"
+ ]
+ }
+ ],
+ "source": [
+ "from vespa.deployment import VespaCloud\n",
+ "from vespa.application import Vespa\n",
+ "import os\n",
+ "\n",
+ "\n",
+ "def read_secret():\n",
+ " \"\"\"Read the API key from the environment variable. This is\n",
+ " only used for CI/CD purposes.\"\"\"\n",
+ " t = os.getenv(\"VESPA_TEAM_API_KEY\")\n",
+ " if t:\n",
+ " return t.replace(r\"\\n\", \"\\n\")\n",
+ " else:\n",
+ " return t\n",
+ "\n",
+ "\n",
+ "vespa_cloud = VespaCloud(\n",
+ " tenant=\"vespa-team\",\n",
+ " application=application,\n",
+ " key_content=read_secret()\n",
+ " if read_secret()\n",
+ " else None, # Can removed this for interactive control-plane login\n",
+ " application_package=package,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aaae2f91",
+ "metadata": {},
+ "source": [
+ "`app` now holds a reference to a [VespaCloud](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaCloud) instance.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "471c2da7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Deployment started in run 3 of dev-aws-us-east-1c for vespa-team.feedperformancecloud. This may take a few minutes the first time.\n",
+ "INFO [08:04:48] Deploying platform version 8.387.10 and application dev build 3 for dev-aws-us-east-1c of default ...\n",
+ "INFO [08:04:48] Using CA signed certificate version 1\n",
+ "INFO [08:04:49] Using 1 nodes in container cluster 'feedperformancecloud_container'\n",
+ "WARNING [08:04:50] Auto-overriding validation which would be disallowed in production: certificate-removal: Data plane certificate(s) from cluster 'feedperformancecloud_container' is removed (removed certificates: [CN=cloud.vespa.example]) This can cause client connection issues.. To allow this add certificate-removal to validation-overrides.xml, see https://docs.vespa.ai/en/reference/validation-overrides.html\n",
+ "INFO [08:04:50] Using 1 nodes in container cluster 'feedperformancecloud_container'\n",
+ "WARNING [08:04:53] Auto-overriding validation which would be disallowed in production: certificate-removal: Data plane certificate(s) from cluster 'feedperformancecloud_container' is removed (removed certificates: [CN=cloud.vespa.example]) This can cause client connection issues.. To allow this add certificate-removal to validation-overrides.xml, see https://docs.vespa.ai/en/reference/validation-overrides.html\n",
+ "INFO [08:04:55] Session 303878 for tenant 'vespa-team' prepared and activated.\n",
+ "INFO [08:04:55] ######## Details for all nodes ########\n",
+ "INFO [08:04:55] h95731a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
+ "INFO [08:04:55] --- platform vespa/cloud-tenant-rhel8:8.387.10\n",
+ "INFO [08:04:55] --- container on port 4080 has not started \n",
+ "INFO [08:04:55] --- metricsproxy-container on port 19092 has config generation 303870, wanted is 303878\n",
+ "INFO [08:04:55] h95729b.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
+ "INFO [08:04:55] --- platform vespa/cloud-tenant-rhel8:8.387.10\n",
+ "INFO [08:04:55] --- storagenode on port 19102 has config generation 303870, wanted is 303878\n",
+ "INFO [08:04:55] --- searchnode on port 19107 has config generation 303878, wanted is 303878\n",
+ "INFO [08:04:55] --- distributor on port 19111 has config generation 303878, wanted is 303878\n",
+ "INFO [08:04:55] --- metricsproxy-container on port 19092 has config generation 303878, wanted is 303878\n",
+ "INFO [08:04:55] h93272g.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
+ "INFO [08:04:55] --- platform vespa/cloud-tenant-rhel8:8.387.10\n",
+ "INFO [08:04:55] --- logserver-container on port 4080 has config generation 303878, wanted is 303878\n",
+ "INFO [08:04:55] --- metricsproxy-container on port 19092 has config generation 303878, wanted is 303878\n",
+ "INFO [08:04:55] h93272h.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
+ "INFO [08:04:55] --- platform vespa/cloud-tenant-rhel8:8.387.10\n",
+ "INFO [08:04:55] --- container-clustercontroller on port 19050 has config generation 303878, wanted is 303878\n",
+ "INFO [08:04:55] --- metricsproxy-container on port 19092 has config generation 303878, wanted is 303878\n",
+ "INFO [08:05:03] Found endpoints:\n",
+ "INFO [08:05:03] - dev.aws-us-east-1c\n",
+ "INFO [08:05:03] |-- https://b48e8812.bc737822.z.vespa-app.cloud/ (cluster 'feedperformancecloud_container')\n",
+ "INFO [08:05:04] Deployment of new application complete!\n",
+ "Found mtls endpoint for feedperformancecloud_container\n",
+ "URL: https://b48e8812.bc737822.z.vespa-app.cloud/\n",
+ "Connecting to https://b48e8812.bc737822.z.vespa-app.cloud/\n",
+ "Using mtls_key_cert Authentication against endpoint https://b48e8812.bc737822.z.vespa-app.cloud//ApplicationStatus\n",
+ "Application is up!\n",
+ "Finished deployment.\n"
+ ]
+ }
+ ],
+ "source": [
+ "app: Vespa = vespa_cloud.deploy()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "570cfbd3",
+ "metadata": {},
+ "source": [
+ "Note that if you already have a Vespa Cloud instance running, the recommended way to initialize a `Vespa` instance is directly, by passing the `endpoint` and `tenant` parameters to the `Vespa` constructor, along with either:\n",
+ "\n",
+ "1. Key/cert for dataplane authentication (generated as part of deployment, copied into the application package, in `/security/clients.pem`, and `~/.vespa/mytenant.myapplication/data-plane-public-cert.pem` and `~/.vespa/mytenant.myapplication/data-plane-private-key.pem`).\n",
+ "\n",
+ "```python\n",
+ "from vespa.application import Vespa\n",
+ "\n",
+ "app: Vespa = Vespa(\n",
+ " url=\"https://my-endpoint.z.vespa-app.cloud\",\n",
+ " tenant=\"my-tenant\",\n",
+ " key_file=\"path/to/private-key.pem\",\n",
+ " cert_file=\"path/to/certificate.pem\",\n",
+ ")\n",
+ "```\n",
+ "\n",
+ "2. Using a token (must be generated in [Vespa Cloud Console](https://console.vespa-cloud.com/) and defined in the application package, see https://cloud.vespa.ai/en/security/guide.\n",
+ "\n",
+ "```python\n",
+ "from vespa.application import Vespa\n",
+ "import os\n",
+ "\n",
+ "app: Vespa = Vespa(\n",
+ " url=\"https://my-endpoint.z.vespa-app.cloud\",\n",
+ " tenant=\"my-tenant\",\n",
+ " vespa_cloud_secret_token=os.getenv(\"VESPA_CLOUD_SECRET_TOKEN\"),\n",
+ ")\n",
+ "```\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "3bdbbb47",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using mtls_key_cert Authentication against endpoint https://b48e8812.bc737822.z.vespa-app.cloud//ApplicationStatus\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "app.get_application_status()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "sealed-mustang",
+ "metadata": {},
+ "source": [
+ "## Preparing the data\n",
+ "\n",
+ "In this example we use [HF Datasets](https://huggingface.co/docs/datasets/index) library to stream the\n",
+ "[\"Cohere/wikipedia-2023-11-embed-multilingual-v3\"](https://huggingface.co/datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3) dataset and index in our newly deployed Vespa instance.\n",
+ "\n",
+ "The dataset contains wikipedia-pages, and their corresponding embeddings.\n",
+ "\n",
+ "> For this exploration we will use the `id` , `text` and `embedding`-fields\n",
+ "\n",
+ "The following uses the [stream](https://huggingface.co/docs/datasets/stream) option of datasets to stream the data without\n",
+ "downloading all the contents locally.\n",
+ "\n",
+ "The `map` functionality allows us to convert the\n",
+ "dataset fields into the expected feed format for `pyvespa` which expects a dict with the keys `id` and `fields`:\n",
+ "\n",
+ "`{ \"id\": \"vespa-document-id\", \"fields\": {\"vespa_field\": \"vespa-field-value\"}}`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "e9d3facd",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/Users/thomas/.pyenv/versions/3.9.19/envs/pyvespa-dev/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+ " from .autonotebook import tqdm as notebook_tqdm\n"
+ ]
+ }
+ ],
+ "source": [
+ "from datasets import load_dataset"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e2b68592",
+ "metadata": {},
+ "source": [
+ "## Utility function to create dataset with different number of documents\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "60772727",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def get_dataset(n_docs: int = 1000):\n",
+ " dataset = load_dataset(\n",
+ " \"Cohere/wikipedia-2023-11-embed-multilingual-v3\",\n",
+ " \"simple\",\n",
+ " split=f\"train[:{n_docs}]\",\n",
+ " )\n",
+ " dataset = dataset.map(\n",
+ " lambda x: {\n",
+ " \"id\": x[\"_id\"] + \"-iter\",\n",
+ " \"fields\": {\"text\": x[\"text\"], \"embedding\": x[\"emb\"]},\n",
+ " }\n",
+ " ).select_columns([\"id\", \"fields\"])\n",
+ " return dataset"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5e3f0d0f",
+ "metadata": {},
+ "source": [
+ "### A dataclass to store the parameters and results of the different feeding methods\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "b6ab7b70",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from dataclasses import dataclass\n",
+ "from typing import Callable, Optional, Iterable, Dict\n",
+ "\n",
+ "\n",
+ "@dataclass\n",
+ "class FeedParams:\n",
+ " name: str\n",
+ " num_docs: int\n",
+ " max_connections: int\n",
+ " function_name: str\n",
+ " max_workers: Optional[int] = None\n",
+ " max_queue_size: Optional[int] = None\n",
+ "\n",
+ "\n",
+ "@dataclass\n",
+ "class FeedResult(FeedParams):\n",
+ " feed_time: Optional[float] = None"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f865e5c7",
+ "metadata": {},
+ "source": [
+ "### A common callback function to notify if something goes wrong\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "ab4c02b9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from vespa.io import VespaResponse\n",
+ "\n",
+ "\n",
+ "def callback(response: VespaResponse, id: str):\n",
+ " if not response.is_successful():\n",
+ " print(\n",
+ " f\"Failed to feed document {id} with status code {response.status_code}: Reason {response.get_json()}\"\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aa3e99e3",
+ "metadata": {},
+ "source": [
+ "### Defining our feeding functions\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "9b70bde7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import time\n",
+ "import asyncio\n",
+ "from vespa.application import Vespa"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "1fe15cec",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def feed_iterable(app: Vespa, params: FeedParams, data: Iterable[Dict]) -> FeedResult:\n",
+ " start = time.time()\n",
+ " app.feed_iterable(\n",
+ " data,\n",
+ " schema=\"doc\",\n",
+ " namespace=\"pyvespa-feed\",\n",
+ " operation_type=\"feed\",\n",
+ " max_queue_size=params.max_queue_size,\n",
+ " max_workers=params.max_workers,\n",
+ " max_connections=params.max_connections,\n",
+ " callback=callback,\n",
+ " )\n",
+ " end = time.time()\n",
+ " sync_feed_time = end - start\n",
+ " return FeedResult(\n",
+ " **params.__dict__,\n",
+ " feed_time=sync_feed_time,\n",
+ " )\n",
+ "\n",
+ "\n",
+ "def feed_async_iterable(\n",
+ " app: Vespa, params: FeedParams, data: Iterable[Dict]\n",
+ ") -> FeedResult:\n",
+ " start = time.time()\n",
+ " app.feed_async_iterable(\n",
+ " data,\n",
+ " schema=\"doc\",\n",
+ " namespace=\"pyvespa-feed\",\n",
+ " operation_type=\"feed\",\n",
+ " max_queue_size=params.max_queue_size,\n",
+ " max_workers=params.max_workers,\n",
+ " max_connections=params.max_connections,\n",
+ " callback=callback,\n",
+ " )\n",
+ " end = time.time()\n",
+ " sync_feed_time = end - start\n",
+ " return FeedResult(\n",
+ " **params.__dict__,\n",
+ " feed_time=sync_feed_time,\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "43614eb0",
+ "metadata": {},
+ "source": [
+ "## Defining our hyperparameters\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "a22fe87e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Function: feed_async_iterable\n",
+ "{'num_docs': 1000, 'max_connections': 1, 'max_workers': 64, 'max_queue_size': 2500}\n",
+ "{'num_docs': 5000, 'max_connections': 1, 'max_workers': 64, 'max_queue_size': 2500}\n",
+ "{'num_docs': 10000, 'max_connections': 1, 'max_workers': 64, 'max_queue_size': 2500}\n",
+ "\n",
+ "\n",
+ "Function: feed_iterable\n",
+ "{'num_docs': 1000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}\n",
+ "{'num_docs': 5000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}\n",
+ "{'num_docs': 10000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}\n",
+ "\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "from itertools import product\n",
+ "\n",
+ "# We will only run for up to 10 000 documents here as notebook is run as part of CI.\n",
+ "\n",
+ "num_docs = [\n",
+ " 1000,\n",
+ " 5_000,\n",
+ " 10_000,\n",
+ "]\n",
+ "params_by_function = {\n",
+ " \"feed_async_iterable\": {\n",
+ " \"num_docs\": num_docs,\n",
+ " \"max_connections\": [1],\n",
+ " \"max_workers\": [64],\n",
+ " \"max_queue_size\": [2500],\n",
+ " },\n",
+ " \"feed_iterable\": {\n",
+ " \"num_docs\": num_docs,\n",
+ " \"max_connections\": [64],\n",
+ " \"max_workers\": [64],\n",
+ " \"max_queue_size\": [2500],\n",
+ " },\n",
+ "}\n",
+ "\n",
+ "feed_params = []\n",
+ "# Create one FeedParams instance of each permutation\n",
+ "for func, parameters in params_by_function.items():\n",
+ " print(f\"Function: {func}\")\n",
+ " keys, values = zip(*parameters.items())\n",
+ " for combination in product(*values):\n",
+ " settings = dict(zip(keys, combination))\n",
+ " print(settings)\n",
+ " feed_params.append(\n",
+ " FeedParams(\n",
+ " name=f\"{settings['num_docs']}_{settings['max_connections']}_{settings.get('max_workers', 0)}_{func}\",\n",
+ " function_name=func,\n",
+ " **settings,\n",
+ " )\n",
+ " )\n",
+ " print(\"\\n\") # Just to add space between different functions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "2b3f067c",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Total number of feed_params: 6\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f\"Total number of feed_params: {len(feed_params)}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "15648d56",
+ "metadata": {},
+ "source": [
+ "Now, we will need a way to retrieve the callable function from the function name.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "22044170",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Get reference to function from string name\n",
+ "def get_func_from_str(func_name: str) -> Callable:\n",
+ " return globals()[func_name]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "79f3f550",
+ "metadata": {},
+ "source": [
+ "### Function to clean up after each feed\n",
+ "\n",
+ "For a fair comparison, we will delete the data before feeding it again.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "1da9d3f9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from typing import Iterable, Dict\n",
+ "from vespa.application import Vespa\n",
+ "\n",
+ "\n",
+ "def delete_data(app: Vespa, data: Iterable[Dict]):\n",
+ " app.feed_iterable(\n",
+ " iter=data,\n",
+ " schema=\"doc\",\n",
+ " namespace=\"pyvespa-feed\",\n",
+ " operation_type=\"delete\",\n",
+ " callback=callback,\n",
+ " max_workers=16,\n",
+ " max_connections=16,\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e081bf94",
+ "metadata": {},
+ "source": [
+ "## Main experiment loop\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "87c8700c",
+ "metadata": {},
+ "source": [
+ "The line below is used to make the code run in Jupyter, as it is already running an event loop\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "aaa8f920",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import nest_asyncio\n",
+ "\n",
+ "nest_asyncio.apply()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "7a55e1c9",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "--------------------------------------------------\n",
+ "Starting feed with params:\n",
+ "FeedParams(name='1000_1_64_feed_async_iterable', num_docs=1000, max_connections=1, function_name='feed_async_iterable', max_workers=64, max_queue_size=2500)\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using mtls_key_cert Authentication against endpoint https://b48e8812.bc737822.z.vespa-app.cloud//ApplicationStatus\n",
+ "9.478203773498535\n",
+ "Deleting data\n",
+ "--------------------------------------------------\n",
+ "Starting feed with params:\n",
+ "FeedParams(name='5000_1_64_feed_async_iterable', num_docs=5000, max_connections=1, function_name='feed_async_iterable', max_workers=64, max_queue_size=2500)\n",
+ "32.890751123428345\n",
+ "Deleting data\n",
+ "--------------------------------------------------\n",
+ "Starting feed with params:\n",
+ "FeedParams(name='10000_1_64_feed_async_iterable', num_docs=10000, max_connections=1, function_name='feed_async_iterable', max_workers=64, max_queue_size=2500)\n",
+ "77.85460019111633\n",
+ "Deleting data\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Exception in thread Thread-7:\n",
+ "Traceback (most recent call last):\n",
+ " File \"/Users/thomas/.pyenv/versions/3.9.19/lib/python3.9/threading.py\", line 980, in _bootstrap_inner\n",
+ " self.run()\n",
+ " File \"/Users/thomas/.pyenv/versions/3.9.19/envs/pyvespa-dev/lib/python3.9/site-packages/ipykernel/ipkernel.py\", line 766, in run_closure\n",
+ " _threading_Thread_run(self)\n",
+ " File \"/Users/thomas/.pyenv/versions/3.9.19/lib/python3.9/threading.py\", line 917, in run\n",
+ " self._target(*self._args, **self._kwargs)\n",
+ " File \"/Users/thomas/Repos/pyvespa/vespa/application.py\", line 480, in _consumer\n",
+ " future: Future = executor.submit(_submit, doc, sync_session)\n",
+ " File \"/Users/thomas/.pyenv/versions/3.9.19/lib/python3.9/concurrent/futures/thread.py\", line 167, in submit\n",
+ " raise RuntimeError('cannot schedule new futures after shutdown')\n",
+ "RuntimeError: cannot schedule new futures after shutdown\n"
+ ]
+ },
+ {
+ "ename": "KeyboardInterrupt",
+ "evalue": "",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[17], line 22\u001b[0m\n\u001b[1;32m 20\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mDeleting data\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 21\u001b[0m time\u001b[38;5;241m.\u001b[39msleep(\u001b[38;5;241m3\u001b[39m)\n\u001b[0;32m---> 22\u001b[0m \u001b[43mdelete_data\u001b[49m\u001b[43m(\u001b[49m\u001b[43mapp\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdata\u001b[49m\u001b[43m)\u001b[49m\n",
+ "Cell \u001b[0;32mIn[15], line 6\u001b[0m, in \u001b[0;36mdelete_data\u001b[0;34m(app, data)\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mdelete_data\u001b[39m(app: Vespa, data: Iterable[Dict]):\n\u001b[0;32m----> 6\u001b[0m \u001b[43mapp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfeed_iterable\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 7\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43miter\u001b[39;49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 8\u001b[0m \u001b[43m \u001b[49m\u001b[43mschema\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mdoc\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 9\u001b[0m \u001b[43m \u001b[49m\u001b[43mnamespace\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mpyvespa-feed\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10\u001b[0m \u001b[43m \u001b[49m\u001b[43moperation_type\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mdelete\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 11\u001b[0m \u001b[43m \u001b[49m\u001b[43mcallback\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcallback\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 12\u001b[0m \u001b[43m \u001b[49m\u001b[43mmax_workers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m16\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 13\u001b[0m \u001b[43m \u001b[49m\u001b[43mmax_connections\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m16\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 14\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n",
+ "File \u001b[0;32m~/Repos/pyvespa/vespa/application.py:579\u001b[0m, in \u001b[0;36mVespa.feed_iterable\u001b[0;34m(self, iter, schema, namespace, callback, operation_type, max_queue_size, max_workers, max_connections, **kwargs)\u001b[0m\n\u001b[1;32m 577\u001b[0m consumer_thread\u001b[38;5;241m.\u001b[39mstart()\n\u001b[1;32m 578\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m doc \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28miter\u001b[39m:\n\u001b[0;32m--> 579\u001b[0m \u001b[43mqueue\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mput\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdoc\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mblock\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m 580\u001b[0m queue\u001b[38;5;241m.\u001b[39mput(\u001b[38;5;28;01mNone\u001b[39;00m, block\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m 581\u001b[0m queue\u001b[38;5;241m.\u001b[39mjoin()\n",
+ "File \u001b[0;32m~/.pyenv/versions/3.9.19/lib/python3.9/queue.py:140\u001b[0m, in \u001b[0;36mQueue.put\u001b[0;34m(self, item, block, timeout)\u001b[0m\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m timeout \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 139\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_qsize() \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmaxsize:\n\u001b[0;32m--> 140\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mnot_full\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mwait\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 141\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m timeout \u001b[38;5;241m<\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[1;32m 142\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtimeout\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m must be a non-negative number\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
+ "File \u001b[0;32m~/.pyenv/versions/3.9.19/lib/python3.9/threading.py:312\u001b[0m, in \u001b[0;36mCondition.wait\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m 310\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m: \u001b[38;5;66;03m# restore state no matter what (e.g., KeyboardInterrupt)\u001b[39;00m\n\u001b[1;32m 311\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m timeout \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 312\u001b[0m \u001b[43mwaiter\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43macquire\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 313\u001b[0m gotit \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[1;32m 314\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n",
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
+ ]
+ }
+ ],
+ "source": [
+ "results = []\n",
+ "for params in feed_params:\n",
+ " print(\"-\" * 50)\n",
+ " print(\"Starting feed with params:\")\n",
+ " print(params)\n",
+ " data = get_dataset(params.num_docs)\n",
+ " if \"xxx\" not in params.function_name:\n",
+ " if \"feed_sync\" in params.function_name:\n",
+ " print(\"Skipping feed_sync\")\n",
+ " continue\n",
+ " feed_result = get_func_from_str(params.function_name)(\n",
+ " app=app, params=params, data=data\n",
+ " )\n",
+ " else:\n",
+ " feed_result = asyncio.run(\n",
+ " get_func_from_str(params.function_name)(app=app, params=params, data=data)\n",
+ " )\n",
+ " print(feed_result.feed_time)\n",
+ " results.append(feed_result)\n",
+ " print(\"Deleting data\")\n",
+ " time.sleep(3)\n",
+ " delete_data(app, data)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "e4b7f1a4",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "