test: [RLOS2023] Add new e2e test framework for vw (VowpalWabbit#4644)

* intro notebook * test: [RLOS_2023][WIP] updated test for regression weight (VowpalWabbit#4600) * test: add test for regression weight * test: make test more reusable by using json to specify pytest * test: minor fix on naming * test: add and option to python json test * test: [RLOS_2023] test for contextual bandit (VowpalWabbit#4612) * test: add basic cb test and configuration * test: add shared context data generation * add test for cb_explore_adf * test: dynamically create pytest test case * test: give fixed reward function signature * test: [RLOS_2023] [WIP] Support + and * expression for grids (VowpalWabbit#4618) * test: add basic cb test and configuration * test: add shared context data generation * add test for cb_explore_adf * test: dynamically create pytest test case * test: give fixed reward function signature * test: support + and * expression for grids * fix empty expression bugs * test: [RLOS2023] [WIP] add more arguments for reg&cb tests (VowpalWabbit#4619) * test: add more arguments for reg&cb tests * test: fix minor bug in generate expression & add loss funcs to tests * test: [RLOS2023] [WIP] add classification test (VowpalWabbit#4623) * test: add more arguments for reg&cb tests * test: fix minor bug in generate expression & add loss funcs to tests * test: add test for classification * test: organize test framework structure (VowpalWabbit#4624) * test: [RLOS2023][WIP] add option for storing output and grid language redefinition (VowpalWabbit#4627) * test: redesign grid lang * test: add option for store output * test: change list to dict for config vars * test: [RLOS2023] add test for slate (VowpalWabbit#4629) * test: add test for slate * test: test cleanup and slate test update * test: minor cleanup and change assert_loss function to equal instead of lower * test: [RLOS2023] add test for cb with continous action (VowpalWabbit#4630) * test: add test for slate * test: test cleanup and slate test update * test: minor cleanup and change assert_loss function to equal instead of lower * test: add test for cb with continous action * modify blocker testcase * test: [RLOS2023] clean for e2e testing framework v2 (VowpalWabbit#4633) * test: clean for e2e test v2 * test:change seed to same value for all tests * test: add datagen driver (VowpalWabbit#4638) * python black * python black 2 * minor demo cleanup --------- Co-authored-by: Alexey Taymanov <ataymano@microsoft.com> Co-authored-by: Alexey Taymanov <41013086+ataymano@users.noreply.github.com>
ataymano · Oct 5, 2023 · ba83f62 · ba83f62
1 parent 1c5fedf
commit ba83f62
Show file tree

Hide file tree

Showing 24 changed files with 1,919 additions and 0 deletions.
diff --git a/demo/cmd_getting_started/vw_intro.ipynb b/demo/cmd_getting_started/vw_intro.ipynb
@@ -0,0 +1,328 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Helpers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def read(path):\n",
+    "    with open(path) as f:\n",
+    "        print(\"\".join(f.readlines()))"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Regression"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's generate data of the following form: <br>\n",
+    "Every example has single namespace 'f' with single feature 'x' in it <br>\n",
+    "Target function is $$\\hat{y} = 2x + 1$$\n",
+    "And we are learning weights $w$, $b$ for $$y=wx+b$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "with open(\"regression1.txt\", \"w\") as f:\n",
+    "    for i in range(1000):\n",
+    "        x = np.random.rand()\n",
+    "        y = 2 * x + 1\n",
+    "        f.write(f\"{y} |f x:{x}\\n\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Simplest execution"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw -d regression1.txt"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Output more artifacts\n",
+    "-p - predictions <br>\n",
+    "--invert_hash - model in readable format <br>\n",
+    "-f - model in binary format (consumable by vw)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw -d regression1.txt -p regression1.pred --invert_hash regression1.model.txt -f regression1.model.bin"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can look at weights and see the $w$ and $b$ got close to expected 2 and 1 values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "read(\"regression1.model.txt\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Do only predictions, no learning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw -d regression1.txt -t"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw -d regression1.txt -t --learning_rate 10"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw -d regression1.txt -t -i regression1.model.bin"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Interactions"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's generate another dataset of the following form: <br>\n",
+    "Every example has single namespace 'f' with single feature 'x' in it <br>\n",
+    "Target function is $$\\hat{y} = x^2 + 1$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "with open(\"regression2.txt\", \"w\") as f:\n",
+    "    for i in range(1000):\n",
+    "        x = np.random.rand() * 4\n",
+    "        y = x**2 + 1\n",
+    "        f.write(f\"{y} |f x:{x}\\n\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can see loss being far from zero if we stil try to learn $$y=wx+b$$ "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw -d regression2.txt --invert_hash regression2.model.txt"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So let's try to learn $$y=w_1 x^2 + w_2 x + b$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw -d regression2.txt --invert_hash regression2.model.txt --interactions ff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "read(\"regression2.model.txt\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Contextual bandits"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "env = {\"Tom\": {\"sports\": 0, \"politics\": 1}, \"Anna\": {\"sports\": 1, \"politics\": 0}}\n",
+    "\n",
+    "users = [\"Tom\", \"Anna\"]\n",
+    "content = [\"sports\", \"politics\"]\n",
+    "\n",
+    "with open(\"cb.txt\", \"w\") as f:\n",
+    "    for i in range(1000):\n",
+    "        user = users[np.random.randint(0, 2)]\n",
+    "        chosen = np.random.randint(0, 2)\n",
+    "        reward = env[user][content[chosen]]\n",
+    "\n",
+    "        f.write(f\"shared |u {user}\\n\")\n",
+    "        f.write(f\"0:{-reward}:0.5 |a {content[chosen]}\\n\")\n",
+    "        f.write(f\"|a {content[(chosen + 1) % 2]}\\n\\n\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's try to learn to predict reward in the following form: $$r = w_1 I(user\\ is\\ Tom) + w_2 I(user\\ is\\ Anna) + w_3 I(topic\\ is\\ sports) + w_4 I(topic\\ is\\ politics) + b$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw --cb_explore_adf -d cb.txt --invert_hash cb.model.txt"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can see that average reward is still around 0.5 which is the same as we would get answering randomly. This is expected since personalization is not captured in this form.\n",
+    "Let's add interaction between 'u' and 'a' namespaces and try to learn function of the following form:\n",
+    "$$\\begin{aligned}r = w_1 I(user\\ is\\ Tom) I(topic\\ is\\ sports) + w_2 I(user\\ is\\ Tom) I(topic\\ is\\ politics) +\\\\+ w_3 I(user\\ is\\ Anna) I(topic\\ is\\ sports) + w_4 I(user\\ is\\ Anna) I(topic\\ is\\ politics) +\\\\+ w_5 I(user\\ is\\ Tom) + w_6 I(user\\ is\\ Anna) +\\\\+ w_7 I(topic\\ is\\ sports) + w_8 I(topic\\ is\\ politics) + b\\end{aligned}$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!vw --cb_explore_adf -d cb.txt --invert_hash cb.model.txt --interactions ua"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "read(\"cb.model.txt\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}