Merge pull request #18 from amysen/fix-rag-typos

Fix rag notebook typos
google · Jul 18, 2024 · 3340871 · 3340871
2 parents d6640c2 + e416b1e
commit 3340871
Showing 1 changed file with 10 additions and 10 deletions.
diff --git a/notebooks/unisim-gemma-text_rag_demo.ipynb b/notebooks/unisim-gemma-text_rag_demo.ipynb
@@ -10,7 +10,7 @@
     "\n",
     "To do this, we are going to combine [Ollama](https://github.com/ollama/ollama) as our local inference engine, [Gemma](https://ai.google.dev/gemma) as our local LLM, our newly released [RETSim](https://arxiv.org/abs/2311.17264) ultra-fast near-duplicate text embeddings, and [USearch](https://github.com/unum-cloud/usearch) for efficient indexing and retrieval. \n",
     "\n",
-    "For those who want a more details write up you can read it [Wingardium Trivia-osa! On-Device Sorting Hatbot Powered by Gemma, Ollama, USearch, and RETSim](https://elie.net/blog/ai/wingardium-trivia-osa-on-device-sorting-hatbot-powered-by-gemma-ollama-usearch-and-retsim)"
+    "For those who want a more detailed write-up, you can read it at [Wingardium Trivia-osa! On-Device Sorting Hatbot Powered by Gemma, Ollama, USearch, and RETSim](https://elie.net/blog/ai/wingardium-trivia-osa-on-device-sorting-hatbot-powered-by-gemma-ollama-usearch-and-retsim)"
    ]
   },
   {
@@ -19,7 +19,7 @@
    "source": [
     "## Setup\n",
     "\n",
-    "First things first, we are installing the package we need Ollama to run Gemma locally, and UniSim to index data with RETSim and retrieve it with USearch."
+    "First things first, we are installing the packages we need for Ollama to run Gemma locally, and UniSim to index data with RETSim and retrieve it with USearch."
    ]
   },
   {
@@ -97,7 +97,7 @@
     }
    ],
    "source": [
-    "# small wrapper function to make generation easier and check it all work\n",
+    "# small wrapper function to make generation easier and check it all works\n",
     "# we use generate as we are going for a RAG style system so streaming is not useful\n",
     "def generate(prompt: str) -> str:\n",
     "    res = ollama.generate(model=MODEL, prompt=prompt)\n",
@@ -132,9 +132,9 @@
    ],
    "source": [
     "# initializizing TextSim for near duplicate text similarity\n",
-    "VERBOSE = True  # interactive demo so we want to see what happen\n",
+    "VERBOSE = True  # interactive demo so we want to see what happens\n",
     "txtsim = unisim.TextSim(verbose=True)\n",
-    "# check it works as intendeds\n",
+    "# check it works as intended\n",
     "sim_value = txtsim.similarity(\"Gemma\", \"Gemmaa\")\n",
     "if sim_value > 0.9:\n",
     "    print(f\"Similarity {sim_value} - TextSim works as intended\")\n",
@@ -254,10 +254,10 @@
     "\n",
     "The first step to build our RAG pipeline to help the LLM with additional context is to load the data, compute the embeddings and index them. We are simply indexing the characters name using RETSim embedding and will return the data associated with it during the retrieval process to help the model.\n",
     "\n",
-    "The data used if from the Kaggle [Characters in Harry Potter Books dataset](https://www.kaggle.com/datasets/zez000/characters-in-harry-potter-books)\n",
+    "The data used is from the Kaggle [Characters in Harry Potter Books dataset](https://www.kaggle.com/datasets/zez000/characters-in-harry-potter-books)\n",
     "\n",
-    "Each characters has its name and a few fields. Our game plan is to use \n",
-    "unisim/textsim to perform typo resilient name lookup and retrive the relevants \n",
+    "Each character has its name and a few fields. Our game plan is to use \n",
+    "unisim/textsim to perform typo-resilient name lookup and retrieve the relevant \n",
     "fields to help Gemma answer about the characters\n",
     "\n",
     "\n",
@@ -321,7 +321,7 @@
     }
    ],
    "source": [
-    "# indexing data with text sim\n",
+    "# indexing data with TextSim\n",
     "txtsim.reset_index()  # clean up in case we run this cell multiple times\n",
     "idx = txtsim.add(list(CHARACTERS_INFO.keys()))\n",
     "txtsim.indexed_data[:10]  # display what we added"
@@ -444,7 +444,7 @@
    "metadata": {},
    "source": [
     "## RAG answers vs direct answer generation\n",
-    "Let see our RAG in action and compare it to the directly generated answers we got before.\n"
+    "Lets see our RAG in action and compare it to the directly generated answers we got before.\n"
    ]
   },
   {