Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tutorial: REST API #61

Merged
merged 17 commits into from
Jan 11, 2023
Merged

New Tutorial: REST API #61

merged 17 commits into from
Jan 11, 2023

Conversation

bilgeyucel
Copy link
Contributor

@bilgeyucel bilgeyucel commented Nov 4, 2022

In this tutorial, I tried to explain the whole process from setting up an environment to querying the pipeline using REST API.

This tutorial cannot be run in Colab, therefore, the structure is different.

I created a preview link if you want to see how it is displayed on Haystack website.

TODO:

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@agnieszka-m agnieszka-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my 2 cents. I tried to add a reason for most of my comments, but if anyting is unclear, just ping me and I'll be happy to chat.
One thing that's not clear to me is why did we need to first set up the demo files and pipeline if we created a totally new pipeline after that?
Also, I think I'd maybe add more info about the pipeline that they were creating. For example, what params it's using and why (just one sentence like: this pipeline uses the default params that work well for htis amount of files blahblah)?

index.toml Outdated

[[tutorial]]
title = "Haystack with REST API"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we're aiming at something like: Tutorial: Using Haystack's REST API."
So always "Tutorial" + the name of the task (starting with either Gerund or imperative -> please consult with Branden, I'm ok with either).

"cell_type": "markdown",
"metadata": {},
"source": [
"# Haystack with REST API\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about tutorial titles above

"\n",
"- **Level**: Intermediate\n",
"- **Time to complete**: 30 minutes\n",
"- **Prerequisites**: N/A\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an intermediate tutorial, we surely need them to have some prerequisite knowledge - knowledge of (basic) NLP concepts (maybe there are some specific concepts they need to know for this tutorial)? Python knowledge (we're still debating whether this should be added as a prereq), basic knowledge of Haystack? You can also add a link to any beginner tutorials that they can complete before this one to make them understand the necessary concepts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I would suggest making this tutorial condensed to 'using the Rest API'. Lmk what you think here as I think this might be a bit controversial:
The usual problems we see is simply technical, in terms of just setting up, using and interacting with docker and rest api. So I would argue understanding in detail the pipeline structure and each component isn't the aim here. Imo the things people should come away with is:

  • These .yml files are where I define my config
  • They are able to edit the file in a way, maybe with some examples (change the model, reader setting etc) and use this as the pipeline that they query with rest api
  • For details on each NLP related component and par I would forward them to the relevant docs (e.g. the full set of options for a preprocessor, or reader, or retriever etc)

Adding this comment as a reply here because of the mention of 'NLP concepts' which I think could be added as a prerequisite. But mainly I would say the prerequisites are: basic understanding of the YAML syntax (link to resource), basic understanding of docker.

"- **Nodes Used**: `ElasticsearchDocumentStore`, `EmbeddingRetriever`\n",
"- **Goal**: After completing this tutorial, you will have learned how you can interact with Haystack through REST API.\n",
"\n",
"This tutorial teaches you how to create your production-ready document search `pipeline.yml` and interact with Haystack through REST API. \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next paragraph should go to a Description section (have a look at the recent tutorials that Branden updated).

"\n",
"This tutorial teaches you how to create your production-ready document search `pipeline.yml` and interact with Haystack through REST API. \n",
"\n",
"First, we are going to set up the environment to run the same question answering pipeline in [Explore the World Demo](https://haystack-demo.deepset.ai/), then create a new pipeline for the new document search system."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"First, we are going to set up the environment to run the same question answering pipeline in [Explore the World Demo](https://haystack-demo.deepset.ai/), then create a new pipeline for the new document search system."
"First, we are going to set up the environment to run the same question answering pipeline as in the [Explore the World Demo](https://haystack-demo.deepset.ai/), then create a new pipeline for the new document search system."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm reading it the second time, I'm not sure I understand this. Do you mean they're going to use the same data as in the demo? or the same pipeline as in the demo? if the same pipeline, why?

"\n",
"You can use REST API to index your files to your document store. This requires an indexing pipeline. Add the indexing pipeline to `document-search.haystack-pipeline.yml`, then, you can use `/file-upload` endpoint to upload your files to Elasticsearch. \n",
"\n",
"Download the same demo files [here](https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/article_txt_countries_and_capitals.zip) and upload them using cURL. Check [this](https://docs.haystack.deepset.ai/docs/rest_api#indexing-documents-in-the-haystack-rest-api-document-store) documentation page for more detail about file indexing.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Download the same demo files [here](https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/article_txt_countries_and_capitals.zip) and upload them using cURL. Check [this](https://docs.haystack.deepset.ai/docs/rest_api#indexing-documents-in-the-haystack-rest-api-document-store) documentation page for more detail about file indexing.\n",
"Download the same [demo files](https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/article_txt_countries_and_capitals.zip) you used in the first part of this tutorial and upload them using cURL.
To learn more about indexing, see [Indexing Documents](https://docs.haystack.deepset.ai/docs/rest_api#indexing-documents-in-the-haystack-rest-api-document-store).\n",

"Download the same demo files [here](https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/article_txt_countries_and_capitals.zip) and upload them using cURL. Check [this](https://docs.haystack.deepset.ai/docs/rest_api#indexing-documents-in-the-haystack-rest-api-document-store) documentation page for more detail about file indexing.\n",
"\n",
"<aside>\n",
"⚠️ If you want to index your files directly to Elasticsearch through script, be sure to provide the same indexing pipeline with your `document-search.haystack-pipeline.yml` file for consistency between indexed files.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"️ If you want to index your files directly to Elasticsearch through script, be sure to provide the same indexing pipeline with your `document-search.haystack-pipeline.yml` file for consistency between indexed files.\n",
"️To index your files directly to Elasticsearch through a script, use the same indexing pipeline with your `document-search.haystack-pipeline.yml` file to make sure the files indexed are consistent.\n",

Our voice: conversational

"cell_type": "markdown",
"metadata": {},
"source": [
"The indexing pipeline should be as follows:\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The indexing pipeline should be as follows:\n",
"Here's what the indexing pipeline should look like:\n",

"cell_type": "markdown",
"metadata": {},
"source": [
"When you restart the Haystack API, REST API is going to load the new pipeline and the pipeline will be ready to use. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"When you restart the Haystack API, REST API is going to load the new pipeline and the pipeline will be ready to use. "
"When you restart the Haystack API, REST API will load the new pipeline and the pipeline will be ready to use. "

},
"nbformat": 4,
"nbformat_minor": 2
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a paragraph summing up what they have learned/built/achieved in this tutorial. Maybe some links they may want to check, additional info, etc.

Copy link
Contributor

@brandenchan brandenchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making a start on this! I think that you have covered many of the start up steps for the REST API in good detail.

I've made a set of comments that mostly request structural / content changes that I think will simplify the tutorial or provide users with a little bit of conceptual overview. This tutorial covers a really tough topic because we are asking users to interact a lot of components (e.g. Docker, CURL, Yaml Pipelines, REST API, Swagger) and I would push to make the experience still a bit more straight forward for the user.

More than happy to clarify any thing that's unclear, or to have a chat to align!

"cell_type": "markdown",
"metadata": {},
"source": [
"### With Docker\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that there is both the option for with and without Docker. In the first tutorials that I've reworked, I've opted not to give options in the tutorial. Let's teach them to do one thing. I think this will make it much more linear and easy for a user to follow.

This also raises the question of the relationship between this tutorial and the guide on the REST API. The way I see it is that the docs page should have small individual sections that explain each task / step that could arise during the creation of the REST API, but they are separate from any specific environment / dataset / pipeline. The tutorial should be a single set of steps that works with a specific environment / dataset / pipeline to create one possible outcome.

So what I'd push for here is to take out the "REST API with Docker" section from this tutorial and turn this specifically into a REST API without Docker tutorial. You can always have a comment saying "If you'd like to initialize with Docker instead see [Initialization with Docker]"

"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting Up the Environment\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an overview on how I've been using heading levels so far:
Header level 1 reserved for title of notebook (e.g. # Build Your First Question Answering System)
Header level 2 is for sections of the tutorial (e.g. ## Preparing the Colab Environment)
Header level 3 is not used because it is not easily distinguishable from Header level 2 visually, and because it isn't shown in the right ToC

Within header level 2, you can have numbered lists
e.g. 1. Update/install Docker and Docker Compose, then launch Docker then 2. Clone Haystack repository

"- **Nodes Used**: `ElasticsearchDocumentStore`, `EmbeddingRetriever`\n",
"- **Goal**: After completing this tutorial, you will have learned how you can interact with Haystack through REST API.\n",
"\n",
"This tutorial teaches you how to create your production-ready document search `pipeline.yml` and interact with Haystack through REST API. \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might not be clear yet to a user what a pipeline.yml is. How about we just call it a search pipeline?

"source": [
"* **Launch Elasticsearch**\n",
"\n",
"Launching Elasticsearch takes some time, so, be sure to have a `healthy` Elasticsearch container before continue. You can check the health through `docker ps` command. \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those who haven't worked with ES before might not know what a healthy response is from ES. If it isn't too cluttered I wonder if a screen shot might help here

"\n",
"Test whether everything is okay by going to Swagger documentation for the Haystack REST API on [`http://127.0.0.1:8000/docs`](http://127.0.0.1:8000/docs) and trying out `/initialized` endpoint or sending a cURL request as `curl -X GET http://127.0.0.1:8000/initialized`. \n",
"\n",
"If everything is alright, you can start asking questions! Wikipedia pages about countries and capital are already indexed to Elasticsearch by the docker image we provided in [`docker-compose.yml`](https://github.com/deepset-ai/haystack/blob/main/docker-compose.yml#L22). To ask questions, you can use `/query` endpoint again via Haystack REST API UI or a cURL request. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you referring to exactly when you say REST API UI? Do you mean sending a request via Swagger or using the actual Haystack UI? Either way I think users need more hand holding and explanation before we can tell them to do this

tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
"source": [
"# Haystack with REST API\n",
"\n",
"- **Level**: Intermediate\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I think this might actually be an advanced topic. The user has to work with Docker, requests to REST API, pipeline ymls. They also have to make changes to lots of files which they probably don't fully understand the significance of.

"cell_type": "markdown",
"metadata": {},
"source": [
"### Indexing pipeline\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to suggest performing indexing via the /file-upload endpoint, I think we need to give a bit more guidance as to what this actually entails. What does the curl command look like?

"Download the same demo files [here](https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/article_txt_countries_and_capitals.zip) and upload them using cURL. Check [this](https://docs.haystack.deepset.ai/docs/rest_api#indexing-documents-in-the-haystack-rest-api-document-store) documentation page for more detail about file indexing.\n",
"\n",
"<aside>\n",
"⚠️ If you want to index your files directly to Elasticsearch through script, be sure to provide the same indexing pipeline with your `document-search.haystack-pipeline.yml` file for consistency between indexed files.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get what this is saying, but it's a bit hard to understand. Do you mean that if someone is doing both index via REST API endpoint and index via script that calls indexing pipeline, that the pipeline in each case is the same so that the indexed files are consistent?

On a formatting level, I think this "aside" block is quite hard to distinguish from regular text on the rendered page and would opt against using it.

"cell_type": "markdown",
"metadata": {},
"source": [
"## Voilà! Make a new query!\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show users what the expected response should look like?

- **Time to complete**: 30 minutes
- **Prerequisites**: N/A
- **Nodes Used**: `ElasticsearchDocumentStore`, `EmbeddingRetriever`
- **Goal**: After completing this tutorial, you will have learned how you can interact with Haystack through REST API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'interact with a Haystack pipeline' ?

"\n",
"- **Level**: Intermediate\n",
"- **Time to complete**: 30 minutes\n",
"- **Prerequisites**: N/A\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I would suggest making this tutorial condensed to 'using the Rest API'. Lmk what you think here as I think this might be a bit controversial:
The usual problems we see is simply technical, in terms of just setting up, using and interacting with docker and rest api. So I would argue understanding in detail the pipeline structure and each component isn't the aim here. Imo the things people should come away with is:

  • These .yml files are where I define my config
  • They are able to edit the file in a way, maybe with some examples (change the model, reader setting etc) and use this as the pipeline that they query with rest api
  • For details on each NLP related component and par I would forward them to the relevant docs (e.g. the full set of options for a preprocessor, or reader, or retriever etc)

Adding this comment as a reply here because of the mention of 'NLP concepts' which I think could be added as a prerequisite. But mainly I would say the prerequisites are: basic understanding of the YAML syntax (link to resource), basic understanding of docker.

* Remove the demo setup process
* Pay attention to the language
* Add more description to all steps

PR: #61
@bilgeyucel
Copy link
Contributor Author

Hi @TuanaCelik, @brandenchan, and @agnieszka-m, I restructured the tutorial according to your feedback. The main changes I made:

  • Removed the first part where I explained the demo setup. Now I directly focus on creating a new application but with more guidance
  • Didn't mention Swagger UI at all. Swagger UI is nice to use but not a production-focused tool, so, I removed it completely
  • Added more explanation to all steps.
  • Followed the new structure of tutorials on Tutorial restructure draft #44

I think this version is cleaner and easier to follow. I tried to pay attention not to suggest options and tried to stick with one approach only.

Let me know what you think! 🙌

"pip install 'farm-haystack[all]'\n",
"pip install -e rest_api/\n",
"\n",
"brew install xpdf # required for PDFToTextConverter node"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the intrusion... by the way, great job!

I think that brew is installed by default only in macOS, not in Linux and Windows.

From Haystack workflows for tests, I see that you can install xpdf on Linux with the following command:
wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz && tar -xvf xpdf-tools-linux-4.04.tar.gz && sudo cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin

For Windows you can install xpdf with choco install xpdf-utils, but you need to have Chocolatey.

Copy link
Contributor

@brandenchan brandenchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of this Tutorial is much much clearer! Good work. I have just made a set of language changes.

markdowns/20_REST_API.md Outdated Show resolved Hide resolved
## Overview

Haystack enables you to apply the latest NLP technology to your own data and create production-ready applications. Building an end-to-end NLP application requires the combination of multiple concepts. Here are those consepts:
* **DocumentStore** stores the data. You will use Elasticsearch for this tutorial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **DocumentStore** stores the data. You will use Elasticsearch for this tutorial.
* **DocumentStore** stores the data. You will use the ElasticsearchDocumentStore for this tutorial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it confuses me to put it like that. I said "You will use Elasticsearch for this tutorial" because they will start an Elasticsearch instance and then, use ElasticsearchDocumentStore to connect to the instance. Would it be better like this? 👇

Suggested change
* **DocumentStore** stores the data. You will use Elasticsearch for this tutorial.
* **DocumentStore** stores the data. You will start an Elasticsearch instance and use `ElasticsearchDocumentStore` to connect to the instance.

markdowns/20_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_REST_API.md Outdated Show resolved Hide resolved
Copy link
Contributor

@TuanaCelik TuanaCelik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job @bilgeyucel such great effort <3
I have left some minor comments. And one extra one that you could consider with the help of someone in core-engineering maybe is to include what OS these commands would run on, or some type of pre-req (in terms of setup, additionally to the ones you've already provided)

markdowns/20_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_REST_API.md Outdated Show resolved Hide resolved

## Create Pipeline YAML File

YAML files are widely used for confugurations. Haystack enables defining pipelines as YAML files and `load_from_yaml` method loads pipelines from YAML file. In a YAML file, `components` section defines all pipeline nodes and `pipelines` section defines how these nodes are connected to each other to form a pipeline. Let's start with defining query and indexing pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_from_yaml lets you load the YAML pipeline into your python script or pipeline object right? If I have this right could the following make sense?

YAML files are widely used for configurations. Haystack enables defining pipelines as YAML files and the load_from_yaml method will even allow you to load a pipeline defined in YAML into a Python object.

markdowns/20_REST_API.md Outdated Show resolved Hide resolved
type: EmbeddingRetriever
params:
document_store: DocumentStore
top_k: 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is up to you, but I think the section just under these yaml definitions are a great opportunity to hint to the learner how they can change setting. It might serve as a good mental map from the API reference they can see in docs to implementing them in a YAML file. top_k for example but you can in a short paragraph maybe hint them into some other things they could set?

I'll leave this decision up to you and @brandenchan depending on the level of 'optionality' you decide to provide in this tutorial

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to mention API reference 💡 I don't want to offer any options such as saying that "add max_seq_len to Retriever", instead, encourage them to play with the pipeline later and refer to the API Reference for that. So, this sentence👇 might come after sharing the whole YAML file. WDYT? @TuanaCelik @brandenchan

Feel free to play with the pipeline setup later on. Add or remove some nodes, change the parameters or add new ones. Check out Haystack API Reference for more options on nodes and parameters.

* Add more explanation
* Make the language cleaner
Copy link
Contributor

@agnieszka-m agnieszka-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some language comments. Great job!

tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
"source": [
"## Create Pipeline YAML File\n",
"\n",
"YAML files are widely used for configurations. Haystack enables defining pipelines as YAML files and the `load_from_yaml()` method loads pipelines from YAML file. In a YAML file, the `components` section defines all pipeline nodes and `pipelines` section defines how these nodes are connected to each other to form a pipeline. Let's start with defining query and indexing pipelines."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"YAML files are widely used for configurations. Haystack enables defining pipelines as YAML files and the `load_from_yaml()` method loads pipelines from YAML file. In a YAML file, the `components` section defines all pipeline nodes and `pipelines` section defines how these nodes are connected to each other to form a pipeline. Let's start with defining query and indexing pipelines."
"YAML files are widely used for configurations. You can define Haystack pipelines as YAML files and then use the `load_from_yaml()` method to load pipelines from the YAML file. In a YAML file, the `components` section defines all pipeline nodes and the `pipelines` section defines how these nodes are connected to each other to form a pipeline. Let's start with defining query and indexing pipelines."

It's recommended not to say your product "enables" people to do something. Instead, you can say they can do something with the product.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @agnieszka-m, here, the user doesn't use the load_from_yaml() method, it is used by Haystack. I wanted to explain the process a little bit so that they can see the connection between a YAML file and python objects we use in Haystack. What do you think about this 👇 ?

YAML files are widely used for configurations. You can define Haystack pipelines as YAML files and then `load_from_yaml()` method loads the pipeline defined in YAML into a Python object. In a YAML file, the `components` section defines all pipeline nodes and the `pipelines` section defines how these nodes are connected to each other to form a pipeline. Let's start with defining query and indexing pipelines.

tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this tutorial, this is very much needed!

I took a first pass and I stopped at the setup phase, let me know what you think!

tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_REST_API.ipynb Outdated Show resolved Hide resolved
@bilgeyucel bilgeyucel requested a review from masci December 9, 2022 11:22
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Clone Haystack repository.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, do we need this at all? Since we let users create a pipeline from scratch, why not doing the same for the compose file? I believe the narrative would flow better and the result would be much more rewarding for the user. Unexperienced Docker users could still just copy/paste the yaml code into a compose file if they want to skip this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @masci, I agree that they don't need to clone the whole repository only for the compose file, yet, I am not sure how to mitigate this issue. In my opinion, if we make them create the compose file from scratch in the tutorial, we need to explain what each line of code does. I didn't want to do that because:

  1. Explaining the compose file would make the tutorial longer and more complicated for inexperienced users
  2. The focus point of this tutorial is the pipeline YAML file and how to edit it, I don't want to shift toward Docker and put it in a spotlight

That's why I think for this tutorial it's more important to provide a compose file that works out of the box. Maybe I can give the Github link to the compose file and ask them to copy/paste it instead of making them clone the repo. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, they can curl https://raw.githubusercontent.com/deepset-ai/haystack/main/docker-compose.yml 👍

tutorials/20_Using_Haystack_with_REST_API.ipynb Outdated Show resolved Hide resolved
"source": [
"1. Create a document search pipeline.\n",
"\n",
"Time to design a document search pipeline from scratch. This will be your query pipeline. Create a new file named `document-search.haystack-pipeline.yml` in the `/pipeline` folder under `/rest_api` in the Haystack code base. Then, create a `PIPELINE_YAML_PATH` variable in the `docker-compose.yml` with the new file name. `PIPELINE_YAML_PATH` variable will tell `rest_api` which YAML file to run. \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to my previous point, if we skip the git clone haystack step the folder from which users run this tutorial might be much simpler, something like

tutorial
├── docker-compose.yml
└── document-search.haystack-pipeline.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea 💯

tutorials/20_Using_Haystack_with_REST_API.ipynb Outdated Show resolved Hide resolved
@bilgeyucel bilgeyucel requested a review from masci December 28, 2022 14:23
tutorials/20_Using_Haystack_with_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_Using_Haystack_with_REST_API.ipynb Outdated Show resolved Hide resolved
tutorials/20_Using_Haystack_with_REST_API.ipynb Outdated Show resolved Hide resolved
markdowns/20_Using_Haystack_with_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_Using_Haystack_with_REST_API.md Outdated Show resolved Hide resolved
markdowns/20_Using_Haystack_with_REST_API.md Outdated Show resolved Hide resolved
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@bilgeyucel bilgeyucel requested a review from anakin87 January 10, 2023 11:38
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me!

* Add version info
* Fix comment spacing
@TuanaCelik TuanaCelik self-requested a review January 10, 2023 15:44
@bilgeyucel bilgeyucel merged commit e1d78c4 into main Jan 11, 2023
@bilgeyucel bilgeyucel deleted the tutorial_20 branch January 11, 2023 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants