From de0f921da052c880e724b37f970d47248992d200 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 11 Oct 2022 17:37:51 +0200 Subject: [PATCH 01/48] Draft tutorial 1 restructure --- markdowns/1.md | 250 +++++----------- tutorials/01_Basic_QA_Pipeline.ipynb | 432 ++++++++++----------------- 2 files changed, 241 insertions(+), 441 deletions(-) diff --git a/markdowns/1.md b/markdowns/1.md index b78cc592..0320173b 100644 --- a/markdowns/1.md +++ b/markdowns/1.md @@ -7,39 +7,24 @@ date: "2020-09-03" id: "tutorial1md" ---> -# Build Your First QA System - - +# Build Your First Question Answering System [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb) -Question Answering can be used in a variety of use cases. A very common one: Using it to navigate through complex knowledge bases or long documents ("search setting"). - -A "knowledge base" could for example be your website, an internal wiki or a collection of financial reports. -In this tutorial we will work on a slightly different domain: "Game of Thrones". - -Let's see how we can use a bunch of Wikipedia articles to answer a variety of questions about the -marvellous seven kingdoms. - +Question answering allows you to quickly look into your Document collections to find answers to your questions. You can use it to search through complex knowledge bases, or long documents. -### Prepare environment +A knowledge base could, for example, be your website, an internal wiki or a collection of financial reports. In this tutorial we will work on a set of wiki pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! -#### Colab: Enable the GPU runtime -Make sure you enable the GPU runtime to experience decent speed in this tutorial. -**Runtime -> Change Runtime type -> Hardware accelerator -> GPU** - +## Prepare Environment -You can double check whether the GPU runtime is enabled with the following command: - - -```bash -%%bash +Before running the code in this notebook, you should set up the Colab environment with the following steps: +- [Enable GPU Runtime in GPU]() +- [Check if GPU is Enabled]() +- [Set logging level to INFO]() -nvidia-smi -``` -To start, install the latest release of Haystack with `pip`: +To start, let's install the latest release of Haystack with `pip`: ```bash @@ -49,45 +34,14 @@ pip install --upgrade pip pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] ``` -## Logging - -We configure how logging messages should be displayed and which log level should be used before importing Haystack. -Example log message: -INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt -Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily: - - -```python -import logging - -logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) -logging.getLogger("haystack").setLevel(logging.INFO) -``` - ## Document Store -Haystack finds answers to queries within the documents stored in a `DocumentStore`. The current implementations of `DocumentStore` include `ElasticsearchDocumentStore`, `FAISSDocumentStore`, `SQLDocumentStore`, and `InMemoryDocumentStore`. +A Haystack question answering system finds answers to questions within the documents stored in a `DocumentStore`. In this tutorial, we are initializing an `ElasticsearchDocumentStore` but there are many other options available. To learn which one is right for your use case, and how to initialize it, see [Choosing the Right Document Store](https://haystack.deepset.ai/components/document-store#choosing-the-right-document-store) and [Initialization](https://haystack.deepset.ai/components/document-store#initialisation). -**Here:** We recommended Elasticsearch as it comes preloaded with features like [full-text queries](https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html), [BM25 retrieval](https://www.elastic.co/elasticon/conf/2016/sf/improved-text-scoring-with-bm25), and [vector storage for text embeddings](https://www.elastic.co/guide/en/elasticsearch/reference/7.6/dense-vector.html). -**Alternatives:** If you are unable to setup an Elasticsearch instance, then follow the [Tutorial 3](https://github.com/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb) for using SQL/InMemory document stores. +### Start an Elasticsearch Server -**Hint**: This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores. - -### Start an Elasticsearch server locally -You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source. - - -```python -# Recommended: Start Elasticsearch using Docker via the Haystack utility function -from haystack.utils import launch_es - -launch_es() -``` - -### Start an Elasticsearch server in Colab - -If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source. +The `ElasticsearchDocumentStore` needs to attach to a running Elasticsearch server. To download, extract and set the permission for the Elasticsearch installation image, run: ```bash @@ -98,6 +52,8 @@ tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz chown -R daemon:daemon elasticsearch-7.9.2 ``` +To start the server, run: + ```bash %%bash --bg @@ -105,9 +61,7 @@ chown -R daemon:daemon elasticsearch-7.9.2 sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch ``` -### Create the Document Store - -The `ElasticsearchDocumentStore` class will try to open a connection in the constructor, here we wait 30 seconds only to be sure Elasticsearch is ready before continuing: +Let's wait 30 seconds to make sure the server has fully started up. ```python @@ -115,7 +69,11 @@ import time time.sleep(30) ``` -Finally, we create the Document Store instance: +If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done [manually](https://haystack.deepset.ai/components/document-store#initialisation), or using our [`launch_es()`]() utility function. + +### Create the DocumentStore + +When you initialize the `ElasticsearchDocumentStore`, it opens a connection with the Elasticsearch service. ```python @@ -124,127 +82,83 @@ from haystack.document_stores import ElasticsearchDocumentStore # Get the host where Elasticsearch is running, default to localhost host = os.environ.get("ELASTICSEARCH_HOST", "localhost") -document_store = ElasticsearchDocumentStore(host=host, username="", password="", index="document") -``` -## Preprocessing of documents +document_store = ElasticsearchDocumentStore( + host=host, + username="", + password="", + index="document" +) +``` -Haystack provides a customizable pipeline for: - - converting files into texts - - cleaning texts - - splitting texts - - writing them to a Document Store +## Preparing Documents -In this tutorial, we download Wikipedia articles about Game of Thrones, apply a basic cleaning function, and index them in Elasticsearch. +Let's download 517 articles from the Game of Thrones Wikipedia. They can be found in `data/tutorial1` as a set of `.txt` files. ```python -from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http +from haystack.utils import fetch_archive_from_http - -# Let's first fetch some documents that we want to query -# Here: 517 Wikipedia articles for Game of Thrones doc_dir = "data/tutorial1" -s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip" -fetch_archive_from_http(url=s3_url, output_dir=doc_dir) - -# Convert files to dicts -# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers) -# It must take a str as input, and return a str. -docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True) - -# We now have a list of dictionaries that we can write to our document store. -# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself. -# The default format here is: -# { -# 'content': "", -# 'meta': {'name': "", ...} -# } -# (Optionally: you can also add more key-value-pairs here, that will be indexed as fields in Elasticsearch and -# can be accessed later for filtering or shown in the responses of the Pipeline) - -# Let's have a look at the first 3 entries: -print(docs[:3]) -# Now, let's write the dicts containing documents to our DB. -document_store.write_documents(docs) +fetch_archive_from_http( + url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", + output_dir=doc_dir +) ``` -## Initialize Retriever, Reader & Pipeline - -### Retriever - -Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered. -They use some simple but fast algorithm. - -**Here:** We use Elasticsearch's default BM25 algorithm - -**Alternatives:** - -- Customize the `BM25Retriever`with custom queries (e.g. boosting) and filters -- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging -- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT) -- Use `DensePassageRetriever` to use different embedding models for passage and query (see Tutorial 6) +The `.txt` files we just downloaded need to be converted into Haystack [Document objects](https://haystack.deepset.ai/components/documents-answers-labels#document) before they can be written into the DocumentStore. We will also apply the `clean_wiki_text` cleaning function to the text and split the Wikipedia documents by paragraph breaks. ```python -from haystack.nodes import BM25Retriever - -retriever = BM25Retriever(document_store=document_store) +from haystack.utils import clean_wiki_text, convert_files_to_docs +docs = convert_files_to_docs( + dir_path=doc_dir, + clean_func=clean_wiki_text, + split_paragraphs=True +) ``` +Now let's write these Documents into the DocumentStore. -```python -# Alternative: An in-memory TfidfRetriever based on Pandas dataframes for building quick-prototypes with SQLite document store. -# from haystack.nodes import TfidfRetriever -# retriever = TfidfRetriever(document_store=document_store) +```python +# Now, let's write the dicts containing documents to our DB. +document_store.write_documents(docs) ``` -### Reader +While the default code in this tutorial uses Game of Thrones data, you can also supply your own data. So long as your data adheres to the [input format](https://haystack.deepset.ai/components/document-store#input-format) or is cast into a [Document object](https://haystack.deepset.ai/components/documents-answers-labels#document), it can be written into the DocumentStore. -A Reader scans the texts returned by retrievers in detail and extracts the k best answers. They are based -on powerful, but slower deep learning models. +## Retriever -Haystack currently supports Readers based on the frameworks FARM and Transformers. -With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models). +Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Since this happens at [query time](), they need to be fast. Here we are using the BM25 algorithm which is considered a [sparse retrieval method](https://haystack.deepset.ai/pipeline_nodes/retriever#deeper-dive-dense-vs-sparse). For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). -**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2) -**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package) +```python +from haystack.nodes import BM25Retriever -**Alternatives (Models):** e.g. "distilbert-base-uncased-distilled-squad" (fast) or "deepset/bert-large-uncased-whole-word-masking-squad2" (good accuracy) +retriever = BM25Retriever(document_store=document_store) +``` -**Hint:** You can adjust the model to return "no answer possible" with the no_ans_boost. Higher values mean the model prefers "no answer possible" +## Reader -#### FARMReader +A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Haystack Readers can load question answering models from [Hugging Face's model hub](https://huggingface.co/models?pipeline_tag=question-answering&sort=downloads). Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). ```python from haystack.nodes import FARMReader -# Load a local model or any of the QA models on -# Hugging Face's model hub (https://huggingface.co/models) - reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) ``` -#### TransformersReader +### The Retriever-Reader Pipeline -Alternative: +The Retriever and Reader that we just initialized are considered [nodes](https://haystack.deepset.ai/pipeline_nodes/overview) in Haystack and they are connected using a [`pipeline`](https://haystack.deepset.ai/components/pipelines). Pipelines are customizable, giving you the power to define how data is routed through the nodes at both indexing and querying time. +It makes sense to join a Retriever and Reader because they are a complementary pairing. While the Reader is very effective at picking out answers to questions, it is not fast enough to perform this on large amounts of text at query time. By performing retrieval first, only the most promising candidate Documents are passed to the Reader, thus reducing its workload. This improvement in speed can come with a small tradeoff in accuracy. To learn how to optimize the performance of your question answering system, have a look at [Optimization](https://haystack.deepset.ai/guides/optimization). -```python -from haystack.nodes import TransformersReader -# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1) -``` +In Haystack, there is a `Pipeline` class that allows you to define your own pipeline configuration. However, there are also [Ready-Made Pipelines] that simplify the initialization of commonly used configurations. Here, we are using the [`ExtractiveQAPipeline`](https://haystack.deepset.ai/components/ready-made-pipelines#extractiveqapipeline) that combines our Reader and Retriever. -### Pipeline - -With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline. -Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases. -To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions. -You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelinesmd). ```python @@ -253,57 +167,45 @@ from haystack.pipelines import ExtractiveQAPipeline pipe = ExtractiveQAPipeline(reader, retriever) ``` -## Voilà! Ask a question! +## Asking Questions and Getting Answers + +Haystack pipelines have a `run()` method that performs a query, or in the case of an `ExtractiveQAPipeline`, answers a question. The `params` argument allows you to provide arguments to the nodes performing the query. See [Arguments](https://haystack.deepset.ai/components/pipelines#arguments) to learn how to populate this field, and [Choosing the Right top-k Values] to understand what the `top-k` parameters are doing. ```python -# You can configure how many candidates the Reader and Retriever shall return -# The higher top_k_retriever, the better (but also the slower) your answers. prediction = pipe.run( - query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}} + query="Who is the father of Arya Stark?", + params={ + "Retriever": {"top_k": 10}, + "Reader": {"top_k": 5} + } ) ``` +Here are some questions you could try out: +- Who is the father of Arya Stark? +- Who created the Dothraki vocabulary? +- Who is the sister of Sansa? -```python -# prediction = pipe.run(query="Who created the Dothraki vocabulary?", params={"Reader": {"top_k": 5}}) -# prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}}) -``` - -Now you can either print the object directly: +The answers returned by the pipeline can be printed out directly. ```python from pprint import pprint pprint(prediction) - -# Sample output: -# { -# 'answers': [ , -# , -# ... -# ] -# 'documents': [ , -# , -# ... -# ], -# 'no_ans_gap': 11.688868522644043, -# 'node_id': 'Reader', -# 'params': {'Reader': {'top_k': 5}, 'Retriever': {'top_k': 5}}, -# 'query': 'Who is the father of Arya Stark?', -# 'root_node': 'Query' -# } ``` -Or use a util to simplify the output: +We also provide a utility functions to simplify the output. ```python from haystack.utils import print_answers -# Change `minimum` to `medium` or `all` to raise the level of detail -print_answers(prediction, details="minimum") +print_answers( + prediction, + details="minimum" ## Choose from `minimum`, `medium` and `all` +) ``` ## About us diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb index c88a5530..6234b4ed 100644 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ b/tutorials/01_Basic_QA_Pipeline.ipynb @@ -4,62 +4,34 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Build Your First QA System\n", - "\n", - "\n", + "# Build Your First Question Answering System\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb)\n", "\n", - "Question Answering can be used in a variety of use cases. A very common one: Using it to navigate through complex knowledge bases or long documents (\"search setting\").\n", - "\n", - "A \"knowledge base\" could for example be your website, an internal wiki or a collection of financial reports. \n", - "In this tutorial we will work on a slightly different domain: \"Game of Thrones\". \n", + "Question answering allows you to quickly look into your Document collections to find answers to your questions. You can use it to search through complex knowledge bases, or long documents.\n", "\n", - "Let's see how we can use a bunch of Wikipedia articles to answer a variety of questions about the \n", - "marvellous seven kingdoms.\n" + "A knowledge base could, for example, be your website, an internal wiki or a collection of financial reports. In this tutorial we will work on a set of wiki pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n" ] }, { "cell_type": "markdown", - "metadata": { - "collapsed": false - }, "source": [ - "### Prepare environment\n", - "\n", - "#### Colab: Enable the GPU runtime\n", - "Make sure you enable the GPU runtime to experience decent speed in this tutorial.\n", - "**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n", - "\n", - "\n", + "## Prepare Environment\n", "\n", - "You can double check whether the GPU runtime is enabled with the following command:" - ] - }, - { - "cell_type": "code", - "execution_count": null, + "Before running the code in this notebook, you should set up the Colab environment with the following steps:\n", + "- [Enable GPU Runtime in GPU]()\n", + "- [Check if GPU is Enabled]()\n", + "- [Set logging level to INFO]()\n" + ], "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - }, - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [], - "source": [ - "%%bash\n", - "\n", - "nvidia-smi" - ] + "collapsed": false + } }, { "cell_type": "markdown", "metadata": {}, "source": [ - "To start, install the latest release of Haystack with `pip`:" + "To start, let's install the latest release of Haystack with `pip`:" ] }, { @@ -78,77 +50,22 @@ "pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]" ] }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "## Logging\n", - "\n", - "We configure how logging messages should be displayed and which log level should be used before importing Haystack.\n", - "Example log message:\n", - "INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt\n", - "Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "import logging\n", - "\n", - "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", - "logging.getLogger(\"haystack\").setLevel(logging.INFO)" - ] - }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Document Store\n", "\n", - "Haystack finds answers to queries within the documents stored in a `DocumentStore`. The current implementations of `DocumentStore` include `ElasticsearchDocumentStore`, `FAISSDocumentStore`, `SQLDocumentStore`, and `InMemoryDocumentStore`.\n", - "\n", - "**Here:** We recommended Elasticsearch as it comes preloaded with features like [full-text queries](https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html), [BM25 retrieval](https://www.elastic.co/elasticon/conf/2016/sf/improved-text-scoring-with-bm25), and [vector storage for text embeddings](https://www.elastic.co/guide/en/elasticsearch/reference/7.6/dense-vector.html).\n", - "\n", - "**Alternatives:** If you are unable to setup an Elasticsearch instance, then follow the [Tutorial 3](https://github.com/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb) for using SQL/InMemory document stores.\n", - "\n", - "**Hint**: This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores.\n", - "\n", - "### Start an Elasticsearch server locally\n", - "You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Recommended: Start Elasticsearch using Docker via the Haystack utility function\n", - "from haystack.utils import launch_es\n", - "\n", - "launch_es()" + "A Haystack question answering system finds answers to questions within the documents stored in a `DocumentStore`. In this tutorial, we are initializing an `ElasticsearchDocumentStore` but there are many other options available. To learn which one is right for your use case, and how to initialize it, see [Choosing the Right Document Store](https://haystack.deepset.ai/components/document-store#choosing-the-right-document-store) and [Initialization](https://haystack.deepset.ai/components/document-store#initialisation).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Start an Elasticsearch server in Colab\n", + "### Start an Elasticsearch Server\n", "\n", - "If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source." + "The `ElasticsearchDocumentStore` needs to attach to a running Elasticsearch server. To download, extract and set the permission for the Elasticsearch installation image, run:" ] }, { @@ -168,53 +85,75 @@ "chown -R daemon:daemon elasticsearch-7.9.2" ] }, + { + "cell_type": "markdown", + "source": [ + "To start the server, run:" + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "code", "execution_count": null, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, "outputs": [], "source": [ "%%bash --bg\n", "\n", "sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch" - ] + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", - "metadata": {}, "source": [ - "### Create the Document Store\n", - "\n", - "The `ElasticsearchDocumentStore` class will try to open a connection in the constructor, here we wait 30 seconds only to be sure Elasticsearch is ready before continuing:" - ] + "Let's wait 30 seconds to make sure the server has fully started up." + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "code", "execution_count": null, - "metadata": {}, "outputs": [], "source": [ "import time\n", "time.sleep(30)" - ] + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", - "metadata": {}, "source": [ - "Finally, we create the Document Store instance:" - ] + "If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done [manually](https://haystack.deepset.ai/components/document-store#initialisation), or using our [`launch_es()`]() utility function." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Create the DocumentStore\n", + "\n", + "When you initialize the `ElasticsearchDocumentStore`, it opens a connection with the Elasticsearch service." + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { - "name": "#%%\n" + "is_executing": true } }, "outputs": [], @@ -224,89 +163,53 @@ "\n", "# Get the host where Elasticsearch is running, default to localhost\n", "host = os.environ.get(\"ELASTICSEARCH_HOST\", \"localhost\")\n", - "document_store = ElasticsearchDocumentStore(host=host, username=\"\", password=\"\", index=\"document\")" + "\n", + "document_store = ElasticsearchDocumentStore(\n", + " host=host,\n", + " username=\"\",\n", + " password=\"\",\n", + " index=\"document\"\n", + ")" ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ - "## Preprocessing of documents\n", + "## Preparing Documents\n", "\n", - "Haystack provides a customizable pipeline for:\n", - " - converting files into texts\n", - " - cleaning texts\n", - " - splitting texts\n", - " - writing them to a Document Store\n", - "\n", - "In this tutorial, we download Wikipedia articles about Game of Thrones, apply a basic cleaning function, and index them in Elasticsearch." + "Let's download 517 articles from the Game of Thrones Wikipedia. They can be found in `data/tutorial1` as a set of `.txt` files." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, "outputs": [], "source": [ - "from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http\n", - "\n", + "from haystack.utils import fetch_archive_from_http\n", "\n", - "# Let's first fetch some documents that we want to query\n", - "# Here: 517 Wikipedia articles for Game of Thrones\n", "doc_dir = \"data/tutorial1\"\n", - "s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip\"\n", - "fetch_archive_from_http(url=s3_url, output_dir=doc_dir)\n", - "\n", - "# Convert files to dicts\n", - "# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers)\n", - "# It must take a str as input, and return a str.\n", - "docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)\n", "\n", - "# We now have a list of dictionaries that we can write to our document store.\n", - "# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself.\n", - "# The default format here is:\n", - "# {\n", - "# 'content': \"\",\n", - "# 'meta': {'name': \"\", ...}\n", - "# }\n", - "# (Optionally: you can also add more key-value-pairs here, that will be indexed as fields in Elasticsearch and\n", - "# can be accessed later for filtering or shown in the responses of the Pipeline)\n", - "\n", - "# Let's have a look at the first 3 entries:\n", - "print(docs[:3])\n", - "\n", - "# Now, let's write the dicts containing documents to our DB.\n", - "document_store.write_documents(docs)" - ] + "fetch_archive_from_http(\n", + " url=\"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip\",\n", + " output_dir=doc_dir\n", + ")" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "is_executing": true + } + } }, { "cell_type": "markdown", - "metadata": {}, "source": [ - "## Initialize Retriever, Reader & Pipeline\n", - "\n", - "### Retriever\n", - "\n", - "Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered.\n", - "They use some simple but fast algorithm.\n", - "\n", - "**Here:** We use Elasticsearch's default BM25 algorithm\n", - "\n", - "**Alternatives:**\n", - "\n", - "- Customize the `BM25Retriever`with custom queries (e.g. boosting) and filters\n", - "- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging\n", - "- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT)\n", - "- Use `DensePassageRetriever` to use different embedding models for passage and query (see Tutorial 6)" - ] + "The `.txt` files we just downloaded need to be converted into Haystack [Document objects](https://haystack.deepset.ai/components/documents-answers-labels#document) before they can be written into the DocumentStore. We will also apply the `clean_wiki_text` cleaning function to the text and split the Wikipedia documents by paragraph breaks." + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "code", @@ -314,98 +217,99 @@ "metadata": {}, "outputs": [], "source": [ - "from haystack.nodes import BM25Retriever\n", - "\n", - "retriever = BM25Retriever(document_store=document_store)" + "from haystack.utils import clean_wiki_text, convert_files_to_docs\n", + "docs = convert_files_to_docs(\n", + " dir_path=doc_dir,\n", + " clean_func=clean_wiki_text,\n", + " split_paragraphs=True\n", + ")" ] }, + { + "cell_type": "markdown", + "source": [ + "Now let's write these Documents into the DocumentStore." + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false, - "name": "#%%\n" - } - }, "outputs": [], "source": [ - "# Alternative: An in-memory TfidfRetriever based on Pandas dataframes for building quick-prototypes with SQLite document store.\n", - "\n", - "# from haystack.nodes import TfidfRetriever\n", - "# retriever = TfidfRetriever(document_store=document_store)" - ] + "# Now, let's write the dicts containing documents to our DB.\n", + "document_store.write_documents(docs)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "While the default code in this tutorial uses Game of Thrones data, you can also supply your own data. So long as your data adheres to the [input format](https://haystack.deepset.ai/components/document-store#input-format) or is cast into a [Document object](https://haystack.deepset.ai/components/documents-answers-labels#document), it can be written into the DocumentStore." + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Reader\n", - "\n", - "A Reader scans the texts returned by retrievers in detail and extracts the k best answers. They are based\n", - "on powerful, but slower deep learning models.\n", - "\n", - "Haystack currently supports Readers based on the frameworks FARM and Transformers.\n", - "With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models).\n", - "\n", - "**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2)\n", - "\n", - "**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package)\n", + "## Retriever\n", "\n", - "**Alternatives (Models):** e.g. \"distilbert-base-uncased-distilled-squad\" (fast) or \"deepset/bert-large-uncased-whole-word-masking-squad2\" (good accuracy)\n", - "\n", - "**Hint:** You can adjust the model to return \"no answer possible\" with the no_ans_boost. Higher values mean the model prefers \"no answer possible\"\n", - "\n", - "#### FARMReader" + "Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Since this happens at [query time](), they need to be fast. Here we are using the BM25 algorithm which is considered a [sparse retrieval method](https://haystack.deepset.ai/pipeline_nodes/retriever#deeper-dive-dense-vs-sparse). For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false - } - }, + "metadata": {}, "outputs": [], "source": [ - "from haystack.nodes import FARMReader\n", - "\n", - "# Load a local model or any of the QA models on\n", - "# Hugging Face's model hub (https://huggingface.co/models)\n", + "from haystack.nodes import BM25Retriever\n", "\n", - "reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)" + "retriever = BM25Retriever(document_store=document_store)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### TransformersReader\n", + "## Reader\n", "\n", - "Alternative:" + "A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Haystack Readers can load question answering models from [Hugging Face's model hub](https://huggingface.co/models?pipeline_tag=question-answering&sort=downloads). Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "pycharm": { + "is_executing": false + } + }, "outputs": [], "source": [ - "from haystack.nodes import TransformersReader\n", - "# reader = TransformersReader(model_name_or_path=\"distilbert-base-uncased-distilled-squad\", tokenizer=\"distilbert-base-uncased\", use_gpu=-1)" + "from haystack.nodes import FARMReader\n", + "\n", + "reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Pipeline\n", + "### The Retriever-Reader Pipeline\n", + "\n", + "The Retriever and Reader that we just initialized are considered [nodes](https://haystack.deepset.ai/pipeline_nodes/overview) in Haystack and they are connected using a [`pipeline`](https://haystack.deepset.ai/components/pipelines). Pipelines are customizable, giving you the power to define how data is routed through the nodes at both indexing and querying time.\n", + "\n", + "It makes sense to join a Retriever and Reader because they are a complementary pairing. While the Reader is very effective at picking out answers to questions, it is not fast enough to perform this on large amounts of text at query time. By performing retrieval first, only the most promising candidate Documents are passed to the Reader, thus reducing its workload. This improvement in speed can come with a small tradeoff in accuracy. To learn how to optimize the performance of your question answering system, have a look at [Optimization](https://haystack.deepset.ai/guides/optimization).\n", "\n", - "With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline.\n", - "Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases.\n", - "To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions.\n", - "You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelinesmd)." + "In Haystack, there is a `Pipeline` class that allows you to define your own pipeline configuration. However, there are also [Ready-Made Pipelines] that simplify the initialization of commonly used configurations. Here, we are using the [`ExtractiveQAPipeline`](https://haystack.deepset.ai/components/ready-made-pipelines#extractiveqapipeline) that combines our Reader and Retriever.\n" ] }, { @@ -427,9 +331,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Voilà! Ask a question!" + "## Asking Questions and Getting Answers" ] }, + { + "cell_type": "markdown", + "source": [ + "Haystack pipelines have a `run()` method that performs a query, or in the case of an `ExtractiveQAPipeline`, answers a question. The `params` argument allows you to provide arguments to the nodes performing the query. See [Arguments](https://haystack.deepset.ai/components/pipelines#arguments) to learn how to populate this field, and [Choosing the Right top-k Values] to understand what the `top-k` parameters are doing." + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "code", "execution_count": null, @@ -440,28 +353,32 @@ }, "outputs": [], "source": [ - "# You can configure how many candidates the Reader and Retriever shall return\n", - "# The higher top_k_retriever, the better (but also the slower) your answers.\n", "prediction = pipe.run(\n", - " query=\"Who is the father of Arya Stark?\", params={\"Retriever\": {\"top_k\": 10}, \"Reader\": {\"top_k\": 5}}\n", + " query=\"Who is the father of Arya Stark?\",\n", + " params={\n", + " \"Retriever\": {\"top_k\": 10},\n", + " \"Reader\": {\"top_k\": 5}\n", + " }\n", ")" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "cell_type": "markdown", "source": [ - "# prediction = pipe.run(query=\"Who created the Dothraki vocabulary?\", params={\"Reader\": {\"top_k\": 5}})\n", - "# prediction = pipe.run(query=\"Who is the sister of Sansa?\", params={\"Reader\": {\"top_k\": 5}})" - ] + "Here are some questions you could try out:\n", + "- Who is the father of Arya Stark?\n", + "- Who created the Dothraki vocabulary?\n", + "- Who is the sister of Sansa?" + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now you can either print the object directly:" + "The answers returned by the pipeline can be printed out directly." ] }, { @@ -472,31 +389,14 @@ "source": [ "from pprint import pprint\n", "\n", - "pprint(prediction)\n", - "\n", - "# Sample output:\n", - "# {\n", - "# 'answers': [ ,\n", - "# ,\n", - "# ...\n", - "# ]\n", - "# 'documents': [ ,\n", - "# ,\n", - "# ...\n", - "# ],\n", - "# 'no_ans_gap': 11.688868522644043,\n", - "# 'node_id': 'Reader',\n", - "# 'params': {'Reader': {'top_k': 5}, 'Retriever': {'top_k': 5}},\n", - "# 'query': 'Who is the father of Arya Stark?',\n", - "# 'root_node': 'Query'\n", - "# }" + "pprint(prediction)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Or use a util to simplify the output:" + "We also provide a utility functions to simplify the output." ] }, { @@ -504,25 +404,23 @@ "execution_count": null, "metadata": { "pycharm": { - "is_executing": false, - "name": "#%%\n" + "is_executing": false } }, "outputs": [], "source": [ "from haystack.utils import print_answers\n", "\n", - "# Change `minimum` to `medium` or `all` to raise the level of detail\n", - "print_answers(prediction, details=\"minimum\")" + "print_answers(\n", + " prediction,\n", + " details=\"minimum\" ## Choose from `minimum`, `medium` and `all`\n", + ")" ] }, { "cell_type": "markdown", "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%% md\n" - } + "collapsed": false }, "source": [ "## About us\n", From e6a5693e51d97a3322a25da9b97a762a9f014aa6 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 11 Oct 2022 17:40:14 +0200 Subject: [PATCH 02/48] Add title --- markdowns/1.md | 2 ++ tutorials/01_Basic_QA_Pipeline.ipynb | 2 ++ 2 files changed, 4 insertions(+) diff --git a/markdowns/1.md b/markdowns/1.md index 0320173b..71284bf4 100644 --- a/markdowns/1.md +++ b/markdowns/1.md @@ -24,6 +24,8 @@ Before running the code in this notebook, you should set up the Colab environmen - [Set logging level to INFO]() +## Haystack Installation + To start, let's install the latest release of Haystack with `pip`: diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb index 6234b4ed..851ed47c 100644 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ b/tutorials/01_Basic_QA_Pipeline.ipynb @@ -31,6 +31,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "## Haystack Installation\n", + "\n", "To start, let's install the latest release of Haystack with `pip`:" ] }, From ac87643aec2be8a39441719289e8e1ec3fc9251f Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 11 Oct 2022 17:42:53 +0200 Subject: [PATCH 03/48] Add link --- markdowns/1.md | 2 +- tutorials/01_Basic_QA_Pipeline.ipynb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/markdowns/1.md b/markdowns/1.md index 71284bf4..b4e728c7 100644 --- a/markdowns/1.md +++ b/markdowns/1.md @@ -159,7 +159,7 @@ The Retriever and Reader that we just initialized are considered [nodes](https:/ It makes sense to join a Retriever and Reader because they are a complementary pairing. While the Reader is very effective at picking out answers to questions, it is not fast enough to perform this on large amounts of text at query time. By performing retrieval first, only the most promising candidate Documents are passed to the Reader, thus reducing its workload. This improvement in speed can come with a small tradeoff in accuracy. To learn how to optimize the performance of your question answering system, have a look at [Optimization](https://haystack.deepset.ai/guides/optimization). -In Haystack, there is a `Pipeline` class that allows you to define your own pipeline configuration. However, there are also [Ready-Made Pipelines] that simplify the initialization of commonly used configurations. Here, we are using the [`ExtractiveQAPipeline`](https://haystack.deepset.ai/components/ready-made-pipelines#extractiveqapipeline) that combines our Reader and Retriever. +In Haystack, there is a `Pipeline` class that allows you to define your own pipeline configuration. However, there are also [Ready-Made Pipelines](https://haystack.deepset.ai/components/ready-made-pipelines) that simplify the initialization of commonly used configurations. Here, we are using the [`ExtractiveQAPipeline`](https://haystack.deepset.ai/components/ready-made-pipelines#extractiveqapipeline) that combines our Reader and Retriever. diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb index 851ed47c..efc8adb2 100644 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ b/tutorials/01_Basic_QA_Pipeline.ipynb @@ -311,7 +311,7 @@ "\n", "It makes sense to join a Retriever and Reader because they are a complementary pairing. While the Reader is very effective at picking out answers to questions, it is not fast enough to perform this on large amounts of text at query time. By performing retrieval first, only the most promising candidate Documents are passed to the Reader, thus reducing its workload. This improvement in speed can come with a small tradeoff in accuracy. To learn how to optimize the performance of your question answering system, have a look at [Optimization](https://haystack.deepset.ai/guides/optimization).\n", "\n", - "In Haystack, there is a `Pipeline` class that allows you to define your own pipeline configuration. However, there are also [Ready-Made Pipelines] that simplify the initialization of commonly used configurations. Here, we are using the [`ExtractiveQAPipeline`](https://haystack.deepset.ai/components/ready-made-pipelines#extractiveqapipeline) that combines our Reader and Retriever.\n" + "In Haystack, there is a `Pipeline` class that allows you to define your own pipeline configuration. However, there are also [Ready-Made Pipelines](https://haystack.deepset.ai/components/ready-made-pipelines) that simplify the initialization of commonly used configurations. Here, we are using the [`ExtractiveQAPipeline`](https://haystack.deepset.ai/components/ready-made-pipelines#extractiveqapipeline) that combines our Reader and Retriever.\n" ] }, { From a2c3f36d7f7869f0198e5b3c7ea752d832cc28ba Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 11 Oct 2022 17:47:03 +0200 Subject: [PATCH 04/48] Fix titles --- markdowns/1.md | 3 ++- tutorials/01_Basic_QA_Pipeline.ipynb | 15 ++++----------- 2 files changed, 6 insertions(+), 12 deletions(-) diff --git a/markdowns/1.md b/markdowns/1.md index b4e728c7..314f59fc 100644 --- a/markdowns/1.md +++ b/markdowns/1.md @@ -153,7 +153,7 @@ from haystack.nodes import FARMReader reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) ``` -### The Retriever-Reader Pipeline +## The Retriever-Reader Pipeline The Retriever and Reader that we just initialized are considered [nodes](https://haystack.deepset.ai/pipeline_nodes/overview) in Haystack and they are connected using a [`pipeline`](https://haystack.deepset.ai/components/pipelines). Pipelines are customizable, giving you the power to define how data is routed through the nodes at both indexing and querying time. @@ -174,6 +174,7 @@ pipe = ExtractiveQAPipeline(reader, retriever) Haystack pipelines have a `run()` method that performs a query, or in the case of an `ExtractiveQAPipeline`, answers a question. The `params` argument allows you to provide arguments to the nodes performing the query. See [Arguments](https://haystack.deepset.ai/components/pipelines#arguments) to learn how to populate this field, and [Choosing the Right top-k Values] to understand what the `top-k` parameters are doing. + ```python prediction = pipe.run( query="Who is the father of Arya Stark?", diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb index efc8adb2..52e5017f 100644 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ b/tutorials/01_Basic_QA_Pipeline.ipynb @@ -305,7 +305,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### The Retriever-Reader Pipeline\n", + "## The Retriever-Reader Pipeline\n", "\n", "The Retriever and Reader that we just initialized are considered [nodes](https://haystack.deepset.ai/pipeline_nodes/overview) in Haystack and they are connected using a [`pipeline`](https://haystack.deepset.ai/components/pipelines). Pipelines are customizable, giving you the power to define how data is routed through the nodes at both indexing and querying time.\n", "\n", @@ -333,18 +333,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Asking Questions and Getting Answers" + "## Asking Questions and Getting Answers\n", + "\n", + "Haystack pipelines have a `run()` method that performs a query, or in the case of an `ExtractiveQAPipeline`, answers a question. The `params` argument allows you to provide arguments to the nodes performing the query. See [Arguments](https://haystack.deepset.ai/components/pipelines#arguments) to learn how to populate this field, and [Choosing the Right top-k Values] to understand what the `top-k` parameters are doing.\n" ] }, - { - "cell_type": "markdown", - "source": [ - "Haystack pipelines have a `run()` method that performs a query, or in the case of an `ExtractiveQAPipeline`, answers a question. The `params` argument allows you to provide arguments to the nodes performing the query. See [Arguments](https://haystack.deepset.ai/components/pipelines#arguments) to learn how to populate this field, and [Choosing the Right top-k Values] to understand what the `top-k` parameters are doing." - ], - "metadata": { - "collapsed": false - } - }, { "cell_type": "code", "execution_count": null, From 9d128a35b2adc1f17167ba12e4a1541e2b85af16 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 11 Oct 2022 17:54:49 +0200 Subject: [PATCH 05/48] Add final message --- markdowns/1.md | 2 ++ tutorials/01_Basic_QA_Pipeline.ipynb | 9 +++++++++ 2 files changed, 11 insertions(+) diff --git a/markdowns/1.md b/markdowns/1.md index 314f59fc..aa2b30de 100644 --- a/markdowns/1.md +++ b/markdowns/1.md @@ -211,6 +211,8 @@ print_answers( ) ``` +And there you have it! Congratulations on building your first machine learning based question answering system! + ## About us This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb index 52e5017f..b5928616 100644 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ b/tutorials/01_Basic_QA_Pipeline.ipynb @@ -412,6 +412,15 @@ ")" ] }, + { + "cell_type": "markdown", + "source": [ + "And there you have it! Congratulations on building your first machine learning based question answering system!" + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "metadata": { From acad49e614f4e1b39c3ef45a4894ec3dc37b28c1 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Thu, 13 Oct 2022 14:19:57 +0200 Subject: [PATCH 06/48] Fix some links --- tutorials/01_Basic_QA_Pipeline.ipynb | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb index b5928616..9841b302 100644 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ b/tutorials/01_Basic_QA_Pipeline.ipynb @@ -19,9 +19,9 @@ "## Prepare Environment\n", "\n", "Before running the code in this notebook, you should set up the Colab environment with the following steps:\n", - "- [Enable GPU Runtime in GPU]()\n", - "- [Check if GPU is Enabled]()\n", - "- [Set logging level to INFO]()\n" + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" ], "metadata": { "collapsed": false @@ -133,7 +133,7 @@ { "cell_type": "markdown", "source": [ - "If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done [manually](https://haystack.deepset.ai/components/document-store#initialisation), or using our [`launch_es()`]() utility function." + "If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done [manually](https://haystack.deepset.ai/components/document-store#initialisation), or using our [`launch_es()`](https://haystack.deepset.ai/reference/utils) utility function." ], "metadata": { "collapsed": false @@ -251,7 +251,7 @@ { "cell_type": "markdown", "source": [ - "While the default code in this tutorial uses Game of Thrones data, you can also supply your own data. So long as your data adheres to the [input format](https://haystack.deepset.ai/components/document-store#input-format) or is cast into a [Document object](https://haystack.deepset.ai/components/documents-answers-labels#document), it can be written into the DocumentStore." + "While the default code in this tutorial uses Game of Thrones data, you can also supply your own. So long as your data adheres to the [input format](https://haystack.deepset.ai/components/document-store#input-format) or is cast into a [Document object](https://haystack.deepset.ai/components/documents-answers-labels#document), it can be written into the DocumentStore." ], "metadata": { "collapsed": false From 04470d7c43e1b76a748c6dfa401001dff276dd3e Mon Sep 17 00:00:00 2001 From: brandenchan Date: Thu, 13 Oct 2022 17:11:46 +0200 Subject: [PATCH 07/48] Incorporate reviewer feedback --- tutorials/01_Basic_QA_Pipeline.ipynb | 57 ++++++++++++---------------- 1 file changed, 25 insertions(+), 32 deletions(-) diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb index 9841b302..897a7c93 100644 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ b/tutorials/01_Basic_QA_Pipeline.ipynb @@ -6,19 +6,21 @@ "source": [ "# Build Your First Question Answering System\n", "\n", - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb)\n", + "- Level: Beginner\n", + "- Time to complete: 20 minutes\n", + "- Prerequisites: Prepare the Colab environment. See links below.\n", + "- Nodes Used: `ElasticsearchDocumentStore`, `BM25Retriever`\n", + "- Goal: After completing this tutorial, you will have built a question answering pipeline that can answer questions about the Game of Thrones series. The Python code in this notebook can be used on any operating system.\n", "\n", - "Question answering allows you to quickly look into your Document collections to find answers to your questions. You can use it to search through complex knowledge bases, or long documents.\n", - "\n", - "A knowledge base could, for example, be your website, an internal wiki or a collection of financial reports. In this tutorial we will work on a set of wiki pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n" + "This tutorial teaches you how to set up a question answering system that can search through complex knowledge bases, such as an internal wiki or a collection of financial reports. We will work on a set of Wikipedia pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n" ] }, { "cell_type": "markdown", "source": [ - "## Prepare Environment\n", "\n", - "Before running the code in this notebook, you should set up the Colab environment with the following steps:\n", + "## Preparing the Colab Environment\n", + "\n", "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" @@ -31,7 +33,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Haystack Installation\n", + "## Install Haystack\n", "\n", "To start, let's install the latest release of Haystack with `pip`:" ] @@ -56,7 +58,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Document Store\n", + "## Initialize DocumentStore\n", "\n", "A Haystack question answering system finds answers to questions within the documents stored in a `DocumentStore`. In this tutorial, we are initializing an `ElasticsearchDocumentStore` but there are many other options available. To learn which one is right for your use case, and how to initialize it, see [Choosing the Right Document Store](https://haystack.deepset.ai/components/document-store#choosing-the-right-document-store) and [Initialization](https://haystack.deepset.ai/components/document-store#initialisation).\n" ] @@ -65,9 +67,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Start an Elasticsearch Server\n", + "1. Download the Elasticsearch Image\n", "\n", - "The `ElasticsearchDocumentStore` needs to attach to a running Elasticsearch server. To download, extract and set the permission for the Elasticsearch installation image, run:" + "The `ElasticsearchDocumentStore` needs to attach to a running Elasticsearch server. Download, extract and set the permission for the Elasticsearch installation image:" ] }, { @@ -90,7 +92,7 @@ { "cell_type": "markdown", "source": [ - "To start the server, run:" + "2. Start the Elasticsearch Server" ], "metadata": { "collapsed": false @@ -109,15 +111,6 @@ "collapsed": false } }, - { - "cell_type": "markdown", - "source": [ - "Let's wait 30 seconds to make sure the server has fully started up." - ], - "metadata": { - "collapsed": false - } - }, { "cell_type": "code", "execution_count": null, @@ -133,7 +126,7 @@ { "cell_type": "markdown", "source": [ - "If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done [manually](https://haystack.deepset.ai/components/document-store#initialisation), or using our [`launch_es()`](https://haystack.deepset.ai/reference/utils) utility function." + "If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this [manually](https://haystack.deepset.ai/components/document-store#initialisation), or using our [`launch_es()`](https://haystack.deepset.ai/reference/utils) utility function." ], "metadata": { "collapsed": false @@ -142,9 +135,9 @@ { "cell_type": "markdown", "source": [ - "### Create the DocumentStore\n", + "3. Create the DocumentStore\n", "\n", - "When you initialize the `ElasticsearchDocumentStore`, it opens a connection with the Elasticsearch service." + "When you initialize the `ElasticsearchDocumentStore` in Haystack, it opens a connection with the Elasticsearch service that we started in the previous step." ], "metadata": { "collapsed": false @@ -178,9 +171,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Preparing Documents\n", + "## Prepare Documents\n", "\n", - "Let's download 517 articles from the Game of Thrones Wikipedia. They can be found in `data/tutorial1` as a set of `.txt` files." + "1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files." ] }, { @@ -207,7 +200,7 @@ { "cell_type": "markdown", "source": [ - "The `.txt` files we just downloaded need to be converted into Haystack [Document objects](https://haystack.deepset.ai/components/documents-answers-labels#document) before they can be written into the DocumentStore. We will also apply the `clean_wiki_text` cleaning function to the text and split the Wikipedia documents by paragraph breaks." + "2. Convert the files you just downloaded into Haystack [Document objects](https://haystack.deepset.ai/components/documents-answers-labels#document) to write them into the DocumentStore. Apply the `clean_wiki_text` cleaning function to the text." ], "metadata": { "collapsed": false @@ -230,7 +223,7 @@ { "cell_type": "markdown", "source": [ - "Now let's write these Documents into the DocumentStore." + "3. Write these Documents into the DocumentStore." ], "metadata": { "collapsed": false @@ -261,7 +254,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Retriever\n", + "## Initialize the Retriever\n", "\n", "Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Since this happens at [query time](), they need to be fast. Here we are using the BM25 algorithm which is considered a [sparse retrieval method](https://haystack.deepset.ai/pipeline_nodes/retriever#deeper-dive-dense-vs-sparse). For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." ] @@ -333,9 +326,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Asking Questions and Getting Answers\n", + "## Ask a Question\n", "\n", - "Haystack pipelines have a `run()` method that performs a query, or in the case of an `ExtractiveQAPipeline`, answers a question. The `params` argument allows you to provide arguments to the nodes performing the query. See [Arguments](https://haystack.deepset.ai/components/pipelines#arguments) to learn how to populate this field, and [Choosing the Right top-k Values] to understand what the `top-k` parameters are doing.\n" + "1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://haystack.deepset.ai/components/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://haystack.deepset.ai/guides/optimization#choosing-the-right-top-k-values).\n" ] }, { @@ -373,7 +366,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The answers returned by the pipeline can be printed out directly." + "2. The answers returned by the pipeline can be printed out directly:" ] }, { @@ -391,7 +384,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We also provide a utility functions to simplify the output." + "3. Simplify the printed answers:" ] }, { From d1a0524b2921575d567d17b3929fa94ac575647f Mon Sep 17 00:00:00 2001 From: brandenchan Date: Fri, 14 Oct 2022 12:05:45 +0200 Subject: [PATCH 08/48] Incorporate reviewer feedback --- tutorials/01_Basic_QA_Pipeline.ipynb | 54 ++++++++++++---------------- 1 file changed, 23 insertions(+), 31 deletions(-) diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb index 897a7c93..6f0f81ed 100644 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ b/tutorials/01_Basic_QA_Pipeline.ipynb @@ -6,11 +6,11 @@ "source": [ "# Build Your First Question Answering System\n", "\n", - "- Level: Beginner\n", - "- Time to complete: 20 minutes\n", - "- Prerequisites: Prepare the Colab environment. See links below.\n", - "- Nodes Used: `ElasticsearchDocumentStore`, `BM25Retriever`\n", - "- Goal: After completing this tutorial, you will have built a question answering pipeline that can answer questions about the Game of Thrones series. The Python code in this notebook can be used on any operating system.\n", + "- **Level**: Beginner\n", + "- **Time to complete**: 20 minutes\n", + "- **Prerequisites**: Prepare the Colab environment. See links below.\n", + "- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`\n", + "- **Goal**: After completing this tutorial, you will have built a question answering pipeline that can answer questions about the Game of Thrones series.\n", "\n", "This tutorial teaches you how to set up a question answering system that can search through complex knowledge bases, such as an internal wiki or a collection of financial reports. We will work on a set of Wikipedia pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n" ] @@ -33,7 +33,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Install Haystack\n", + "## Installing Haystack\n", "\n", "To start, let's install the latest release of Haystack with `pip`:" ] @@ -58,18 +58,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Initialize DocumentStore\n", + "## Initializing the DocumentStore\n", "\n", - "A Haystack question answering system finds answers to questions within the documents stored in a `DocumentStore`. In this tutorial, we are initializing an `ElasticsearchDocumentStore` but there are many other options available. To learn which one is right for your use case, and how to initialize it, see [Choosing the Right Document Store](https://haystack.deepset.ai/components/document-store#choosing-the-right-document-store) and [Initialization](https://haystack.deepset.ai/components/document-store#initialisation).\n" + "A DocumentStore stores the documents that the question answering system uses to find answers to your questions. To learn more, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "1. Download the Elasticsearch Image\n", - "\n", - "The `ElasticsearchDocumentStore` needs to attach to a running Elasticsearch server. Download, extract and set the permission for the Elasticsearch installation image:" + "1. Download, extract and set the permission for the Elasticsearch image:" ] }, { @@ -92,7 +90,7 @@ { "cell_type": "markdown", "source": [ - "2. Start the Elasticsearch Server" + "2. Start the Elasticsearch Server:" ], "metadata": { "collapsed": false @@ -126,7 +124,7 @@ { "cell_type": "markdown", "source": [ - "If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this [manually](https://haystack.deepset.ai/components/document-store#initialisation), or using our [`launch_es()`](https://haystack.deepset.ai/reference/utils) utility function." + "If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this [manually](https://docs.haystack.deepset.ai/docs/document_store#initialisation), or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api) utility function." ], "metadata": { "collapsed": false @@ -135,9 +133,7 @@ { "cell_type": "markdown", "source": [ - "3. Create the DocumentStore\n", - "\n", - "When you initialize the `ElasticsearchDocumentStore` in Haystack, it opens a connection with the Elasticsearch service that we started in the previous step." + "3. Initialize the `ElasticsearchDocumentStore` object in Haystack. Note that this will only successfully run if the Elasticsearch Server is fully started up and ready." ], "metadata": { "collapsed": false @@ -171,7 +167,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Prepare Documents\n", + "## Preparing Documents\n", "\n", "1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files." ] @@ -200,7 +196,7 @@ { "cell_type": "markdown", "source": [ - "2. Convert the files you just downloaded into Haystack [Document objects](https://haystack.deepset.ai/components/documents-answers-labels#document) to write them into the DocumentStore. Apply the `clean_wiki_text` cleaning function to the text." + "2. Convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) to write them into the DocumentStore. Apply the `clean_wiki_text` cleaning function to the text." ], "metadata": { "collapsed": false @@ -244,7 +240,7 @@ { "cell_type": "markdown", "source": [ - "While the default code in this tutorial uses Game of Thrones data, you can also supply your own. So long as your data adheres to the [input format](https://haystack.deepset.ai/components/document-store#input-format) or is cast into a [Document object](https://haystack.deepset.ai/components/documents-answers-labels#document), it can be written into the DocumentStore." + "While the default code in this tutorial uses Game of Thrones data, you can also supply your own. So long as your data adheres to the [input format](https://docs.haystack.deepset.ai/docs/document_store#input-format) or is cast into a [Document object](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document), it can be written into the DocumentStore." ], "metadata": { "collapsed": false @@ -254,9 +250,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Initialize the Retriever\n", + "## Initializing the Retriever\n", "\n", - "Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Since this happens at [query time](), they need to be fast. Here we are using the BM25 algorithm which is considered a [sparse retrieval method](https://haystack.deepset.ai/pipeline_nodes/retriever#deeper-dive-dense-vs-sparse). For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." + "Initialize the `BM25Retriever`. For more Retriever options, see [Retriever](https://docs.haystack.deepset.ai/docs/retriever)" ] }, { @@ -274,9 +270,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Reader\n", + "## Initializing the Reader\n", "\n", - "A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Haystack Readers can load question answering models from [Hugging Face's model hub](https://huggingface.co/models?pipeline_tag=question-answering&sort=downloads). Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + "Initialize the `FARMReader` with the `deepset/robert-base-squad2` model. For more Reader options, see [Reader](https://docs.haystack.deepset.ai/docs/reader)." ] }, { @@ -298,13 +294,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## The Retriever-Reader Pipeline\n", - "\n", - "The Retriever and Reader that we just initialized are considered [nodes](https://haystack.deepset.ai/pipeline_nodes/overview) in Haystack and they are connected using a [`pipeline`](https://haystack.deepset.ai/components/pipelines). Pipelines are customizable, giving you the power to define how data is routed through the nodes at both indexing and querying time.\n", - "\n", - "It makes sense to join a Retriever and Reader because they are a complementary pairing. While the Reader is very effective at picking out answers to questions, it is not fast enough to perform this on large amounts of text at query time. By performing retrieval first, only the most promising candidate Documents are passed to the Reader, thus reducing its workload. This improvement in speed can come with a small tradeoff in accuracy. To learn how to optimize the performance of your question answering system, have a look at [Optimization](https://haystack.deepset.ai/guides/optimization).\n", + "## Creating the Retriever-Reader Pipeline\n", "\n", - "In Haystack, there is a `Pipeline` class that allows you to define your own pipeline configuration. However, there are also [Ready-Made Pipelines](https://haystack.deepset.ai/components/ready-made-pipelines) that simplify the initialization of commonly used configurations. Here, we are using the [`ExtractiveQAPipeline`](https://haystack.deepset.ai/components/ready-made-pipelines#extractiveqapipeline) that combines our Reader and Retriever.\n" + "The `ExtractiveQAPipeline` connects the Reader and Retriever. This makes the system fast because the Reader only processes the Documents that the Retriever has passed on." ] }, { @@ -326,9 +318,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Ask a Question\n", + "## Asking a Question\n", "\n", - "1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://haystack.deepset.ai/components/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://haystack.deepset.ai/guides/optimization#choosing-the-right-top-k-values).\n" + "1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values).\n" ] }, { From b558a66ab50428576e29c325d083d8aef5a37773 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Fri, 14 Oct 2022 12:06:18 +0200 Subject: [PATCH 09/48] Regenerate markdown --- markdowns/1.md | 72 ++++++++++++++++++++++---------------------------- 1 file changed, 31 insertions(+), 41 deletions(-) diff --git a/markdowns/1.md b/markdowns/1.md index aa2b30de..9eee5148 100644 --- a/markdowns/1.md +++ b/markdowns/1.md @@ -9,22 +9,24 @@ id: "tutorial1md" # Build Your First Question Answering System -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb) +- **Level**: Beginner +- **Time to complete**: 20 minutes +- **Prerequisites**: Prepare the Colab environment. See links below. +- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever` +- **Goal**: After completing this tutorial, you will have built a question answering pipeline that can answer questions about the Game of Thrones series. -Question answering allows you to quickly look into your Document collections to find answers to your questions. You can use it to search through complex knowledge bases, or long documents. +This tutorial teaches you how to set up a question answering system that can search through complex knowledge bases, such as an internal wiki or a collection of financial reports. We will work on a set of Wikipedia pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! -A knowledge base could, for example, be your website, an internal wiki or a collection of financial reports. In this tutorial we will work on a set of wiki pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! -## Prepare Environment +## Preparing the Colab Environment -Before running the code in this notebook, you should set up the Colab environment with the following steps: -- [Enable GPU Runtime in GPU]() -- [Check if GPU is Enabled]() -- [Set logging level to INFO]() +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level) -## Haystack Installation +## Installing Haystack To start, let's install the latest release of Haystack with `pip`: @@ -36,14 +38,11 @@ pip install --upgrade pip pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] ``` -## Document Store +## Initializing the DocumentStore -A Haystack question answering system finds answers to questions within the documents stored in a `DocumentStore`. In this tutorial, we are initializing an `ElasticsearchDocumentStore` but there are many other options available. To learn which one is right for your use case, and how to initialize it, see [Choosing the Right Document Store](https://haystack.deepset.ai/components/document-store#choosing-the-right-document-store) and [Initialization](https://haystack.deepset.ai/components/document-store#initialisation). +A DocumentStore stores the documents that the question answering system uses to find answers to your questions. To learn more, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). - -### Start an Elasticsearch Server - -The `ElasticsearchDocumentStore` needs to attach to a running Elasticsearch server. To download, extract and set the permission for the Elasticsearch installation image, run: +1. Download, extract and set the permission for the Elasticsearch image: ```bash @@ -54,7 +53,7 @@ tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz chown -R daemon:daemon elasticsearch-7.9.2 ``` -To start the server, run: +2. Start the Elasticsearch Server: ```bash @@ -63,19 +62,15 @@ To start the server, run: sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch ``` -Let's wait 30 seconds to make sure the server has fully started up. - ```python import time time.sleep(30) ``` -If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done [manually](https://haystack.deepset.ai/components/document-store#initialisation), or using our [`launch_es()`]() utility function. - -### Create the DocumentStore +If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this [manually](https://docs.haystack.deepset.ai/docs/document_store#initialisation), or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api) utility function. -When you initialize the `ElasticsearchDocumentStore`, it opens a connection with the Elasticsearch service. +3. Initialize the `ElasticsearchDocumentStore` object in Haystack. Note that this will only successfully run if the Elasticsearch Server is fully started up and ready. ```python @@ -95,7 +90,7 @@ document_store = ElasticsearchDocumentStore( ## Preparing Documents -Let's download 517 articles from the Game of Thrones Wikipedia. They can be found in `data/tutorial1` as a set of `.txt` files. +1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files. ```python @@ -109,7 +104,7 @@ fetch_archive_from_http( ) ``` -The `.txt` files we just downloaded need to be converted into Haystack [Document objects](https://haystack.deepset.ai/components/documents-answers-labels#document) before they can be written into the DocumentStore. We will also apply the `clean_wiki_text` cleaning function to the text and split the Wikipedia documents by paragraph breaks. +2. Convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) to write them into the DocumentStore. Apply the `clean_wiki_text` cleaning function to the text. ```python @@ -121,7 +116,7 @@ docs = convert_files_to_docs( ) ``` -Now let's write these Documents into the DocumentStore. +3. Write these Documents into the DocumentStore. ```python @@ -129,11 +124,11 @@ Now let's write these Documents into the DocumentStore. document_store.write_documents(docs) ``` -While the default code in this tutorial uses Game of Thrones data, you can also supply your own data. So long as your data adheres to the [input format](https://haystack.deepset.ai/components/document-store#input-format) or is cast into a [Document object](https://haystack.deepset.ai/components/documents-answers-labels#document), it can be written into the DocumentStore. +While the default code in this tutorial uses Game of Thrones data, you can also supply your own. So long as your data adheres to the [input format](https://docs.haystack.deepset.ai/docs/document_store#input-format) or is cast into a [Document object](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document), it can be written into the DocumentStore. -## Retriever +## Initializing the Retriever -Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Since this happens at [query time](), they need to be fast. Here we are using the BM25 algorithm which is considered a [sparse retrieval method](https://haystack.deepset.ai/pipeline_nodes/retriever#deeper-dive-dense-vs-sparse). For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). +Initialize the `BM25Retriever`. For more Retriever options, see [Retriever](https://docs.haystack.deepset.ai/docs/retriever) ```python @@ -142,9 +137,9 @@ from haystack.nodes import BM25Retriever retriever = BM25Retriever(document_store=document_store) ``` -## Reader +## Initializing the Reader -A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Haystack Readers can load question answering models from [Hugging Face's model hub](https://huggingface.co/models?pipeline_tag=question-answering&sort=downloads). Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). +Initialize the `FARMReader` with the `deepset/robert-base-squad2` model. For more Reader options, see [Reader](https://docs.haystack.deepset.ai/docs/reader). ```python @@ -153,14 +148,9 @@ from haystack.nodes import FARMReader reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) ``` -## The Retriever-Reader Pipeline - -The Retriever and Reader that we just initialized are considered [nodes](https://haystack.deepset.ai/pipeline_nodes/overview) in Haystack and they are connected using a [`pipeline`](https://haystack.deepset.ai/components/pipelines). Pipelines are customizable, giving you the power to define how data is routed through the nodes at both indexing and querying time. - -It makes sense to join a Retriever and Reader because they are a complementary pairing. While the Reader is very effective at picking out answers to questions, it is not fast enough to perform this on large amounts of text at query time. By performing retrieval first, only the most promising candidate Documents are passed to the Reader, thus reducing its workload. This improvement in speed can come with a small tradeoff in accuracy. To learn how to optimize the performance of your question answering system, have a look at [Optimization](https://haystack.deepset.ai/guides/optimization). - -In Haystack, there is a `Pipeline` class that allows you to define your own pipeline configuration. However, there are also [Ready-Made Pipelines](https://haystack.deepset.ai/components/ready-made-pipelines) that simplify the initialization of commonly used configurations. Here, we are using the [`ExtractiveQAPipeline`](https://haystack.deepset.ai/components/ready-made-pipelines#extractiveqapipeline) that combines our Reader and Retriever. +## Creating the Retriever-Reader Pipeline +The `ExtractiveQAPipeline` connects the Reader and Retriever. This makes the system fast because the Reader only processes the Documents that the Retriever has passed on. ```python @@ -169,9 +159,9 @@ from haystack.pipelines import ExtractiveQAPipeline pipe = ExtractiveQAPipeline(reader, retriever) ``` -## Asking Questions and Getting Answers +## Asking a Question -Haystack pipelines have a `run()` method that performs a query, or in the case of an `ExtractiveQAPipeline`, answers a question. The `params` argument allows you to provide arguments to the nodes performing the query. See [Arguments](https://haystack.deepset.ai/components/pipelines#arguments) to learn how to populate this field, and [Choosing the Right top-k Values] to understand what the `top-k` parameters are doing. +1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values). @@ -190,7 +180,7 @@ Here are some questions you could try out: - Who created the Dothraki vocabulary? - Who is the sister of Sansa? -The answers returned by the pipeline can be printed out directly. +2. The answers returned by the pipeline can be printed out directly: ```python @@ -199,7 +189,7 @@ from pprint import pprint pprint(prediction) ``` -We also provide a utility functions to simplify the output. +3. Simplify the printed answers: ```python From b99af0792db49d27f3bf1967a0ea1f320f84603a Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 25 Oct 2022 16:38:27 +0100 Subject: [PATCH 10/48] Create second tutorial --- new_tutorials/01_simplified_qa_pipeline.ipynb | 8257 ++++++++++++++++ new_tutorials/02_qa_pipeline.ipynb | 8467 +++++++++++++++++ 2 files changed, 16724 insertions(+) create mode 100644 new_tutorials/01_simplified_qa_pipeline.ipynb create mode 100644 new_tutorials/02_qa_pipeline.ipynb diff --git a/new_tutorials/01_simplified_qa_pipeline.ipynb b/new_tutorials/01_simplified_qa_pipeline.ipynb new file mode 100644 index 00000000..520054a4 --- /dev/null +++ b/new_tutorials/01_simplified_qa_pipeline.ipynb @@ -0,0 +1,8257 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "# Build Your First Question Answering System\n", + "\n", + "- **Level**: Beginner\n", + "- **Time to complete**: 20 minutes\n", + "- **Prerequisites**: Prepare the Colab environment (see links below).\n", + "- **Nodes Used**: `InMemoryDocumentStore`, `BM25Retriever`, `FARMReader`\n", + "- **Goal**: After completing this tutorial, you will have learned about the Reader and Retriever, and built a question answering pipeline that can answer questions about the Game of Thrones series.\n", + "\n", + "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", + "\n", + "This tutorial will introduce you to all the concepts needed to build such a question answering system, but simplify certain set up steps, such as Document preparation and indexing as well as pipeline initialization are simplified so that you can get started quicker.\n", + "\n", + "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "\n", + "## Preparing the Colab Environment\n", + "\n", + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Installing Haystack\n", + "\n", + "To start, let's install the latest release of Haystack with `pip`:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: pip in /Users/deepset/anaconda3/lib/python3.8/site-packages (22.3)\n", + "Requirement already satisfied: farm-haystack[colab] in /Users/deepset/Code/haystack (1.6.1rc0)\n", + "Requirement already satisfied: torch<1.13,>1.9 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.11.0)\n", + "Requirement already satisfied: requests in /Users/deepset/.local/lib/python3.8/site-packages (from farm-haystack[colab]) (2.28.1)\n", + "Requirement already satisfied: pydantic in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.6.1)\n", + "Requirement already satisfied: transformers==4.20.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (4.20.1)\n", + "Requirement already satisfied: nltk in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.5)\n", + "Requirement already satisfied: pandas in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.5)\n", + "Requirement already satisfied: dill in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.3.2)\n", + "Requirement already satisfied: tqdm in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (4.47.0)\n", + "Requirement already satisfied: networkx in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.4)\n", + "Requirement already satisfied: mmh3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.0.0)\n", + "Requirement already satisfied: quantulum3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.7.10)\n", + "Requirement already satisfied: posthog in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.4.5)\n", + "Requirement already satisfied: azure-ai-formrecognizer==3.2.0b2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.2.0b2)\n", + "Requirement already satisfied: azure-core<1.23 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.22.1)\n", + "Requirement already satisfied: huggingface-hub<0.8.0,>=0.5.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.7.0)\n", + "Requirement already satisfied: more_itertools in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (8.4.0)\n", + "Requirement already satisfied: python-docx in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.8.10)\n", + "Requirement already satisfied: langdetect in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.8)\n", + "Requirement already satisfied: tika in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.24)\n", + "Requirement already satisfied: sentence-transformers>=2.2.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.2.0)\n", + "Requirement already satisfied: scipy>=1.3.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.5.0)\n", + "Requirement already satisfied: scikit-learn>=1.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.2)\n", + "Requirement already satisfied: seqeval in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.0.12)\n", + "Requirement already satisfied: mlflow in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.0)\n", + "Requirement already satisfied: elasticsearch<7.11,>=7.7 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (7.8.1)\n", + "Requirement already satisfied: elastic-apm in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (5.8.1)\n", + "Requirement already satisfied: rapidfuzz<3,>=2.0.15 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.5.0)\n", + "Requirement already satisfied: jsonschema in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.2.0)\n", + "Requirement already satisfied: grpcio==1.43.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.43.0)\n", + "Requirement already satisfied: msrest>=0.6.21 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (0.6.21)\n", + "Requirement already satisfied: six>=1.11.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.15.0)\n", + "Requirement already satisfied: azure-common~=1.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.1.28)\n", + "Requirement already satisfied: filelock in /Users/deepset/.local/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (3.8.0)\n", + "Requirement already satisfied: numpy>=1.17 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (1.18.5)\n", + "Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (0.12.1)\n", + "Requirement already satisfied: packaging>=20.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (21.3)\n", + "Requirement already satisfied: pyyaml>=5.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (5.3.1)\n", + "Requirement already satisfied: regex!=2019.12.17 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (2020.6.8)\n", + "Requirement already satisfied: urllib3>=1.21.1 in /Users/deepset/.local/lib/python3.8/site-packages (from elasticsearch<7.11,>=7.7->farm-haystack[colab]) (1.26.11)\n", + "Requirement already satisfied: certifi in /Users/deepset/anaconda3/lib/python3.8/site-packages (from elasticsearch<7.11,>=7.7->farm-haystack[colab]) (2020.6.20)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from huggingface-hub<0.8.0,>=0.5.0->farm-haystack[colab]) (4.1.1)\n", + "Requirement already satisfied: jarowinkler<2.0.0,>=1.2.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from rapidfuzz<3,>=2.0.15->farm-haystack[colab]) (1.2.1)\n", + "Requirement already satisfied: idna<4,>=2.5 in /Users/deepset/.local/lib/python3.8/site-packages (from requests->farm-haystack[colab]) (3.3)\n", + "Requirement already satisfied: charset-normalizer<3,>=2 in /Users/deepset/.local/lib/python3.8/site-packages (from requests->farm-haystack[colab]) (2.1.0)\n", + "Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from scikit-learn>=1.0.0->farm-haystack[colab]) (2.1.0)\n", + "Requirement already satisfied: joblib>=0.11 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from scikit-learn>=1.0.0->farm-haystack[colab]) (0.16.0)\n", + "Requirement already satisfied: sentencepiece in /Users/deepset/anaconda3/lib/python3.8/site-packages (from sentence-transformers>=2.2.0->farm-haystack[colab]) (0.1.91)\n", + "Requirement already satisfied: torchvision in /Users/deepset/anaconda3/lib/python3.8/site-packages (from sentence-transformers>=2.2.0->farm-haystack[colab]) (0.12.0)\n", + "Requirement already satisfied: setuptools in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (49.2.0.post20200714)\n", + "Requirement already satisfied: pyrsistent>=0.14.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (0.16.0)\n", + "Requirement already satisfied: attrs>=17.4.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (19.3.0)\n", + "Requirement already satisfied: python-dateutil in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (2.8.1)\n", + "Requirement already satisfied: databricks-cli>=0.8.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.11.0)\n", + "Requirement already satisfied: sqlalchemy in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.3.18)\n", + "Requirement already satisfied: cloudpickle in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.5.0)\n", + "Requirement already satisfied: docker>=3.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (4.3.0)\n", + "Requirement already satisfied: Flask in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.1.2)\n", + "Requirement already satisfied: entrypoints in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.3)\n", + "Requirement already satisfied: protobuf>=3.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.12.4)\n", + "Requirement already satisfied: querystring-parser in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.2.4)\n", + "Requirement already satisfied: gitpython>=2.1.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.1.7)\n", + "Requirement already satisfied: sqlparse in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.3.1)\n", + "Requirement already satisfied: alembic in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.4.2)\n", + "Requirement already satisfied: click>=7.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (7.1.2)\n", + "Requirement already satisfied: simplejson in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.17.2)\n", + "Requirement already satisfied: gunicorn in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (20.0.4)\n", + "Requirement already satisfied: decorator>=4.3.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from networkx->farm-haystack[colab]) (4.4.2)\n", + "Requirement already satisfied: pytz>=2017.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from pandas->farm-haystack[colab]) (2020.1)\n", + "Requirement already satisfied: backoff<2.0.0,>=1.10.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from posthog->farm-haystack[colab]) (1.11.1)\n", + "Requirement already satisfied: monotonic>=1.5 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from posthog->farm-haystack[colab]) (1.6)\n", + "Requirement already satisfied: lxml>=2.3.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from python-docx->farm-haystack[colab]) (4.5.2)\n", + "Requirement already satisfied: inflect in /Users/deepset/anaconda3/lib/python3.8/site-packages (from quantulum3->farm-haystack[colab]) (5.4.0)\n", + "Requirement already satisfied: num2words in /Users/deepset/anaconda3/lib/python3.8/site-packages (from quantulum3->farm-haystack[colab]) (0.5.10)\n", + "Requirement already satisfied: Keras>=2.2.4 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from seqeval->farm-haystack[colab]) (2.4.3)\n", + "Requirement already satisfied: tabulate>=0.7.7 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from databricks-cli>=0.8.0->mlflow->farm-haystack[colab]) (0.8.7)\n", + "Requirement already satisfied: websocket-client>=0.32.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from docker>=3.6.0->mlflow->farm-haystack[colab]) (0.57.0)\n", + "Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from gitpython>=2.1.0->mlflow->farm-haystack[colab]) (4.0.5)\n", + "Requirement already satisfied: h5py in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Keras>=2.2.4->seqeval->farm-haystack[colab]) (2.10.0)\n", + "Requirement already satisfied: isodate>=0.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (0.6.1)\n", + "Requirement already satisfied: requests-oauthlib>=0.5.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.3.1)\n", + "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from packaging>=20.0->transformers==4.20.1->farm-haystack[colab]) (2.4.7)\n", + "Requirement already satisfied: python-editor>=0.3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from alembic->mlflow->farm-haystack[colab]) (1.0.4)\n", + "Requirement already satisfied: Mako in /Users/deepset/anaconda3/lib/python3.8/site-packages (from alembic->mlflow->farm-haystack[colab]) (1.1.3)\n", + "Requirement already satisfied: itsdangerous>=0.24 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (1.1.0)\n", + "Requirement already satisfied: Werkzeug>=0.15 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (0.16.1)\n", + "Requirement already satisfied: Jinja2>=2.10.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (2.11.2)\n", + "Requirement already satisfied: docopt>=0.6.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from num2words->quantulum3->farm-haystack[colab]) (0.6.2)\n", + "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from torchvision->sentence-transformers>=2.2.0->farm-haystack[colab]) (7.2.0)\n", + "Requirement already satisfied: smmap<4,>=3.0.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->gitpython>=2.1.0->mlflow->farm-haystack[colab]) (3.0.4)\n", + "Requirement already satisfied: MarkupSafe>=0.23 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Jinja2>=2.10.1->Flask->mlflow->farm-haystack[colab]) (1.1.1)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from requests-oauthlib>=0.5.0->msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (3.2.0)\n" + ] + } + ], + "source": [ + "%%bash\n", + "\n", + "pip install --upgrade pip\n", + "pip install farm-haystack[colab]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Initializing the DocumentStore\n", + "\n", + "A DocumentStore stores the documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` since it requires no external dependencies. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/deepset/anaconda3/lib/python3.8/site-packages/tqdm/std.py:668: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version\n", + " from pandas import Panel\n" + ] + } + ], + "source": [ + "from haystack.document_stores import InMemoryDocumentStore\n", + "\n", + "document_store = InMemoryDocumentStore()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Preparing Documents\n", + "\n", + "1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from haystack.utils import fetch_archive_from_http\n", + "\n", + "doc_dir = \"data/tutorial1\"\n", + "\n", + "fetch_archive_from_http(\n", + " url=\"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip\",\n", + " output_dir=doc_dir\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "2. Use the `TextIndexingPipeline` to convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2b6e6115f8d844cbbe0e48ba340f3a4d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b6a3f600560748dfb8fff380b5923359", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "992b78efdac04f0092c4a2d0ded53b97", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b3f18970fc554acb8893b2c77c7bd139", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e3dfdab6c8454eb193f0bfdce059c923", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f12af0e6935b4ed38e8d143c2734446a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8eb3665460a24d4e8e2d63b7bd2cb81c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f1459a48334444e1a8dd918fc57cdb02", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ba41ec0fc385436aa4f358739a4420a8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ab8f6f3475c74e708018b9017c2757c3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "976bf694055944dea7c74c0e071e2ee5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0ad4629f763e496eb55a9da34f6a8466", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6bfd18eaf55a49d7ad70a33b8dd034ba", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a68a8754b8194a629c20b73b62d86c64", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7d35361c50294a618d746b6365261164", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "16f3c191057844f9acbfc67377893d64", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "01264e99f990441b9f4a2977e72cb4a4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "220aa15ca0584d9ca38c925f4e4c8653", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a043e684228e474d9ba93068f08a35e8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "297b5be864964140828e3c1c51f6c0a3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:haystack.nodes.preprocessor.preprocessor:One or more sentence found with word count higher than the split length.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "10856412f6134296a20af0e160276de3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "896b3da25fda4561ac78314ea3f38ee5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "454b53a5fa7941aa8c7c3a1f543ead5a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ac6e314b45194b71908e60212e17afac", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1ebcb622daf84b6f8a8cdd4ba6fba272", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a238138a92ea4b6d8477d53a4e7abd60", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "09d757bf8566447d87a0e70bd29563c4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e76a2022d2e341b898587d7e4dbd7ab5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7767598fb03c49d6b2e314ebf9aa8257", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "86bfd19f9b4f4e1198e792e980b3fa08", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6854db2033654d93b92dc6fffbfb1ae5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "23ee0f4dfcee4d8e954ea4e4f6024b07", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f0c5e8c95e244fa3a6d470ab17494fec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4bd3a2ca03f54826b6b2160d1c7afad3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "69c9d975e49c4eb889ed3330b4a841c4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7e4ae1f4a77846d0992b37056b0d46bd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b32d554072ec464e9c03c537b2b48d79", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8a4e8395334f4e1493db06d801613047", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "283a54a3ec5342dcbf3c5c18f8affe87", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a6737f32fc954140b9728beade5a02e1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b21ee5704e1f4c62a81f0ea9b534a140", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "aa10487a8d6f4c36abeeadb90c442b9e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "74cef7a45e1247b8883bb6662fe47fb6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "effc7fc6d6ef43c4b48df4a6ee02c000", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eaf68d18bf0c4760a4d84ad2b3428c52", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3c88c6c86c1e49038bcb27365e31b989", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "830389d3af704422a25773e2bcc464a5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3cee68a49aea478698191da098de68e3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "00571c7037934d1cbe9ba9d597a00e8a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c3f2a7593a60491dbe349b3bb9ce7a62", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0909c2a892b541fd901efe65379c7738", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a5f3dba309cc4a4b95011b2e770020f9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5ac213e0be56489cb57fb127f6b4e60a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cdc162214efc4d48aa56c058187006e7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8ae726cf3bba40b7987b25a6f64dd301", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f52e33734bf04efcbaf72ce7555443bd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ee4c60dd41cb4c52b900d1fcd3b4b447", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d939b476bcf84edbb1a8194399148fd3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "19990ed6799744abbd7bc3dbbb73de3c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6a0d25714cbc4c3e99306beb41f0bbee", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d9d8219143104fb699d1df18254db7d2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5cbfad03ad2b4b04a729a458e5b22abc", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bb69fbbf21064c3da123819aead636fb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9e81ab1cbba54d86ab6a744915cefc91", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6578451b85a749679adf0fb6c87fee1e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4db2eefaedbb4f2cbcf4e452f6d3ccf2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "59e82112bd1947628cf7dda7b240e3a9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a379ac4739f046e7afa7308e6b9e5193", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "78e12c20df2048c8a998759fca3cbb18", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a92f6fbb90164cb7a3c740261dca85ec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5d53a88f055e4c0f9139f3769e44d3f2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5d7e9be7b74946bab67a9522d37a709f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d109314969194067a2bc646457fcbec6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f72c246c75224de3b0ea29873ac80cfb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "427dfea24d5848f88d7c0d30a2405fe1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "def0e371ceea46818837d19d31320b7b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d88e56968964103a9de9b6c493ee39a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b0d6b686960640c38849ea369bea97a1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fe0a4653015d431e858f542854ebaca0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ba657d4f52b24ba6a9be056160dbd950", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4734370e764b4ff883913681a09b15a5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9e8c3a3e4ba94331884e98699bbeecff", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "40e127de69874b81b2d4a8056651faa0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "84c4f7b6a0cb42bfbd474524bf552f6e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "be166a6350e04cf998b205fc73ffbd80", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f8aaa4596f62416b9a259d4d2bf9b96b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a1c19009bbec4cdb912bfa75e7cc4b19", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c19c94d0831d46b9aed02126a9976f97", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "828720f91e78497b9729d44e6eb120ec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5fda23fb35d64aadb6e972ca0787c2ba", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4b750f8ec618499eb62a9102082f10b0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d6ad87d5724d41ac9e9b05d29935062d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cc5ed2f7d31a4fa9982582e51246cf36", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4e7df01f2bdf46feaef9e4554969f6da", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b965db4594b64ec5a2c47e484f01bd6a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a2b9d365c7b94bee85c2a8e81d647a41", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4054ae5a111d4c38a86d69341ba1ef17", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c06beaf6a5504d418e2c2a7f66d9c2fb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "75502402f7374f8da45db944afc55643", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f8863e05ac254d77916257d3c819140d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5dca4734c44044ce90232fcdd3d49685", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f1f69d47a667479fafe71419dfad3eff", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3cf0f1ba829745c8b4aa14a63b9b9d92", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "95880c283d7442758c77ba5874a6a6af", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bff1d5ee9d804134adf4fdfdc440a98f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "905fce61fb084a518bdc81e3f28f179d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "815156c1dfb7463ba5e16c97d477ac3c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9f915e6a26bc4844a9711c29e9d08501", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ea5fc6f63117456cbd0902223a14ccc3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "045ec8738307493bb762d3a4230d61d2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1bc404e8dc7c4b83be47c436e7ac5feb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2769981bad92453a9b59eecfb74ceeef", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "74468687151a4b4baf58cee7e083c90a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8a152b5c6a2c404ca99773a1d2d3c793", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c21784c6573948e8b0eec85ef9aed335", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "41bcfc8eecd94d189a05c5aeac56600e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b2745bd644be4b63a807a221577a0c1d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9c6437ba67d2470e89dcd8f51eaff916", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "05153defcac54387a8ce869ab6d3bf81", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7e53acf53fe2422daeee1cd81f1d43a5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "576ee4fe01a24f10b305468b263a7d2a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e65f631219b24906bfac49f856ed92e7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f1e9ac7e1c764dcc8c8dd8beb756241c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bbb6e7ffd7f1434d90558b555b64de5e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5cced9e83ea746dd9c5e078238709887", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "523b2f3f88e041b79d5c244595b35533", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "620476d242104125bf92962a4742f830", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cdc0bf772d7242f0af38e2db6243c9da", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "60bd5030aeda48b39bd0db1926ab046a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ff2930a01bdd46beb7e5a2f2bbf0ed1c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "12bcee0ed9124e9a95ba1f37cae43a2a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "435a2775da3141dabbcc91ca5af2081c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8fd6e9af6fe3415f952bbcd749d175ee", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c08f98435c624bd1bf53867731dea82a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e7684994cab247919988f5a32ac5c60c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3aebc07bd6f74ac0926268004dfeea3c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ac7af9e4ed844b10b0c0f6f3887faeae", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4c4169c3b9894edca34070ac7270241f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1fcda9244eda4565803163f402e0a0a0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ad7816b00c8342b68a5b73ec31cf14c3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c9a9ddb6170b4c8ba92a5afd5739080e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "122c4f3f72cd44b69a6bcd4abd4fb8ab", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cfeefb5b8bde40d391c45394acff0ff2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cafc52829a084666bbd4a736a62aa495", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5fa31b37abe14fe69311f21341107eff", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3eb8548c46d54413866b0deed4c09712", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "475315f0cc394626a90ad9bad7a8bbea", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "32c3ee9fc45e4046baf8b61f8dcf0b05", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8cde52f095104854b48949e5b531cf39", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a33f4b72b437420f8f918a73470df6ea", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e76cb1bcc673424296583851df2a9ee4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0e5406d4ba724b74a3aa6c5b3564152d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "857b183610ba427398e9907659fd17a3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d77f5a30dd3a4ffa976820562a666294", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d9316ae16de14337b9b96ea332c26585", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c4b9b613c31d46b39ad8045290b57972", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "826e2dd310964b0782b1cfcb380bc505", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "efe3829f14a149a9a4cc330c08b64f8a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "83195f0da5534769a8c42a28b45819a8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5cfbe19c4dae47a2a63d82307fcc9284", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c0c43b7eaf79407fb0ac42a099e804ca", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9b0cfffecb674aa9baefd66ae9e12e8a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5e1342341c314641a027e3a57bc19fd1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1533fe7b307641a392c31bc1bcace052", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "14db526b0fb8424ab529381851af7cca", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d32089c88ffb404b8afb70a317e53da8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "43b69f0a15894befbee21ba9ed858344", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "71dbb964c9df483b853d6d91bc56bc17", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eed56a5c51ef4c82bd1333582a66ac04", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9fc7625bcba74de5bf175f36f4226464", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0bb1b6a130da4f898205d8880b546a6e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "10896bb94564424db75cacf397e1e82a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "604ff91fda7645d78b2cc7372468fc05", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7e02d370e99441f78cb349aba79fd7cd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1a7c59a2e60d41c9b3c9feb26c411026", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7b1c94e95ebb4e0a853c1d399bb4e968", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "36a14a01aa2448cc8ccabfc48eff5aec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0bbeb226d4cf40cfb042cf5e1db484e0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fc1d2b3093ad4e6ea453868a22b62571", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0f4063209e4348dbaa809afb36cb559e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "860e2dedd4b147b393648446c455f12e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9dec4f04e42c4f7a88b6f45e92d9ec22", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0d779476d8b547b59db79e4ea4fdf33c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a92f7810c9cb4820875917a4b62a6a5d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "02236d20a9404be5bc1a3af08b351c03", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "11b1bd3ea3fc4f55bd643a1848a6431b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f53e29a3fc8a4fd4bf787db4456727b0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8ae21a7791dc41e6b9871c9d80c6b986", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b40ca8d7ca24453c97501baebfffc42b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9eb936ada95e4d0f9647bb3e40a07d89", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "97b2fbcd5841478cb0a8682a71600916", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bd6e3bd773b24b6e97c477d7ac522da2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6eb7deff761146059845e699418973d5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "989f4959e86b4db28fbab80710fd7f4e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a1498f58321e45a0a701b3be91063435", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bc877055e5564bb5920e4d8ac0e93959", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5c452ab0838043ac89609a3461d2a697", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "25594d1e7f214cd5b85fd08f528b3af9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5c73160c631c4704a88e558ee126e087", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8f5bb0d78dc2466ca2bf95c5afdd57a9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6417220f23c6480facc7225f4a985991", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f826270063824b76ae84ab6187e7db15", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6f9adf7ef2d1491880b208b5527f78ed", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "90ed90fcb1f84a30800d0d49c8288ce1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "884a9ffc47fb4b4bac3e4f5c7948c3df", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7b573f393c82491da2fb8b391ca6db71", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9784ae34d4754cb08cc9e2e381bd7899", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3663db3ad20c48b8827818eff09235fb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "07de5589c5b7465d8565d66953abc462", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1ee6ab81a92843719085e09910634778", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c3f64fc979b94a5a8c2991faf0762988", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7cb3f4ba8f91433fb98741aa2da305f2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7ac9623a6e0d4c47b838e5a238620838", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "118b88a9e0be42758e3060eccf01c4cf", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "45ad910aee594f68b044993a74061265", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "38d4d783a83942d6abeb87de83a0b9e8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "efd2bc4d56da46a1961dd2f0e42e1474", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f9addc6cbbce4fda84769115af9f1b04", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "877257b7a9a8450cb693e037ee48a677", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5572f58b8ed843b6a13c959000a8221c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bef459086dc146708af4a63d20af706c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "562a868eb9bd434aa9ac5b6869e90b5e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "407c4f0fdcd144b282af537bd9cd8df0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9f4cbd2f9a72402a84a1deac7a420d15", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "37d0b08a8e91426d805f00470a1b95a1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2faf03e7ddf14e3f945799a39e28cecb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6d4eca8ebb0148b5bc6d168108c973fd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5b4ff602c2a146b9a89807070c5815c2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fab6f26bcf3044c2972e8c4544de33e0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d73d2e43efba4caa93a794c3d6f28d69", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ddad575ba24b4608b8ded0c1761987e1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "aa28cc110e144a18aeb207c3aa99e32b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "62852ebc55774c80bf22bb2702a137f4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "556870c67137403fba9237d2ca8b5cc3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c9edc88ea6aa4a22be276a7b63e15de8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "016be437e82c49c9a689c41fe63b9b49", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "102a6d310ba745a2b369f5bfa2e37f19", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fdb8e07ce5fd4c07b20150f914a32414", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d3476a3744424ab891305bbf2dc4aaa4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a4b065a1c5914fa496ff2a9c70bb2f1a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c285b1f410684e9b8d7881f12417908c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ec653078756e464ca7f82b8a41d80431", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "640f3f17956546a88f1bf924d08a7009", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6c6beef932f84670ae4887876de6f6b3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "11e59dd7e8d04487b3de6525c4c678da", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ce6c709c78634562b70093499b1a32b7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5513d3fd992f4b599431e1f19b2d3751", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cb270ada221f4b7c82da8ac482f80f1b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5540a17b0f08419b83b7742ca5734a40", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ccdcf08f5ffb4f048d584155c9208c46", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2a96a404739c48ea920aeab664f1dfb4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e1f8921a334c41959ee7d20fab0022a9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "307ff91602964206886e973a672a9525", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bc903e9f1d464e4090269ca871b01588", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "79e9e049bb974693a00bc9c4bb6993dd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "96f1d42287254b46a66f09769e6244b5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9c45fa2ff6ba41bdbb922ff65ccf31b2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6901f44a746e4371b62edafd38839c8c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ba40172218f042ca832016bf4b05c85f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "db7e001bae74412d8b8f397b89c4013f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d188bca23d24bf69376b1bf3a720bb7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e09010af0f7743219f86cca5c47eb203", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d0b95ec9ef204158812cd9a708deabdd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8dda73fc405947f1b97e355257e0acd1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d63098eae074310afa03c54a2f60786", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d017d045c4a24d18b123fe36c2a132b7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b3e9db2c617e4ceaacc4eac6a43aa259", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d774881459a405eaa826234afc93c72", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f728d8d45659428288b0a814e6ad6e6d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8ba0797e0ae642b2a7a91b60a549d76e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f483f71a536c4c7b90882ef33ef4ba02", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2d2f098ff9684140a0a8e6c8f3880781", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ad9134cb38d1439092a1575c6b6788ba", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d2bb2a37b70243ffb2991d0d421d973d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f4d67c570bbf4bb7ac4d9e0fb3097e0b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1f7cff9dbb7b47fd91286eba5ededfdc", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c7e1aab698364ac0996c3de4a4616c85", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6c99a4ba9d6c4c45a378cc2967623689", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b194fb0823b447d6870b0e51989ee0d0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1c19e2ff585541c1b43239eef6c60f69", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4ced3e9057874463ac1441459af1fce0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eba97bfb81bf4182bf8190b2740d553b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d03848eb827746518b1ab993d9aec266", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "194381a561f6473c848fa7e85f6ad1e8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c2e82e90e41e4a788d5bfd75f85d3fae", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a1eb6e5fa87445a39af3b64db5aa6429", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "41a9d7f532924c3eab56ed0eccbaabd5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cfd69993799c43c6a8f2273be1845f44", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a63ab0236263409bb7746b5829585d14", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6e3e5b0ee62a46beb6002fcbde48fec5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ff2db7d763444c1f910ba43b3286e819", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5585717fa99843da88413e61d4443531", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3e02c58d21bd4c5486a1c510dbfff2ee", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eb5605b26c1e43c291d7e5d5bc26d3d3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "59e87bcb2964422595246d42e245676a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a326ca6fb16b4824939c615db63b733f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "46fd9391490e4df3bc8437aa2baac07b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3068f59ca00c4672b382d6bd06e8a0c4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f4dac192c1df4bd3a180690e7c3a50a3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1fe9caa7f10a42639226ac10185a3e98", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "60851936ffa84085a0a11e6b1f600b62", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8fa9753c9f3a4c88bc826c6f186083e0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d198e81c9dc4db988e985c3aa8df7f9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c6c14540645a48ef97f5da3b33e9378b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6d5e60e36d174930a7a47770fb5d8534", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "91ceaf03b3294e9e8d9b5ff384f0514b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4ead1ac1d9d54b3e8e307adbec97c397", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ad2fcca881b2455990b891c9519ebca1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d2898f46eec54150af3b59db4cf99807", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "299cd1e70c7e46ffb41e42e445e12126", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9487ddc4851b4f4a8e92218fdb5799a5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "11895923be6e4b5cbc44cc0fa40a2599", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "726e8a3380e3455483c9527d59a46da9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d7a93c2f2b3f4361b6bf0eec5a054aa8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a253cae7f6194d2ca202177f70137031", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3868383dedfc43ba9591abc0cee848ae", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8f82bf14ce3d40e2beb106bc597dcbaf", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9eb7cbc188234f53bfbf6f7c07ce7567", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5e8308b29308407c8c2a8f3eeec15d5c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9b64f5b8a239425fadd393a1cd061e3d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c49d4ac1232441adb0b9ddd8939c549c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "46a200b12dc247d4899691c6ec9bdfce", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "49cbfcdb4be64ca6a1da0f13e88adfc7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7bcd972a904b4cbf9883fa4300c7261a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5ebf88e02a324d50b23beadd0937a893", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bf801dda4d50433796322a378ce46d16", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9b46a132ef5f41ba991b46e5f5be4a5c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a3946263b7d84f5fb01ff1cbffa52401", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0ee8c5ad2e8a41e7a077be6717ed1e63", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7bb79be0932c4deea6a29cf2ec22ddbb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d9f22271c6584acca85b5f2b9c3b221c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "64dcc2b0ac7541b2bdd08624cf2f3c26", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3f4a3d8f08c34161b4f9a0229058df40", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5d67c29a9fbb46a18781d2ddcf14acec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6b1d75e78e9f4423b3dee2908410fcf9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2fc1e2d39b414b8aad28c040c79320c9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ede5db7dba4d413f9d698bf7bb26751f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "57d5acf4718a4bfab85e920b9bc4e793", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "05786ddf80fb4b0fae51239f0904b708", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "476b27b6911f4411aa8aa46c43efc69f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a4969352fc8848b8b879da35b4c5c352", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "470014e1c35d49bfbb9c9767bd47b7e6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "16f31c4e5f95486eb133401b4af1d99a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8bf70dc246464169977bb4a1cbe93919", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "12f4affc63ad41cf88dcd8bd2f8b1d5b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b5ae2d460c10421a8b0a55e796924364", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "553580e0b36040b2bae947a13489ddd6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b0c42b8b54b4496c95f7603ada33d179", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7986968b3bd2443f90a94cd3c176e84f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7ef20f48d5194d6189ff60e4f9e71b15", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bfeee8455a194a7992ae0897e15b83f7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "14ecae5d228b42a5b20df0463d66a641", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b1d341e2521649699103a4d06f384f25", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "900b0de941b54459a997e4b186d5e1d9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e07846ca662548b28e7b5d3ddec23b08", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d65114f95d8c4da19be3867a2017d91e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0a327239827f462389c03c72ea4ac5ad", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "91c709da9c8c4bf48264ee73dc4f4da4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "110f021d4b7546c7a67257bbb76da6c2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6694a2450e8b4da48187819da7c11c6a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3d63dd1b02ec411abf24764cdf6f8c81", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5f1e523dcf2e4c27a1b99ae6ab882640", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "45da6d2e38f84b259ad760267029f10e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6dc70490c21441a0967527b28b5786a3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c108984b1d9b48238255cbc6c35e990b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f106336973bd4f55af4c1e5457e686e1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "import os\n", + "# from haystack.pipeline import TextIndexingPipeline\n", + "from text_indexing_pipeline import TextIndexingPipeline\n", + "\n", + "files_to_index = [doc_dir + \"/\" + f for f in os.listdir(doc_dir)]\n", + "indexing_pipeline = TextIndexingPipeline(document_store)\n", + "indexing_pipeline.run_batch(files_to_index)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way.\n", + "\n", + "As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using `DocumentStore.write_documents()`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Initializing the Retriever\n", + "\n", + "Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from haystack.nodes import TFIDFRetriever\n", + "\n", + "retriever = TFIDFRetriever(document_store=document_store)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Initializing the Reader\n", + "\n", + "A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from haystack.nodes import FARMReader\n", + "\n", + "reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Creating the Retriever-Reader Pipeline\n", + "\n", + "The `ExtractiveQAPipeline` connects the Reader and Retriever. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from haystack.pipelines import ExtractiveQAPipeline\n", + "\n", + "pipe = ExtractiveQAPipeline(reader, retriever)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Asking a Question\n", + "\n", + "1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "prediction = pipe.run(\n", + " query=\"Who is the father of Arya Stark?\",\n", + " params={\n", + " \"Retriever\": {\"top_k\": 10},\n", + " \"Reader\": {\"top_k\": 5}\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "Here are some questions you could try out:\n", + "- Who is the father of Arya Stark?\n", + "- Who created the Dothraki vocabulary?\n", + "- Who is the sister of Sansa?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "2. The answers returned by the pipeline can be printed out directly:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from pprint import pprint\n", + "\n", + "pprint(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "3. Simplify the printed answers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from haystack.utils import print_answers\n", + "\n", + "print_answers(\n", + " prediction,\n", + " details=\"minimum\" ## Choose from `minimum`, `medium` and `all`\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "And there you have it! Congratulations on building your first machine learning based question answering system!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## About us\n", + "\n", + "This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n", + "\n", + "We bring NLP to the industry via open source! \n", + "Our focus: Industry specific language models & large scale QA systems. \n", + " \n", + "Some of our other work: \n", + "- [German BERT](https://deepset.ai/german-bert)\n", + "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", + "\n", + "Get in touch:\n", + "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", + "\n", + "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + }, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/new_tutorials/02_qa_pipeline.ipynb b/new_tutorials/02_qa_pipeline.ipynb new file mode 100644 index 00000000..a37b9f4c --- /dev/null +++ b/new_tutorials/02_qa_pipeline.ipynb @@ -0,0 +1,8467 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "# The Building Blocks of a Scalable Question Answering System\n", + "\n", + "- **Level**: Beginner\n", + "- **Time to complete**: 20 minutes\n", + "- **Prerequisites**: Prepare the Colab environment (see links below).\n", + "- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`, `FARMReader`\n", + "- **Goal**: After completing this tutorial, you will have learned about the Reader, Retriever and `ElasticsearchDocumentStore`. You will index files with and indexing pipeline and combine the Reader and Retriever in a querying pipeline. At the end, you will have built a question answering pipeline that can answer questions about the Game of Thrones series.\n", + "\n", + "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", + "\n", + "This tutorial will introduce you to all the concepts needed to build such a question answering system. It will also use Haystack components such as indexing pipelines, querying pipelines and DocumentStores backed by external database services.\n", + "\n", + "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "\n", + "## Preparing the Colab Environment\n", + "\n", + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Installing Haystack\n", + "\n", + "To start, let's install the latest release of Haystack with `pip`:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: pip in /Users/deepset/anaconda3/lib/python3.8/site-packages (22.3)\n", + "Requirement already satisfied: farm-haystack[colab] in /Users/deepset/Code/haystack (1.6.1rc0)\n", + "Requirement already satisfied: torch<1.13,>1.9 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.11.0)\n", + "Requirement already satisfied: requests in /Users/deepset/.local/lib/python3.8/site-packages (from farm-haystack[colab]) (2.28.1)\n", + "Requirement already satisfied: pydantic in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.6.1)\n", + "Requirement already satisfied: transformers==4.20.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (4.20.1)\n", + "Requirement already satisfied: nltk in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.5)\n", + "Requirement already satisfied: pandas in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.5)\n", + "Requirement already satisfied: dill in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.3.2)\n", + "Requirement already satisfied: tqdm in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (4.47.0)\n", + "Requirement already satisfied: networkx in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.4)\n", + "Requirement already satisfied: mmh3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.0.0)\n", + "Requirement already satisfied: quantulum3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.7.10)\n", + "Requirement already satisfied: posthog in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.4.5)\n", + "Requirement already satisfied: azure-ai-formrecognizer==3.2.0b2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.2.0b2)\n", + "Requirement already satisfied: azure-core<1.23 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.22.1)\n", + "Requirement already satisfied: huggingface-hub<0.8.0,>=0.5.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.7.0)\n", + "Requirement already satisfied: more_itertools in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (8.4.0)\n", + "Requirement already satisfied: python-docx in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.8.10)\n", + "Requirement already satisfied: langdetect in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.8)\n", + "Requirement already satisfied: tika in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.24)\n", + "Requirement already satisfied: sentence-transformers>=2.2.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.2.0)\n", + "Requirement already satisfied: scipy>=1.3.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.5.0)\n", + "Requirement already satisfied: scikit-learn>=1.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.2)\n", + "Requirement already satisfied: seqeval in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.0.12)\n", + "Requirement already satisfied: mlflow in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.0)\n", + "Requirement already satisfied: elasticsearch<7.11,>=7.7 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (7.8.1)\n", + "Requirement already satisfied: elastic-apm in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (5.8.1)\n", + "Requirement already satisfied: rapidfuzz<3,>=2.0.15 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.5.0)\n", + "Requirement already satisfied: jsonschema in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.2.0)\n", + "Requirement already satisfied: grpcio==1.43.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.43.0)\n", + "Requirement already satisfied: msrest>=0.6.21 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (0.6.21)\n", + "Requirement already satisfied: six>=1.11.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.15.0)\n", + "Requirement already satisfied: azure-common~=1.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.1.28)\n", + "Requirement already satisfied: filelock in /Users/deepset/.local/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (3.8.0)\n", + "Requirement already satisfied: numpy>=1.17 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (1.18.5)\n", + "Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (0.12.1)\n", + "Requirement already satisfied: packaging>=20.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (21.3)\n", + "Requirement already satisfied: pyyaml>=5.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (5.3.1)\n", + "Requirement already satisfied: regex!=2019.12.17 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (2020.6.8)\n", + "Requirement already satisfied: urllib3>=1.21.1 in /Users/deepset/.local/lib/python3.8/site-packages (from elasticsearch<7.11,>=7.7->farm-haystack[colab]) (1.26.11)\n", + "Requirement already satisfied: certifi in /Users/deepset/anaconda3/lib/python3.8/site-packages (from elasticsearch<7.11,>=7.7->farm-haystack[colab]) (2020.6.20)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from huggingface-hub<0.8.0,>=0.5.0->farm-haystack[colab]) (4.1.1)\n", + "Requirement already satisfied: jarowinkler<2.0.0,>=1.2.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from rapidfuzz<3,>=2.0.15->farm-haystack[colab]) (1.2.1)\n", + "Requirement already satisfied: idna<4,>=2.5 in /Users/deepset/.local/lib/python3.8/site-packages (from requests->farm-haystack[colab]) (3.3)\n", + "Requirement already satisfied: charset-normalizer<3,>=2 in /Users/deepset/.local/lib/python3.8/site-packages (from requests->farm-haystack[colab]) (2.1.0)\n", + "Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from scikit-learn>=1.0.0->farm-haystack[colab]) (2.1.0)\n", + "Requirement already satisfied: joblib>=0.11 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from scikit-learn>=1.0.0->farm-haystack[colab]) (0.16.0)\n", + "Requirement already satisfied: sentencepiece in /Users/deepset/anaconda3/lib/python3.8/site-packages (from sentence-transformers>=2.2.0->farm-haystack[colab]) (0.1.91)\n", + "Requirement already satisfied: torchvision in /Users/deepset/anaconda3/lib/python3.8/site-packages (from sentence-transformers>=2.2.0->farm-haystack[colab]) (0.12.0)\n", + "Requirement already satisfied: setuptools in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (49.2.0.post20200714)\n", + "Requirement already satisfied: pyrsistent>=0.14.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (0.16.0)\n", + "Requirement already satisfied: attrs>=17.4.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (19.3.0)\n", + "Requirement already satisfied: python-dateutil in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (2.8.1)\n", + "Requirement already satisfied: databricks-cli>=0.8.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.11.0)\n", + "Requirement already satisfied: sqlalchemy in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.3.18)\n", + "Requirement already satisfied: cloudpickle in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.5.0)\n", + "Requirement already satisfied: docker>=3.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (4.3.0)\n", + "Requirement already satisfied: Flask in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.1.2)\n", + "Requirement already satisfied: entrypoints in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.3)\n", + "Requirement already satisfied: protobuf>=3.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.12.4)\n", + "Requirement already satisfied: querystring-parser in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.2.4)\n", + "Requirement already satisfied: gitpython>=2.1.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.1.7)\n", + "Requirement already satisfied: sqlparse in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.3.1)\n", + "Requirement already satisfied: alembic in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.4.2)\n", + "Requirement already satisfied: click>=7.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (7.1.2)\n", + "Requirement already satisfied: simplejson in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.17.2)\n", + "Requirement already satisfied: gunicorn in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (20.0.4)\n", + "Requirement already satisfied: decorator>=4.3.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from networkx->farm-haystack[colab]) (4.4.2)\n", + "Requirement already satisfied: pytz>=2017.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from pandas->farm-haystack[colab]) (2020.1)\n", + "Requirement already satisfied: backoff<2.0.0,>=1.10.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from posthog->farm-haystack[colab]) (1.11.1)\n", + "Requirement already satisfied: monotonic>=1.5 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from posthog->farm-haystack[colab]) (1.6)\n", + "Requirement already satisfied: lxml>=2.3.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from python-docx->farm-haystack[colab]) (4.5.2)\n", + "Requirement already satisfied: inflect in /Users/deepset/anaconda3/lib/python3.8/site-packages (from quantulum3->farm-haystack[colab]) (5.4.0)\n", + "Requirement already satisfied: num2words in /Users/deepset/anaconda3/lib/python3.8/site-packages (from quantulum3->farm-haystack[colab]) (0.5.10)\n", + "Requirement already satisfied: Keras>=2.2.4 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from seqeval->farm-haystack[colab]) (2.4.3)\n", + "Requirement already satisfied: tabulate>=0.7.7 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from databricks-cli>=0.8.0->mlflow->farm-haystack[colab]) (0.8.7)\n", + "Requirement already satisfied: websocket-client>=0.32.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from docker>=3.6.0->mlflow->farm-haystack[colab]) (0.57.0)\n", + "Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from gitpython>=2.1.0->mlflow->farm-haystack[colab]) (4.0.5)\n", + "Requirement already satisfied: h5py in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Keras>=2.2.4->seqeval->farm-haystack[colab]) (2.10.0)\n", + "Requirement already satisfied: isodate>=0.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (0.6.1)\n", + "Requirement already satisfied: requests-oauthlib>=0.5.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.3.1)\n", + "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from packaging>=20.0->transformers==4.20.1->farm-haystack[colab]) (2.4.7)\n", + "Requirement already satisfied: python-editor>=0.3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from alembic->mlflow->farm-haystack[colab]) (1.0.4)\n", + "Requirement already satisfied: Mako in /Users/deepset/anaconda3/lib/python3.8/site-packages (from alembic->mlflow->farm-haystack[colab]) (1.1.3)\n", + "Requirement already satisfied: itsdangerous>=0.24 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (1.1.0)\n", + "Requirement already satisfied: Werkzeug>=0.15 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (0.16.1)\n", + "Requirement already satisfied: Jinja2>=2.10.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (2.11.2)\n", + "Requirement already satisfied: docopt>=0.6.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from num2words->quantulum3->farm-haystack[colab]) (0.6.2)\n", + "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from torchvision->sentence-transformers>=2.2.0->farm-haystack[colab]) (7.2.0)\n", + "Requirement already satisfied: smmap<4,>=3.0.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->gitpython>=2.1.0->mlflow->farm-haystack[colab]) (3.0.4)\n", + "Requirement already satisfied: MarkupSafe>=0.23 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Jinja2>=2.10.1->Flask->mlflow->farm-haystack[colab]) (1.1.1)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from requests-oauthlib>=0.5.0->msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (3.2.0)\n" + ] + } + ], + "source": [ + "%%bash\n", + "\n", + "pip install --upgrade pip\n", + "pip install farm-haystack[colab]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Initializing the ElasticsearchDocumentStore\n", + "\n", + "A DocumentStore stores the documents that the question answering system uses to find answers to your questions. Here we are using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs indepedently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "1. Download, extract and set the permissions for the Elasticsearch installation image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "%%bash\n", + "\n", + "wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q\n", + "tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz\n", + "chown -R daemon:daemon elasticsearch-7.9.2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "2. Start the server." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "%%bash --bg\n", + "\n", + "sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done manually, or using our `launch_es()` utility function." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "3. Wait 30s to ensure that the server has fully started up." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "import time\n", + "time.sleep(30)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "4. Initialize the `ElasticsearchDocumentStore`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/deepset/anaconda3/lib/python3.8/site-packages/tqdm/std.py:668: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version\n", + " from pandas import Panel\n" + ] + } + ], + "source": [ + "import os\n", + "from haystack.document_stores import ElasticsearchDocumentStore\n", + "\n", + "# Get the host where Elasticsearch is running, default to localhost\n", + "host = os.environ.get(\"ELASTICSEARCH_HOST\", \"localhost\")\n", + "\n", + "document_store = ElasticsearchDocumentStore(\n", + " host=host,\n", + " username=\"\",\n", + " password=\"\",\n", + " index=\"document\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Indexing Documents with a Pipeline\n", + "\n", + "You can write Documents into your DocumentStore using an indexing pipeline. Pipelines are composed of nodes that perform different kinds of processing. For example, here we will be using the `TextConverter` which turns `.txt` files into Haystack `Document` objects, as well as the `PreProcessor` which can clean and split the text within a `Document`. \n", + "\n", + "Once all components are combined, the indexing pipeline will ingest `.txt` filepaths, preprocess them and write them into the DocumentStore.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from haystack.utils import fetch_archive_from_http\n", + "\n", + "doc_dir = \"data/tutorial1\"\n", + "\n", + "fetch_archive_from_http(\n", + " url=\"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip\",\n", + " output_dir=doc_dir\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "2. Initialize the pipeline, TextConverter and PreProcessor." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2b6e6115f8d844cbbe0e48ba340f3a4d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b6a3f600560748dfb8fff380b5923359", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "992b78efdac04f0092c4a2d0ded53b97", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b3f18970fc554acb8893b2c77c7bd139", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e3dfdab6c8454eb193f0bfdce059c923", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f12af0e6935b4ed38e8d143c2734446a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8eb3665460a24d4e8e2d63b7bd2cb81c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f1459a48334444e1a8dd918fc57cdb02", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ba41ec0fc385436aa4f358739a4420a8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ab8f6f3475c74e708018b9017c2757c3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "976bf694055944dea7c74c0e071e2ee5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0ad4629f763e496eb55a9da34f6a8466", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6bfd18eaf55a49d7ad70a33b8dd034ba", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a68a8754b8194a629c20b73b62d86c64", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7d35361c50294a618d746b6365261164", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "16f3c191057844f9acbfc67377893d64", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "01264e99f990441b9f4a2977e72cb4a4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "220aa15ca0584d9ca38c925f4e4c8653", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a043e684228e474d9ba93068f08a35e8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "297b5be864964140828e3c1c51f6c0a3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:haystack.nodes.preprocessor.preprocessor:One or more sentence found with word count higher than the split length.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "10856412f6134296a20af0e160276de3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "896b3da25fda4561ac78314ea3f38ee5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "454b53a5fa7941aa8c7c3a1f543ead5a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ac6e314b45194b71908e60212e17afac", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1ebcb622daf84b6f8a8cdd4ba6fba272", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a238138a92ea4b6d8477d53a4e7abd60", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "09d757bf8566447d87a0e70bd29563c4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e76a2022d2e341b898587d7e4dbd7ab5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7767598fb03c49d6b2e314ebf9aa8257", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "86bfd19f9b4f4e1198e792e980b3fa08", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6854db2033654d93b92dc6fffbfb1ae5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "23ee0f4dfcee4d8e954ea4e4f6024b07", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f0c5e8c95e244fa3a6d470ab17494fec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4bd3a2ca03f54826b6b2160d1c7afad3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "69c9d975e49c4eb889ed3330b4a841c4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7e4ae1f4a77846d0992b37056b0d46bd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b32d554072ec464e9c03c537b2b48d79", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8a4e8395334f4e1493db06d801613047", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "283a54a3ec5342dcbf3c5c18f8affe87", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a6737f32fc954140b9728beade5a02e1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b21ee5704e1f4c62a81f0ea9b534a140", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "aa10487a8d6f4c36abeeadb90c442b9e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "74cef7a45e1247b8883bb6662fe47fb6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "effc7fc6d6ef43c4b48df4a6ee02c000", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eaf68d18bf0c4760a4d84ad2b3428c52", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3c88c6c86c1e49038bcb27365e31b989", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "830389d3af704422a25773e2bcc464a5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3cee68a49aea478698191da098de68e3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "00571c7037934d1cbe9ba9d597a00e8a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c3f2a7593a60491dbe349b3bb9ce7a62", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0909c2a892b541fd901efe65379c7738", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a5f3dba309cc4a4b95011b2e770020f9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5ac213e0be56489cb57fb127f6b4e60a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cdc162214efc4d48aa56c058187006e7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8ae726cf3bba40b7987b25a6f64dd301", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f52e33734bf04efcbaf72ce7555443bd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ee4c60dd41cb4c52b900d1fcd3b4b447", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d939b476bcf84edbb1a8194399148fd3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "19990ed6799744abbd7bc3dbbb73de3c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6a0d25714cbc4c3e99306beb41f0bbee", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d9d8219143104fb699d1df18254db7d2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5cbfad03ad2b4b04a729a458e5b22abc", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bb69fbbf21064c3da123819aead636fb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9e81ab1cbba54d86ab6a744915cefc91", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6578451b85a749679adf0fb6c87fee1e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4db2eefaedbb4f2cbcf4e452f6d3ccf2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "59e82112bd1947628cf7dda7b240e3a9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a379ac4739f046e7afa7308e6b9e5193", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "78e12c20df2048c8a998759fca3cbb18", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a92f6fbb90164cb7a3c740261dca85ec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5d53a88f055e4c0f9139f3769e44d3f2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5d7e9be7b74946bab67a9522d37a709f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d109314969194067a2bc646457fcbec6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f72c246c75224de3b0ea29873ac80cfb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "427dfea24d5848f88d7c0d30a2405fe1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "def0e371ceea46818837d19d31320b7b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d88e56968964103a9de9b6c493ee39a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b0d6b686960640c38849ea369bea97a1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fe0a4653015d431e858f542854ebaca0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ba657d4f52b24ba6a9be056160dbd950", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4734370e764b4ff883913681a09b15a5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9e8c3a3e4ba94331884e98699bbeecff", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "40e127de69874b81b2d4a8056651faa0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "84c4f7b6a0cb42bfbd474524bf552f6e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "be166a6350e04cf998b205fc73ffbd80", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f8aaa4596f62416b9a259d4d2bf9b96b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a1c19009bbec4cdb912bfa75e7cc4b19", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c19c94d0831d46b9aed02126a9976f97", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "828720f91e78497b9729d44e6eb120ec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5fda23fb35d64aadb6e972ca0787c2ba", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4b750f8ec618499eb62a9102082f10b0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d6ad87d5724d41ac9e9b05d29935062d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cc5ed2f7d31a4fa9982582e51246cf36", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4e7df01f2bdf46feaef9e4554969f6da", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b965db4594b64ec5a2c47e484f01bd6a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a2b9d365c7b94bee85c2a8e81d647a41", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4054ae5a111d4c38a86d69341ba1ef17", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c06beaf6a5504d418e2c2a7f66d9c2fb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "75502402f7374f8da45db944afc55643", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f8863e05ac254d77916257d3c819140d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5dca4734c44044ce90232fcdd3d49685", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f1f69d47a667479fafe71419dfad3eff", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3cf0f1ba829745c8b4aa14a63b9b9d92", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "95880c283d7442758c77ba5874a6a6af", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bff1d5ee9d804134adf4fdfdc440a98f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "905fce61fb084a518bdc81e3f28f179d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "815156c1dfb7463ba5e16c97d477ac3c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9f915e6a26bc4844a9711c29e9d08501", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ea5fc6f63117456cbd0902223a14ccc3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "045ec8738307493bb762d3a4230d61d2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1bc404e8dc7c4b83be47c436e7ac5feb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2769981bad92453a9b59eecfb74ceeef", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "74468687151a4b4baf58cee7e083c90a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8a152b5c6a2c404ca99773a1d2d3c793", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c21784c6573948e8b0eec85ef9aed335", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "41bcfc8eecd94d189a05c5aeac56600e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b2745bd644be4b63a807a221577a0c1d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9c6437ba67d2470e89dcd8f51eaff916", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "05153defcac54387a8ce869ab6d3bf81", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7e53acf53fe2422daeee1cd81f1d43a5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "576ee4fe01a24f10b305468b263a7d2a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e65f631219b24906bfac49f856ed92e7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f1e9ac7e1c764dcc8c8dd8beb756241c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bbb6e7ffd7f1434d90558b555b64de5e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5cced9e83ea746dd9c5e078238709887", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "523b2f3f88e041b79d5c244595b35533", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "620476d242104125bf92962a4742f830", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cdc0bf772d7242f0af38e2db6243c9da", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "60bd5030aeda48b39bd0db1926ab046a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ff2930a01bdd46beb7e5a2f2bbf0ed1c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "12bcee0ed9124e9a95ba1f37cae43a2a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "435a2775da3141dabbcc91ca5af2081c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8fd6e9af6fe3415f952bbcd749d175ee", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c08f98435c624bd1bf53867731dea82a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e7684994cab247919988f5a32ac5c60c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3aebc07bd6f74ac0926268004dfeea3c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ac7af9e4ed844b10b0c0f6f3887faeae", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4c4169c3b9894edca34070ac7270241f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1fcda9244eda4565803163f402e0a0a0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ad7816b00c8342b68a5b73ec31cf14c3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c9a9ddb6170b4c8ba92a5afd5739080e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "122c4f3f72cd44b69a6bcd4abd4fb8ab", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cfeefb5b8bde40d391c45394acff0ff2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cafc52829a084666bbd4a736a62aa495", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5fa31b37abe14fe69311f21341107eff", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3eb8548c46d54413866b0deed4c09712", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "475315f0cc394626a90ad9bad7a8bbea", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "32c3ee9fc45e4046baf8b61f8dcf0b05", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8cde52f095104854b48949e5b531cf39", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a33f4b72b437420f8f918a73470df6ea", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e76cb1bcc673424296583851df2a9ee4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0e5406d4ba724b74a3aa6c5b3564152d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "857b183610ba427398e9907659fd17a3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d77f5a30dd3a4ffa976820562a666294", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d9316ae16de14337b9b96ea332c26585", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c4b9b613c31d46b39ad8045290b57972", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "826e2dd310964b0782b1cfcb380bc505", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "efe3829f14a149a9a4cc330c08b64f8a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "83195f0da5534769a8c42a28b45819a8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5cfbe19c4dae47a2a63d82307fcc9284", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c0c43b7eaf79407fb0ac42a099e804ca", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9b0cfffecb674aa9baefd66ae9e12e8a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5e1342341c314641a027e3a57bc19fd1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1533fe7b307641a392c31bc1bcace052", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "14db526b0fb8424ab529381851af7cca", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d32089c88ffb404b8afb70a317e53da8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "43b69f0a15894befbee21ba9ed858344", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "71dbb964c9df483b853d6d91bc56bc17", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eed56a5c51ef4c82bd1333582a66ac04", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9fc7625bcba74de5bf175f36f4226464", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0bb1b6a130da4f898205d8880b546a6e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "10896bb94564424db75cacf397e1e82a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "604ff91fda7645d78b2cc7372468fc05", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7e02d370e99441f78cb349aba79fd7cd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1a7c59a2e60d41c9b3c9feb26c411026", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7b1c94e95ebb4e0a853c1d399bb4e968", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "36a14a01aa2448cc8ccabfc48eff5aec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0bbeb226d4cf40cfb042cf5e1db484e0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fc1d2b3093ad4e6ea453868a22b62571", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0f4063209e4348dbaa809afb36cb559e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "860e2dedd4b147b393648446c455f12e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9dec4f04e42c4f7a88b6f45e92d9ec22", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0d779476d8b547b59db79e4ea4fdf33c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a92f7810c9cb4820875917a4b62a6a5d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "02236d20a9404be5bc1a3af08b351c03", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "11b1bd3ea3fc4f55bd643a1848a6431b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f53e29a3fc8a4fd4bf787db4456727b0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8ae21a7791dc41e6b9871c9d80c6b986", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b40ca8d7ca24453c97501baebfffc42b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9eb936ada95e4d0f9647bb3e40a07d89", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "97b2fbcd5841478cb0a8682a71600916", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bd6e3bd773b24b6e97c477d7ac522da2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6eb7deff761146059845e699418973d5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "989f4959e86b4db28fbab80710fd7f4e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a1498f58321e45a0a701b3be91063435", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bc877055e5564bb5920e4d8ac0e93959", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5c452ab0838043ac89609a3461d2a697", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "25594d1e7f214cd5b85fd08f528b3af9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5c73160c631c4704a88e558ee126e087", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8f5bb0d78dc2466ca2bf95c5afdd57a9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6417220f23c6480facc7225f4a985991", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f826270063824b76ae84ab6187e7db15", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6f9adf7ef2d1491880b208b5527f78ed", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "90ed90fcb1f84a30800d0d49c8288ce1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "884a9ffc47fb4b4bac3e4f5c7948c3df", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7b573f393c82491da2fb8b391ca6db71", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9784ae34d4754cb08cc9e2e381bd7899", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3663db3ad20c48b8827818eff09235fb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "07de5589c5b7465d8565d66953abc462", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1ee6ab81a92843719085e09910634778", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c3f64fc979b94a5a8c2991faf0762988", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7cb3f4ba8f91433fb98741aa2da305f2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7ac9623a6e0d4c47b838e5a238620838", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "118b88a9e0be42758e3060eccf01c4cf", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "45ad910aee594f68b044993a74061265", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "38d4d783a83942d6abeb87de83a0b9e8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "efd2bc4d56da46a1961dd2f0e42e1474", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f9addc6cbbce4fda84769115af9f1b04", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "877257b7a9a8450cb693e037ee48a677", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5572f58b8ed843b6a13c959000a8221c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bef459086dc146708af4a63d20af706c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "562a868eb9bd434aa9ac5b6869e90b5e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "407c4f0fdcd144b282af537bd9cd8df0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9f4cbd2f9a72402a84a1deac7a420d15", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "37d0b08a8e91426d805f00470a1b95a1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2faf03e7ddf14e3f945799a39e28cecb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6d4eca8ebb0148b5bc6d168108c973fd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5b4ff602c2a146b9a89807070c5815c2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fab6f26bcf3044c2972e8c4544de33e0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d73d2e43efba4caa93a794c3d6f28d69", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ddad575ba24b4608b8ded0c1761987e1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "aa28cc110e144a18aeb207c3aa99e32b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "62852ebc55774c80bf22bb2702a137f4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "556870c67137403fba9237d2ca8b5cc3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c9edc88ea6aa4a22be276a7b63e15de8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "016be437e82c49c9a689c41fe63b9b49", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "102a6d310ba745a2b369f5bfa2e37f19", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "fdb8e07ce5fd4c07b20150f914a32414", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d3476a3744424ab891305bbf2dc4aaa4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a4b065a1c5914fa496ff2a9c70bb2f1a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c285b1f410684e9b8d7881f12417908c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ec653078756e464ca7f82b8a41d80431", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "640f3f17956546a88f1bf924d08a7009", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6c6beef932f84670ae4887876de6f6b3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "11e59dd7e8d04487b3de6525c4c678da", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ce6c709c78634562b70093499b1a32b7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5513d3fd992f4b599431e1f19b2d3751", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cb270ada221f4b7c82da8ac482f80f1b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5540a17b0f08419b83b7742ca5734a40", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ccdcf08f5ffb4f048d584155c9208c46", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2a96a404739c48ea920aeab664f1dfb4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e1f8921a334c41959ee7d20fab0022a9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "307ff91602964206886e973a672a9525", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bc903e9f1d464e4090269ca871b01588", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "79e9e049bb974693a00bc9c4bb6993dd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "96f1d42287254b46a66f09769e6244b5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9c45fa2ff6ba41bdbb922ff65ccf31b2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6901f44a746e4371b62edafd38839c8c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ba40172218f042ca832016bf4b05c85f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "db7e001bae74412d8b8f397b89c4013f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d188bca23d24bf69376b1bf3a720bb7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e09010af0f7743219f86cca5c47eb203", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d0b95ec9ef204158812cd9a708deabdd", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8dda73fc405947f1b97e355257e0acd1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d63098eae074310afa03c54a2f60786", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d017d045c4a24d18b123fe36c2a132b7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b3e9db2c617e4ceaacc4eac6a43aa259", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d774881459a405eaa826234afc93c72", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f728d8d45659428288b0a814e6ad6e6d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8ba0797e0ae642b2a7a91b60a549d76e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f483f71a536c4c7b90882ef33ef4ba02", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2d2f098ff9684140a0a8e6c8f3880781", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ad9134cb38d1439092a1575c6b6788ba", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d2bb2a37b70243ffb2991d0d421d973d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f4d67c570bbf4bb7ac4d9e0fb3097e0b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1f7cff9dbb7b47fd91286eba5ededfdc", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c7e1aab698364ac0996c3de4a4616c85", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6c99a4ba9d6c4c45a378cc2967623689", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b194fb0823b447d6870b0e51989ee0d0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1c19e2ff585541c1b43239eef6c60f69", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4ced3e9057874463ac1441459af1fce0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eba97bfb81bf4182bf8190b2740d553b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d03848eb827746518b1ab993d9aec266", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "194381a561f6473c848fa7e85f6ad1e8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c2e82e90e41e4a788d5bfd75f85d3fae", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a1eb6e5fa87445a39af3b64db5aa6429", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "41a9d7f532924c3eab56ed0eccbaabd5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cfd69993799c43c6a8f2273be1845f44", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a63ab0236263409bb7746b5829585d14", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6e3e5b0ee62a46beb6002fcbde48fec5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ff2db7d763444c1f910ba43b3286e819", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5585717fa99843da88413e61d4443531", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3e02c58d21bd4c5486a1c510dbfff2ee", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "eb5605b26c1e43c291d7e5d5bc26d3d3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "59e87bcb2964422595246d42e245676a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a326ca6fb16b4824939c615db63b733f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "46fd9391490e4df3bc8437aa2baac07b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3068f59ca00c4672b382d6bd06e8a0c4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f4dac192c1df4bd3a180690e7c3a50a3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "1fe9caa7f10a42639226ac10185a3e98", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "60851936ffa84085a0a11e6b1f600b62", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8fa9753c9f3a4c88bc826c6f186083e0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4d198e81c9dc4db988e985c3aa8df7f9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c6c14540645a48ef97f5da3b33e9378b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6d5e60e36d174930a7a47770fb5d8534", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "91ceaf03b3294e9e8d9b5ff384f0514b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4ead1ac1d9d54b3e8e307adbec97c397", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ad2fcca881b2455990b891c9519ebca1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d2898f46eec54150af3b59db4cf99807", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "299cd1e70c7e46ffb41e42e445e12126", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9487ddc4851b4f4a8e92218fdb5799a5", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "11895923be6e4b5cbc44cc0fa40a2599", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "726e8a3380e3455483c9527d59a46da9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d7a93c2f2b3f4361b6bf0eec5a054aa8", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a253cae7f6194d2ca202177f70137031", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3868383dedfc43ba9591abc0cee848ae", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8f82bf14ce3d40e2beb106bc597dcbaf", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9eb7cbc188234f53bfbf6f7c07ce7567", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5e8308b29308407c8c2a8f3eeec15d5c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9b64f5b8a239425fadd393a1cd061e3d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c49d4ac1232441adb0b9ddd8939c549c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "46a200b12dc247d4899691c6ec9bdfce", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "49cbfcdb4be64ca6a1da0f13e88adfc7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7bcd972a904b4cbf9883fa4300c7261a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5ebf88e02a324d50b23beadd0937a893", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bf801dda4d50433796322a378ce46d16", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9b46a132ef5f41ba991b46e5f5be4a5c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a3946263b7d84f5fb01ff1cbffa52401", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0ee8c5ad2e8a41e7a077be6717ed1e63", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7bb79be0932c4deea6a29cf2ec22ddbb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d9f22271c6584acca85b5f2b9c3b221c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "64dcc2b0ac7541b2bdd08624cf2f3c26", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3f4a3d8f08c34161b4f9a0229058df40", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5d67c29a9fbb46a18781d2ddcf14acec", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6b1d75e78e9f4423b3dee2908410fcf9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2fc1e2d39b414b8aad28c040c79320c9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ede5db7dba4d413f9d698bf7bb26751f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "57d5acf4718a4bfab85e920b9bc4e793", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "05786ddf80fb4b0fae51239f0904b708", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "476b27b6911f4411aa8aa46c43efc69f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a4969352fc8848b8b879da35b4c5c352", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "470014e1c35d49bfbb9c9767bd47b7e6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "16f31c4e5f95486eb133401b4af1d99a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8bf70dc246464169977bb4a1cbe93919", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "12f4affc63ad41cf88dcd8bd2f8b1d5b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b5ae2d460c10421a8b0a55e796924364", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "553580e0b36040b2bae947a13489ddd6", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b0c42b8b54b4496c95f7603ada33d179", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7986968b3bd2443f90a94cd3c176e84f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7ef20f48d5194d6189ff60e4f9e71b15", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "bfeee8455a194a7992ae0897e15b83f7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "14ecae5d228b42a5b20df0463d66a641", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b1d341e2521649699103a4d06f384f25", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "900b0de941b54459a997e4b186d5e1d9", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e07846ca662548b28e7b5d3ddec23b08", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d65114f95d8c4da19be3867a2017d91e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0a327239827f462389c03c72ea4ac5ad", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "91c709da9c8c4bf48264ee73dc4f4da4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "110f021d4b7546c7a67257bbb76da6c2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6694a2450e8b4da48187819da7c11c6a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "3d63dd1b02ec411abf24764cdf6f8c81", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5f1e523dcf2e4c27a1b99ae6ab882640", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "45da6d2e38f84b259ad760267029f10e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6dc70490c21441a0967527b28b5786a3", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c108984b1d9b48238255cbc6c35e990b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f106336973bd4f55af4c1e5457e686e1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "from haystack import Pipeline\n", + "from haystack.nodes import TextConverter, PreProcessor\n", + "\n", + "indexing_pipeline = Pipeline()\n", + "text_converter = TextConverter()\n", + "preprocessor = PreProcessor(\n", + " clean_whitespace=True,\n", + " clean_header_footer=True,\n", + " clean_empty_lines=True,\n", + " split_by=\"word\",\n", + " split_length=200,\n", + " split_overlap=20,\n", + " split_respect_sentence_boundary=True,\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "To learn more about the parameters of the `PreProcessor`, see [Usage](https://docs.haystack.deepset.ai/docs/preprocessor#usage). To understand why document splitting is important for your question answering system's performance, see [Document Length](https://docs.haystack.deepset.ai/docs/optimization#document-length)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "2. Populate the indexing pipeline with nodes. You should provide the `name` or `name`s of preceding nodes as the `input` argument. Note that in an indexing pipeline, the input to the first node is \"File\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "indexing_pipeline.add_node(component=text_converter, name=\"TextConverter\", inputs=[\"File\"])\n", + "indexing_pipeline.add_node(component=preprocessor, name=\"PreProcessor\", inputs=[\"TextConverter\"])\n", + "indexing_pipeline.add_node(component=document_store, name=\"DocumentStore\", inputs=[\"PreProcessor\"])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "3. Write the text data into the DocumentStore by running the indexing pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "files_to_index = [doc_dir + \"/\" + f for f in os.listdir(doc_dir)]\n", + "indexing_pipeline.run_batch(files_to_index)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way.\n", + "\n", + "As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using `DocumentStore.write_documents()`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Initializing the Retriever\n", + "\n", + "Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Here we are using the BM25 algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from haystack.nodes import BM25Retriever\n", + "\n", + "retriever = BM25Retriever(document_store=document_store)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Initializing the Reader\n", + "\n", + "A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from haystack.nodes import FARMReader\n", + "\n", + "reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Creating the Retriever-Reader Pipeline\n", + "\n", + "You can combine the Reader and Retriever in a querying pipeline using the `Pipeline` class. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "1. Initialize the `Pipeline` object and add the Retriever and Reader as nodes. You should provide the `name` or `name`s of preceding nodes as the input argument. Note that in a querying pipeline, the input to the first node is \"Query\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from haystack import Pipeline\n", + "\n", + "querying_pipeline = Pipeline()\n", + "querying_pipeline.add_node(component=retriever, name=\"Retriever\", inputs=[\"Query\"])\n", + "querying_pipeline.add_node(component=reader, name=\"Reader\", inputs=[\"Retriever\"])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Asking a Question\n", + "\n", + "1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "prediction = querying_pipeline.run(\n", + " query=\"Who is the father of Arya Stark?\",\n", + " params={\n", + " \"Retriever\": {\"top_k\": 10},\n", + " \"Reader\": {\"top_k\": 5}\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "Here are some questions you could try out:\n", + "- Who is the father of Arya Stark?\n", + "- Who created the Dothraki vocabulary?\n", + "- Who is the sister of Sansa?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "2. The answers returned by the pipeline can be printed out directly:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from pprint import pprint\n", + "\n", + "pprint(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "3. Simplify the printed answers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "from haystack.utils import print_answers\n", + "\n", + "print_answers(\n", + " prediction,\n", + " details=\"minimum\" ## Choose from `minimum`, `medium` and `all`\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "And there you have it! Congratulations on building a scalable machine learning based question answering system!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## About us\n", + "\n", + "This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n", + "\n", + "We bring NLP to the industry via open source! \n", + "Our focus: Industry specific language models & large scale QA systems. \n", + " \n", + "Some of our other work: \n", + "- [German BERT](https://deepset.ai/german-bert)\n", + "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", + "\n", + "Get in touch:\n", + "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", + "\n", + "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + }, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file From 464ba3ce4c6cae6730c5d06729526baa423c9261 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 1 Nov 2022 12:27:55 +0100 Subject: [PATCH 11/48] Clear output --- new_tutorials/01_simplified_qa_pipeline.ipynb | 8012 +--------------- new_tutorials/02_qa_pipeline.ipynb | 8121 +---------------- 2 files changed, 102 insertions(+), 16031 deletions(-) diff --git a/new_tutorials/01_simplified_qa_pipeline.ipynb b/new_tutorials/01_simplified_qa_pipeline.ipynb index 520054a4..6e6657df 100644 --- a/new_tutorials/01_simplified_qa_pipeline.ipynb +++ b/new_tutorials/01_simplified_qa_pipeline.ipynb @@ -2,11 +2,7 @@ "cells": [ { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "# Build Your First Question Answering System\n", "\n", @@ -18,7 +14,7 @@ "\n", "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", "\n", - "This tutorial will introduce you to all the concepts needed to build such a question answering system, but simplify certain set up steps, such as Document preparation and indexing as well as pipeline initialization are simplified so that you can get started quicker.\n", + "This tutorial will introduce you to all the concepts needed to build such a question answering system. However, certain setup steps, such as Document preparation and indexing as well as pipeline initialization, are simplified so that you can get started quicker.\n", "\n", "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n", "\n" @@ -26,11 +22,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "\n", "## Preparing the Colab Environment\n", @@ -42,11 +34,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Installing Haystack\n", "\n", @@ -55,113 +43,9 @@ }, { "cell_type": "code", - "execution_count": 1, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: pip in /Users/deepset/anaconda3/lib/python3.8/site-packages (22.3)\n", - "Requirement already satisfied: farm-haystack[colab] in /Users/deepset/Code/haystack (1.6.1rc0)\n", - "Requirement already satisfied: torch<1.13,>1.9 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.11.0)\n", - "Requirement already satisfied: requests in /Users/deepset/.local/lib/python3.8/site-packages (from farm-haystack[colab]) (2.28.1)\n", - "Requirement already satisfied: pydantic in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.6.1)\n", - "Requirement already satisfied: transformers==4.20.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (4.20.1)\n", - "Requirement already satisfied: nltk in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.5)\n", - "Requirement already satisfied: pandas in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.5)\n", - "Requirement already satisfied: dill in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.3.2)\n", - "Requirement already satisfied: tqdm in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (4.47.0)\n", - "Requirement already satisfied: networkx in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.4)\n", - "Requirement already satisfied: mmh3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.0.0)\n", - "Requirement already satisfied: quantulum3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.7.10)\n", - "Requirement already satisfied: posthog in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.4.5)\n", - "Requirement already satisfied: azure-ai-formrecognizer==3.2.0b2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.2.0b2)\n", - "Requirement already satisfied: azure-core<1.23 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.22.1)\n", - "Requirement already satisfied: huggingface-hub<0.8.0,>=0.5.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.7.0)\n", - "Requirement already satisfied: more_itertools in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (8.4.0)\n", - "Requirement already satisfied: python-docx in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.8.10)\n", - "Requirement already satisfied: langdetect in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.8)\n", - "Requirement already satisfied: tika in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.24)\n", - "Requirement already satisfied: sentence-transformers>=2.2.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.2.0)\n", - "Requirement already satisfied: scipy>=1.3.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.5.0)\n", - "Requirement already satisfied: scikit-learn>=1.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.2)\n", - "Requirement already satisfied: seqeval in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.0.12)\n", - "Requirement already satisfied: mlflow in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.0)\n", - "Requirement already satisfied: elasticsearch<7.11,>=7.7 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (7.8.1)\n", - "Requirement already satisfied: elastic-apm in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (5.8.1)\n", - "Requirement already satisfied: rapidfuzz<3,>=2.0.15 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.5.0)\n", - "Requirement already satisfied: jsonschema in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.2.0)\n", - "Requirement already satisfied: grpcio==1.43.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.43.0)\n", - "Requirement already satisfied: msrest>=0.6.21 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (0.6.21)\n", - "Requirement already satisfied: six>=1.11.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.15.0)\n", - "Requirement already satisfied: azure-common~=1.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.1.28)\n", - "Requirement already satisfied: filelock in /Users/deepset/.local/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (3.8.0)\n", - "Requirement already satisfied: numpy>=1.17 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (1.18.5)\n", - "Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (0.12.1)\n", - "Requirement already satisfied: packaging>=20.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (21.3)\n", - "Requirement already satisfied: pyyaml>=5.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (5.3.1)\n", - "Requirement already satisfied: regex!=2019.12.17 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (2020.6.8)\n", - "Requirement already satisfied: urllib3>=1.21.1 in /Users/deepset/.local/lib/python3.8/site-packages (from elasticsearch<7.11,>=7.7->farm-haystack[colab]) (1.26.11)\n", - "Requirement already satisfied: certifi in /Users/deepset/anaconda3/lib/python3.8/site-packages (from elasticsearch<7.11,>=7.7->farm-haystack[colab]) (2020.6.20)\n", - "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from huggingface-hub<0.8.0,>=0.5.0->farm-haystack[colab]) (4.1.1)\n", - "Requirement already satisfied: jarowinkler<2.0.0,>=1.2.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from rapidfuzz<3,>=2.0.15->farm-haystack[colab]) (1.2.1)\n", - "Requirement already satisfied: idna<4,>=2.5 in /Users/deepset/.local/lib/python3.8/site-packages (from requests->farm-haystack[colab]) (3.3)\n", - "Requirement already satisfied: charset-normalizer<3,>=2 in /Users/deepset/.local/lib/python3.8/site-packages (from requests->farm-haystack[colab]) (2.1.0)\n", - "Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from scikit-learn>=1.0.0->farm-haystack[colab]) (2.1.0)\n", - "Requirement already satisfied: joblib>=0.11 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from scikit-learn>=1.0.0->farm-haystack[colab]) (0.16.0)\n", - "Requirement already satisfied: sentencepiece in /Users/deepset/anaconda3/lib/python3.8/site-packages (from sentence-transformers>=2.2.0->farm-haystack[colab]) (0.1.91)\n", - "Requirement already satisfied: torchvision in /Users/deepset/anaconda3/lib/python3.8/site-packages (from sentence-transformers>=2.2.0->farm-haystack[colab]) (0.12.0)\n", - "Requirement already satisfied: setuptools in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (49.2.0.post20200714)\n", - "Requirement already satisfied: pyrsistent>=0.14.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (0.16.0)\n", - "Requirement already satisfied: attrs>=17.4.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (19.3.0)\n", - "Requirement already satisfied: python-dateutil in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (2.8.1)\n", - "Requirement already satisfied: databricks-cli>=0.8.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.11.0)\n", - "Requirement already satisfied: sqlalchemy in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.3.18)\n", - "Requirement already satisfied: cloudpickle in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.5.0)\n", - "Requirement already satisfied: docker>=3.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (4.3.0)\n", - "Requirement already satisfied: Flask in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.1.2)\n", - "Requirement already satisfied: entrypoints in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.3)\n", - "Requirement already satisfied: protobuf>=3.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.12.4)\n", - "Requirement already satisfied: querystring-parser in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.2.4)\n", - "Requirement already satisfied: gitpython>=2.1.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.1.7)\n", - "Requirement already satisfied: sqlparse in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.3.1)\n", - "Requirement already satisfied: alembic in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.4.2)\n", - "Requirement already satisfied: click>=7.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (7.1.2)\n", - "Requirement already satisfied: simplejson in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.17.2)\n", - "Requirement already satisfied: gunicorn in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (20.0.4)\n", - "Requirement already satisfied: decorator>=4.3.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from networkx->farm-haystack[colab]) (4.4.2)\n", - "Requirement already satisfied: pytz>=2017.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from pandas->farm-haystack[colab]) (2020.1)\n", - "Requirement already satisfied: backoff<2.0.0,>=1.10.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from posthog->farm-haystack[colab]) (1.11.1)\n", - "Requirement already satisfied: monotonic>=1.5 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from posthog->farm-haystack[colab]) (1.6)\n", - "Requirement already satisfied: lxml>=2.3.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from python-docx->farm-haystack[colab]) (4.5.2)\n", - "Requirement already satisfied: inflect in /Users/deepset/anaconda3/lib/python3.8/site-packages (from quantulum3->farm-haystack[colab]) (5.4.0)\n", - "Requirement already satisfied: num2words in /Users/deepset/anaconda3/lib/python3.8/site-packages (from quantulum3->farm-haystack[colab]) (0.5.10)\n", - "Requirement already satisfied: Keras>=2.2.4 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from seqeval->farm-haystack[colab]) (2.4.3)\n", - "Requirement already satisfied: tabulate>=0.7.7 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from databricks-cli>=0.8.0->mlflow->farm-haystack[colab]) (0.8.7)\n", - "Requirement already satisfied: websocket-client>=0.32.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from docker>=3.6.0->mlflow->farm-haystack[colab]) (0.57.0)\n", - "Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from gitpython>=2.1.0->mlflow->farm-haystack[colab]) (4.0.5)\n", - "Requirement already satisfied: h5py in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Keras>=2.2.4->seqeval->farm-haystack[colab]) (2.10.0)\n", - "Requirement already satisfied: isodate>=0.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (0.6.1)\n", - "Requirement already satisfied: requests-oauthlib>=0.5.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.3.1)\n", - "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from packaging>=20.0->transformers==4.20.1->farm-haystack[colab]) (2.4.7)\n", - "Requirement already satisfied: python-editor>=0.3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from alembic->mlflow->farm-haystack[colab]) (1.0.4)\n", - "Requirement already satisfied: Mako in /Users/deepset/anaconda3/lib/python3.8/site-packages (from alembic->mlflow->farm-haystack[colab]) (1.1.3)\n", - "Requirement already satisfied: itsdangerous>=0.24 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (1.1.0)\n", - "Requirement already satisfied: Werkzeug>=0.15 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (0.16.1)\n", - "Requirement already satisfied: Jinja2>=2.10.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (2.11.2)\n", - "Requirement already satisfied: docopt>=0.6.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from num2words->quantulum3->farm-haystack[colab]) (0.6.2)\n", - "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from torchvision->sentence-transformers>=2.2.0->farm-haystack[colab]) (7.2.0)\n", - "Requirement already satisfied: smmap<4,>=3.0.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->gitpython>=2.1.0->mlflow->farm-haystack[colab]) (3.0.4)\n", - "Requirement already satisfied: MarkupSafe>=0.23 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Jinja2>=2.10.1->Flask->mlflow->farm-haystack[colab]) (1.1.1)\n", - "Requirement already satisfied: oauthlib>=3.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from requests-oauthlib>=0.5.0->msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (3.2.0)\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "%%bash\n", "\n", @@ -171,35 +55,18 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Initializing the DocumentStore\n", "\n", - "A DocumentStore stores the documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` since it requires no external dependencies. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." + "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` which is the simplest DocumentStore to get started with. It requires no external dependencies and is a good option for smaller projects and debugging. However, it does not scale up so well to larger Document collections. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." ] }, { "cell_type": "code", - "execution_count": 2, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/deepset/anaconda3/lib/python3.8/site-packages/tqdm/std.py:668: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version\n", - " from pandas import Panel\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from haystack.document_stores import InMemoryDocumentStore\n", "\n", @@ -208,11 +75,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Preparing Documents\n", "\n", @@ -221,24 +84,9 @@ }, { "cell_type": "code", - "execution_count": 3, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "data": { - "text/plain": [ - "False" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from haystack.utils import fetch_archive_from_http\n", "\n", @@ -252,7718 +100,16 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "2. Use the `TextIndexingPipeline` to convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore." ] }, { "cell_type": "code", - "execution_count": 4, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2b6e6115f8d844cbbe0e48ba340f3a4d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b6a3f600560748dfb8fff380b5923359", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "992b78efdac04f0092c4a2d0ded53b97", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b3f18970fc554acb8893b2c77c7bd139", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e3dfdab6c8454eb193f0bfdce059c923", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f12af0e6935b4ed38e8d143c2734446a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8eb3665460a24d4e8e2d63b7bd2cb81c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f1459a48334444e1a8dd918fc57cdb02", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ba41ec0fc385436aa4f358739a4420a8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ab8f6f3475c74e708018b9017c2757c3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "976bf694055944dea7c74c0e071e2ee5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0ad4629f763e496eb55a9da34f6a8466", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6bfd18eaf55a49d7ad70a33b8dd034ba", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a68a8754b8194a629c20b73b62d86c64", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7d35361c50294a618d746b6365261164", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "16f3c191057844f9acbfc67377893d64", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "01264e99f990441b9f4a2977e72cb4a4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "220aa15ca0584d9ca38c925f4e4c8653", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a043e684228e474d9ba93068f08a35e8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "297b5be864964140828e3c1c51f6c0a3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "WARNING:haystack.nodes.preprocessor.preprocessor:One or more sentence found with word count higher than the split length.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "10856412f6134296a20af0e160276de3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "896b3da25fda4561ac78314ea3f38ee5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "454b53a5fa7941aa8c7c3a1f543ead5a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ac6e314b45194b71908e60212e17afac", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1ebcb622daf84b6f8a8cdd4ba6fba272", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a238138a92ea4b6d8477d53a4e7abd60", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "09d757bf8566447d87a0e70bd29563c4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e76a2022d2e341b898587d7e4dbd7ab5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7767598fb03c49d6b2e314ebf9aa8257", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "86bfd19f9b4f4e1198e792e980b3fa08", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6854db2033654d93b92dc6fffbfb1ae5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "23ee0f4dfcee4d8e954ea4e4f6024b07", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f0c5e8c95e244fa3a6d470ab17494fec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4bd3a2ca03f54826b6b2160d1c7afad3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "69c9d975e49c4eb889ed3330b4a841c4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7e4ae1f4a77846d0992b37056b0d46bd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b32d554072ec464e9c03c537b2b48d79", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8a4e8395334f4e1493db06d801613047", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "283a54a3ec5342dcbf3c5c18f8affe87", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a6737f32fc954140b9728beade5a02e1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b21ee5704e1f4c62a81f0ea9b534a140", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "aa10487a8d6f4c36abeeadb90c442b9e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "74cef7a45e1247b8883bb6662fe47fb6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "effc7fc6d6ef43c4b48df4a6ee02c000", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "eaf68d18bf0c4760a4d84ad2b3428c52", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3c88c6c86c1e49038bcb27365e31b989", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "830389d3af704422a25773e2bcc464a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3cee68a49aea478698191da098de68e3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "00571c7037934d1cbe9ba9d597a00e8a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c3f2a7593a60491dbe349b3bb9ce7a62", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0909c2a892b541fd901efe65379c7738", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a5f3dba309cc4a4b95011b2e770020f9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5ac213e0be56489cb57fb127f6b4e60a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cdc162214efc4d48aa56c058187006e7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8ae726cf3bba40b7987b25a6f64dd301", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f52e33734bf04efcbaf72ce7555443bd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ee4c60dd41cb4c52b900d1fcd3b4b447", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d939b476bcf84edbb1a8194399148fd3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "19990ed6799744abbd7bc3dbbb73de3c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6a0d25714cbc4c3e99306beb41f0bbee", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d9d8219143104fb699d1df18254db7d2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5cbfad03ad2b4b04a729a458e5b22abc", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bb69fbbf21064c3da123819aead636fb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9e81ab1cbba54d86ab6a744915cefc91", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6578451b85a749679adf0fb6c87fee1e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4db2eefaedbb4f2cbcf4e452f6d3ccf2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "59e82112bd1947628cf7dda7b240e3a9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a379ac4739f046e7afa7308e6b9e5193", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "78e12c20df2048c8a998759fca3cbb18", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a92f6fbb90164cb7a3c740261dca85ec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5d53a88f055e4c0f9139f3769e44d3f2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5d7e9be7b74946bab67a9522d37a709f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d109314969194067a2bc646457fcbec6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f72c246c75224de3b0ea29873ac80cfb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "427dfea24d5848f88d7c0d30a2405fe1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "def0e371ceea46818837d19d31320b7b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d88e56968964103a9de9b6c493ee39a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b0d6b686960640c38849ea369bea97a1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "fe0a4653015d431e858f542854ebaca0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ba657d4f52b24ba6a9be056160dbd950", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4734370e764b4ff883913681a09b15a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9e8c3a3e4ba94331884e98699bbeecff", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "40e127de69874b81b2d4a8056651faa0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "84c4f7b6a0cb42bfbd474524bf552f6e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "be166a6350e04cf998b205fc73ffbd80", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f8aaa4596f62416b9a259d4d2bf9b96b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a1c19009bbec4cdb912bfa75e7cc4b19", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c19c94d0831d46b9aed02126a9976f97", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "828720f91e78497b9729d44e6eb120ec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5fda23fb35d64aadb6e972ca0787c2ba", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4b750f8ec618499eb62a9102082f10b0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d6ad87d5724d41ac9e9b05d29935062d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cc5ed2f7d31a4fa9982582e51246cf36", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4e7df01f2bdf46feaef9e4554969f6da", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b965db4594b64ec5a2c47e484f01bd6a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a2b9d365c7b94bee85c2a8e81d647a41", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4054ae5a111d4c38a86d69341ba1ef17", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c06beaf6a5504d418e2c2a7f66d9c2fb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "75502402f7374f8da45db944afc55643", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f8863e05ac254d77916257d3c819140d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5dca4734c44044ce90232fcdd3d49685", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f1f69d47a667479fafe71419dfad3eff", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3cf0f1ba829745c8b4aa14a63b9b9d92", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "95880c283d7442758c77ba5874a6a6af", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bff1d5ee9d804134adf4fdfdc440a98f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "905fce61fb084a518bdc81e3f28f179d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "815156c1dfb7463ba5e16c97d477ac3c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9f915e6a26bc4844a9711c29e9d08501", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ea5fc6f63117456cbd0902223a14ccc3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "045ec8738307493bb762d3a4230d61d2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1bc404e8dc7c4b83be47c436e7ac5feb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2769981bad92453a9b59eecfb74ceeef", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "74468687151a4b4baf58cee7e083c90a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8a152b5c6a2c404ca99773a1d2d3c793", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c21784c6573948e8b0eec85ef9aed335", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "41bcfc8eecd94d189a05c5aeac56600e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b2745bd644be4b63a807a221577a0c1d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9c6437ba67d2470e89dcd8f51eaff916", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "05153defcac54387a8ce869ab6d3bf81", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7e53acf53fe2422daeee1cd81f1d43a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "576ee4fe01a24f10b305468b263a7d2a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e65f631219b24906bfac49f856ed92e7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f1e9ac7e1c764dcc8c8dd8beb756241c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bbb6e7ffd7f1434d90558b555b64de5e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5cced9e83ea746dd9c5e078238709887", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "523b2f3f88e041b79d5c244595b35533", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "620476d242104125bf92962a4742f830", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cdc0bf772d7242f0af38e2db6243c9da", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "60bd5030aeda48b39bd0db1926ab046a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ff2930a01bdd46beb7e5a2f2bbf0ed1c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "12bcee0ed9124e9a95ba1f37cae43a2a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "435a2775da3141dabbcc91ca5af2081c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8fd6e9af6fe3415f952bbcd749d175ee", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c08f98435c624bd1bf53867731dea82a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e7684994cab247919988f5a32ac5c60c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3aebc07bd6f74ac0926268004dfeea3c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ac7af9e4ed844b10b0c0f6f3887faeae", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4c4169c3b9894edca34070ac7270241f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1fcda9244eda4565803163f402e0a0a0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ad7816b00c8342b68a5b73ec31cf14c3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c9a9ddb6170b4c8ba92a5afd5739080e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "122c4f3f72cd44b69a6bcd4abd4fb8ab", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cfeefb5b8bde40d391c45394acff0ff2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cafc52829a084666bbd4a736a62aa495", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5fa31b37abe14fe69311f21341107eff", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3eb8548c46d54413866b0deed4c09712", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "475315f0cc394626a90ad9bad7a8bbea", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "32c3ee9fc45e4046baf8b61f8dcf0b05", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8cde52f095104854b48949e5b531cf39", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a33f4b72b437420f8f918a73470df6ea", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e76cb1bcc673424296583851df2a9ee4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0e5406d4ba724b74a3aa6c5b3564152d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "857b183610ba427398e9907659fd17a3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d77f5a30dd3a4ffa976820562a666294", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d9316ae16de14337b9b96ea332c26585", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c4b9b613c31d46b39ad8045290b57972", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "826e2dd310964b0782b1cfcb380bc505", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "efe3829f14a149a9a4cc330c08b64f8a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "83195f0da5534769a8c42a28b45819a8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5cfbe19c4dae47a2a63d82307fcc9284", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c0c43b7eaf79407fb0ac42a099e804ca", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9b0cfffecb674aa9baefd66ae9e12e8a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5e1342341c314641a027e3a57bc19fd1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1533fe7b307641a392c31bc1bcace052", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "14db526b0fb8424ab529381851af7cca", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d32089c88ffb404b8afb70a317e53da8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "43b69f0a15894befbee21ba9ed858344", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "71dbb964c9df483b853d6d91bc56bc17", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "eed56a5c51ef4c82bd1333582a66ac04", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9fc7625bcba74de5bf175f36f4226464", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0bb1b6a130da4f898205d8880b546a6e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "10896bb94564424db75cacf397e1e82a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "604ff91fda7645d78b2cc7372468fc05", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7e02d370e99441f78cb349aba79fd7cd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1a7c59a2e60d41c9b3c9feb26c411026", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7b1c94e95ebb4e0a853c1d399bb4e968", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "36a14a01aa2448cc8ccabfc48eff5aec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0bbeb226d4cf40cfb042cf5e1db484e0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "fc1d2b3093ad4e6ea453868a22b62571", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0f4063209e4348dbaa809afb36cb559e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "860e2dedd4b147b393648446c455f12e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9dec4f04e42c4f7a88b6f45e92d9ec22", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0d779476d8b547b59db79e4ea4fdf33c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a92f7810c9cb4820875917a4b62a6a5d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "02236d20a9404be5bc1a3af08b351c03", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "11b1bd3ea3fc4f55bd643a1848a6431b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f53e29a3fc8a4fd4bf787db4456727b0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8ae21a7791dc41e6b9871c9d80c6b986", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b40ca8d7ca24453c97501baebfffc42b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9eb936ada95e4d0f9647bb3e40a07d89", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "97b2fbcd5841478cb0a8682a71600916", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bd6e3bd773b24b6e97c477d7ac522da2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6eb7deff761146059845e699418973d5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "989f4959e86b4db28fbab80710fd7f4e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a1498f58321e45a0a701b3be91063435", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bc877055e5564bb5920e4d8ac0e93959", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5c452ab0838043ac89609a3461d2a697", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "25594d1e7f214cd5b85fd08f528b3af9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5c73160c631c4704a88e558ee126e087", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8f5bb0d78dc2466ca2bf95c5afdd57a9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6417220f23c6480facc7225f4a985991", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f826270063824b76ae84ab6187e7db15", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6f9adf7ef2d1491880b208b5527f78ed", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "90ed90fcb1f84a30800d0d49c8288ce1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "884a9ffc47fb4b4bac3e4f5c7948c3df", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7b573f393c82491da2fb8b391ca6db71", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9784ae34d4754cb08cc9e2e381bd7899", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3663db3ad20c48b8827818eff09235fb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "07de5589c5b7465d8565d66953abc462", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1ee6ab81a92843719085e09910634778", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c3f64fc979b94a5a8c2991faf0762988", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7cb3f4ba8f91433fb98741aa2da305f2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7ac9623a6e0d4c47b838e5a238620838", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "118b88a9e0be42758e3060eccf01c4cf", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "45ad910aee594f68b044993a74061265", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "38d4d783a83942d6abeb87de83a0b9e8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "efd2bc4d56da46a1961dd2f0e42e1474", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f9addc6cbbce4fda84769115af9f1b04", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "877257b7a9a8450cb693e037ee48a677", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5572f58b8ed843b6a13c959000a8221c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bef459086dc146708af4a63d20af706c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "562a868eb9bd434aa9ac5b6869e90b5e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "407c4f0fdcd144b282af537bd9cd8df0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9f4cbd2f9a72402a84a1deac7a420d15", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "37d0b08a8e91426d805f00470a1b95a1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2faf03e7ddf14e3f945799a39e28cecb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6d4eca8ebb0148b5bc6d168108c973fd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5b4ff602c2a146b9a89807070c5815c2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "fab6f26bcf3044c2972e8c4544de33e0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d73d2e43efba4caa93a794c3d6f28d69", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ddad575ba24b4608b8ded0c1761987e1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "aa28cc110e144a18aeb207c3aa99e32b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "62852ebc55774c80bf22bb2702a137f4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "556870c67137403fba9237d2ca8b5cc3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c9edc88ea6aa4a22be276a7b63e15de8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "016be437e82c49c9a689c41fe63b9b49", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "102a6d310ba745a2b369f5bfa2e37f19", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "fdb8e07ce5fd4c07b20150f914a32414", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d3476a3744424ab891305bbf2dc4aaa4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a4b065a1c5914fa496ff2a9c70bb2f1a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c285b1f410684e9b8d7881f12417908c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ec653078756e464ca7f82b8a41d80431", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "640f3f17956546a88f1bf924d08a7009", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6c6beef932f84670ae4887876de6f6b3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "11e59dd7e8d04487b3de6525c4c678da", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ce6c709c78634562b70093499b1a32b7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5513d3fd992f4b599431e1f19b2d3751", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cb270ada221f4b7c82da8ac482f80f1b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5540a17b0f08419b83b7742ca5734a40", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ccdcf08f5ffb4f048d584155c9208c46", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2a96a404739c48ea920aeab664f1dfb4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e1f8921a334c41959ee7d20fab0022a9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "307ff91602964206886e973a672a9525", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bc903e9f1d464e4090269ca871b01588", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "79e9e049bb974693a00bc9c4bb6993dd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "96f1d42287254b46a66f09769e6244b5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9c45fa2ff6ba41bdbb922ff65ccf31b2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6901f44a746e4371b62edafd38839c8c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ba40172218f042ca832016bf4b05c85f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "db7e001bae74412d8b8f397b89c4013f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d188bca23d24bf69376b1bf3a720bb7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e09010af0f7743219f86cca5c47eb203", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d0b95ec9ef204158812cd9a708deabdd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8dda73fc405947f1b97e355257e0acd1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d63098eae074310afa03c54a2f60786", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d017d045c4a24d18b123fe36c2a132b7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b3e9db2c617e4ceaacc4eac6a43aa259", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d774881459a405eaa826234afc93c72", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f728d8d45659428288b0a814e6ad6e6d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8ba0797e0ae642b2a7a91b60a549d76e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f483f71a536c4c7b90882ef33ef4ba02", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2d2f098ff9684140a0a8e6c8f3880781", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ad9134cb38d1439092a1575c6b6788ba", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d2bb2a37b70243ffb2991d0d421d973d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f4d67c570bbf4bb7ac4d9e0fb3097e0b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1f7cff9dbb7b47fd91286eba5ededfdc", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c7e1aab698364ac0996c3de4a4616c85", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6c99a4ba9d6c4c45a378cc2967623689", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b194fb0823b447d6870b0e51989ee0d0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1c19e2ff585541c1b43239eef6c60f69", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4ced3e9057874463ac1441459af1fce0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "eba97bfb81bf4182bf8190b2740d553b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d03848eb827746518b1ab993d9aec266", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "194381a561f6473c848fa7e85f6ad1e8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c2e82e90e41e4a788d5bfd75f85d3fae", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a1eb6e5fa87445a39af3b64db5aa6429", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "41a9d7f532924c3eab56ed0eccbaabd5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cfd69993799c43c6a8f2273be1845f44", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a63ab0236263409bb7746b5829585d14", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6e3e5b0ee62a46beb6002fcbde48fec5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ff2db7d763444c1f910ba43b3286e819", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5585717fa99843da88413e61d4443531", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3e02c58d21bd4c5486a1c510dbfff2ee", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "eb5605b26c1e43c291d7e5d5bc26d3d3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "59e87bcb2964422595246d42e245676a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a326ca6fb16b4824939c615db63b733f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "46fd9391490e4df3bc8437aa2baac07b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3068f59ca00c4672b382d6bd06e8a0c4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f4dac192c1df4bd3a180690e7c3a50a3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1fe9caa7f10a42639226ac10185a3e98", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "60851936ffa84085a0a11e6b1f600b62", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8fa9753c9f3a4c88bc826c6f186083e0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d198e81c9dc4db988e985c3aa8df7f9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c6c14540645a48ef97f5da3b33e9378b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6d5e60e36d174930a7a47770fb5d8534", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "91ceaf03b3294e9e8d9b5ff384f0514b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4ead1ac1d9d54b3e8e307adbec97c397", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ad2fcca881b2455990b891c9519ebca1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d2898f46eec54150af3b59db4cf99807", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "299cd1e70c7e46ffb41e42e445e12126", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9487ddc4851b4f4a8e92218fdb5799a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "11895923be6e4b5cbc44cc0fa40a2599", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "726e8a3380e3455483c9527d59a46da9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d7a93c2f2b3f4361b6bf0eec5a054aa8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a253cae7f6194d2ca202177f70137031", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3868383dedfc43ba9591abc0cee848ae", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8f82bf14ce3d40e2beb106bc597dcbaf", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9eb7cbc188234f53bfbf6f7c07ce7567", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5e8308b29308407c8c2a8f3eeec15d5c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9b64f5b8a239425fadd393a1cd061e3d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c49d4ac1232441adb0b9ddd8939c549c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "46a200b12dc247d4899691c6ec9bdfce", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "49cbfcdb4be64ca6a1da0f13e88adfc7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7bcd972a904b4cbf9883fa4300c7261a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5ebf88e02a324d50b23beadd0937a893", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bf801dda4d50433796322a378ce46d16", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9b46a132ef5f41ba991b46e5f5be4a5c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a3946263b7d84f5fb01ff1cbffa52401", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0ee8c5ad2e8a41e7a077be6717ed1e63", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7bb79be0932c4deea6a29cf2ec22ddbb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d9f22271c6584acca85b5f2b9c3b221c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "64dcc2b0ac7541b2bdd08624cf2f3c26", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3f4a3d8f08c34161b4f9a0229058df40", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5d67c29a9fbb46a18781d2ddcf14acec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6b1d75e78e9f4423b3dee2908410fcf9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2fc1e2d39b414b8aad28c040c79320c9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ede5db7dba4d413f9d698bf7bb26751f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "57d5acf4718a4bfab85e920b9bc4e793", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "05786ddf80fb4b0fae51239f0904b708", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "476b27b6911f4411aa8aa46c43efc69f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a4969352fc8848b8b879da35b4c5c352", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "470014e1c35d49bfbb9c9767bd47b7e6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "16f31c4e5f95486eb133401b4af1d99a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8bf70dc246464169977bb4a1cbe93919", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "12f4affc63ad41cf88dcd8bd2f8b1d5b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b5ae2d460c10421a8b0a55e796924364", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "553580e0b36040b2bae947a13489ddd6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b0c42b8b54b4496c95f7603ada33d179", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7986968b3bd2443f90a94cd3c176e84f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7ef20f48d5194d6189ff60e4f9e71b15", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bfeee8455a194a7992ae0897e15b83f7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "14ecae5d228b42a5b20df0463d66a641", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b1d341e2521649699103a4d06f384f25", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "900b0de941b54459a997e4b186d5e1d9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e07846ca662548b28e7b5d3ddec23b08", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d65114f95d8c4da19be3867a2017d91e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0a327239827f462389c03c72ea4ac5ad", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "91c709da9c8c4bf48264ee73dc4f4da4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "110f021d4b7546c7a67257bbb76da6c2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6694a2450e8b4da48187819da7c11c6a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3d63dd1b02ec411abf24764cdf6f8c81", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5f1e523dcf2e4c27a1b99ae6ab882640", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "45da6d2e38f84b259ad760267029f10e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6dc70490c21441a0967527b28b5786a3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c108984b1d9b48238255cbc6c35e990b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f106336973bd4f55af4c1e5457e686e1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import os\n", "# from haystack.pipeline import TextIndexingPipeline\n", @@ -7971,17 +117,13 @@ "\n", "files_to_index = [doc_dir + \"/\" + f for f in os.listdir(doc_dir)]\n", "indexing_pipeline = TextIndexingPipeline(document_store)\n", - "indexing_pipeline.run_batch(files_to_index)\n", + "indexing_pipeline.run_batch(file_paths=files_to_index)\n", "\n" ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way.\n", "\n", @@ -7990,25 +132,17 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Initializing the Retriever\n", "\n", - "Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." + "Retrievers sift through all the Documents and return only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from haystack.nodes import TFIDFRetriever\n", @@ -8018,11 +152,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Initializing the Reader\n", "\n", @@ -8032,11 +162,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from haystack.nodes import FARMReader\n", @@ -8046,11 +172,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Creating the Retriever-Reader Pipeline\n", "\n", @@ -8060,11 +182,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from haystack.pipelines import ExtractiveQAPipeline\n", @@ -8074,11 +192,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Asking a Question\n", "\n", @@ -8088,11 +202,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "prediction = pipe.run(\n", @@ -8106,11 +216,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "Here are some questions you could try out:\n", "- Who is the father of Arya Stark?\n", @@ -8120,11 +226,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "2. The answers returned by the pipeline can be printed out directly:" ] @@ -8132,11 +234,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from pprint import pprint\n", @@ -8146,11 +244,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "3. Simplify the printed answers:" ] @@ -8158,11 +252,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from haystack.utils import print_answers\n", @@ -8175,22 +265,14 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "And there you have it! Congratulations on building your first machine learning based question answering system!" ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## About us\n", "\n", @@ -8208,24 +290,6 @@ "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { diff --git a/new_tutorials/02_qa_pipeline.ipynb b/new_tutorials/02_qa_pipeline.ipynb index a37b9f4c..8c6ee285 100644 --- a/new_tutorials/02_qa_pipeline.ipynb +++ b/new_tutorials/02_qa_pipeline.ipynb @@ -2,11 +2,7 @@ "cells": [ { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "# The Building Blocks of a Scalable Question Answering System\n", "\n", @@ -26,11 +22,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "\n", "## Preparing the Colab Environment\n", @@ -42,11 +34,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Installing Haystack\n", "\n", @@ -55,113 +43,9 @@ }, { "cell_type": "code", - "execution_count": 1, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: pip in /Users/deepset/anaconda3/lib/python3.8/site-packages (22.3)\n", - "Requirement already satisfied: farm-haystack[colab] in /Users/deepset/Code/haystack (1.6.1rc0)\n", - "Requirement already satisfied: torch<1.13,>1.9 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.11.0)\n", - "Requirement already satisfied: requests in /Users/deepset/.local/lib/python3.8/site-packages (from farm-haystack[colab]) (2.28.1)\n", - "Requirement already satisfied: pydantic in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.6.1)\n", - "Requirement already satisfied: transformers==4.20.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (4.20.1)\n", - "Requirement already satisfied: nltk in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.5)\n", - "Requirement already satisfied: pandas in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.5)\n", - "Requirement already satisfied: dill in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.3.2)\n", - "Requirement already satisfied: tqdm in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (4.47.0)\n", - "Requirement already satisfied: networkx in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.4)\n", - "Requirement already satisfied: mmh3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.0.0)\n", - "Requirement already satisfied: quantulum3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.7.10)\n", - "Requirement already satisfied: posthog in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.4.5)\n", - "Requirement already satisfied: azure-ai-formrecognizer==3.2.0b2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.2.0b2)\n", - "Requirement already satisfied: azure-core<1.23 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.22.1)\n", - "Requirement already satisfied: huggingface-hub<0.8.0,>=0.5.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.7.0)\n", - "Requirement already satisfied: more_itertools in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (8.4.0)\n", - "Requirement already satisfied: python-docx in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.8.10)\n", - "Requirement already satisfied: langdetect in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.8)\n", - "Requirement already satisfied: tika in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.24)\n", - "Requirement already satisfied: sentence-transformers>=2.2.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.2.0)\n", - "Requirement already satisfied: scipy>=1.3.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.5.0)\n", - "Requirement already satisfied: scikit-learn>=1.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.2)\n", - "Requirement already satisfied: seqeval in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (0.0.12)\n", - "Requirement already satisfied: mlflow in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.0.0)\n", - "Requirement already satisfied: elasticsearch<7.11,>=7.7 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (7.8.1)\n", - "Requirement already satisfied: elastic-apm in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (5.8.1)\n", - "Requirement already satisfied: rapidfuzz<3,>=2.0.15 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (2.5.0)\n", - "Requirement already satisfied: jsonschema in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (3.2.0)\n", - "Requirement already satisfied: grpcio==1.43.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from farm-haystack[colab]) (1.43.0)\n", - "Requirement already satisfied: msrest>=0.6.21 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (0.6.21)\n", - "Requirement already satisfied: six>=1.11.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.15.0)\n", - "Requirement already satisfied: azure-common~=1.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.1.28)\n", - "Requirement already satisfied: filelock in /Users/deepset/.local/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (3.8.0)\n", - "Requirement already satisfied: numpy>=1.17 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (1.18.5)\n", - "Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (0.12.1)\n", - "Requirement already satisfied: packaging>=20.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (21.3)\n", - "Requirement already satisfied: pyyaml>=5.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (5.3.1)\n", - "Requirement already satisfied: regex!=2019.12.17 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from transformers==4.20.1->farm-haystack[colab]) (2020.6.8)\n", - "Requirement already satisfied: urllib3>=1.21.1 in /Users/deepset/.local/lib/python3.8/site-packages (from elasticsearch<7.11,>=7.7->farm-haystack[colab]) (1.26.11)\n", - "Requirement already satisfied: certifi in /Users/deepset/anaconda3/lib/python3.8/site-packages (from elasticsearch<7.11,>=7.7->farm-haystack[colab]) (2020.6.20)\n", - "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from huggingface-hub<0.8.0,>=0.5.0->farm-haystack[colab]) (4.1.1)\n", - "Requirement already satisfied: jarowinkler<2.0.0,>=1.2.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from rapidfuzz<3,>=2.0.15->farm-haystack[colab]) (1.2.1)\n", - "Requirement already satisfied: idna<4,>=2.5 in /Users/deepset/.local/lib/python3.8/site-packages (from requests->farm-haystack[colab]) (3.3)\n", - "Requirement already satisfied: charset-normalizer<3,>=2 in /Users/deepset/.local/lib/python3.8/site-packages (from requests->farm-haystack[colab]) (2.1.0)\n", - "Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from scikit-learn>=1.0.0->farm-haystack[colab]) (2.1.0)\n", - "Requirement already satisfied: joblib>=0.11 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from scikit-learn>=1.0.0->farm-haystack[colab]) (0.16.0)\n", - "Requirement already satisfied: sentencepiece in /Users/deepset/anaconda3/lib/python3.8/site-packages (from sentence-transformers>=2.2.0->farm-haystack[colab]) (0.1.91)\n", - "Requirement already satisfied: torchvision in /Users/deepset/anaconda3/lib/python3.8/site-packages (from sentence-transformers>=2.2.0->farm-haystack[colab]) (0.12.0)\n", - "Requirement already satisfied: setuptools in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (49.2.0.post20200714)\n", - "Requirement already satisfied: pyrsistent>=0.14.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (0.16.0)\n", - "Requirement already satisfied: attrs>=17.4.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from jsonschema->farm-haystack[colab]) (19.3.0)\n", - "Requirement already satisfied: python-dateutil in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (2.8.1)\n", - "Requirement already satisfied: databricks-cli>=0.8.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.11.0)\n", - "Requirement already satisfied: sqlalchemy in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.3.18)\n", - "Requirement already satisfied: cloudpickle in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.5.0)\n", - "Requirement already satisfied: docker>=3.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (4.3.0)\n", - "Requirement already satisfied: Flask in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.1.2)\n", - "Requirement already satisfied: entrypoints in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.3)\n", - "Requirement already satisfied: protobuf>=3.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.12.4)\n", - "Requirement already satisfied: querystring-parser in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.2.4)\n", - "Requirement already satisfied: gitpython>=2.1.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.1.7)\n", - "Requirement already satisfied: sqlparse in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (0.3.1)\n", - "Requirement already satisfied: alembic in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (1.4.2)\n", - "Requirement already satisfied: click>=7.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (7.1.2)\n", - "Requirement already satisfied: simplejson in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (3.17.2)\n", - "Requirement already satisfied: gunicorn in /Users/deepset/anaconda3/lib/python3.8/site-packages (from mlflow->farm-haystack[colab]) (20.0.4)\n", - "Requirement already satisfied: decorator>=4.3.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from networkx->farm-haystack[colab]) (4.4.2)\n", - "Requirement already satisfied: pytz>=2017.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from pandas->farm-haystack[colab]) (2020.1)\n", - "Requirement already satisfied: backoff<2.0.0,>=1.10.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from posthog->farm-haystack[colab]) (1.11.1)\n", - "Requirement already satisfied: monotonic>=1.5 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from posthog->farm-haystack[colab]) (1.6)\n", - "Requirement already satisfied: lxml>=2.3.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from python-docx->farm-haystack[colab]) (4.5.2)\n", - "Requirement already satisfied: inflect in /Users/deepset/anaconda3/lib/python3.8/site-packages (from quantulum3->farm-haystack[colab]) (5.4.0)\n", - "Requirement already satisfied: num2words in /Users/deepset/anaconda3/lib/python3.8/site-packages (from quantulum3->farm-haystack[colab]) (0.5.10)\n", - "Requirement already satisfied: Keras>=2.2.4 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from seqeval->farm-haystack[colab]) (2.4.3)\n", - "Requirement already satisfied: tabulate>=0.7.7 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from databricks-cli>=0.8.0->mlflow->farm-haystack[colab]) (0.8.7)\n", - "Requirement already satisfied: websocket-client>=0.32.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from docker>=3.6.0->mlflow->farm-haystack[colab]) (0.57.0)\n", - "Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from gitpython>=2.1.0->mlflow->farm-haystack[colab]) (4.0.5)\n", - "Requirement already satisfied: h5py in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Keras>=2.2.4->seqeval->farm-haystack[colab]) (2.10.0)\n", - "Requirement already satisfied: isodate>=0.6.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (0.6.1)\n", - "Requirement already satisfied: requests-oauthlib>=0.5.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (1.3.1)\n", - "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from packaging>=20.0->transformers==4.20.1->farm-haystack[colab]) (2.4.7)\n", - "Requirement already satisfied: python-editor>=0.3 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from alembic->mlflow->farm-haystack[colab]) (1.0.4)\n", - "Requirement already satisfied: Mako in /Users/deepset/anaconda3/lib/python3.8/site-packages (from alembic->mlflow->farm-haystack[colab]) (1.1.3)\n", - "Requirement already satisfied: itsdangerous>=0.24 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (1.1.0)\n", - "Requirement already satisfied: Werkzeug>=0.15 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (0.16.1)\n", - "Requirement already satisfied: Jinja2>=2.10.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Flask->mlflow->farm-haystack[colab]) (2.11.2)\n", - "Requirement already satisfied: docopt>=0.6.2 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from num2words->quantulum3->farm-haystack[colab]) (0.6.2)\n", - "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from torchvision->sentence-transformers>=2.2.0->farm-haystack[colab]) (7.2.0)\n", - "Requirement already satisfied: smmap<4,>=3.0.1 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->gitpython>=2.1.0->mlflow->farm-haystack[colab]) (3.0.4)\n", - "Requirement already satisfied: MarkupSafe>=0.23 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from Jinja2>=2.10.1->Flask->mlflow->farm-haystack[colab]) (1.1.1)\n", - "Requirement already satisfied: oauthlib>=3.0.0 in /Users/deepset/anaconda3/lib/python3.8/site-packages (from requests-oauthlib>=0.5.0->msrest>=0.6.21->azure-ai-formrecognizer==3.2.0b2->farm-haystack[colab]) (3.2.0)\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "%%bash\n", "\n", @@ -171,24 +55,16 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Initializing the ElasticsearchDocumentStore\n", "\n", - "A DocumentStore stores the documents that the question answering system uses to find answers to your questions. Here we are using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs indepedently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." + "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs indepedently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "1. Download, extract and set the permissions for the Elasticsearch installation image." ] @@ -196,11 +72,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "%%bash\n", @@ -212,23 +84,15 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "2. Start the server." ] }, { "cell_type": "code", - "execution_count": 2, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "execution_count": null, + "metadata": {}, "outputs": [], "source": [ "%%bash --bg\n", @@ -238,34 +102,22 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ - "If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done manually, or using our `launch_es()` utility function." + "If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done manually, or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api#module-doc_store) utility function." ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "3. Wait 30s to ensure that the server has fully started up." ] }, { "cell_type": "code", - "execution_count": 3, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "execution_count": null, + "metadata": {}, "outputs": [], "source": [ "import time\n", @@ -274,33 +126,16 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ - "4. Initialize the `ElasticsearchDocumentStore`.\n" + "4. Initialize the [`ElasticsearchDocumentStore`](https://docs.haystack.deepset.ai/reference/document-store-api#module-elasticsearch).\n" ] }, { "cell_type": "code", - "execution_count": 2, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/deepset/anaconda3/lib/python3.8/site-packages/tqdm/std.py:668: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version\n", - " from pandas import Panel\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import os\n", "from haystack.document_stores import ElasticsearchDocumentStore\n", @@ -318,11 +153,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Indexing Documents with a Pipeline\n", "\n", @@ -333,35 +164,16 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files." ] }, { "cell_type": "code", - "execution_count": 3, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "data": { - "text/plain": [ - "False" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from haystack.utils import fetch_archive_from_http\n", "\n", @@ -375,7718 +187,16 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "2. Initialize the pipeline, TextConverter and PreProcessor." ] }, { "cell_type": "code", - "execution_count": 4, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2b6e6115f8d844cbbe0e48ba340f3a4d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b6a3f600560748dfb8fff380b5923359", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "992b78efdac04f0092c4a2d0ded53b97", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b3f18970fc554acb8893b2c77c7bd139", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e3dfdab6c8454eb193f0bfdce059c923", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f12af0e6935b4ed38e8d143c2734446a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8eb3665460a24d4e8e2d63b7bd2cb81c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f1459a48334444e1a8dd918fc57cdb02", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ba41ec0fc385436aa4f358739a4420a8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ab8f6f3475c74e708018b9017c2757c3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "976bf694055944dea7c74c0e071e2ee5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0ad4629f763e496eb55a9da34f6a8466", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6bfd18eaf55a49d7ad70a33b8dd034ba", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a68a8754b8194a629c20b73b62d86c64", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7d35361c50294a618d746b6365261164", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "16f3c191057844f9acbfc67377893d64", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "01264e99f990441b9f4a2977e72cb4a4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "220aa15ca0584d9ca38c925f4e4c8653", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a043e684228e474d9ba93068f08a35e8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "297b5be864964140828e3c1c51f6c0a3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "WARNING:haystack.nodes.preprocessor.preprocessor:One or more sentence found with word count higher than the split length.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "10856412f6134296a20af0e160276de3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "896b3da25fda4561ac78314ea3f38ee5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "454b53a5fa7941aa8c7c3a1f543ead5a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ac6e314b45194b71908e60212e17afac", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1ebcb622daf84b6f8a8cdd4ba6fba272", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a238138a92ea4b6d8477d53a4e7abd60", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "09d757bf8566447d87a0e70bd29563c4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e76a2022d2e341b898587d7e4dbd7ab5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7767598fb03c49d6b2e314ebf9aa8257", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "86bfd19f9b4f4e1198e792e980b3fa08", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6854db2033654d93b92dc6fffbfb1ae5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "23ee0f4dfcee4d8e954ea4e4f6024b07", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f0c5e8c95e244fa3a6d470ab17494fec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4bd3a2ca03f54826b6b2160d1c7afad3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "69c9d975e49c4eb889ed3330b4a841c4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7e4ae1f4a77846d0992b37056b0d46bd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b32d554072ec464e9c03c537b2b48d79", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8a4e8395334f4e1493db06d801613047", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "283a54a3ec5342dcbf3c5c18f8affe87", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a6737f32fc954140b9728beade5a02e1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b21ee5704e1f4c62a81f0ea9b534a140", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "aa10487a8d6f4c36abeeadb90c442b9e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "74cef7a45e1247b8883bb6662fe47fb6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "effc7fc6d6ef43c4b48df4a6ee02c000", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "eaf68d18bf0c4760a4d84ad2b3428c52", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3c88c6c86c1e49038bcb27365e31b989", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "830389d3af704422a25773e2bcc464a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3cee68a49aea478698191da098de68e3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "00571c7037934d1cbe9ba9d597a00e8a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c3f2a7593a60491dbe349b3bb9ce7a62", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0909c2a892b541fd901efe65379c7738", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a5f3dba309cc4a4b95011b2e770020f9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5ac213e0be56489cb57fb127f6b4e60a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cdc162214efc4d48aa56c058187006e7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8ae726cf3bba40b7987b25a6f64dd301", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f52e33734bf04efcbaf72ce7555443bd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ee4c60dd41cb4c52b900d1fcd3b4b447", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d939b476bcf84edbb1a8194399148fd3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "19990ed6799744abbd7bc3dbbb73de3c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6a0d25714cbc4c3e99306beb41f0bbee", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d9d8219143104fb699d1df18254db7d2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5cbfad03ad2b4b04a729a458e5b22abc", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bb69fbbf21064c3da123819aead636fb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9e81ab1cbba54d86ab6a744915cefc91", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6578451b85a749679adf0fb6c87fee1e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4db2eefaedbb4f2cbcf4e452f6d3ccf2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "59e82112bd1947628cf7dda7b240e3a9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a379ac4739f046e7afa7308e6b9e5193", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "78e12c20df2048c8a998759fca3cbb18", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a92f6fbb90164cb7a3c740261dca85ec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5d53a88f055e4c0f9139f3769e44d3f2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5d7e9be7b74946bab67a9522d37a709f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d109314969194067a2bc646457fcbec6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f72c246c75224de3b0ea29873ac80cfb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "427dfea24d5848f88d7c0d30a2405fe1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "def0e371ceea46818837d19d31320b7b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d88e56968964103a9de9b6c493ee39a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b0d6b686960640c38849ea369bea97a1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "fe0a4653015d431e858f542854ebaca0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ba657d4f52b24ba6a9be056160dbd950", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4734370e764b4ff883913681a09b15a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9e8c3a3e4ba94331884e98699bbeecff", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "40e127de69874b81b2d4a8056651faa0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "84c4f7b6a0cb42bfbd474524bf552f6e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "be166a6350e04cf998b205fc73ffbd80", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f8aaa4596f62416b9a259d4d2bf9b96b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a1c19009bbec4cdb912bfa75e7cc4b19", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c19c94d0831d46b9aed02126a9976f97", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "828720f91e78497b9729d44e6eb120ec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5fda23fb35d64aadb6e972ca0787c2ba", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4b750f8ec618499eb62a9102082f10b0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d6ad87d5724d41ac9e9b05d29935062d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cc5ed2f7d31a4fa9982582e51246cf36", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4e7df01f2bdf46feaef9e4554969f6da", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b965db4594b64ec5a2c47e484f01bd6a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a2b9d365c7b94bee85c2a8e81d647a41", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4054ae5a111d4c38a86d69341ba1ef17", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c06beaf6a5504d418e2c2a7f66d9c2fb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "75502402f7374f8da45db944afc55643", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f8863e05ac254d77916257d3c819140d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5dca4734c44044ce90232fcdd3d49685", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f1f69d47a667479fafe71419dfad3eff", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3cf0f1ba829745c8b4aa14a63b9b9d92", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "95880c283d7442758c77ba5874a6a6af", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bff1d5ee9d804134adf4fdfdc440a98f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "905fce61fb084a518bdc81e3f28f179d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "815156c1dfb7463ba5e16c97d477ac3c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9f915e6a26bc4844a9711c29e9d08501", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ea5fc6f63117456cbd0902223a14ccc3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "045ec8738307493bb762d3a4230d61d2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1bc404e8dc7c4b83be47c436e7ac5feb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2769981bad92453a9b59eecfb74ceeef", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "74468687151a4b4baf58cee7e083c90a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8a152b5c6a2c404ca99773a1d2d3c793", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c21784c6573948e8b0eec85ef9aed335", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "41bcfc8eecd94d189a05c5aeac56600e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b2745bd644be4b63a807a221577a0c1d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9c6437ba67d2470e89dcd8f51eaff916", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "05153defcac54387a8ce869ab6d3bf81", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7e53acf53fe2422daeee1cd81f1d43a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "576ee4fe01a24f10b305468b263a7d2a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e65f631219b24906bfac49f856ed92e7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f1e9ac7e1c764dcc8c8dd8beb756241c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bbb6e7ffd7f1434d90558b555b64de5e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5cced9e83ea746dd9c5e078238709887", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "523b2f3f88e041b79d5c244595b35533", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "620476d242104125bf92962a4742f830", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cdc0bf772d7242f0af38e2db6243c9da", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "60bd5030aeda48b39bd0db1926ab046a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ff2930a01bdd46beb7e5a2f2bbf0ed1c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "12bcee0ed9124e9a95ba1f37cae43a2a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "435a2775da3141dabbcc91ca5af2081c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8fd6e9af6fe3415f952bbcd749d175ee", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c08f98435c624bd1bf53867731dea82a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e7684994cab247919988f5a32ac5c60c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3aebc07bd6f74ac0926268004dfeea3c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ac7af9e4ed844b10b0c0f6f3887faeae", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4c4169c3b9894edca34070ac7270241f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1fcda9244eda4565803163f402e0a0a0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ad7816b00c8342b68a5b73ec31cf14c3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c9a9ddb6170b4c8ba92a5afd5739080e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "122c4f3f72cd44b69a6bcd4abd4fb8ab", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cfeefb5b8bde40d391c45394acff0ff2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cafc52829a084666bbd4a736a62aa495", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5fa31b37abe14fe69311f21341107eff", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3eb8548c46d54413866b0deed4c09712", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "475315f0cc394626a90ad9bad7a8bbea", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "32c3ee9fc45e4046baf8b61f8dcf0b05", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8cde52f095104854b48949e5b531cf39", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a33f4b72b437420f8f918a73470df6ea", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e76cb1bcc673424296583851df2a9ee4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0e5406d4ba724b74a3aa6c5b3564152d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "857b183610ba427398e9907659fd17a3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d77f5a30dd3a4ffa976820562a666294", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d9316ae16de14337b9b96ea332c26585", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c4b9b613c31d46b39ad8045290b57972", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "826e2dd310964b0782b1cfcb380bc505", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "efe3829f14a149a9a4cc330c08b64f8a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "83195f0da5534769a8c42a28b45819a8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5cfbe19c4dae47a2a63d82307fcc9284", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c0c43b7eaf79407fb0ac42a099e804ca", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9b0cfffecb674aa9baefd66ae9e12e8a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5e1342341c314641a027e3a57bc19fd1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1533fe7b307641a392c31bc1bcace052", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "14db526b0fb8424ab529381851af7cca", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d32089c88ffb404b8afb70a317e53da8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "43b69f0a15894befbee21ba9ed858344", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "71dbb964c9df483b853d6d91bc56bc17", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "eed56a5c51ef4c82bd1333582a66ac04", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9fc7625bcba74de5bf175f36f4226464", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0bb1b6a130da4f898205d8880b546a6e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "10896bb94564424db75cacf397e1e82a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "604ff91fda7645d78b2cc7372468fc05", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7e02d370e99441f78cb349aba79fd7cd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1a7c59a2e60d41c9b3c9feb26c411026", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7b1c94e95ebb4e0a853c1d399bb4e968", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "36a14a01aa2448cc8ccabfc48eff5aec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0bbeb226d4cf40cfb042cf5e1db484e0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "fc1d2b3093ad4e6ea453868a22b62571", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0f4063209e4348dbaa809afb36cb559e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "860e2dedd4b147b393648446c455f12e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9dec4f04e42c4f7a88b6f45e92d9ec22", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0d779476d8b547b59db79e4ea4fdf33c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a92f7810c9cb4820875917a4b62a6a5d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "02236d20a9404be5bc1a3af08b351c03", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "11b1bd3ea3fc4f55bd643a1848a6431b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f53e29a3fc8a4fd4bf787db4456727b0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8ae21a7791dc41e6b9871c9d80c6b986", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b40ca8d7ca24453c97501baebfffc42b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9eb936ada95e4d0f9647bb3e40a07d89", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "97b2fbcd5841478cb0a8682a71600916", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bd6e3bd773b24b6e97c477d7ac522da2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6eb7deff761146059845e699418973d5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "989f4959e86b4db28fbab80710fd7f4e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a1498f58321e45a0a701b3be91063435", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bc877055e5564bb5920e4d8ac0e93959", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5c452ab0838043ac89609a3461d2a697", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "25594d1e7f214cd5b85fd08f528b3af9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5c73160c631c4704a88e558ee126e087", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8f5bb0d78dc2466ca2bf95c5afdd57a9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6417220f23c6480facc7225f4a985991", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f826270063824b76ae84ab6187e7db15", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6f9adf7ef2d1491880b208b5527f78ed", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "90ed90fcb1f84a30800d0d49c8288ce1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "884a9ffc47fb4b4bac3e4f5c7948c3df", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7b573f393c82491da2fb8b391ca6db71", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9784ae34d4754cb08cc9e2e381bd7899", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3663db3ad20c48b8827818eff09235fb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "07de5589c5b7465d8565d66953abc462", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1ee6ab81a92843719085e09910634778", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c3f64fc979b94a5a8c2991faf0762988", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7cb3f4ba8f91433fb98741aa2da305f2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7ac9623a6e0d4c47b838e5a238620838", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "118b88a9e0be42758e3060eccf01c4cf", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "45ad910aee594f68b044993a74061265", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "38d4d783a83942d6abeb87de83a0b9e8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "efd2bc4d56da46a1961dd2f0e42e1474", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f9addc6cbbce4fda84769115af9f1b04", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "877257b7a9a8450cb693e037ee48a677", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5572f58b8ed843b6a13c959000a8221c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bef459086dc146708af4a63d20af706c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "562a868eb9bd434aa9ac5b6869e90b5e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "407c4f0fdcd144b282af537bd9cd8df0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9f4cbd2f9a72402a84a1deac7a420d15", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "37d0b08a8e91426d805f00470a1b95a1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2faf03e7ddf14e3f945799a39e28cecb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6d4eca8ebb0148b5bc6d168108c973fd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5b4ff602c2a146b9a89807070c5815c2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "fab6f26bcf3044c2972e8c4544de33e0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d73d2e43efba4caa93a794c3d6f28d69", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ddad575ba24b4608b8ded0c1761987e1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "aa28cc110e144a18aeb207c3aa99e32b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "62852ebc55774c80bf22bb2702a137f4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "556870c67137403fba9237d2ca8b5cc3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c9edc88ea6aa4a22be276a7b63e15de8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "016be437e82c49c9a689c41fe63b9b49", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "102a6d310ba745a2b369f5bfa2e37f19", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "fdb8e07ce5fd4c07b20150f914a32414", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d3476a3744424ab891305bbf2dc4aaa4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a4b065a1c5914fa496ff2a9c70bb2f1a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c285b1f410684e9b8d7881f12417908c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ec653078756e464ca7f82b8a41d80431", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "640f3f17956546a88f1bf924d08a7009", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6c6beef932f84670ae4887876de6f6b3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "11e59dd7e8d04487b3de6525c4c678da", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ce6c709c78634562b70093499b1a32b7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5513d3fd992f4b599431e1f19b2d3751", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cb270ada221f4b7c82da8ac482f80f1b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5540a17b0f08419b83b7742ca5734a40", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ccdcf08f5ffb4f048d584155c9208c46", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2a96a404739c48ea920aeab664f1dfb4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e1f8921a334c41959ee7d20fab0022a9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "307ff91602964206886e973a672a9525", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bc903e9f1d464e4090269ca871b01588", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "79e9e049bb974693a00bc9c4bb6993dd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "96f1d42287254b46a66f09769e6244b5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9c45fa2ff6ba41bdbb922ff65ccf31b2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6901f44a746e4371b62edafd38839c8c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ba40172218f042ca832016bf4b05c85f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "db7e001bae74412d8b8f397b89c4013f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d188bca23d24bf69376b1bf3a720bb7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e09010af0f7743219f86cca5c47eb203", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d0b95ec9ef204158812cd9a708deabdd", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8dda73fc405947f1b97e355257e0acd1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d63098eae074310afa03c54a2f60786", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d017d045c4a24d18b123fe36c2a132b7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b3e9db2c617e4ceaacc4eac6a43aa259", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d774881459a405eaa826234afc93c72", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f728d8d45659428288b0a814e6ad6e6d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8ba0797e0ae642b2a7a91b60a549d76e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f483f71a536c4c7b90882ef33ef4ba02", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2d2f098ff9684140a0a8e6c8f3880781", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ad9134cb38d1439092a1575c6b6788ba", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d2bb2a37b70243ffb2991d0d421d973d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f4d67c570bbf4bb7ac4d9e0fb3097e0b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1f7cff9dbb7b47fd91286eba5ededfdc", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c7e1aab698364ac0996c3de4a4616c85", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6c99a4ba9d6c4c45a378cc2967623689", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b194fb0823b447d6870b0e51989ee0d0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1c19e2ff585541c1b43239eef6c60f69", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4ced3e9057874463ac1441459af1fce0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "eba97bfb81bf4182bf8190b2740d553b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d03848eb827746518b1ab993d9aec266", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "194381a561f6473c848fa7e85f6ad1e8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c2e82e90e41e4a788d5bfd75f85d3fae", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a1eb6e5fa87445a39af3b64db5aa6429", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "41a9d7f532924c3eab56ed0eccbaabd5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "cfd69993799c43c6a8f2273be1845f44", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a63ab0236263409bb7746b5829585d14", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6e3e5b0ee62a46beb6002fcbde48fec5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ff2db7d763444c1f910ba43b3286e819", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5585717fa99843da88413e61d4443531", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3e02c58d21bd4c5486a1c510dbfff2ee", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "eb5605b26c1e43c291d7e5d5bc26d3d3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "59e87bcb2964422595246d42e245676a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a326ca6fb16b4824939c615db63b733f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "46fd9391490e4df3bc8437aa2baac07b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3068f59ca00c4672b382d6bd06e8a0c4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f4dac192c1df4bd3a180690e7c3a50a3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1fe9caa7f10a42639226ac10185a3e98", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "60851936ffa84085a0a11e6b1f600b62", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8fa9753c9f3a4c88bc826c6f186083e0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4d198e81c9dc4db988e985c3aa8df7f9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c6c14540645a48ef97f5da3b33e9378b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6d5e60e36d174930a7a47770fb5d8534", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "91ceaf03b3294e9e8d9b5ff384f0514b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4ead1ac1d9d54b3e8e307adbec97c397", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ad2fcca881b2455990b891c9519ebca1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d2898f46eec54150af3b59db4cf99807", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "299cd1e70c7e46ffb41e42e445e12126", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9487ddc4851b4f4a8e92218fdb5799a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "11895923be6e4b5cbc44cc0fa40a2599", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "726e8a3380e3455483c9527d59a46da9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d7a93c2f2b3f4361b6bf0eec5a054aa8", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a253cae7f6194d2ca202177f70137031", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3868383dedfc43ba9591abc0cee848ae", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8f82bf14ce3d40e2beb106bc597dcbaf", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9eb7cbc188234f53bfbf6f7c07ce7567", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5e8308b29308407c8c2a8f3eeec15d5c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9b64f5b8a239425fadd393a1cd061e3d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c49d4ac1232441adb0b9ddd8939c549c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "46a200b12dc247d4899691c6ec9bdfce", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "49cbfcdb4be64ca6a1da0f13e88adfc7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7bcd972a904b4cbf9883fa4300c7261a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5ebf88e02a324d50b23beadd0937a893", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bf801dda4d50433796322a378ce46d16", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9b46a132ef5f41ba991b46e5f5be4a5c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a3946263b7d84f5fb01ff1cbffa52401", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0ee8c5ad2e8a41e7a077be6717ed1e63", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7bb79be0932c4deea6a29cf2ec22ddbb", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d9f22271c6584acca85b5f2b9c3b221c", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "64dcc2b0ac7541b2bdd08624cf2f3c26", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3f4a3d8f08c34161b4f9a0229058df40", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5d67c29a9fbb46a18781d2ddcf14acec", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6b1d75e78e9f4423b3dee2908410fcf9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "2fc1e2d39b414b8aad28c040c79320c9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "ede5db7dba4d413f9d698bf7bb26751f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "57d5acf4718a4bfab85e920b9bc4e793", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "05786ddf80fb4b0fae51239f0904b708", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "476b27b6911f4411aa8aa46c43efc69f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a4969352fc8848b8b879da35b4c5c352", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "470014e1c35d49bfbb9c9767bd47b7e6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "16f31c4e5f95486eb133401b4af1d99a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "8bf70dc246464169977bb4a1cbe93919", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "12f4affc63ad41cf88dcd8bd2f8b1d5b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b5ae2d460c10421a8b0a55e796924364", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "553580e0b36040b2bae947a13489ddd6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b0c42b8b54b4496c95f7603ada33d179", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7986968b3bd2443f90a94cd3c176e84f", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "7ef20f48d5194d6189ff60e4f9e71b15", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bfeee8455a194a7992ae0897e15b83f7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "14ecae5d228b42a5b20df0463d66a641", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "b1d341e2521649699103a4d06f384f25", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "900b0de941b54459a997e4b186d5e1d9", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e07846ca662548b28e7b5d3ddec23b08", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d65114f95d8c4da19be3867a2017d91e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0a327239827f462389c03c72ea4ac5ad", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "91c709da9c8c4bf48264ee73dc4f4da4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "110f021d4b7546c7a67257bbb76da6c2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6694a2450e8b4da48187819da7c11c6a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "3d63dd1b02ec411abf24764cdf6f8c81", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "5f1e523dcf2e4c27a1b99ae6ab882640", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "45da6d2e38f84b259ad760267029f10e", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "6dc70490c21441a0967527b28b5786a3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c108984b1d9b48238255cbc6c35e990b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Converting files', max=1.0, style=ProgressStyle(descripti…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f106336973bd4f55af4c1e5457e686e1", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='Preprocessing', max=1.0, style=ProgressStyle(description_…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from haystack import Pipeline\n", "from haystack.nodes import TextConverter, PreProcessor\n", @@ -8106,22 +216,14 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "To learn more about the parameters of the `PreProcessor`, see [Usage](https://docs.haystack.deepset.ai/docs/preprocessor#usage). To understand why document splitting is important for your question answering system's performance, see [Document Length](https://docs.haystack.deepset.ai/docs/optimization#document-length)." ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "2. Populate the indexing pipeline with nodes. You should provide the `name` or `name`s of preceding nodes as the `input` argument. Note that in an indexing pipeline, the input to the first node is \"File\"." ] @@ -8129,11 +231,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "import os\n", @@ -8145,11 +243,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "3. Write the text data into the DocumentStore by running the indexing pipeline." ] @@ -8157,51 +251,35 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "files_to_index = [doc_dir + \"/\" + f for f in os.listdir(doc_dir)]\n", - "indexing_pipeline.run_batch(files_to_index)" + "indexing_pipeline.run_batch(file_paths=files_to_index)" ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way.\n", "\n", - "As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using `DocumentStore.write_documents()`." + "As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using [`DocumentStore.write_documents()`](https://docs.haystack.deepset.ai/reference/document-store-api#basedocumentstorewrite_documents)." ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Initializing the Retriever\n", "\n", - "Retrievers sift through all the Documents and return only those that it thinks might contain the answer to the question. Here we are using the BM25 algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." + "Retrievers sift through all the Documents and return only those that it thinks might be relevant to the question. Here we are using the BM25 algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from haystack.nodes import BM25Retriever\n", @@ -8211,11 +289,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Initializing the Reader\n", "\n", @@ -8225,11 +299,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from haystack.nodes import FARMReader\n", @@ -8239,11 +309,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Creating the Retriever-Reader Pipeline\n", "\n", @@ -8252,11 +318,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "1. Initialize the `Pipeline` object and add the Retriever and Reader as nodes. You should provide the `name` or `name`s of preceding nodes as the input argument. Note that in a querying pipeline, the input to the first node is \"Query\"." ] @@ -8264,11 +326,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from haystack import Pipeline\n", @@ -8280,11 +338,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## Asking a Question\n", "\n", @@ -8294,11 +348,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "prediction = querying_pipeline.run(\n", @@ -8312,11 +362,14 @@ }, { "cell_type": "markdown", + "source": [], "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "collapsed": false + } + }, + { + "cell_type": "markdown", + "metadata": {}, "source": [ "Here are some questions you could try out:\n", "- Who is the father of Arya Stark?\n", @@ -8326,11 +379,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "2. The answers returned by the pipeline can be printed out directly:" ] @@ -8338,11 +387,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from pprint import pprint\n", @@ -8352,11 +397,7 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "3. Simplify the printed answers:" ] @@ -8364,11 +405,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, + "metadata": {}, "outputs": [], "source": [ "from haystack.utils import print_answers\n", @@ -8381,22 +418,14 @@ }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "And there you have it! Congratulations on building a scalable machine learning based question answering system!" ] }, { "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, + "metadata": {}, "source": [ "## About us\n", "\n", @@ -8414,28 +443,6 @@ "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [] } ], "metadata": { @@ -8464,4 +471,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} \ No newline at end of file +} From 1ea723e74674be87a14f72e268ca22f93b94cf0c Mon Sep 17 00:00:00 2001 From: brandenchan Date: Wed, 2 Nov 2022 15:47:54 +0100 Subject: [PATCH 12/48] Create finetuning tutorial --- new_tutorials/03_finetune_a_reader.ipynb | 246 +++++++++++++++++++++++ 1 file changed, 246 insertions(+) create mode 100644 new_tutorials/03_finetune_a_reader.ipynb diff --git a/new_tutorials/03_finetune_a_reader.ipynb b/new_tutorials/03_finetune_a_reader.ipynb new file mode 100644 index 00000000..b50f685a --- /dev/null +++ b/new_tutorials/03_finetune_a_reader.ipynb @@ -0,0 +1,246 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Reader Fine-Tuning\n", + "\n", + "- **Level**: Intermediate\n", + "- **Time to complete**: 20 minutes\n", + "- **Prerequisites**: Prepare the Colab environment (see links below).\n", + "- **Nodes Used**: `FARMReader`\n", + "- **Goal**: Learn how to improve the performance of a Reader model by performing fine-tuning.\n", + "\n", + "Fine-tuning can improve your Reader's performance on question answering, especially if you are working with very specific domains. While many of the existing public models trained on SQuAD or other public question answering datasets should be enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we have had cases where ~2000 examples could increase performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## Preparing the Colab Environment\n", + "\n", + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Installing Haystack\n", + "\n", + "To start, let's install the latest release of Haystack with `pip`:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "\n", + "pip install --upgrade pip\n", + "pip install farm-haystack[colab]" + ] + }, + { + "cell_type": "markdown", + "source": [ + "\n", + "## Creating Training Data\n", + "\n", + "To start fine-tuning your Reader model, you need question answering data in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format. One sample from this data should contain a question, a text answer, and the document in which this answer can be found.\n", + "\n", + "You can start generating your own training data using one of the two tools that we offer:\n", + "\n", + "1. **Annotation Tool**: You can use the deepset [annotation tool](https://haystack.deepset.ai/guides/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. The labels can be exported in the SQuAD format that is compatible with fine-tuning in Haystack.\n", + "\n", + "2. **Feedback Mechanism**: In a production system, you can collect users' feedback to model predictions via Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api) and use this as training data. To learn how to interact with the user feedback endpoints, see [User Feedback](https://docs.haystack.deepset.ai/docs/domain_adaptation#user-feedback).\n" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "\n", + "## Fine-tuning the Reader\n", + "\n", + "1. Initialize the Reader, supplying the name of the base model that you wish to improve." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from haystack.nodes import FARMReader\n", + "\n", + "reader = FARMReader(model_name_or_path=\"distilbert-base-uncased-distilled-squad\", use_gpu=True)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this case, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "2. Provide the SQuAD format training data to the `Reader.train()` method." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "data_dir = \"data/squad20\"\n", + "reader.train(\n", + " data_dir=data_dir,\n", + " train_filename=\"dev-v2.0.json\",\n", + " use_gpu=True,\n", + " n_epochs=1,\n", + " save_dir=\"my_model\"\n", + ")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "With the default parameters above, we are starting with a base model that was trained on the SQuAD training dataset and we are further fine-tuning it on the SQuAD development dataset. To fine-tune the model for your domain, you should replace `train_filename` with your domain specific dataset.\n", + "\n", + "If you are looking to perform evaluation over the course of fine-tuning, see [FARMReader.train() API](https://docs.haystack.deepset.ai/reference/reader-api#farmreadertrain) for the relevant arguments." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Saving and Loading\n", + "\n", + "The model is automatically saved at the end of fine-tuning in the `save_dir` that you specified.\n", + "However, you can also manually save the Reader again by running:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "reader.save(directory=\"my_model\")" + ] + }, + { + "cell_type": "markdown", + "source": [ + "To load a saved model, run:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "new_reader = FARMReader(model_name_or_path=\"my_model\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## About us\n", + "\n", + "This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n", + "\n", + "We bring NLP to the industry via open source! \n", + "Our focus: Industry specific language models & large scale QA systems. \n", + " \n", + "Some of our other work: \n", + "- [German BERT](https://deepset.ai/german-bert)\n", + "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", + "\n", + "Get in touch:\n", + "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", + "\n", + "By the way: [we're hiring!](https://www.deepset.ai/jobs)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [], + "metadata": { + "collapsed": false + } + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.8.9 64-bit", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.9" + }, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 3dd310d9dca00fe5cbd3b5f0c4f879a36d4061cc Mon Sep 17 00:00:00 2001 From: brandenchan Date: Wed, 2 Nov 2022 17:45:48 +0100 Subject: [PATCH 13/48] Create distillation tutorial --- .../04_distilling_a_reader_model.ipynb | 278 ++++++++++++++++++ 1 file changed, 278 insertions(+) create mode 100644 new_tutorials/04_distilling_a_reader_model.ipynb diff --git a/new_tutorials/04_distilling_a_reader_model.ipynb b/new_tutorials/04_distilling_a_reader_model.ipynb new file mode 100644 index 00000000..0a76394c --- /dev/null +++ b/new_tutorials/04_distilling_a_reader_model.ipynb @@ -0,0 +1,278 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Distilling a Reader Model\n", + "\n", + "- **Level**: Advanced\n", + "- **Time to complete**: 20 minutes\n", + "- **Prerequisites**: Prepare the Colab environment (see links below).\n", + "- **Nodes Used**: `FARMReader`\n", + "- **Goal**: Distil the question answering capabilities of a larger Reader model into a smaller Reader model.\n", + "\n", + "Model distillation is the process of teaching a smaller model to imitate the performance of a larger, better trained model. By distilling one model into another, you end up with a more computationally efficient version of the original with only a slight trade-off in accuracy. In this tutorial, you will learn how to perform one form of model distillation on Reader models in Haystack. Model distillation is a complex topic and an active area of research so if you would like to learn more about it, we recommend looking at [Model Distillation](https://docs.haystack.deepset.ai/docs/model_distillation)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## Preparing the Colab Environment\n", + "\n", + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Installing Haystack\n", + "\n", + "To start, let's install the latest release of Haystack with `pip`:" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "\n", + "pip install --upgrade pip\n", + "pip install farm-haystack[colab]" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Augmenting Training Data\n", + "\n", + "Having more training data is useful at all levels of model training. When performing intermediate layer distillation, additional data is beneficial, even if it is synthetically generated. Here we will be using the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py) to augment our dataset. It creates artifical copies of question answering samples by replacing randomly chosen words with words of similar meaning. This meaning similarity is determined by their vector representations in a GLoVe word embedding model." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "1. Download the `augment_squad.py` script" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "!wget https://raw.githubusercontent.com/deepset-ai/haystack/main/haystack/utils/augment_squad.py" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "2. Download a small slice of the SQuAD question answering database as well as a set of GLoVe vectors." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "doc_dir = \"data/tutorial2\"\n", + "\n", + "glove_url = \"https://nlp.stanford.edu/data/glove.6B.zip\"\n", + "fetch_archive_from_http(url=glove_url, output_dir=doc_dir)\n", + "\n", + "s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/squad_small.json.zip\"\n", + "fetch_archive_from_http(url=s3_url, output_dir=doc_dir)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "Note that we have chosen a smaller set vectors and a smaller dataset so that this tutorial will run in a reasonable amount of time. You will want to pick larger versions of both for real use cases." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "3. Run the `augment_squad.py` script to create an augmented dataset." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "!python augment_squad.py --squad_path squad_small.json --output_path augmented_dataset.json --multiplication_factor 2 --glove_path glove.6B.300d.txt" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "The multiplication factor determines how many augmented samples we are generating. Setting it to 2 makes it much quicker to run. In real use cases, you will want to set this to something like 20." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Distilling a Reader Model\n", + "\n", + "Distillation in Haystack is done in two distinct phases:\n", + "- Intermediate layer distillation ensures that the teacher and student models behave similarly. This can be performed using the augmented data. While intermediate layer distillation is optional, it has positive impact on the result of model training.\n", + "- Prediction layer distillation optimize the model for the specific task. This must be performed using the non-augmented data.\n" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "1. Initialize the teacher and student models." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# Loading a fine-tuned model as teacher e.g. \"deepset/​bert-​base-​uncased-​squad2\"\n", + "teacher = FARMReader(model_name_or_path=\"my_model\", use_gpu=True)\n", + "\n", + "# You can use any pre-trained language model as teacher that uses the same tokenizer as the teacher model.\n", + "# The number of the layers in the teacher model also needs to be a multiple of the number of the layers in the student.\n", + "student = FARMReader(model_name_or_path=\"huawei-noah/TinyBERT_General_6L_768D\", use_gpu=True)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "2. Perform intermediate layer distillation and prediction layer distillation." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "student.distil_intermediate_layers_from(teacher, data_dir=\".\", train_filename=\"augmented_dataset.json\", use_gpu=True)\n", + "student.distil_prediction_layer_from(teacher, data_dir=\"data/squad20\", train_filename=\"dev-v2.0.json\", use_gpu=True)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "3. Save the student model." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "student.save(directory=\"my_distilled_model\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "## About us\n", + "\n", + "This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n", + "\n", + "We bring NLP to the industry via open source! \n", + "Our focus: Industry specific language models & large scale QA systems. \n", + " \n", + "Some of our other work: \n", + "- [German BERT](https://deepset.ai/german-bert)\n", + "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", + "\n", + "Get in touch:\n", + "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", + "\n", + "By the way: [we're hiring!](https://www.deepset.ai/jobs)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.8.9 64-bit", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.9" + }, + "vscode": { + "interpreter": { + "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 8e4cc8ba9453df3a657362f74265c5999e646031 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Wed, 2 Nov 2022 17:46:30 +0100 Subject: [PATCH 14/48] Rename tutorial --- ...ling_a_reader_model.ipynb => 04_distilling_a_reader.ipynb} | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) rename new_tutorials/{04_distilling_a_reader_model.ipynb => 04_distilling_a_reader.ipynb} (99%) diff --git a/new_tutorials/04_distilling_a_reader_model.ipynb b/new_tutorials/04_distilling_a_reader.ipynb similarity index 99% rename from new_tutorials/04_distilling_a_reader_model.ipynb rename to new_tutorials/04_distilling_a_reader.ipynb index 0a76394c..3bf7954b 100644 --- a/new_tutorials/04_distilling_a_reader_model.ipynb +++ b/new_tutorials/04_distilling_a_reader.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Distilling a Reader Model\n", + "# Distilling a Reader\n", "\n", "- **Level**: Advanced\n", "- **Time to complete**: 20 minutes\n", @@ -149,7 +149,7 @@ { "cell_type": "markdown", "source": [ - "## Distilling a Reader Model\n", + "## Distilling a Reader\n", "\n", "Distillation in Haystack is done in two distinct phases:\n", "- Intermediate layer distillation ensures that the teacher and student models behave similarly. This can be performed using the augmented data. While intermediate layer distillation is optional, it has positive impact on the result of model training.\n", From 080b5880120bce5222bbc6dff8c90e7dc8eb205b Mon Sep 17 00:00:00 2001 From: brandenchan Date: Thu, 3 Nov 2022 12:38:38 +0100 Subject: [PATCH 15/48] Test and refine distillation tutorial --- new_tutorials/03_finetune_a_reader.ipynb | 2 +- new_tutorials/04_distilling_a_reader.ipynb | 129 ++++++++++++++++++--- 2 files changed, 113 insertions(+), 18 deletions(-) diff --git a/new_tutorials/03_finetune_a_reader.ipynb b/new_tutorials/03_finetune_a_reader.ipynb index b50f685a..cfd5ffd4 100644 --- a/new_tutorials/03_finetune_a_reader.ipynb +++ b/new_tutorials/03_finetune_a_reader.ipynb @@ -7,7 +7,7 @@ "# Reader Fine-Tuning\n", "\n", "- **Level**: Intermediate\n", - "- **Time to complete**: 20 minutes\n", + "- **Time to complete**: 30 minutes\n", "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `FARMReader`\n", "- **Goal**: Learn how to improve the performance of a Reader model by performing fine-tuning.\n", diff --git a/new_tutorials/04_distilling_a_reader.ipynb b/new_tutorials/04_distilling_a_reader.ipynb index 3bf7954b..9e5efcf0 100644 --- a/new_tutorials/04_distilling_a_reader.ipynb +++ b/new_tutorials/04_distilling_a_reader.ipynb @@ -7,7 +7,7 @@ "# Distilling a Reader\n", "\n", "- **Level**: Advanced\n", - "- **Time to complete**: 20 minutes\n", + "- **Time to complete**: 45 minutes\n", "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `FARMReader`\n", "- **Goal**: Distil the question answering capabilities of a larger Reader model into a smaller Reader model.\n", @@ -51,6 +51,20 @@ "pip install farm-haystack[colab]" ] }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import logging\n", + "\n", + "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", + "logging.getLogger(\"haystack\").setLevel(logging.INFO)" + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "source": [ @@ -65,7 +79,7 @@ { "cell_type": "markdown", "source": [ - "1. Download the `augment_squad.py` script" + "1. Download the `augment_squad.py` script." ], "metadata": { "collapsed": false @@ -85,7 +99,7 @@ { "cell_type": "markdown", "source": [ - "2. Download a small slice of the SQuAD question answering database as well as a set of GLoVe vectors." + "2. Download a small slice of the SQuAD question answering database." ], "metadata": { "collapsed": false @@ -96,13 +110,36 @@ "execution_count": null, "outputs": [], "source": [ - "doc_dir = \"data/tutorial2\"\n", + "from haystack.utils import fetch_archive_from_http\n", "\n", - "glove_url = \"https://nlp.stanford.edu/data/glove.6B.zip\"\n", - "fetch_archive_from_http(url=glove_url, output_dir=doc_dir)\n", + "doc_dir = \"data/distilling_a_reader\"\n", + "squad_dir = doc_dir + \"/squad\"\n", "\n", "s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/squad_small.json.zip\"\n", - "fetch_archive_from_http(url=s3_url, output_dir=doc_dir)" + "fetch_archive_from_http(url=s3_url, output_dir=squad_dir)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + " 3. Download a set of GLoVe vectors." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "glove_dir = doc_dir + \"/glove\"\n", + "\n", + "glove_url = \"https://nlp.stanford.edu/data/glove.6B.zip\"\n", + "fetch_archive_from_http(url=glove_url, output_dir=glove_dir)" ], "metadata": { "collapsed": false @@ -120,7 +157,7 @@ { "cell_type": "markdown", "source": [ - "3. Run the `augment_squad.py` script to create an augmented dataset." + "4. Run the `augment_squad.py` script to create an augmented dataset." ], "metadata": { "collapsed": false @@ -131,7 +168,11 @@ "execution_count": null, "outputs": [], "source": [ - "!python augment_squad.py --squad_path squad_small.json --output_path augmented_dataset.json --multiplication_factor 2 --glove_path glove.6B.300d.txt" + "!python augment_squad.py \\\n", + " --squad_path data/distilling_a_reader/squad/squad_small.json \\\n", + " --glove_path data/distilling_a_reader/glove/glove.6B.300d.txt\n", + " --output_path augmented_dataset.json \\\n", + " --multiplication_factor 2" ], "metadata": { "collapsed": false @@ -162,7 +203,7 @@ { "cell_type": "markdown", "source": [ - "1. Initialize the teacher and student models." + "1. Initialize the teacher model." ], "metadata": { "collapsed": false @@ -173,11 +214,37 @@ "execution_count": null, "outputs": [], "source": [ - "# Loading a fine-tuned model as teacher e.g. \"deepset/​bert-​base-​uncased-​squad2\"\n", - "teacher = FARMReader(model_name_or_path=\"my_model\", use_gpu=True)\n", + "from haystack.nodes import FARMReader\n", "\n", - "# You can use any pre-trained language model as teacher that uses the same tokenizer as the teacher model.\n", - "# The number of the layers in the teacher model also needs to be a multiple of the number of the layers in the student.\n", + "teacher = FARMReader(model_name_or_path=\"deepset/bert-base-uncased-squad2\", use_gpu=True)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "Here we are using [`deepset/bert-base-uncased-squad2`](https://huggingface.co/deepset/bert-base-uncased-squad2), a base sized BERT model trained on SQuAD." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "2. Initialize the student model." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ "student = FARMReader(model_name_or_path=\"huawei-noah/TinyBERT_General_6L_768D\", use_gpu=True)" ], "metadata": { @@ -187,7 +254,36 @@ { "cell_type": "markdown", "source": [ - "2. Perform intermediate layer distillation and prediction layer distillation." + "Here we are using a TinyBERT model that is smaller than the teacher model. You can pick any other student model, so long as it uses the same tokenizer as the teacher model. Also, the number of layers in the teacher model must be a multiple of number of lyaers in the student." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "3. Perform intermediate layer distillation." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "student.distil_intermediate_layers_from(teacher, data_dir=\".\", train_filename=\"augmented_dataset.json\", use_gpu=True)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "4. Perform prediction layer distillation." ], "metadata": { "collapsed": false @@ -198,7 +294,6 @@ "execution_count": null, "outputs": [], "source": [ - "student.distil_intermediate_layers_from(teacher, data_dir=\".\", train_filename=\"augmented_dataset.json\", use_gpu=True)\n", "student.distil_prediction_layer_from(teacher, data_dir=\"data/squad20\", train_filename=\"dev-v2.0.json\", use_gpu=True)" ], "metadata": { @@ -208,7 +303,7 @@ { "cell_type": "markdown", "source": [ - "3. Save the student model." + "5. Save the student model." ], "metadata": { "collapsed": false From 8b8d792a3f52605bde8cc2d2df5d82f62d3393e8 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Thu, 3 Nov 2022 15:04:35 +0100 Subject: [PATCH 16/48] Fix first tutorial --- ...our_first_question_answering_system.ipynb} | 36 ++++++++++++++----- 1 file changed, 28 insertions(+), 8 deletions(-) rename new_tutorials/{01_simplified_qa_pipeline.ipynb => 01_build_your_first_question_answering_system.ipynb} (91%) diff --git a/new_tutorials/01_simplified_qa_pipeline.ipynb b/new_tutorials/01_build_your_first_question_answering_system.ipynb similarity index 91% rename from new_tutorials/01_simplified_qa_pipeline.ipynb rename to new_tutorials/01_build_your_first_question_answering_system.ipynb index 6e6657df..1a8a145e 100644 --- a/new_tutorials/01_simplified_qa_pipeline.ipynb +++ b/new_tutorials/01_build_your_first_question_answering_system.ipynb @@ -27,9 +27,9 @@ "\n", "## Preparing the Colab Environment\n", "\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n" ] }, { @@ -90,7 +90,7 @@ "source": [ "from haystack.utils import fetch_archive_from_http\n", "\n", - "doc_dir = \"data/tutorial1\"\n", + "doc_dir = \"data/build_your_first_question_answering_system\"\n", "\n", "fetch_archive_from_http(\n", " url=\"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip\",\n", @@ -112,8 +112,8 @@ "outputs": [], "source": [ "import os\n", - "# from haystack.pipeline import TextIndexingPipeline\n", - "from text_indexing_pipeline import TextIndexingPipeline\n", + "from haystack.pipelines.standard_pipelines import TextIndexingPipeline\n", + "# from text_indexing_pipeline import TextIndexingPipeline\n", "\n", "files_to_index = [doc_dir + \"/\" + f for f in os.listdir(doc_dir)]\n", "indexing_pipeline = TextIndexingPipeline(document_store)\n", @@ -145,9 +145,9 @@ "metadata": {}, "outputs": [], "source": [ - "from haystack.nodes import TFIDFRetriever\n", + "from haystack.nodes import TfidfRetriever\n", "\n", - "retriever = TFIDFRetriever(document_store=document_store)" + "retriever = TfidfRetriever(document_store=document_store)" ] }, { @@ -270,6 +270,17 @@ "And there you have it! Congratulations on building your first machine learning based question answering system!" ] }, + { + "cell_type": "markdown", + "source": [ + "# Next Steps\n", + "\n", + "Check out [Build a Scalable Question Answering System](https://haystack.deepset.ai/tutorials/02_build_a_scalable_question_answering_system) to learn how to make a more advanced question answering system that uses an Elasticsearch backed DocumentStore and makes more use of the flexibility that pipelines offer." + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "metadata": {}, @@ -290,6 +301,15 @@ "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [], + "metadata": { + "collapsed": false + } } ], "metadata": { From fbf252ea309d8726cf7df0b1fb65ddc85f2f1002 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Thu, 3 Nov 2022 15:44:10 +0100 Subject: [PATCH 17/48] Run and test tutorial 2 --- ..._scalable_question_answering_system.ipynb} | 42 ++++++++++++++----- 1 file changed, 31 insertions(+), 11 deletions(-) rename new_tutorials/{02_qa_pipeline.ipynb => 02_build_a_scalable_question_answering_system.ipynb} (92%) diff --git a/new_tutorials/02_qa_pipeline.ipynb b/new_tutorials/02_build_a_scalable_question_answering_system.ipynb similarity index 92% rename from new_tutorials/02_qa_pipeline.ipynb rename to new_tutorials/02_build_a_scalable_question_answering_system.ipynb index 8c6ee285..fcda819e 100644 --- a/new_tutorials/02_qa_pipeline.ipynb +++ b/new_tutorials/02_build_a_scalable_question_answering_system.ipynb @@ -4,13 +4,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# The Building Blocks of a Scalable Question Answering System\n", + "# Build a Scalable Question Answering System\n", "\n", "- **Level**: Beginner\n", - "- **Time to complete**: 20 minutes\n", + "- **Time to complete**: 30 minutes\n", "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`, `FARMReader`\n", - "- **Goal**: After completing this tutorial, you will have learned about the Reader, Retriever and `ElasticsearchDocumentStore`. You will index files with and indexing pipeline and combine the Reader and Retriever in a querying pipeline. At the end, you will have built a question answering pipeline that can answer questions about the Game of Thrones series.\n", + "- **Goal**: After completing this tutorial, you will have learned about the Reader, Retriever and `ElasticsearchDocumentStore`. You will index files with and indexing pipeline and combine the Reader and Retriever in a querying pipeline.\n", "\n", "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", "\n", @@ -27,9 +27,9 @@ "\n", "## Preparing the Colab Environment\n", "\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n" ] }, { @@ -59,7 +59,7 @@ "source": [ "## Initializing the ElasticsearchDocumentStore\n", "\n", - "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs indepedently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." + "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs indepedently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." ] }, { @@ -166,7 +166,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files." + "1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/build_a_scalable_question_answering_system` as a set of `.txt` files." ] }, { @@ -177,7 +177,7 @@ "source": [ "from haystack.utils import fetch_archive_from_http\n", "\n", - "doc_dir = \"data/tutorial1\"\n", + "doc_dir = \"data/build_a_scalable_question_answering_system\"\n", "\n", "fetch_archive_from_http(\n", " url=\"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip\",\n", @@ -381,7 +381,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "2. The answers returned by the pipeline can be printed out directly:" + "2. The answers returned by the pipeline can be printed out directly." ] }, { @@ -399,7 +399,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "3. Simplify the printed answers:" + "3. Simplify the printed answers." ] }, { @@ -423,6 +423,17 @@ "And there you have it! Congratulations on building a scalable machine learning based question answering system!" ] }, + { + "cell_type": "markdown", + "source": [ + "# Next Steps\n", + "\n", + "To learn how to improve the performance of the Reader, see [Fine-Tune a Reader](https://haystack.deepset.ai/tutorials/03_fine_tune_a_reader)." + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "metadata": {}, @@ -443,6 +454,15 @@ "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [], + "metadata": { + "collapsed": false + } } ], "metadata": { From 2411b4ee6b49984261e3c397a69472c35ec9fb81 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Thu, 3 Nov 2022 16:00:47 +0100 Subject: [PATCH 18/48] Run and test tutorial 3 --- new_tutorials/03_finetune_a_reader.ipynb | 17 ++++------------- 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/new_tutorials/03_finetune_a_reader.ipynb b/new_tutorials/03_finetune_a_reader.ipynb index cfd5ffd4..0d9f4840 100644 --- a/new_tutorials/03_finetune_a_reader.ipynb +++ b/new_tutorials/03_finetune_a_reader.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Reader Fine-Tuning\n", + "# Fine-Tune a Reader\n", "\n", "- **Level**: Intermediate\n", "- **Time to complete**: 30 minutes\n", @@ -23,9 +23,9 @@ "source": [ "## Preparing the Colab Environment\n", "\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n" ] }, { @@ -206,15 +206,6 @@ "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)" ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [], - "metadata": { - "collapsed": false - } } ], "metadata": { From a8bfce2f4a0d725423baf60e3e5da18c2b131a60 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Thu, 3 Nov 2022 16:11:04 +0100 Subject: [PATCH 19/48] Run and test tutorial 3 --- new_tutorials/03_finetune_a_reader.ipynb | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/new_tutorials/03_finetune_a_reader.ipynb b/new_tutorials/03_finetune_a_reader.ipynb index 0d9f4840..c8803af6 100644 --- a/new_tutorials/03_finetune_a_reader.ipynb +++ b/new_tutorials/03_finetune_a_reader.ipynb @@ -184,6 +184,17 @@ "new_reader = FARMReader(model_name_or_path=\"my_model\")" ] }, + { + "cell_type": "markdown", + "source": [ + "# Next Steps\n", + "\n", + "Now that you have a model with improved performance, why not transfer its question answering capabilities into a smaller, faster model? Starting with this new model, you can use model distillation to create a more efficient model with only a slight tradeoff in performance. To learn more, see [Distil a Reader](https://haystack.deepset.ai/tutorials/04_distil_a_reader)." + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "metadata": { From b939038de0f1878d6a6035f463c395d067835423 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Thu, 3 Nov 2022 16:43:06 +0100 Subject: [PATCH 20/48] Run and test tutorial 4 --- new_tutorials/03_finetune_a_reader.ipynb | 4 +- new_tutorials/04_distilling_a_reader.ipynb | 50 ++++++++++++---------- 2 files changed, 30 insertions(+), 24 deletions(-) diff --git a/new_tutorials/03_finetune_a_reader.ipynb b/new_tutorials/03_finetune_a_reader.ipynb index c8803af6..377008d0 100644 --- a/new_tutorials/03_finetune_a_reader.ipynb +++ b/new_tutorials/03_finetune_a_reader.ipynb @@ -189,7 +189,9 @@ "source": [ "# Next Steps\n", "\n", - "Now that you have a model with improved performance, why not transfer its question answering capabilities into a smaller, faster model? Starting with this new model, you can use model distillation to create a more efficient model with only a slight tradeoff in performance. To learn more, see [Distil a Reader](https://haystack.deepset.ai/tutorials/04_distil_a_reader)." + "Now that you have a model with improved performance, why not transfer its question answering capabilities into a smaller, faster model? Starting with this new model, you can use model distillation to create a more efficient model with only a slight tradeoff in performance. To learn more, see [Distil a Reader](https://haystack.deepset.ai/tutorials/04_distil_a_reader).\n", + "\n", + "To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader)." ], "metadata": { "collapsed": false diff --git a/new_tutorials/04_distilling_a_reader.ipynb b/new_tutorials/04_distilling_a_reader.ipynb index 9e5efcf0..2fc4153a 100644 --- a/new_tutorials/04_distilling_a_reader.ipynb +++ b/new_tutorials/04_distilling_a_reader.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Distilling a Reader\n", + "# Distil a Reader\n", "\n", "- **Level**: Advanced\n", "- **Time to complete**: 45 minutes\n", @@ -23,9 +23,9 @@ "source": [ "## Preparing the Colab Environment\n", "\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n" ] }, { @@ -51,20 +51,6 @@ "pip install farm-haystack[colab]" ] }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "import logging\n", - "\n", - "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", - "logging.getLogger(\"haystack\").setLevel(logging.INFO)" - ], - "metadata": { - "collapsed": false - } - }, { "cell_type": "markdown", "source": [ @@ -112,7 +98,7 @@ "source": [ "from haystack.utils import fetch_archive_from_http\n", "\n", - "doc_dir = \"data/distilling_a_reader\"\n", + "doc_dir = \"data/distil_a_reader\"\n", "squad_dir = doc_dir + \"/squad\"\n", "\n", "s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/squad_small.json.zip\"\n", @@ -148,7 +134,7 @@ { "cell_type": "markdown", "source": [ - "Note that we have chosen a smaller set vectors and a smaller dataset so that this tutorial will run in a reasonable amount of time. You will want to pick larger versions of both for real use cases." + "Note that we have chosen a smaller set of vectors and a smaller dataset so that this tutorial will run in a reasonable amount of time. You will want to pick larger versions of both for real use cases." ], "metadata": { "collapsed": false @@ -169,8 +155,8 @@ "outputs": [], "source": [ "!python augment_squad.py \\\n", - " --squad_path data/distilling_a_reader/squad/squad_small.json \\\n", - " --glove_path data/distilling_a_reader/glove/glove.6B.300d.txt\n", + " --squad_path data/distil_a_reader/squad/squad_small.json \\\n", + " --glove_path data/distil_a_reader/glove/glove.6B.300d.txt \\\n", " --output_path augmented_dataset.json \\\n", " --multiplication_factor 2" ], @@ -254,7 +240,7 @@ { "cell_type": "markdown", "source": [ - "Here we are using a TinyBERT model that is smaller than the teacher model. You can pick any other student model, so long as it uses the same tokenizer as the teacher model. Also, the number of layers in the teacher model must be a multiple of number of lyaers in the student." + "Here we are using a TinyBERT model that is smaller than the teacher model. You can pick any other student model, so long as it uses the same tokenizer as the teacher model. Also, the number of layers in the teacher model must be a multiple of number of layers in the student." ], "metadata": { "collapsed": false @@ -320,6 +306,24 @@ "collapsed": false } }, + { + "cell_type": "markdown", + "source": [], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "# Next Steps\n", + "\n", + "To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader)." + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "metadata": { From f3d24c61a39e60682c00ff9c20793f183463f847 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Fri, 4 Nov 2022 11:08:51 +0100 Subject: [PATCH 21/48] Rename tutorial 4 --- .../{04_distilling_a_reader.ipynb => 04_distill_a_reader.ipynb} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename new_tutorials/{04_distilling_a_reader.ipynb => 04_distill_a_reader.ipynb} (99%) diff --git a/new_tutorials/04_distilling_a_reader.ipynb b/new_tutorials/04_distill_a_reader.ipynb similarity index 99% rename from new_tutorials/04_distilling_a_reader.ipynb rename to new_tutorials/04_distill_a_reader.ipynb index 2fc4153a..8ed8d9d0 100644 --- a/new_tutorials/04_distilling_a_reader.ipynb +++ b/new_tutorials/04_distill_a_reader.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Distil a Reader\n", + "# Distill a Reader\n", "\n", "- **Level**: Advanced\n", "- **Time to complete**: 45 minutes\n", From f0f57a432e307e7ee0d2e6535f8b62af67d3643f Mon Sep 17 00:00:00 2001 From: Branden Chan <33759007+brandenchan@users.noreply.github.com> Date: Fri, 4 Nov 2022 11:22:51 +0100 Subject: [PATCH 22/48] Oxford comma Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> --- markdowns/1.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/markdowns/1.md b/markdowns/1.md index 9eee5148..90c86d0d 100644 --- a/markdowns/1.md +++ b/markdowns/1.md @@ -42,7 +42,7 @@ pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[col A DocumentStore stores the documents that the question answering system uses to find answers to your questions. To learn more, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). -1. Download, extract and set the permission for the Elasticsearch image: +1. Download, extract, and set the permission for the Elasticsearch image: ```bash From 0daaaa41607dacc24f1f6ae321a98e364ac2432b Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 8 Nov 2022 12:45:44 +0100 Subject: [PATCH 23/48] Incorporate reviewer feedback for tutorial 2 --- ...your_first_question_answering_system.ipynb | 18 +++++--- ...a_scalable_question_answering_system.ipynb | 44 +++++++++++-------- new_tutorials/03_finetune_a_reader.ipynb | 15 +++++-- new_tutorials/04_distill_a_reader.ipynb | 15 +++++-- 4 files changed, 63 insertions(+), 29 deletions(-) diff --git a/new_tutorials/01_build_your_first_question_answering_system.ipynb b/new_tutorials/01_build_your_first_question_answering_system.ipynb index 1a8a145e..98ad69b1 100644 --- a/new_tutorials/01_build_your_first_question_answering_system.ipynb +++ b/new_tutorials/01_build_your_first_question_answering_system.ipynb @@ -7,18 +7,26 @@ "# Build Your First Question Answering System\n", "\n", "- **Level**: Beginner\n", - "- **Time to complete**: 20 minutes\n", + "- **Time to complete**: 15 minutes\n", "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `InMemoryDocumentStore`, `BM25Retriever`, `FARMReader`\n", - "- **Goal**: After completing this tutorial, you will have learned about the Reader and Retriever, and built a question answering pipeline that can answer questions about the Game of Thrones series.\n", + "- **Goal**: After completing this tutorial, you will have learned about the Reader and Retriever, and built a question answering pipeline that can answer questions about the Game of Thrones series.\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Overview\n", "\n", "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", "\n", "This tutorial will introduce you to all the concepts needed to build such a question answering system. However, certain setup steps, such as Document preparation and indexing as well as pipeline initialization, are simplified so that you can get started quicker.\n", "\n", - "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n", - "\n" - ] + "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!" + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", diff --git a/new_tutorials/02_build_a_scalable_question_answering_system.ipynb b/new_tutorials/02_build_a_scalable_question_answering_system.ipynb index fcda819e..a18fe633 100644 --- a/new_tutorials/02_build_a_scalable_question_answering_system.ipynb +++ b/new_tutorials/02_build_a_scalable_question_answering_system.ipynb @@ -7,18 +7,26 @@ "# Build a Scalable Question Answering System\n", "\n", "- **Level**: Beginner\n", - "- **Time to complete**: 30 minutes\n", + "- **Time to complete**: 20 minutes\n", "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`, `FARMReader`\n", - "- **Goal**: After completing this tutorial, you will have learned about the Reader, Retriever and `ElasticsearchDocumentStore`. You will index files with and indexing pipeline and combine the Reader and Retriever in a querying pipeline.\n", + "- **Goal**: After completing this tutorial, you'll have built a scalable search system that runs on text files and can answer questions about Game of Thrones. You'll then be able to expand this system for your needs.\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Overview\n", "\n", "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", "\n", - "This tutorial will introduce you to all the concepts needed to build such a question answering system. It will also use Haystack components such as indexing pipelines, querying pipelines and DocumentStores backed by external database services.\n", + "This tutorial introduces you to all the concepts needed to build such a question answering system. It also uses Haystack components, such as indexing pipelines, querying pipelines, and DocumentStores backed by external database services.\n", "\n", - "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n", - "\n" - ] + "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!" + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", @@ -59,14 +67,14 @@ "source": [ "## Initializing the ElasticsearchDocumentStore\n", "\n", - "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs indepedently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." + "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here, we're using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs independently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "1. Download, extract and set the permissions for the Elasticsearch installation image." + "1. Download, extract, and set the permissions for the Elasticsearch installation image." ] }, { @@ -104,7 +112,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you are working in an environment where Docker is available, you can also start Elasticsearch via Docker. This can be done manually, or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api#module-doc_store) utility function." + "If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this manually, or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api#module-doc_store) utility function." ] }, { @@ -157,9 +165,9 @@ "source": [ "## Indexing Documents with a Pipeline\n", "\n", - "You can write Documents into your DocumentStore using an indexing pipeline. Pipelines are composed of nodes that perform different kinds of processing. For example, here we will be using the `TextConverter` which turns `.txt` files into Haystack `Document` objects, as well as the `PreProcessor` which can clean and split the text within a `Document`. \n", + "The indexing pipeline turns your files into Document objects and writes them to the DocumentStore. Our indexing pipeline will have two nodes: `TextConverter` which turns `.txt` files into Haystack `Document` objects and `PreProcessor` which cleans and splits the text within a `Document`.\n", "\n", - "Once all components are combined, the indexing pipeline will ingest `.txt` filepaths, preprocess them and write them into the DocumentStore.\n" + "Once these nodes are combined into a pipeline, the pipeline will ingest `.txt` file paths, preprocess them, and write them into the DocumentStore.\n" ] }, { @@ -189,7 +197,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "2. Initialize the pipeline, TextConverter and PreProcessor." + "2. Initialize the pipeline, TextConverter, and PreProcessor." ] }, { @@ -225,7 +233,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "2. Populate the indexing pipeline with nodes. You should provide the `name` or `name`s of preceding nodes as the `input` argument. Note that in an indexing pipeline, the input to the first node is \"File\"." + "2. Add the nodes into an indexing pipeline. You should provide the `name` or `name`s of preceding nodes as the `input` argument. Note that in an indexing pipeline, the input to the first node is \"File\"." ] }, { @@ -245,7 +253,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "3. Write the text data into the DocumentStore by running the indexing pipeline." + "3. Run the indexing pipeline to write the text data into the DocumentStore." ] }, { @@ -273,7 +281,7 @@ "source": [ "## Initializing the Retriever\n", "\n", - "Retrievers sift through all the Documents and return only those that it thinks might be relevant to the question. Here we are using the BM25 algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." + "Retrievers sift through all the Documents and return only those that are relevant to the question. Here we are using the BM25Retriever. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." ] }, { @@ -293,7 +301,7 @@ "source": [ "## Initializing the Reader\n", "\n", - "A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + "A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base-sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." ] }, { @@ -381,7 +389,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "2. The answers returned by the pipeline can be printed out directly." + "2. You can directly print out the answers returned by the pipeline:" ] }, { @@ -399,7 +407,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "3. Simplify the printed answers." + "3. Simplify the printed answers:" ] }, { diff --git a/new_tutorials/03_finetune_a_reader.ipynb b/new_tutorials/03_finetune_a_reader.ipynb index 377008d0..60ed545b 100644 --- a/new_tutorials/03_finetune_a_reader.ipynb +++ b/new_tutorials/03_finetune_a_reader.ipynb @@ -7,13 +7,22 @@ "# Fine-Tune a Reader\n", "\n", "- **Level**: Intermediate\n", - "- **Time to complete**: 30 minutes\n", + "- **Time to complete**: 20 minutes\n", "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `FARMReader`\n", - "- **Goal**: Learn how to improve the performance of a Reader model by performing fine-tuning.\n", + "- **Goal**: Learn how to improve the performance of a Reader model by performing fine-tuning." + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Overview\n", "\n", "Fine-tuning can improve your Reader's performance on question answering, especially if you are working with very specific domains. While many of the existing public models trained on SQuAD or other public question answering datasets should be enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we have had cases where ~2000 examples could increase performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset." - ] + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", diff --git a/new_tutorials/04_distill_a_reader.ipynb b/new_tutorials/04_distill_a_reader.ipynb index 8ed8d9d0..5b8c91f0 100644 --- a/new_tutorials/04_distill_a_reader.ipynb +++ b/new_tutorials/04_distill_a_reader.ipynb @@ -7,13 +7,22 @@ "# Distill a Reader\n", "\n", "- **Level**: Advanced\n", - "- **Time to complete**: 45 minutes\n", + "- **Time to complete**: 30 minutes\n", "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `FARMReader`\n", - "- **Goal**: Distil the question answering capabilities of a larger Reader model into a smaller Reader model.\n", + "- **Goal**: Distil the question answering capabilities of a larger Reader model into a smaller Reader model.\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Overview\n", "\n", "Model distillation is the process of teaching a smaller model to imitate the performance of a larger, better trained model. By distilling one model into another, you end up with a more computationally efficient version of the original with only a slight trade-off in accuracy. In this tutorial, you will learn how to perform one form of model distillation on Reader models in Haystack. Model distillation is a complex topic and an active area of research so if you would like to learn more about it, we recommend looking at [Model Distillation](https://docs.haystack.deepset.ai/docs/model_distillation)." - ] + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "markdown", From 4a8334b5610127a658da5e2e641b6b990ba2ad29 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 8 Nov 2022 15:01:37 +0100 Subject: [PATCH 24/48] Incorporate reviewer feedback for tutorial 3 --- new_tutorials/03_finetune_a_reader.ipynb | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/new_tutorials/03_finetune_a_reader.ipynb b/new_tutorials/03_finetune_a_reader.ipynb index 60ed545b..f094f3d3 100644 --- a/new_tutorials/03_finetune_a_reader.ipynb +++ b/new_tutorials/03_finetune_a_reader.ipynb @@ -18,7 +18,7 @@ "source": [ "## Overview\n", "\n", - "Fine-tuning can improve your Reader's performance on question answering, especially if you are working with very specific domains. While many of the existing public models trained on SQuAD or other public question answering datasets should be enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we have had cases where ~2000 examples could increase performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset." + "Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset." ], "metadata": { "collapsed": false @@ -70,9 +70,9 @@ "\n", "You can start generating your own training data using one of the two tools that we offer:\n", "\n", - "1. **Annotation Tool**: You can use the deepset [annotation tool](https://haystack.deepset.ai/guides/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. The labels can be exported in the SQuAD format that is compatible with fine-tuning in Haystack.\n", + "1. **Annotation Tool**: You can use the deepset [Annotation Tool](https://haystack.deepset.ai/guides/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. You can then export the question-answer pairs in the SQuAD format that is compatible with fine-tuning in Haystack.\n", "\n", - "2. **Feedback Mechanism**: In a production system, you can collect users' feedback to model predictions via Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api) and use this as training data. To learn how to interact with the user feedback endpoints, see [User Feedback](https://docs.haystack.deepset.ai/docs/domain_adaptation#user-feedback).\n" + "2. **Feedback Mechanism**: In a production system, you can collect users' feedback to model predictions with Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api) and use this as training data. To learn how to interact with the user feedback endpoints, see [User Feedback](https://docs.haystack.deepset.ai/docs/domain_adaptation#user-feedback).\n" ], "metadata": { "collapsed": false @@ -84,7 +84,7 @@ "\n", "## Fine-tuning the Reader\n", "\n", - "1. Initialize the Reader, supplying the name of the base model that you wish to improve." + "1. Initialize the Reader, supplying the name of the base model you wish to improve." ], "metadata": { "collapsed": false @@ -106,7 +106,7 @@ { "cell_type": "markdown", "source": [ - "We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this case, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + "We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this tutorial, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base-sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." ], "metadata": { "collapsed": false @@ -142,9 +142,9 @@ { "cell_type": "markdown", "source": [ - "With the default parameters above, we are starting with a base model that was trained on the SQuAD training dataset and we are further fine-tuning it on the SQuAD development dataset. To fine-tune the model for your domain, you should replace `train_filename` with your domain specific dataset.\n", + "With the default parameters above, we are starting with a base model trained on the SQuAD training dataset and we are further fine-tuning it on the SQuAD development dataset. To fine-tune the model for your domain, replace `train_filename` with your domain-specific dataset.\n", "\n", - "If you are looking to perform evaluation over the course of fine-tuning, see [FARMReader.train() API](https://docs.haystack.deepset.ai/reference/reader-api#farmreadertrain) for the relevant arguments." + "To perform evaluation over the course of fine-tuning, see [FARMReader.train() API](https://docs.haystack.deepset.ai/reference/reader-api#farmreadertrain) for the relevant arguments." ], "metadata": { "collapsed": false From e3db63670ef9e19d73b3a27c545198742d0e8e0d Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 8 Nov 2022 15:11:22 +0100 Subject: [PATCH 25/48] Incorporate reviewer feedback for tutorial 4 --- new_tutorials/04_distill_a_reader.ipynb | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/new_tutorials/04_distill_a_reader.ipynb b/new_tutorials/04_distill_a_reader.ipynb index 5b8c91f0..15c0614d 100644 --- a/new_tutorials/04_distill_a_reader.ipynb +++ b/new_tutorials/04_distill_a_reader.ipynb @@ -10,7 +10,7 @@ "- **Time to complete**: 30 minutes\n", "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `FARMReader`\n", - "- **Goal**: Distil the question answering capabilities of a larger Reader model into a smaller Reader model.\n" + "- **Goal**: Distil the question answering capabilities of the larger BERT base Reader model into a smaller TinyBERT Reader model.\n" ] }, { @@ -18,7 +18,7 @@ "source": [ "## Overview\n", "\n", - "Model distillation is the process of teaching a smaller model to imitate the performance of a larger, better trained model. By distilling one model into another, you end up with a more computationally efficient version of the original with only a slight trade-off in accuracy. In this tutorial, you will learn how to perform one form of model distillation on Reader models in Haystack. Model distillation is a complex topic and an active area of research so if you would like to learn more about it, we recommend looking at [Model Distillation](https://docs.haystack.deepset.ai/docs/model_distillation)." + "Model distillation is the process of teaching a smaller model to imitate the performance of a larger, better trained model. By distilling one model into another, you end up with a more computationally efficient version of the original with only a slight trade-off in accuracy. In this tutorial, you will learn how to perform one form of model distillation on Reader models in Haystack. Model distillation is a complex topic and an active area of research so if want to learn more about it, see [Model Distillation](https://docs.haystack.deepset.ai/docs/model_distillation)." ], "metadata": { "collapsed": false @@ -65,7 +65,7 @@ "source": [ "## Augmenting Training Data\n", "\n", - "Having more training data is useful at all levels of model training. When performing intermediate layer distillation, additional data is beneficial, even if it is synthetically generated. Here we will be using the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py) to augment our dataset. It creates artifical copies of question answering samples by replacing randomly chosen words with words of similar meaning. This meaning similarity is determined by their vector representations in a GLoVe word embedding model." + "Having more human annotated training data is useful at all levels of model training. However, intermediate layer distillation can benefit even from synthetically generated data, since it is a less exact type of training. In this tutorial, we'll be using the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py) to augment our dataset. It creates artificial copies of question answering samples by replacing randomly chosen words with words of similar meaning. This meaning similarity is determined by their vector representations in a GLoVe word embedding model." ], "metadata": { "collapsed": false @@ -143,7 +143,7 @@ { "cell_type": "markdown", "source": [ - "Note that we have chosen a smaller set of vectors and a smaller dataset so that this tutorial will run in a reasonable amount of time. You will want to pick larger versions of both for real use cases." + "This tutorial uses a smaller set of vectors and a smaller dataset to make it faster. For real use cases, pick larger versions of both." ], "metadata": { "collapsed": false @@ -176,7 +176,7 @@ { "cell_type": "markdown", "source": [ - "The multiplication factor determines how many augmented samples we are generating. Setting it to 2 makes it much quicker to run. In real use cases, you will want to set this to something like 20." + "The multiplication factor determines how many augmented samples we're generating. Setting it to 2 makes it much quicker to run. In real use cases, set this to something like 20." ], "metadata": { "collapsed": false @@ -187,9 +187,9 @@ "source": [ "## Distilling a Reader\n", "\n", - "Distillation in Haystack is done in two distinct phases:\n", - "- Intermediate layer distillation ensures that the teacher and student models behave similarly. This can be performed using the augmented data. While intermediate layer distillation is optional, it has positive impact on the result of model training.\n", - "- Prediction layer distillation optimize the model for the specific task. This must be performed using the non-augmented data.\n" + "Distillation in Haystack is done in two phases:\n", + "- Intermediate layer distillation ensures that the teacher and student models behave similarly. This can be performed using the augmented data. While intermediate layer distillation is optional, it will improve the performance of the model after training.\n", + "- Prediction layer distillation optimizes the model for the specific task. This must be performed using the non-augmented data.\n" ], "metadata": { "collapsed": false @@ -249,7 +249,7 @@ { "cell_type": "markdown", "source": [ - "Here we are using a TinyBERT model that is smaller than the teacher model. You can pick any other student model, so long as it uses the same tokenizer as the teacher model. Also, the number of layers in the teacher model must be a multiple of number of layers in the student." + "Here we are using a TinyBERT model that is smaller than the teacher model. You can pick any other student model, so long as it uses the same tokenizer as the teacher model. Also, the number of layers in the teacher model must be a multiple of the number of layers in the student." ], "metadata": { "collapsed": false From 405ed89deeb17cc1fbcaab51196aa6972a69f0f0 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Mon, 14 Nov 2022 16:36:09 +0100 Subject: [PATCH 26/48] Move new tutorials into folder --- README.md | 9 +- tutorials/01_Basic_QA_Pipeline.ipynb | 460 ------------------ ...your_first_question_answering_system.ipynb | 0 .../02_Finetune_a_model_on_your_data.ipynb | 323 ------------ ...a_scalable_question_answering_system.ipynb | 0 ...ic_QA_Pipeline_without_Elasticsearch.ipynb | 448 ----------------- .../03_finetune_a_reader.ipynb | 0 .../21_distill_a_reader.ipynb | 0 8 files changed, 6 insertions(+), 1234 deletions(-) delete mode 100644 tutorials/01_Basic_QA_Pipeline.ipynb rename {new_tutorials => tutorials}/01_build_your_first_question_answering_system.ipynb (100%) delete mode 100644 tutorials/02_Finetune_a_model_on_your_data.ipynb rename {new_tutorials => tutorials}/02_build_a_scalable_question_answering_system.ipynb (100%) delete mode 100644 tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb rename {new_tutorials => tutorials}/03_finetune_a_reader.ipynb (100%) rename new_tutorials/04_distill_a_reader.ipynb => tutorials/21_distill_a_reader.ipynb (100%) diff --git a/README.md b/README.md index 37e1b2d8..a555a4a5 100644 --- a/README.md +++ b/README.md @@ -16,9 +16,9 @@ To contribute to the tutorials please check out our [Contributing Guidelines](./ ## Tutorials -1. [Basic QA Pipeline](./tutorials/01_Basic_QA_Pipeline.ipynb) -2. [Fine Tune a Model on Your Data](./tutorials/02_Finetune_a_model_on_your_data.ipynb) -3. [Basic QA Pipeline Without Elasticsearch](./tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb) +1. [Build Your First Question Answering System](./tutorials/01_build_your_first_question_answering_system.ipynb) +2. [Build a Scalable Question Answering System](./tutorials/02_build_a_scalable_question_answering_system.ipynb) +3. [Fine-Tune a Reader](./tutorials/03_finetune_a_reader.ipynb) 4. [FAQ Style QA](./tutorials/04_FAQ_style_QA.ipynb) 5. [Evaluation](./tutorials/05_Evaluation.ipynb) 6. [Better Retrieval via Embedding Retrieval](./tutorials/06_Better_Retrieval_via_Embedding_Retrieval.ipynb) @@ -34,3 +34,6 @@ To contribute to the tutorials please check out our [Contributing Guidelines](./ 16. [Document Classifier at Index Time](./tutorials/16_Document_Classifier_at_Index_Time.ipynb) 17. [Audio](./tutorials/17_Audio.ipynb) 18. [Generative Pseudo Labeling](./tutorials/18_GPL.ipynb) +19. x +20. x +21. [Distill a Reader](./tutorials/04_distill_a_reader.ipynb) diff --git a/tutorials/01_Basic_QA_Pipeline.ipynb b/tutorials/01_Basic_QA_Pipeline.ipynb deleted file mode 100644 index 6f0f81ed..00000000 --- a/tutorials/01_Basic_QA_Pipeline.ipynb +++ /dev/null @@ -1,460 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Build Your First Question Answering System\n", - "\n", - "- **Level**: Beginner\n", - "- **Time to complete**: 20 minutes\n", - "- **Prerequisites**: Prepare the Colab environment. See links below.\n", - "- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`\n", - "- **Goal**: After completing this tutorial, you will have built a question answering pipeline that can answer questions about the Game of Thrones series.\n", - "\n", - "This tutorial teaches you how to set up a question answering system that can search through complex knowledge bases, such as an internal wiki or a collection of financial reports. We will work on a set of Wikipedia pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!\n" - ] - }, - { - "cell_type": "markdown", - "source": [ - "\n", - "## Preparing the Colab Environment\n", - "\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level)\n" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Installing Haystack\n", - "\n", - "To start, let's install the latest release of Haystack with `pip`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [], - "source": [ - "%%bash\n", - "\n", - "pip install --upgrade pip\n", - "pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initializing the DocumentStore\n", - "\n", - "A DocumentStore stores the documents that the question answering system uses to find answers to your questions. To learn more, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "1. Download, extract and set the permission for the Elasticsearch image:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [], - "source": [ - "%%bash\n", - "\n", - "wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q\n", - "tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz\n", - "chown -R daemon:daemon elasticsearch-7.9.2" - ] - }, - { - "cell_type": "markdown", - "source": [ - "2. Start the Elasticsearch Server:" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "%%bash --bg\n", - "\n", - "sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "import time\n", - "time.sleep(30)" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this [manually](https://docs.haystack.deepset.ai/docs/document_store#initialisation), or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api) utility function." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "3. Initialize the `ElasticsearchDocumentStore` object in Haystack. Note that this will only successfully run if the Elasticsearch Server is fully started up and ready." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": true - } - }, - "outputs": [], - "source": [ - "import os\n", - "from haystack.document_stores import ElasticsearchDocumentStore\n", - "\n", - "# Get the host where Elasticsearch is running, default to localhost\n", - "host = os.environ.get(\"ELASTICSEARCH_HOST\", \"localhost\")\n", - "\n", - "document_store = ElasticsearchDocumentStore(\n", - " host=host,\n", - " username=\"\",\n", - " password=\"\",\n", - " index=\"document\"\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Preparing Documents\n", - "\n", - "1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "from haystack.utils import fetch_archive_from_http\n", - "\n", - "doc_dir = \"data/tutorial1\"\n", - "\n", - "fetch_archive_from_http(\n", - " url=\"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip\",\n", - " output_dir=doc_dir\n", - ")" - ], - "metadata": { - "collapsed": false, - "pycharm": { - "is_executing": true - } - } - }, - { - "cell_type": "markdown", - "source": [ - "2. Convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) to write them into the DocumentStore. Apply the `clean_wiki_text` cleaning function to the text." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from haystack.utils import clean_wiki_text, convert_files_to_docs\n", - "docs = convert_files_to_docs(\n", - " dir_path=doc_dir,\n", - " clean_func=clean_wiki_text,\n", - " split_paragraphs=True\n", - ")" - ] - }, - { - "cell_type": "markdown", - "source": [ - "3. Write these Documents into the DocumentStore." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "# Now, let's write the dicts containing documents to our DB.\n", - "document_store.write_documents(docs)" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "While the default code in this tutorial uses Game of Thrones data, you can also supply your own. So long as your data adheres to the [input format](https://docs.haystack.deepset.ai/docs/document_store#input-format) or is cast into a [Document object](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document), it can be written into the DocumentStore." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initializing the Retriever\n", - "\n", - "Initialize the `BM25Retriever`. For more Retriever options, see [Retriever](https://docs.haystack.deepset.ai/docs/retriever)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from haystack.nodes import BM25Retriever\n", - "\n", - "retriever = BM25Retriever(document_store=document_store)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initializing the Reader\n", - "\n", - "Initialize the `FARMReader` with the `deepset/robert-base-squad2` model. For more Reader options, see [Reader](https://docs.haystack.deepset.ai/docs/reader)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false - } - }, - "outputs": [], - "source": [ - "from haystack.nodes import FARMReader\n", - "\n", - "reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Creating the Retriever-Reader Pipeline\n", - "\n", - "The `ExtractiveQAPipeline` connects the Reader and Retriever. This makes the system fast because the Reader only processes the Documents that the Retriever has passed on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false - } - }, - "outputs": [], - "source": [ - "from haystack.pipelines import ExtractiveQAPipeline\n", - "\n", - "pipe = ExtractiveQAPipeline(reader, retriever)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Asking a Question\n", - "\n", - "1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values).\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false - } - }, - "outputs": [], - "source": [ - "prediction = pipe.run(\n", - " query=\"Who is the father of Arya Stark?\",\n", - " params={\n", - " \"Retriever\": {\"top_k\": 10},\n", - " \"Reader\": {\"top_k\": 5}\n", - " }\n", - ")" - ] - }, - { - "cell_type": "markdown", - "source": [ - "Here are some questions you could try out:\n", - "- Who is the father of Arya Stark?\n", - "- Who created the Dothraki vocabulary?\n", - "- Who is the sister of Sansa?" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "2. The answers returned by the pipeline can be printed out directly:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from pprint import pprint\n", - "\n", - "pprint(prediction)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "3. Simplify the printed answers:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false - } - }, - "outputs": [], - "source": [ - "from haystack.utils import print_answers\n", - "\n", - "print_answers(\n", - " prediction,\n", - " details=\"minimum\" ## Choose from `minimum`, `medium` and `all`\n", - ")" - ] - }, - { - "cell_type": "markdown", - "source": [ - "And there you have it! Congratulations on building your first machine learning based question answering system!" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "## About us\n", - "\n", - "This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n", - "\n", - "We bring NLP to the industry via open source! \n", - "Our focus: Industry specific language models & large scale QA systems. \n", - " \n", - "Some of our other work: \n", - "- [German BERT](https://deepset.ai/german-bert)\n", - "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", - "- [FARM](https://github.com/deepset-ai/FARM)\n", - "\n", - "Get in touch:\n", - "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", - "\n", - "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.8.9 64-bit", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.9" - }, - "vscode": { - "interpreter": { - "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/new_tutorials/01_build_your_first_question_answering_system.ipynb b/tutorials/01_build_your_first_question_answering_system.ipynb similarity index 100% rename from new_tutorials/01_build_your_first_question_answering_system.ipynb rename to tutorials/01_build_your_first_question_answering_system.ipynb diff --git a/tutorials/02_Finetune_a_model_on_your_data.ipynb b/tutorials/02_Finetune_a_model_on_your_data.ipynb deleted file mode 100644 index e44ba950..00000000 --- a/tutorials/02_Finetune_a_model_on_your_data.ipynb +++ /dev/null @@ -1,323 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Fine-tuning a Model on Your Own Data\n", - "\n", - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_Finetune_a_model_on_your_data.ipynb)\n", - "\n", - "For many use cases it is sufficient to just use one of the existing public models that were trained on SQuAD or other public QA datasets (e.g. Natural Questions).\n", - "However, if you have domain-specific questions, fine-tuning your model on custom examples will very likely boost your performance.\n", - "While this varies by domain, we saw that ~ 2000 examples can easily increase performance by +5-20%.\n", - "\n", - "This tutorial shows you how to fine-tune a pretrained model on your own dataset." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Prepare environment\n", - "\n", - "#### Colab: Enable the GPU runtime\n", - "Make sure you enable the GPU runtime to experience decent speed in this tutorial.\n", - "**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n", - "\n", - "" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# Make sure you have a GPU running\n", - "!nvidia-smi" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# Install the latest release of Haystack in your own environment\n", - "#! pip install farm-haystack\n", - "\n", - "# Install the latest main of Haystack\n", - "!pip install --upgrade pip\n", - "!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "## Logging\n", - "\n", - "We configure how logging messages should be displayed and which log level should be used before importing Haystack.\n", - "Example log message:\n", - "INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt\n", - "Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "import logging\n", - "\n", - "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", - "logging.getLogger(\"haystack\").setLevel(logging.INFO)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "from haystack.nodes import FARMReader\n", - "from haystack.utils import fetch_archive_from_http" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Create Training Data\n", - "\n", - "There are two ways to generate training data\n", - "\n", - "1. **Annotation**: You can use the [annotation tool](https://haystack.deepset.ai/guides/annotation) to label your data, i.e. highlighting answers to your questions in a document. The tool supports structuring your workflow with organizations, projects, and users. The labels can be exported in SQuAD format that is compatible for training with Haystack.\n", - "\n", - "![Snapshot of the annotation tool](https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/annotation_tool.png)\n", - "\n", - "2. **Feedback**: For production systems, you can collect training data from direct user feedback via Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api). This includes a customizable user feedback API for providing feedback on the answer returned by the API. The API provides a feedback export endpoint to obtain the feedback data for fine-tuning your model further.\n", - "\n", - "\n", - "## Fine-tune your model\n", - "\n", - "Once you have collected training data, you can fine-tune your base models.\n", - "We initialize a reader as a base model and fine-tune it on our own custom dataset (should be in SQuAD-like format).\n", - "We recommend using a base model that was trained on SQuAD or a similar QA dataset before to benefit from Transfer Learning effects.\n", - "\n", - "**Recommendation**: Run training on a GPU.\n", - "If you are using Colab: Enable this in the menu \"Runtime\" > \"Change Runtime type\" > Select \"GPU\" in dropdown.\n", - "Then change the `use_gpu` arguments below to `True`" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "04/28/2020 14:39:27 - INFO - farm.utils - device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None\n", - "04/28/2020 14:39:27 - INFO - farm.infer - Could not find `distilbert-base-uncased-distilled-squad` locally. Try to download from model hub ...\n", - "04/28/2020 14:39:29 - WARNING - farm.modeling.language_model - Could not automatically detect from language model name what language it is. \n", - "\t We guess it's an *ENGLISH* model ... \n", - "\t If not: Init the language model by supplying the 'language' param.\n", - "04/28/2020 14:39:31 - WARNING - farm.modeling.prediction_head - Some unused parameters are passed to the QuestionAnsweringHead. Might not be a problem. Params: {\"loss_ignore_index\": -1}\n", - "04/28/2020 14:39:33 - INFO - farm.utils - device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None\n", - "04/28/2020 14:39:33 - INFO - farm.utils - device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None\n", - "Preprocessing Dataset data/squad20/dev-v2.0.json: 100%|██████████| 1204/1204 [00:02<00:00, 402.13 Dicts/s]\n", - "Train epoch 0/1 (Cur. train loss: 0.0000): 0%| | 0/1213 [00:00 Change Runtime type -> Hardware accelerator -> GPU**\n", - "\n", - "\n", - "\n", - "You can double check whether the GPU runtime is enabled with the following command:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - }, - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [], - "source": [ - "%%bash\n", - "\n", - "nvidia-smi" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To start, install the latest release of Haystack with `pip`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [], - "source": [ - "%%bash\n", - "\n", - "pip install --upgrade pip\n", - "pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "## Logging\n", - "\n", - "We configure how logging messages should be displayed and which log level should be used before importing Haystack.\n", - "Example log message:\n", - "INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt\n", - "Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "import logging\n", - "\n", - "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", - "logging.getLogger(\"haystack\").setLevel(logging.INFO)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Document Store\n" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "# In-Memory Document Store\n", - "from haystack.document_stores import InMemoryDocumentStore\n", - "\n", - "document_store = InMemoryDocumentStore()" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# Alternatively, uncomment the following to use the SQLite Document Store:\n", - "\n", - "# from haystack.document_stores import SQLDocumentStore\n", - "# document_store = SQLDocumentStore(url=\"sqlite:///qa.db\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "## Preprocessing of documents\n", - "\n", - "Haystack provides a customizable pipeline for:\n", - " - converting files into texts\n", - " - cleaning texts\n", - " - splitting texts\n", - " - writing them to a Document Store\n", - "\n", - "In this tutorial, we download Wikipedia articles on Game of Thrones, apply a basic cleaning function, and index them in Elasticsearch." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http\n", - "\n", - "\n", - "# Let's first get some documents that we want to query\n", - "# Here: 517 Wikipedia articles for Game of Thrones\n", - "doc_dir = \"data/tutorial3\"\n", - "s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt3.zip\"\n", - "fetch_archive_from_http(url=s3_url, output_dir=doc_dir)\n", - "\n", - "# convert files to dicts containing documents that can be indexed to our datastore\n", - "# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers)\n", - "# It must take a str as input, and return a str.\n", - "docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)\n", - "\n", - "# We now have a list of dictionaries that we can write to our document store.\n", - "# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself.\n", - "# The default format here is: {\"name\": \"\", \"content\": \"\"}\n", - "\n", - "# Let's have a look at the first 3 entries:\n", - "print(docs[:3])\n", - "\n", - "# Now, let's write the docs to our DB.\n", - "document_store.write_documents(docs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Retriever, Reader & Pipeline\n", - "\n", - "### Retriever\n", - "\n", - "Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered. \n", - "\n", - "With InMemoryDocumentStore or SQLDocumentStore, you can use the TfidfRetriever. For more retrievers, please refer to the tutorial-1." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false, - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# An in-memory TfidfRetriever based on Pandas dataframes\n", - "from haystack.nodes import TfidfRetriever\n", - "\n", - "retriever = TfidfRetriever(document_store=document_store)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Reader\n", - "\n", - "A Reader scans the texts returned by retrievers in detail and extracts the k best answers. They are based\n", - "on powerful, but slower deep learning models.\n", - "\n", - "Haystack currently supports Readers based on the frameworks FARM and Transformers.\n", - "With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models).\n", - "\n", - "**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2)\n", - "\n", - "**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package)\n", - "\n", - "**Alternatives (Models):** e.g. \"distilbert-base-uncased-distilled-squad\" (fast) or \"deepset/bert-large-uncased-whole-word-masking-squad2\" (good accuracy)\n", - "\n", - "**Hint:** You can adjust the model to return \"no answer possible\" with the no_ans_boost. Higher values mean the model prefers \"no answer possible\"\n", - "\n", - "#### FARMReader" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false - } - }, - "outputs": [], - "source": [ - "from haystack.nodes import FARMReader\n", - "\n", - "\n", - "# Load a local model or any of the QA models on\n", - "# Hugging Face's model hub (https://huggingface.co/models)\n", - "reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### TransformersReader\n", - "\n", - "Alternatively, we can use a Transformers reader:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# from haystack.nodes import FARMReader, TransformersReader\n", - "# reader = TransformersReader(model_name_or_path=\"distilbert-base-uncased-distilled-squad\", tokenizer=\"distilbert-base-uncased\", use_gpu=-1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pipeline\n", - "\n", - "With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline.\n", - "Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases.\n", - "To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions.\n", - "You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelinesmd)." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": { - "pycharm": { - "is_executing": false - } - }, - "outputs": [], - "source": [ - "from haystack.pipelines import ExtractiveQAPipeline\n", - "\n", - "pipe = ExtractiveQAPipeline(reader, retriever)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Voilà! Ask a question!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false - } - }, - "outputs": [], - "source": [ - "# You can configure how many candidates the reader and retriever shall return\n", - "# The higher top_k for retriever, the better (but also the slower) your answers.\n", - "prediction = pipe.run(\n", - " query=\"Who is the father of Arya Stark?\", params={\"Retriever\": {\"top_k\": 10}, \"Reader\": {\"top_k\": 5}}\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# You can try asking more questions:\n", - "\n", - "# prediction = pipe.run(query=\"Who created the Dothraki vocabulary?\", params={\"Reader\": {\"top_k\": 5}})\n", - "# prediction = pipe.run(query=\"Who is the sister of Sansa?\", params={\"Reader\": {\"top_k\": 5}})" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Now you can either print the object directly...\n", - "from pprint import pprint\n", - "\n", - "pprint(prediction)\n", - "\n", - "# Sample output:\n", - "# {\n", - "# 'answers': [ ,\n", - "# ,\n", - "# ...\n", - "# ]\n", - "# 'documents': [ ,\n", - "# ,\n", - "# ...\n", - "# ],\n", - "# 'no_ans_gap': 11.688868522644043,\n", - "# 'node_id': 'Reader',\n", - "# 'params': {'Reader': {'top_k': 5}, 'Retriever': {'top_k': 5}},\n", - "# 'query': 'Who is the father of Arya Stark?',\n", - "# 'root_node': 'Query'\n", - "# }" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "is_executing": false, - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "# ...or use a util to simplify the output\n", - "from haystack.utils import print_answers\n", - "\n", - "\n", - "# Change `minimum` to `medium` or `all` to control the level of detail\n", - "print_answers(prediction, details=\"minimum\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "## About us\n", - "\n", - "This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany\n", - "\n", - "We bring NLP to the industry via open source! \n", - "Our focus: Industry specific language models & large scale QA systems. \n", - " \n", - "Some of our other work: \n", - "- [German BERT](https://deepset.ai/german-bert)\n", - "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", - "- [FARM](https://github.com/deepset-ai/FARM)\n", - "\n", - "Get in touch:\n", - "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", - "\n", - "By the way: [we're hiring!](https://www.deepset.ai/jobs)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.8.9 64-bit", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.9" - }, - "vscode": { - "interpreter": { - "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/new_tutorials/03_finetune_a_reader.ipynb b/tutorials/03_finetune_a_reader.ipynb similarity index 100% rename from new_tutorials/03_finetune_a_reader.ipynb rename to tutorials/03_finetune_a_reader.ipynb diff --git a/new_tutorials/04_distill_a_reader.ipynb b/tutorials/21_distill_a_reader.ipynb similarity index 100% rename from new_tutorials/04_distill_a_reader.ipynb rename to tutorials/21_distill_a_reader.ipynb From fba580674fc7ead8268a608d4fc042b5f783cee7 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Mon, 14 Nov 2022 16:59:40 +0100 Subject: [PATCH 27/48] Update index --- index.toml | 38 +++++++++++++++++++++++--------------- 1 file changed, 23 insertions(+), 15 deletions(-) diff --git a/index.toml b/index.toml index e827ca9f..f0795799 100644 --- a/index.toml +++ b/index.toml @@ -4,28 +4,28 @@ toc = true colab = "https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/" [[tutorial]] -title = "Build Your First QA System" +title = "Build Your First Question Answering System" description = "Get Started by creating a Retriever Reader pipeline." level = "beginner" weight = 10 -notebook = "01_Basic_QA_Pipeline.ipynb" -aliases = ["first-qa-system"] +notebook = "01_build_your_first_question_answering_system.ipynb" +aliases = ["first-qa-system", "01_Basic_QA_Pipeline.ipynb"] [[tutorial]] -title = "Fine-Tuning a Model on Your Own Data" +title = "Build a Scalable Question Answering System" +description = "Create a scalable Retriever Reader pipeline with an Elasticsearch Document Store." +level = "beginner" +weight = 15 +notebook = "02_build_a_scalable_question_answering_system.ipynb" +aliases = ["without-elasticsearch", "03_Basic_QA_Pipeline_without_Elasticsearch.ipynb", "scalable-qa-system"] + +[[tutorial]] +title = "Fine-Tune a Reader" description = "Improve the performance of your Reader by performing fine-tuning." level = "intermediate" weight = 50 -notebook = "02_Finetune_a_model_on_your_data.ipynb" -aliases = ["fine-tuning-a-model"] - -[[tutorial]] -title = "Build a QA System Without Elasticsearch" -description = "Create a Retriever Reader pipeline that requires no external database dependencies." -level = "beginner" -weight = 15 -notebook = "03_Basic_QA_Pipeline_without_Elasticsearch.ipynb" -aliases = ["without-elasticsearch"] +notebook = "03_finetune_a_reader.ipynb" +aliases = ["fine-tuning-a-model", "02_Finetune_a_model_on_your_data.ipynb", "fine-tune-reader"] [[tutorial]] title = "Utilizing Existing FAQs for Question Answering" @@ -153,4 +153,12 @@ description = "Use a MultiModalRetriever to build a cross-modal search pipeline. level = "intermediate" weight = 95 notebook = "19_Text_to_Image_search_pipeline_with_MultiModal_Retriever.ipynb" -aliases = ["multimodal"] \ No newline at end of file +aliases = ["multimodal"] + +[[tutorial]] +title = "Distill a Reader" +description = "Transfer a Reader's question answering ability to a smaller more efficient model." +level = "intermediate" +weight = 115 +notebook = "21_distill_a_reader.ipynb" +aliases = ["distill-reader"] \ No newline at end of file From 0ed9fd7175b1e5f377e7c11a2a8d3b53fd605e56 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Mon, 14 Nov 2022 17:04:36 +0100 Subject: [PATCH 28/48] Remove prereqs --- .../01_build_your_first_question_answering_system.ipynb | 1 - .../02_build_a_scalable_question_answering_system.ipynb | 1 - tutorials/03_finetune_a_reader.ipynb | 1 - tutorials/21_distill_a_reader.ipynb | 5 +++-- 4 files changed, 3 insertions(+), 5 deletions(-) diff --git a/tutorials/01_build_your_first_question_answering_system.ipynb b/tutorials/01_build_your_first_question_answering_system.ipynb index 98ad69b1..cc5e6f5d 100644 --- a/tutorials/01_build_your_first_question_answering_system.ipynb +++ b/tutorials/01_build_your_first_question_answering_system.ipynb @@ -8,7 +8,6 @@ "\n", "- **Level**: Beginner\n", "- **Time to complete**: 15 minutes\n", - "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `InMemoryDocumentStore`, `BM25Retriever`, `FARMReader`\n", "- **Goal**: After completing this tutorial, you will have learned about the Reader and Retriever, and built a question answering pipeline that can answer questions about the Game of Thrones series.\n" ] diff --git a/tutorials/02_build_a_scalable_question_answering_system.ipynb b/tutorials/02_build_a_scalable_question_answering_system.ipynb index a18fe633..9e4da266 100644 --- a/tutorials/02_build_a_scalable_question_answering_system.ipynb +++ b/tutorials/02_build_a_scalable_question_answering_system.ipynb @@ -8,7 +8,6 @@ "\n", "- **Level**: Beginner\n", "- **Time to complete**: 20 minutes\n", - "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`, `FARMReader`\n", "- **Goal**: After completing this tutorial, you'll have built a scalable search system that runs on text files and can answer questions about Game of Thrones. You'll then be able to expand this system for your needs.\n" ] diff --git a/tutorials/03_finetune_a_reader.ipynb b/tutorials/03_finetune_a_reader.ipynb index f094f3d3..e038a0e2 100644 --- a/tutorials/03_finetune_a_reader.ipynb +++ b/tutorials/03_finetune_a_reader.ipynb @@ -8,7 +8,6 @@ "\n", "- **Level**: Intermediate\n", "- **Time to complete**: 20 minutes\n", - "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `FARMReader`\n", "- **Goal**: Learn how to improve the performance of a Reader model by performing fine-tuning." ] diff --git a/tutorials/21_distill_a_reader.ipynb b/tutorials/21_distill_a_reader.ipynb index 15c0614d..48e51515 100644 --- a/tutorials/21_distill_a_reader.ipynb +++ b/tutorials/21_distill_a_reader.ipynb @@ -8,7 +8,6 @@ "\n", "- **Level**: Advanced\n", "- **Time to complete**: 30 minutes\n", - "- **Prerequisites**: Prepare the Colab environment (see links below).\n", "- **Nodes Used**: `FARMReader`\n", "- **Goal**: Distil the question answering capabilities of the larger BERT base Reader model into a smaller TinyBERT Reader model.\n" ] @@ -32,9 +31,11 @@ "source": [ "## Preparing the Colab Environment\n", "\n", + "
\n", "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n" + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n", + "
\n" ] }, { From 4c711fa6e5225cd22de5a03e8c975ae57eb34a48 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Mon, 14 Nov 2022 17:07:36 +0100 Subject: [PATCH 29/48] Regenerate markdowns --- markdowns/01_Basic_QA_Pipeline.md | 222 -------------- ...ld_your_first_question_answering_system.md | 198 +++++++++++++ markdowns/02_Finetune_a_model_on_your_data.md | 175 ----------- ...ld_a_scalable_question_answering_system.md | 280 ++++++++++++++++++ ...Basic_QA_Pipeline_without_Elasticsearch.md | 258 ---------------- markdowns/03_finetune_a_reader.md | 126 ++++++++ markdowns/21_distill_a_reader.md | 165 +++++++++++ 7 files changed, 769 insertions(+), 655 deletions(-) delete mode 100644 markdowns/01_Basic_QA_Pipeline.md create mode 100644 markdowns/01_build_your_first_question_answering_system.md delete mode 100644 markdowns/02_Finetune_a_model_on_your_data.md create mode 100644 markdowns/02_build_a_scalable_question_answering_system.md delete mode 100644 markdowns/03_Basic_QA_Pipeline_without_Elasticsearch.md create mode 100644 markdowns/03_finetune_a_reader.md create mode 100644 markdowns/21_distill_a_reader.md diff --git a/markdowns/01_Basic_QA_Pipeline.md b/markdowns/01_Basic_QA_Pipeline.md deleted file mode 100644 index 90c86d0d..00000000 --- a/markdowns/01_Basic_QA_Pipeline.md +++ /dev/null @@ -1,222 +0,0 @@ - - -# Build Your First Question Answering System - -- **Level**: Beginner -- **Time to complete**: 20 minutes -- **Prerequisites**: Prepare the Colab environment. See links below. -- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever` -- **Goal**: After completing this tutorial, you will have built a question answering pipeline that can answer questions about the Game of Thrones series. - -This tutorial teaches you how to set up a question answering system that can search through complex knowledge bases, such as an internal wiki or a collection of financial reports. We will work on a set of Wikipedia pages about Game of Thrones. Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! - - - -## Preparing the Colab Environment - -- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/v5.2-unstable/docs/enable-gpu-runtime-in-colab) -- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/v5.2-unstable/docs/check-if-gpu-is-enabled) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/v5.2-unstable/docs/set-the-logging-level) - - -## Installing Haystack - -To start, let's install the latest release of Haystack with `pip`: - - -```bash -%%bash - -pip install --upgrade pip -pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] -``` - -## Initializing the DocumentStore - -A DocumentStore stores the documents that the question answering system uses to find answers to your questions. To learn more, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). - -1. Download, extract, and set the permission for the Elasticsearch image: - - -```bash -%%bash - -wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q -tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz -chown -R daemon:daemon elasticsearch-7.9.2 -``` - -2. Start the Elasticsearch Server: - - -```bash -%%bash --bg - -sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch -``` - - -```python -import time -time.sleep(30) -``` - -If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this [manually](https://docs.haystack.deepset.ai/docs/document_store#initialisation), or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api) utility function. - -3. Initialize the `ElasticsearchDocumentStore` object in Haystack. Note that this will only successfully run if the Elasticsearch Server is fully started up and ready. - - -```python -import os -from haystack.document_stores import ElasticsearchDocumentStore - -# Get the host where Elasticsearch is running, default to localhost -host = os.environ.get("ELASTICSEARCH_HOST", "localhost") - -document_store = ElasticsearchDocumentStore( - host=host, - username="", - password="", - index="document" -) -``` - -## Preparing Documents - -1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files. - - -```python -from haystack.utils import fetch_archive_from_http - -doc_dir = "data/tutorial1" - -fetch_archive_from_http( - url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", - output_dir=doc_dir -) -``` - -2. Convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) to write them into the DocumentStore. Apply the `clean_wiki_text` cleaning function to the text. - - -```python -from haystack.utils import clean_wiki_text, convert_files_to_docs -docs = convert_files_to_docs( - dir_path=doc_dir, - clean_func=clean_wiki_text, - split_paragraphs=True -) -``` - -3. Write these Documents into the DocumentStore. - - -```python -# Now, let's write the dicts containing documents to our DB. -document_store.write_documents(docs) -``` - -While the default code in this tutorial uses Game of Thrones data, you can also supply your own. So long as your data adheres to the [input format](https://docs.haystack.deepset.ai/docs/document_store#input-format) or is cast into a [Document object](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document), it can be written into the DocumentStore. - -## Initializing the Retriever - -Initialize the `BM25Retriever`. For more Retriever options, see [Retriever](https://docs.haystack.deepset.ai/docs/retriever) - - -```python -from haystack.nodes import BM25Retriever - -retriever = BM25Retriever(document_store=document_store) -``` - -## Initializing the Reader - -Initialize the `FARMReader` with the `deepset/robert-base-squad2` model. For more Reader options, see [Reader](https://docs.haystack.deepset.ai/docs/reader). - - -```python -from haystack.nodes import FARMReader - -reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) -``` - -## Creating the Retriever-Reader Pipeline - -The `ExtractiveQAPipeline` connects the Reader and Retriever. This makes the system fast because the Reader only processes the Documents that the Retriever has passed on. - - -```python -from haystack.pipelines import ExtractiveQAPipeline - -pipe = ExtractiveQAPipeline(reader, retriever) -``` - -## Asking a Question - -1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values). - - - -```python -prediction = pipe.run( - query="Who is the father of Arya Stark?", - params={ - "Retriever": {"top_k": 10}, - "Reader": {"top_k": 5} - } -) -``` - -Here are some questions you could try out: -- Who is the father of Arya Stark? -- Who created the Dothraki vocabulary? -- Who is the sister of Sansa? - -2. The answers returned by the pipeline can be printed out directly: - - -```python -from pprint import pprint - -pprint(prediction) -``` - -3. Simplify the printed answers: - - -```python -from haystack.utils import print_answers - -print_answers( - prediction, - details="minimum" ## Choose from `minimum`, `medium` and `all` -) -``` - -And there you have it! Congratulations on building your first machine learning based question answering system! - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) -- [FARM](https://github.com/deepset-ai/FARM) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) - diff --git a/markdowns/01_build_your_first_question_answering_system.md b/markdowns/01_build_your_first_question_answering_system.md new file mode 100644 index 00000000..eaf0e0ff --- /dev/null +++ b/markdowns/01_build_your_first_question_answering_system.md @@ -0,0 +1,198 @@ +--- +layout: tutorial +colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_build_your_first_question_answering_system.ipynb +toc: True +title: "Build Your First Question Answering System" +last_updated: 2022-11-14 +level: "beginner" +weight: 10 +description: Get Started by creating a Retriever Reader pipeline. +category: "QA" +aliases: ['/tutorials/first-qa-system', '/tutorials/01_Basic_QA_Pipeline.ipynb'] +--- + + +# Build Your First Question Answering System + +- **Level**: Beginner +- **Time to complete**: 15 minutes +- **Nodes Used**: `InMemoryDocumentStore`, `BM25Retriever`, `FARMReader` +- **Goal**: After completing this tutorial, you will have learned about the Reader and Retriever, and built a question answering pipeline that can answer questions about the Game of Thrones series. + + +## Overview + +Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as "Who is the father of Arya Stark?" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example. + +This tutorial will introduce you to all the concepts needed to build such a question answering system. However, certain setup steps, such as Document preparation and indexing as well as pipeline initialization, are simplified so that you can get started quicker. + +Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! + + +## Preparing the Colab Environment + +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + + +## Installing Haystack + +To start, let's install the latest release of Haystack with `pip`: + + +```bash +%%bash + +pip install --upgrade pip +pip install farm-haystack[colab] +``` + +## Initializing the DocumentStore + +A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` which is the simplest DocumentStore to get started with. It requires no external dependencies and is a good option for smaller projects and debugging. However, it does not scale up so well to larger Document collections. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). + + +```python +from haystack.document_stores import InMemoryDocumentStore + +document_store = InMemoryDocumentStore() +``` + +## Preparing Documents + +1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files. + + +```python +from haystack.utils import fetch_archive_from_http + +doc_dir = "data/build_your_first_question_answering_system" + +fetch_archive_from_http( + url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", + output_dir=doc_dir +) +``` + +2. Use the `TextIndexingPipeline` to convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore. + + +```python +import os +from haystack.pipelines.standard_pipelines import TextIndexingPipeline +# from text_indexing_pipeline import TextIndexingPipeline + +files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)] +indexing_pipeline = TextIndexingPipeline(document_store) +indexing_pipeline.run_batch(file_paths=files_to_index) + + +``` + +While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way. + +As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using `DocumentStore.write_documents()`. + +## Initializing the Retriever + +Retrievers sift through all the Documents and return only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). + + +```python +from haystack.nodes import TfidfRetriever + +retriever = TfidfRetriever(document_store=document_store) +``` + +## Initializing the Reader + +A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). + + +```python +from haystack.nodes import FARMReader + +reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) +``` + +## Creating the Retriever-Reader Pipeline + +The `ExtractiveQAPipeline` connects the Reader and Retriever. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on. + + +```python +from haystack.pipelines import ExtractiveQAPipeline + +pipe = ExtractiveQAPipeline(reader, retriever) +``` + +## Asking a Question + +1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values). + + + +```python +prediction = pipe.run( + query="Who is the father of Arya Stark?", + params={ + "Retriever": {"top_k": 10}, + "Reader": {"top_k": 5} + } +) +``` + +Here are some questions you could try out: +- Who is the father of Arya Stark? +- Who created the Dothraki vocabulary? +- Who is the sister of Sansa? + +2. The answers returned by the pipeline can be printed out directly: + + +```python +from pprint import pprint + +pprint(prediction) +``` + +3. Simplify the printed answers: + + +```python +from haystack.utils import print_answers + +print_answers( + prediction, + details="minimum" ## Choose from `minimum`, `medium` and `all` +) +``` + +And there you have it! Congratulations on building your first machine learning based question answering system! + +# Next Steps + +Check out [Build a Scalable Question Answering System](https://haystack.deepset.ai/tutorials/02_build_a_scalable_question_answering_system) to learn how to make a more advanced question answering system that uses an Elasticsearch backed DocumentStore and makes more use of the flexibility that pipelines offer. + +## About us + +This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany + +We bring NLP to the industry via open source! +Our focus: Industry specific language models & large scale QA systems. + +Some of our other work: +- [German BERT](https://deepset.ai/german-bert) +- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) + +Get in touch: +[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) + +By the way: [we're hiring!](https://www.deepset.ai/jobs) + + + +```python + +``` diff --git a/markdowns/02_Finetune_a_model_on_your_data.md b/markdowns/02_Finetune_a_model_on_your_data.md deleted file mode 100644 index a9e06ec4..00000000 --- a/markdowns/02_Finetune_a_model_on_your_data.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -layout: tutorial -colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_Finetune_a_model_on_your_data.ipynb -toc: True -title: "Fine-Tuning a Model on Your Own Data" -last_updated: 2022-10-12 -level: "intermediate" -weight: 50 -description: Improve the performance of your Reader by performing fine-tuning. -category: "QA" -aliases: ['/tutorials/fine-tuning-a-model'] ---- - - -# Fine-tuning a Model on Your Own Data - -For many use cases it is sufficient to just use one of the existing public models that were trained on SQuAD or other public QA datasets (e.g. Natural Questions). -However, if you have domain-specific questions, fine-tuning your model on custom examples will very likely boost your performance. -While this varies by domain, we saw that ~ 2000 examples can easily increase performance by +5-20%. - -This tutorial shows you how to fine-tune a pretrained model on your own dataset. - -### Prepare environment - -#### Colab: Enable the GPU runtime -Make sure you enable the GPU runtime to experience decent speed in this tutorial. -**Runtime -> Change Runtime type -> Hardware accelerator -> GPU** - - - - -```python -# Make sure you have a GPU running -!nvidia-smi -``` - - -```python -# Install the latest release of Haystack in your own environment -#! pip install farm-haystack - -# Install the latest main of Haystack -!pip install --upgrade pip -!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] -``` - -## Logging - -We configure how logging messages should be displayed and which log level should be used before importing Haystack. -Example log message: -INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt -Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily: - - -```python -import logging - -logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) -logging.getLogger("haystack").setLevel(logging.INFO) -``` - - -```python -from haystack.nodes import FARMReader -from haystack.utils import fetch_archive_from_http -``` - - -## Create Training Data - -There are two ways to generate training data - -1. **Annotation**: You can use the [annotation tool](https://haystack.deepset.ai/guides/annotation) to label your data, i.e. highlighting answers to your questions in a document. The tool supports structuring your workflow with organizations, projects, and users. The labels can be exported in SQuAD format that is compatible for training with Haystack. - -![Snapshot of the annotation tool](https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/annotation_tool.png) - -2. **Feedback**: For production systems, you can collect training data from direct user feedback via Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api). This includes a customizable user feedback API for providing feedback on the answer returned by the API. The API provides a feedback export endpoint to obtain the feedback data for fine-tuning your model further. - - -## Fine-tune your model - -Once you have collected training data, you can fine-tune your base models. -We initialize a reader as a base model and fine-tune it on our own custom dataset (should be in SQuAD-like format). -We recommend using a base model that was trained on SQuAD or a similar QA dataset before to benefit from Transfer Learning effects. - -**Recommendation**: Run training on a GPU. -If you are using Colab: Enable this in the menu "Runtime" > "Change Runtime type" > Select "GPU" in dropdown. -Then change the `use_gpu` arguments below to `True` - - -```python -reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True) -data_dir = "data/squad20" -# data_dir = "PATH/TO_YOUR/TRAIN_DATA" -reader.train(data_dir=data_dir, train_filename="dev-v2.0.json", use_gpu=True, n_epochs=1, save_dir="my_model") -``` - - -```python -# Saving the model happens automatically at the end of training into the `save_dir` you specified -# However, you could also save a reader manually again via: -reader.save(directory="my_model") -``` - - -```python -# If you want to load it at a later point, just do: -new_reader = FARMReader(model_name_or_path="my_model") -``` - -## Distill your model -In this case, we have used "distilbert-base-uncased" as our base model. This model was trained using a process called distillation. In this process, a bigger model is trained first and is used to train a smaller model which increases its accuracy. This is why "distilbert-base-uncased" can achieve quite competitive performance while being very small. - -Sometimes, however, you can't use an already distilled model and have to distil it yourself. For this case, haystack has implemented [distillation features](https://haystack.deepset.ai/guides/model-distillation). - -### Augmenting your training data -To get the most out of model distillation, we recommend increasing the size of your training data by using data augmentation. You can do this by running the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py): - - -```python -# Downloading script -!wget https://raw.githubusercontent.com/deepset-ai/haystack/main/haystack/utils/augment_squad.py - -doc_dir = "data/tutorial2" - -# Downloading smaller glove vector file (only for demonstration purposes) -glove_url = "https://nlp.stanford.edu/data/glove.6B.zip" -fetch_archive_from_http(url=glove_url, output_dir=doc_dir) - -# Downloading very small dataset to make tutorial faster (please use a bigger dataset for real use cases) -s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/squad_small.json.zip" -fetch_archive_from_http(url=s3_url, output_dir=doc_dir) - -# Just replace the path with your dataset and adjust the output (also please remove glove path to use bigger glove vector file) -!python augment_squad.py --squad_path squad_small.json --output_path augmented_dataset.json --multiplication_factor 2 --glove_path glove.6B.300d.txt -``` - -In this case, we use a multiplication factor of 2 to keep this example lightweight. Usually you would use a factor like 20 depending on the size of your training data. Augmenting this small dataset with a multiplication factor of 2, should take about 5 to 10 minutes to run on one V100 GPU. - -### Running distillation -Distillation in haystack is done in two steps: First, you run intermediate layer distillation on the augmented dataset to ensure the two models behave similarly. After that, you run the prediction layer distillation on the non-augmented dataset to optimize the model for your specific task. - -If you want, you can leave out the intermediate layer distillation step and only run the prediction layer distillation. This way you also do not need to perform data augmentation. However, this will make the model significantly less accurate. - - -```python -# Loading a fine-tuned model as teacher e.g. "deepset/​bert-​base-​uncased-​squad2" -teacher = FARMReader(model_name_or_path="my_model", use_gpu=True) - -# You can use any pre-trained language model as teacher that uses the same tokenizer as the teacher model. -# The number of the layers in the teacher model also needs to be a multiple of the number of the layers in the student. -student = FARMReader(model_name_or_path="huawei-noah/TinyBERT_General_6L_768D", use_gpu=True) - -student.distil_intermediate_layers_from(teacher, data_dir=".", train_filename="augmented_dataset.json", use_gpu=True) -student.distil_prediction_layer_from(teacher, data_dir="data/squad20", train_filename="dev-v2.0.json", use_gpu=True) - -student.save(directory="my_distilled_model") -``` - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) -- [FARM](https://github.com/deepset-ai/FARM) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) diff --git a/markdowns/02_build_a_scalable_question_answering_system.md b/markdowns/02_build_a_scalable_question_answering_system.md new file mode 100644 index 00000000..7c08a04a --- /dev/null +++ b/markdowns/02_build_a_scalable_question_answering_system.md @@ -0,0 +1,280 @@ +--- +layout: tutorial +colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_build_a_scalable_question_answering_system.ipynb +toc: True +title: "Build a Scalable Question Answering System" +last_updated: 2022-11-14 +level: "beginner" +weight: 15 +description: Create a scalable Retriever Reader pipeline with an Elasticsearch Document Store. +category: "QA" +aliases: ['/tutorials/without-elasticsearch', '/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb', '/tutorials/scalable-qa-system'] +--- + + +# Build a Scalable Question Answering System + +- **Level**: Beginner +- **Time to complete**: 20 minutes +- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`, `FARMReader` +- **Goal**: After completing this tutorial, you'll have built a scalable search system that runs on text files and can answer questions about Game of Thrones. You'll then be able to expand this system for your needs. + + +## Overview + +Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as "Who is the father of Arya Stark?" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example. + +This tutorial introduces you to all the concepts needed to build such a question answering system. It also uses Haystack components, such as indexing pipelines, querying pipelines, and DocumentStores backed by external database services. + +Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! + + +## Preparing the Colab Environment + +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + + +## Installing Haystack + +To start, let's install the latest release of Haystack with `pip`: + + +```bash +%%bash + +pip install --upgrade pip +pip install farm-haystack[colab] +``` + +## Initializing the ElasticsearchDocumentStore + +A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here, we're using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs independently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). + +1. Download, extract, and set the permissions for the Elasticsearch installation image. + + +```bash +%%bash + +wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q +tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz +chown -R daemon:daemon elasticsearch-7.9.2 +``` + +2. Start the server. + + +```bash +%%bash --bg + +sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch +``` + +If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this manually, or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api#module-doc_store) utility function. + +3. Wait 30s to ensure that the server has fully started up. + + +```python +import time +time.sleep(30) +``` + +4. Initialize the [`ElasticsearchDocumentStore`](https://docs.haystack.deepset.ai/reference/document-store-api#module-elasticsearch). + + + +```python +import os +from haystack.document_stores import ElasticsearchDocumentStore + +# Get the host where Elasticsearch is running, default to localhost +host = os.environ.get("ELASTICSEARCH_HOST", "localhost") + +document_store = ElasticsearchDocumentStore( + host=host, + username="", + password="", + index="document" +) +``` + +## Indexing Documents with a Pipeline + +The indexing pipeline turns your files into Document objects and writes them to the DocumentStore. Our indexing pipeline will have two nodes: `TextConverter` which turns `.txt` files into Haystack `Document` objects and `PreProcessor` which cleans and splits the text within a `Document`. + +Once these nodes are combined into a pipeline, the pipeline will ingest `.txt` file paths, preprocess them, and write them into the DocumentStore. + + +1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/build_a_scalable_question_answering_system` as a set of `.txt` files. + + +```python +from haystack.utils import fetch_archive_from_http + +doc_dir = "data/build_a_scalable_question_answering_system" + +fetch_archive_from_http( + url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", + output_dir=doc_dir +) +``` + +2. Initialize the pipeline, TextConverter, and PreProcessor. + + +```python +from haystack import Pipeline +from haystack.nodes import TextConverter, PreProcessor + +indexing_pipeline = Pipeline() +text_converter = TextConverter() +preprocessor = PreProcessor( + clean_whitespace=True, + clean_header_footer=True, + clean_empty_lines=True, + split_by="word", + split_length=200, + split_overlap=20, + split_respect_sentence_boundary=True, +) + +``` + +To learn more about the parameters of the `PreProcessor`, see [Usage](https://docs.haystack.deepset.ai/docs/preprocessor#usage). To understand why document splitting is important for your question answering system's performance, see [Document Length](https://docs.haystack.deepset.ai/docs/optimization#document-length). + +2. Add the nodes into an indexing pipeline. You should provide the `name` or `name`s of preceding nodes as the `input` argument. Note that in an indexing pipeline, the input to the first node is "File". + + +```python +import os + +indexing_pipeline.add_node(component=text_converter, name="TextConverter", inputs=["File"]) +indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["TextConverter"]) +indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"]) + +``` + +3. Run the indexing pipeline to write the text data into the DocumentStore. + + +```python +files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)] +indexing_pipeline.run_batch(file_paths=files_to_index) +``` + +While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way. + +As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using [`DocumentStore.write_documents()`](https://docs.haystack.deepset.ai/reference/document-store-api#basedocumentstorewrite_documents). + +## Initializing the Retriever + +Retrievers sift through all the Documents and return only those that are relevant to the question. Here we are using the BM25Retriever. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). + + +```python +from haystack.nodes import BM25Retriever + +retriever = BM25Retriever(document_store=document_store) +``` + +## Initializing the Reader + +A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base-sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). + + +```python +from haystack.nodes import FARMReader + +reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) +``` + +## Creating the Retriever-Reader Pipeline + +You can combine the Reader and Retriever in a querying pipeline using the `Pipeline` class. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on. + +1. Initialize the `Pipeline` object and add the Retriever and Reader as nodes. You should provide the `name` or `name`s of preceding nodes as the input argument. Note that in a querying pipeline, the input to the first node is "Query". + + +```python +from haystack import Pipeline + +querying_pipeline = Pipeline() +querying_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) +querying_pipeline.add_node(component=reader, name="Reader", inputs=["Retriever"]) + +``` + +## Asking a Question + +1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values). + + + +```python +prediction = querying_pipeline.run( + query="Who is the father of Arya Stark?", + params={ + "Retriever": {"top_k": 10}, + "Reader": {"top_k": 5} + } +) +``` + + + +Here are some questions you could try out: +- Who is the father of Arya Stark? +- Who created the Dothraki vocabulary? +- Who is the sister of Sansa? + +2. You can directly print out the answers returned by the pipeline: + + +```python +from pprint import pprint + +pprint(prediction) +``` + +3. Simplify the printed answers: + + +```python +from haystack.utils import print_answers + +print_answers( + prediction, + details="minimum" ## Choose from `minimum`, `medium` and `all` +) +``` + +And there you have it! Congratulations on building a scalable machine learning based question answering system! + +# Next Steps + +To learn how to improve the performance of the Reader, see [Fine-Tune a Reader](https://haystack.deepset.ai/tutorials/03_fine_tune_a_reader). + +## About us + +This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany + +We bring NLP to the industry via open source! +Our focus: Industry specific language models & large scale QA systems. + +Some of our other work: +- [German BERT](https://deepset.ai/german-bert) +- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) + +Get in touch: +[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) + +By the way: [we're hiring!](https://www.deepset.ai/jobs) + + + +```python + +``` diff --git a/markdowns/03_Basic_QA_Pipeline_without_Elasticsearch.md b/markdowns/03_Basic_QA_Pipeline_without_Elasticsearch.md deleted file mode 100644 index cb4f2922..00000000 --- a/markdowns/03_Basic_QA_Pipeline_without_Elasticsearch.md +++ /dev/null @@ -1,258 +0,0 @@ ---- -layout: tutorial -colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb -toc: True -title: "Build a QA System Without Elasticsearch" -last_updated: 2022-10-26 -level: "beginner" -weight: 15 -description: Create a Retriever Reader pipeline that requires no external database dependencies. -category: "QA" -aliases: ['/tutorials/without-elasticsearch'] ---- - - -# Build a QA System Without Elasticsearch - -Haystack provides alternatives to Elasticsearch for developing quick prototypes. - -You can use an `InMemoryDocumentStore` or a `SQLDocumentStore`(with SQLite) as the document store. - -If you are interested in more feature-rich Elasticsearch, then please refer to the Tutorial 1. - -### Prepare environment - -#### Colab: Enable the GPU runtime -Make sure you enable the GPU runtime to experience decent speed in this tutorial. -**Runtime -> Change Runtime type -> Hardware accelerator -> GPU** - - - -You can double check whether the GPU runtime is enabled with the following command: - - -```bash -%%bash - -nvidia-smi -``` - -To start, install the latest release of Haystack with `pip`: - - -```bash -%%bash - -pip install --upgrade pip -pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] -``` - -## Logging - -We configure how logging messages should be displayed and which log level should be used before importing Haystack. -Example log message: -INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt -Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily: - - -```python -import logging - -logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) -logging.getLogger("haystack").setLevel(logging.INFO) -``` - -## Document Store - - - -```python -# In-Memory Document Store -from haystack.document_stores import InMemoryDocumentStore - -document_store = InMemoryDocumentStore() -``` - - -```python -# Alternatively, uncomment the following to use the SQLite Document Store: - -# from haystack.document_stores import SQLDocumentStore -# document_store = SQLDocumentStore(url="sqlite:///qa.db") -``` - -## Preprocessing of documents - -Haystack provides a customizable pipeline for: - - converting files into texts - - cleaning texts - - splitting texts - - writing them to a Document Store - -In this tutorial, we download Wikipedia articles on Game of Thrones, apply a basic cleaning function, and index them in Elasticsearch. - - -```python -from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http - - -# Let's first get some documents that we want to query -# Here: 517 Wikipedia articles for Game of Thrones -doc_dir = "data/tutorial3" -s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt3.zip" -fetch_archive_from_http(url=s3_url, output_dir=doc_dir) - -# convert files to dicts containing documents that can be indexed to our datastore -# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers) -# It must take a str as input, and return a str. -docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True) - -# We now have a list of dictionaries that we can write to our document store. -# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself. -# The default format here is: {"name": "", "content": ""} - -# Let's have a look at the first 3 entries: -print(docs[:3]) - -# Now, let's write the docs to our DB. -document_store.write_documents(docs) -``` - -## Initialize Retriever, Reader & Pipeline - -### Retriever - -Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered. - -With InMemoryDocumentStore or SQLDocumentStore, you can use the TfidfRetriever. For more retrievers, please refer to the tutorial-1. - - -```python -# An in-memory TfidfRetriever based on Pandas dataframes -from haystack.nodes import TfidfRetriever - -retriever = TfidfRetriever(document_store=document_store) -``` - -### Reader - -A Reader scans the texts returned by retrievers in detail and extracts the k best answers. They are based -on powerful, but slower deep learning models. - -Haystack currently supports Readers based on the frameworks FARM and Transformers. -With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models). - -**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2) - -**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package) - -**Alternatives (Models):** e.g. "distilbert-base-uncased-distilled-squad" (fast) or "deepset/bert-large-uncased-whole-word-masking-squad2" (good accuracy) - -**Hint:** You can adjust the model to return "no answer possible" with the no_ans_boost. Higher values mean the model prefers "no answer possible" - -#### FARMReader - - -```python -from haystack.nodes import FARMReader - - -# Load a local model or any of the QA models on -# Hugging Face's model hub (https://huggingface.co/models) -reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) -``` - -#### TransformersReader - -Alternatively, we can use a Transformers reader: - - -```python -# from haystack.nodes import FARMReader, TransformersReader -# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1) -``` - -### Pipeline - -With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline. -Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases. -To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions. -You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelines). - - -```python -from haystack.pipelines import ExtractiveQAPipeline - -pipe = ExtractiveQAPipeline(reader, retriever) -``` - -## Voilà! Ask a question! - - -```python -# You can configure how many candidates the reader and retriever shall return -# The higher top_k for retriever, the better (but also the slower) your answers. -prediction = pipe.run( - query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}} -) -``` - - -```python -# You can try asking more questions: - -# prediction = pipe.run(query="Who created the Dothraki vocabulary?", params={"Reader": {"top_k": 5}}) -# prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}}) -``` - - -```python -# Now you can either print the object directly... -from pprint import pprint - -pprint(prediction) - -# Sample output: -# { -# 'answers': [ , -# , -# ... -# ] -# 'documents': [ , -# , -# ... -# ], -# 'no_ans_gap': 11.688868522644043, -# 'node_id': 'Reader', -# 'params': {'Reader': {'top_k': 5}, 'Retriever': {'top_k': 5}}, -# 'query': 'Who is the father of Arya Stark?', -# 'root_node': 'Query' -# } -``` - - -```python -# ...or use a util to simplify the output -from haystack.utils import print_answers - - -# Change `minimum` to `medium` or `all` to control the level of detail -print_answers(prediction, details="minimum") -``` - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) -- [FARM](https://github.com/deepset-ai/FARM) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) diff --git a/markdowns/03_finetune_a_reader.md b/markdowns/03_finetune_a_reader.md new file mode 100644 index 00000000..e381a8fa --- /dev/null +++ b/markdowns/03_finetune_a_reader.md @@ -0,0 +1,126 @@ +--- +layout: tutorial +colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_finetune_a_reader.ipynb +toc: True +title: "Fine-Tune a Reader" +last_updated: 2022-11-14 +level: "intermediate" +weight: 50 +description: Improve the performance of your Reader by performing fine-tuning. +category: "QA" +aliases: ['/tutorials/fine-tuning-a-model', '/tutorials/02_Finetune_a_model_on_your_data.ipynb', '/tutorials/fine-tune-reader'] +--- + + +# Fine-Tune a Reader + +- **Level**: Intermediate +- **Time to complete**: 20 minutes +- **Nodes Used**: `FARMReader` +- **Goal**: Learn how to improve the performance of a Reader model by performing fine-tuning. + +## Overview + +Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset. + +## Preparing the Colab Environment + +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + + +## Installing Haystack + +To start, let's install the latest release of Haystack with `pip`: + + +```bash +%%bash + +pip install --upgrade pip +pip install farm-haystack[colab] +``` + + +## Creating Training Data + +To start fine-tuning your Reader model, you need question answering data in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format. One sample from this data should contain a question, a text answer, and the document in which this answer can be found. + +You can start generating your own training data using one of the two tools that we offer: + +1. **Annotation Tool**: You can use the deepset [Annotation Tool](https://haystack.deepset.ai/guides/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. You can then export the question-answer pairs in the SQuAD format that is compatible with fine-tuning in Haystack. + +2. **Feedback Mechanism**: In a production system, you can collect users' feedback to model predictions with Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api) and use this as training data. To learn how to interact with the user feedback endpoints, see [User Feedback](https://docs.haystack.deepset.ai/docs/domain_adaptation#user-feedback). + + + +## Fine-tuning the Reader + +1. Initialize the Reader, supplying the name of the base model you wish to improve. + + +```python +from haystack.nodes import FARMReader + +reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True) +``` + +We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this tutorial, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base-sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). + +2. Provide the SQuAD format training data to the `Reader.train()` method. + + +```python +data_dir = "data/squad20" +reader.train( + data_dir=data_dir, + train_filename="dev-v2.0.json", + use_gpu=True, + n_epochs=1, + save_dir="my_model" +) +``` + +With the default parameters above, we are starting with a base model trained on the SQuAD training dataset and we are further fine-tuning it on the SQuAD development dataset. To fine-tune the model for your domain, replace `train_filename` with your domain-specific dataset. + +To perform evaluation over the course of fine-tuning, see [FARMReader.train() API](https://docs.haystack.deepset.ai/reference/reader-api#farmreadertrain) for the relevant arguments. + +## Saving and Loading + +The model is automatically saved at the end of fine-tuning in the `save_dir` that you specified. +However, you can also manually save the Reader again by running: + + +```python +reader.save(directory="my_model") +``` + +To load a saved model, run: + + +```python +new_reader = FARMReader(model_name_or_path="my_model") +``` + +# Next Steps + +Now that you have a model with improved performance, why not transfer its question answering capabilities into a smaller, faster model? Starting with this new model, you can use model distillation to create a more efficient model with only a slight tradeoff in performance. To learn more, see [Distil a Reader](https://haystack.deepset.ai/tutorials/04_distil_a_reader). + +To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader). + +## About us + +This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany + +We bring NLP to the industry via open source! +Our focus: Industry specific language models & large scale QA systems. + +Some of our other work: +- [German BERT](https://deepset.ai/german-bert) +- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) + +Get in touch: +[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) + +By the way: [we're hiring!](https://www.deepset.ai/jobs) diff --git a/markdowns/21_distill_a_reader.md b/markdowns/21_distill_a_reader.md new file mode 100644 index 00000000..3ee211ce --- /dev/null +++ b/markdowns/21_distill_a_reader.md @@ -0,0 +1,165 @@ +--- +layout: tutorial +colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/21_distill_a_reader.ipynb +toc: True +title: "Distill a Reader" +last_updated: 2022-11-14 +level: "intermediate" +weight: 115 +description: Transfer a Reader's question answering ability to a smaller more efficient model. +category: "QA" +aliases: ['/tutorials/distill-reader'] +--- + + +# Distill a Reader + +- **Level**: Advanced +- **Time to complete**: 30 minutes +- **Nodes Used**: `FARMReader` +- **Goal**: Distil the question answering capabilities of the larger BERT base Reader model into a smaller TinyBERT Reader model. + + +## Overview + +Model distillation is the process of teaching a smaller model to imitate the performance of a larger, better trained model. By distilling one model into another, you end up with a more computationally efficient version of the original with only a slight trade-off in accuracy. In this tutorial, you will learn how to perform one form of model distillation on Reader models in Haystack. Model distillation is a complex topic and an active area of research so if want to learn more about it, see [Model Distillation](https://docs.haystack.deepset.ai/docs/model_distillation). + +## Preparing the Colab Environment + +
+- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) +
+ + +## Installing Haystack + +To start, let's install the latest release of Haystack with `pip`: + + +```bash +%%bash + +pip install --upgrade pip +pip install farm-haystack[colab] +``` + +## Augmenting Training Data + +Having more human annotated training data is useful at all levels of model training. However, intermediate layer distillation can benefit even from synthetically generated data, since it is a less exact type of training. In this tutorial, we'll be using the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py) to augment our dataset. It creates artificial copies of question answering samples by replacing randomly chosen words with words of similar meaning. This meaning similarity is determined by their vector representations in a GLoVe word embedding model. + +1. Download the `augment_squad.py` script. + + +```python +!wget https://raw.githubusercontent.com/deepset-ai/haystack/main/haystack/utils/augment_squad.py +``` + +2. Download a small slice of the SQuAD question answering database. + + +```python +from haystack.utils import fetch_archive_from_http + +doc_dir = "data/distil_a_reader" +squad_dir = doc_dir + "/squad" + +s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/squad_small.json.zip" +fetch_archive_from_http(url=s3_url, output_dir=squad_dir) +``` + + 3. Download a set of GLoVe vectors. + + +```python +glove_dir = doc_dir + "/glove" + +glove_url = "https://nlp.stanford.edu/data/glove.6B.zip" +fetch_archive_from_http(url=glove_url, output_dir=glove_dir) +``` + +This tutorial uses a smaller set of vectors and a smaller dataset to make it faster. For real use cases, pick larger versions of both. + +4. Run the `augment_squad.py` script to create an augmented dataset. + + +```python +!python augment_squad.py \ + --squad_path data/distil_a_reader/squad/squad_small.json \ + --glove_path data/distil_a_reader/glove/glove.6B.300d.txt \ + --output_path augmented_dataset.json \ + --multiplication_factor 2 +``` + +The multiplication factor determines how many augmented samples we're generating. Setting it to 2 makes it much quicker to run. In real use cases, set this to something like 20. + +## Distilling a Reader + +Distillation in Haystack is done in two phases: +- Intermediate layer distillation ensures that the teacher and student models behave similarly. This can be performed using the augmented data. While intermediate layer distillation is optional, it will improve the performance of the model after training. +- Prediction layer distillation optimizes the model for the specific task. This must be performed using the non-augmented data. + + +1. Initialize the teacher model. + + +```python +from haystack.nodes import FARMReader + +teacher = FARMReader(model_name_or_path="deepset/bert-base-uncased-squad2", use_gpu=True) +``` + +Here we are using [`deepset/bert-base-uncased-squad2`](https://huggingface.co/deepset/bert-base-uncased-squad2), a base sized BERT model trained on SQuAD. + +2. Initialize the student model. + + +```python +student = FARMReader(model_name_or_path="huawei-noah/TinyBERT_General_6L_768D", use_gpu=True) +``` + +Here we are using a TinyBERT model that is smaller than the teacher model. You can pick any other student model, so long as it uses the same tokenizer as the teacher model. Also, the number of layers in the teacher model must be a multiple of the number of layers in the student. + +3. Perform intermediate layer distillation. + + +```python +student.distil_intermediate_layers_from(teacher, data_dir=".", train_filename="augmented_dataset.json", use_gpu=True) +``` + +4. Perform prediction layer distillation. + + +```python +student.distil_prediction_layer_from(teacher, data_dir="data/squad20", train_filename="dev-v2.0.json", use_gpu=True) +``` + +5. Save the student model. + + +```python +student.save(directory="my_distilled_model") +``` + + + +# Next Steps + +To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader). + +## About us + +This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany + +We bring NLP to the industry via open source! +Our focus: Industry specific language models & large scale QA systems. + +Some of our other work: +- [German BERT](https://deepset.ai/german-bert) +- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) + +Get in touch: +[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) + +By the way: [we're hiring!](https://www.deepset.ai/jobs) From 0eb9de3d36b9fef38185d32da3474f9860a1e2f2 Mon Sep 17 00:00:00 2001 From: Branden Chan <33759007+brandenchan@users.noreply.github.com> Date: Wed, 16 Nov 2022 13:51:05 +0100 Subject: [PATCH 30/48] Update index.toml Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> --- index.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.toml b/index.toml index f0795799..d1c287e7 100644 --- a/index.toml +++ b/index.toml @@ -13,7 +13,7 @@ aliases = ["first-qa-system", "01_Basic_QA_Pipeline.ipynb"] [[tutorial]] title = "Build a Scalable Question Answering System" -description = "Create a scalable Retriever Reader pipeline with an Elasticsearch Document Store." +description = "Create a scalable Retriever-Reader pipeline with an Elasticsearch DocumentStore." level = "beginner" weight = 15 notebook = "02_build_a_scalable_question_answering_system.ipynb" From a7318c52045ad94feb90069a20bc609617b4e416 Mon Sep 17 00:00:00 2001 From: Branden Chan <33759007+brandenchan@users.noreply.github.com> Date: Wed, 16 Nov 2022 13:51:29 +0100 Subject: [PATCH 31/48] Update index.toml Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> --- index.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.toml b/index.toml index d1c287e7..ca5c08d2 100644 --- a/index.toml +++ b/index.toml @@ -157,7 +157,7 @@ aliases = ["multimodal"] [[tutorial]] title = "Distill a Reader" -description = "Transfer a Reader's question answering ability to a smaller more efficient model." +description = "Transfer a Reader's question answering ability to a smaller, more efficient model." level = "intermediate" weight = 115 notebook = "21_distill_a_reader.ipynb" From 7e7b4f38eda68f0676358055b4d50d03fd43a22d Mon Sep 17 00:00:00 2001 From: Branden Chan <33759007+brandenchan@users.noreply.github.com> Date: Wed, 16 Nov 2022 14:22:37 +0100 Subject: [PATCH 32/48] Update tutorials/03_finetune_a_reader.ipynb Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> --- tutorials/03_finetune_a_reader.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/03_finetune_a_reader.ipynb b/tutorials/03_finetune_a_reader.ipynb index e038a0e2..1e8cf859 100644 --- a/tutorials/03_finetune_a_reader.ipynb +++ b/tutorials/03_finetune_a_reader.ipynb @@ -17,7 +17,7 @@ "source": [ "## Overview\n", "\n", - "Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset." + "Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained on public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset." ], "metadata": { "collapsed": false From 7c437c43cf0b6b06bd13e66de4afe38085322472 Mon Sep 17 00:00:00 2001 From: Branden Chan <33759007+brandenchan@users.noreply.github.com> Date: Wed, 16 Nov 2022 14:23:14 +0100 Subject: [PATCH 33/48] Update tutorials/03_finetune_a_reader.ipynb Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> --- tutorials/03_finetune_a_reader.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/03_finetune_a_reader.ipynb b/tutorials/03_finetune_a_reader.ipynb index 1e8cf859..6dd61754 100644 --- a/tutorials/03_finetune_a_reader.ipynb +++ b/tutorials/03_finetune_a_reader.ipynb @@ -65,7 +65,7 @@ "\n", "## Creating Training Data\n", "\n", - "To start fine-tuning your Reader model, you need question answering data in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format. One sample from this data should contain a question, a text answer, and the document in which this answer can be found.\n", + "To start fine-tuning your Reader model, you need question answering data in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format. One sample from this data should contain a question, a text answer, and the document containing the answer.\n", "\n", "You can start generating your own training data using one of the two tools that we offer:\n", "\n", From f04a193d224c2c1b365a8d6bedabadd5df216c95 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Wed, 16 Nov 2022 14:24:37 +0100 Subject: [PATCH 34/48] Incorporate Reviewer feedback --- .../01_build_your_first_question_answering_system.ipynb | 8 ++++---- .../02_build_a_scalable_question_answering_system.ipynb | 2 +- tutorials/03_finetune_a_reader.ipynb | 4 ++-- tutorials/21_distill_a_reader.ipynb | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/tutorials/01_build_your_first_question_answering_system.ipynb b/tutorials/01_build_your_first_question_answering_system.ipynb index cc5e6f5d..598ae774 100644 --- a/tutorials/01_build_your_first_question_answering_system.ipynb +++ b/tutorials/01_build_your_first_question_answering_system.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Build Your First Question Answering System\n", + "# Tutorial: Build Your First Question Answering System\n", "\n", "- **Level**: Beginner\n", "- **Time to complete**: 15 minutes\n", @@ -17,11 +17,11 @@ "source": [ "## Overview\n", "\n", - "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", + "Let's learn how to build a question answering system using Haystack's DocumentStore, Retriever, and Reader. Given a question like \"Who is the father of Arya Stark?\", this program will search through a knowledge base and look for a fitting answer.\n", "\n", - "This tutorial will introduce you to all the concepts needed to build such a question answering system. However, certain setup steps, such as Document preparation and indexing as well as pipeline initialization, are simplified so that you can get started quicker.\n", + "While the documents we are using in this tutorial are all to do with Game of Thrones, the question answering system can work in many domains if you provide the documents. For example, you could add your company's internal wikis, or a collection of financial reports and still receive answers to questions on these topics.\n", "\n", - "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!" + "To help you get started quicker, we have simplified certain steps in this tutorial. For example, Document preparation and pipeline initialization are handled by ready-made classes that replace lines of initialization code. But don't worry! This doesn't affect how well the question answering system performs." ], "metadata": { "collapsed": false diff --git a/tutorials/02_build_a_scalable_question_answering_system.ipynb b/tutorials/02_build_a_scalable_question_answering_system.ipynb index 9e4da266..3f231735 100644 --- a/tutorials/02_build_a_scalable_question_answering_system.ipynb +++ b/tutorials/02_build_a_scalable_question_answering_system.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Build a Scalable Question Answering System\n", + "# Tutorial: Build a Scalable Question Answering System\n", "\n", "- **Level**: Beginner\n", "- **Time to complete**: 20 minutes\n", diff --git a/tutorials/03_finetune_a_reader.ipynb b/tutorials/03_finetune_a_reader.ipynb index 6dd61754..cf71d783 100644 --- a/tutorials/03_finetune_a_reader.ipynb +++ b/tutorials/03_finetune_a_reader.ipynb @@ -4,12 +4,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Fine-Tune a Reader\n", + "# Tutorial: Fine-Tune a Reader to Improve its Performance\n", "\n", "- **Level**: Intermediate\n", "- **Time to complete**: 20 minutes\n", "- **Nodes Used**: `FARMReader`\n", - "- **Goal**: Learn how to improve the performance of a Reader model by performing fine-tuning." + "- **Goal**: Learn how to improve the performance of a DistilBERT Reader model by performing further training on the SQuAD dataset." ] }, { diff --git a/tutorials/21_distill_a_reader.ipynb b/tutorials/21_distill_a_reader.ipynb index 48e51515..da6caed3 100644 --- a/tutorials/21_distill_a_reader.ipynb +++ b/tutorials/21_distill_a_reader.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Distill a Reader\n", + "# Tutorial: Distill a Reader\n", "\n", "- **Level**: Advanced\n", "- **Time to complete**: 30 minutes\n", From 627ec52dc6f49776b8187edf83fd54bc96982214 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Wed, 16 Nov 2022 14:25:38 +0100 Subject: [PATCH 35/48] Regenerate markdown --- .../01_build_your_first_question_answering_system.md | 10 +++++----- .../02_build_a_scalable_question_answering_system.md | 6 +++--- markdowns/03_finetune_a_reader.md | 10 +++++----- markdowns/21_distill_a_reader.md | 6 +++--- 4 files changed, 16 insertions(+), 16 deletions(-) diff --git a/markdowns/01_build_your_first_question_answering_system.md b/markdowns/01_build_your_first_question_answering_system.md index eaf0e0ff..59dd166a 100644 --- a/markdowns/01_build_your_first_question_answering_system.md +++ b/markdowns/01_build_your_first_question_answering_system.md @@ -3,7 +3,7 @@ layout: tutorial colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_build_your_first_question_answering_system.ipynb toc: True title: "Build Your First Question Answering System" -last_updated: 2022-11-14 +last_updated: 2022-11-16 level: "beginner" weight: 10 description: Get Started by creating a Retriever Reader pipeline. @@ -12,7 +12,7 @@ aliases: ['/tutorials/first-qa-system', '/tutorials/01_Basic_QA_Pipeline.ipynb'] --- -# Build Your First Question Answering System +# Tutorial: Build Your First Question Answering System - **Level**: Beginner - **Time to complete**: 15 minutes @@ -22,11 +22,11 @@ aliases: ['/tutorials/first-qa-system', '/tutorials/01_Basic_QA_Pipeline.ipynb'] ## Overview -Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as "Who is the father of Arya Stark?" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example. +Let's learn how to build a question answering system using Haystack's DocumentStore, Retriever, and Reader. Given a question like "Who is the father of Arya Stark?", this program will search through a knowledge base and look for a fitting answer. -This tutorial will introduce you to all the concepts needed to build such a question answering system. However, certain setup steps, such as Document preparation and indexing as well as pipeline initialization, are simplified so that you can get started quicker. +While the documents we are using in this tutorial are all to do with Game of Thrones, the question answering system can work in many domains if you provide the documents. For example, you could add your company's internal wikis, or a collection of financial reports and still receive answers to questions on these topics. -Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! +To help you get started quicker, we have simplified certain steps in this tutorial. For example, Document preparation and pipeline initialization are handled by ready-made classes that replace lines of initialization code. But don't worry! This doesn't affect how well the question answering system performs. ## Preparing the Colab Environment diff --git a/markdowns/02_build_a_scalable_question_answering_system.md b/markdowns/02_build_a_scalable_question_answering_system.md index 7c08a04a..b7cb4efe 100644 --- a/markdowns/02_build_a_scalable_question_answering_system.md +++ b/markdowns/02_build_a_scalable_question_answering_system.md @@ -3,16 +3,16 @@ layout: tutorial colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_build_a_scalable_question_answering_system.ipynb toc: True title: "Build a Scalable Question Answering System" -last_updated: 2022-11-14 +last_updated: 2022-11-16 level: "beginner" weight: 15 -description: Create a scalable Retriever Reader pipeline with an Elasticsearch Document Store. +description: Create a scalable Retriever-Reader pipeline with an Elasticsearch DocumentStore. category: "QA" aliases: ['/tutorials/without-elasticsearch', '/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb', '/tutorials/scalable-qa-system'] --- -# Build a Scalable Question Answering System +# Tutorial: Build a Scalable Question Answering System - **Level**: Beginner - **Time to complete**: 20 minutes diff --git a/markdowns/03_finetune_a_reader.md b/markdowns/03_finetune_a_reader.md index e381a8fa..4c7c8174 100644 --- a/markdowns/03_finetune_a_reader.md +++ b/markdowns/03_finetune_a_reader.md @@ -3,7 +3,7 @@ layout: tutorial colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_finetune_a_reader.ipynb toc: True title: "Fine-Tune a Reader" -last_updated: 2022-11-14 +last_updated: 2022-11-16 level: "intermediate" weight: 50 description: Improve the performance of your Reader by performing fine-tuning. @@ -12,16 +12,16 @@ aliases: ['/tutorials/fine-tuning-a-model', '/tutorials/02_Finetune_a_model_on_y --- -# Fine-Tune a Reader +# Tutorial: Fine-Tune a Reader to Improve its Performance - **Level**: Intermediate - **Time to complete**: 20 minutes - **Nodes Used**: `FARMReader` -- **Goal**: Learn how to improve the performance of a Reader model by performing fine-tuning. +- **Goal**: Learn how to improve the performance of a DistilBERT Reader model by performing further training on the SQuAD dataset. ## Overview -Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset. +Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained on public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset. ## Preparing the Colab Environment @@ -45,7 +45,7 @@ pip install farm-haystack[colab] ## Creating Training Data -To start fine-tuning your Reader model, you need question answering data in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format. One sample from this data should contain a question, a text answer, and the document in which this answer can be found. +To start fine-tuning your Reader model, you need question answering data in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format. One sample from this data should contain a question, a text answer, and the document containing the answer. You can start generating your own training data using one of the two tools that we offer: diff --git a/markdowns/21_distill_a_reader.md b/markdowns/21_distill_a_reader.md index 3ee211ce..77d7b810 100644 --- a/markdowns/21_distill_a_reader.md +++ b/markdowns/21_distill_a_reader.md @@ -3,16 +3,16 @@ layout: tutorial colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/21_distill_a_reader.ipynb toc: True title: "Distill a Reader" -last_updated: 2022-11-14 +last_updated: 2022-11-16 level: "intermediate" weight: 115 -description: Transfer a Reader's question answering ability to a smaller more efficient model. +description: Transfer a Reader's question answering ability to a smaller, more efficient model. category: "QA" aliases: ['/tutorials/distill-reader'] --- -# Distill a Reader +# Tutorial: Distill a Reader - **Level**: Advanced - **Time to complete**: 30 minutes From a52d1bc63599fb587e8d218d2ec804c59d11edef Mon Sep 17 00:00:00 2001 From: brandenchan Date: Wed, 16 Nov 2022 17:33:52 +0100 Subject: [PATCH 36/48] Edit colab env setup sections --- .../01_build_your_first_question_answering_system.md | 6 ++---- .../02_build_a_scalable_question_answering_system.md | 6 ++---- markdowns/03_finetune_a_reader.md | 6 ++---- markdowns/21_distill_a_reader.md | 8 ++------ 4 files changed, 8 insertions(+), 18 deletions(-) diff --git a/markdowns/01_build_your_first_question_answering_system.md b/markdowns/01_build_your_first_question_answering_system.md index 59dd166a..ac55ce4b 100644 --- a/markdowns/01_build_your_first_question_answering_system.md +++ b/markdowns/01_build_your_first_question_answering_system.md @@ -31,10 +31,8 @@ To help you get started quicker, we have simplified certain steps in this tutori ## Preparing the Colab Environment -- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) -- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) - +- [Enabling the GPU in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/faq#why-is-haystack-not-logging-everything-to-the-console) ## Installing Haystack diff --git a/markdowns/02_build_a_scalable_question_answering_system.md b/markdowns/02_build_a_scalable_question_answering_system.md index b7cb4efe..8ca77049 100644 --- a/markdowns/02_build_a_scalable_question_answering_system.md +++ b/markdowns/02_build_a_scalable_question_answering_system.md @@ -31,10 +31,8 @@ Let's learn how to build a question answering system and discover more about the ## Preparing the Colab Environment -- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) -- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) - +- [Enabling the GPU in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/faq#why-is-haystack-not-logging-everything-to-the-console) ## Installing Haystack diff --git a/markdowns/03_finetune_a_reader.md b/markdowns/03_finetune_a_reader.md index 4c7c8174..54c3509f 100644 --- a/markdowns/03_finetune_a_reader.md +++ b/markdowns/03_finetune_a_reader.md @@ -25,10 +25,8 @@ Fine-tuning can improve your Reader's performance on question answering, especia ## Preparing the Colab Environment -- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) -- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) - +- [Enabling the GPU in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/faq#why-is-haystack-not-logging-everything-to-the-console) ## Installing Haystack diff --git a/markdowns/21_distill_a_reader.md b/markdowns/21_distill_a_reader.md index 77d7b810..e1d9f393 100644 --- a/markdowns/21_distill_a_reader.md +++ b/markdowns/21_distill_a_reader.md @@ -26,12 +26,8 @@ Model distillation is the process of teaching a smaller model to imitate the per ## Preparing the Colab Environment -
-- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) -- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) -
- +- [Enabling the GPU in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/faq#why-is-haystack-not-logging-everything-to-the-console) ## Installing Haystack From 2dc469a28d5a2bdca6d3a14849b6cd5c8a27b786 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Wed, 16 Nov 2022 17:34:29 +0100 Subject: [PATCH 37/48] Regenerate MD files --- .../01_build_your_first_question_answering_system.md | 6 ++++-- .../02_build_a_scalable_question_answering_system.md | 6 ++++-- markdowns/03_finetune_a_reader.md | 6 ++++-- markdowns/21_distill_a_reader.md | 8 ++++++-- 4 files changed, 18 insertions(+), 8 deletions(-) diff --git a/markdowns/01_build_your_first_question_answering_system.md b/markdowns/01_build_your_first_question_answering_system.md index ac55ce4b..59dd166a 100644 --- a/markdowns/01_build_your_first_question_answering_system.md +++ b/markdowns/01_build_your_first_question_answering_system.md @@ -31,8 +31,10 @@ To help you get started quicker, we have simplified certain steps in this tutori ## Preparing the Colab Environment -- [Enabling the GPU in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/faq#why-is-haystack-not-logging-everything-to-the-console) +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + ## Installing Haystack diff --git a/markdowns/02_build_a_scalable_question_answering_system.md b/markdowns/02_build_a_scalable_question_answering_system.md index 8ca77049..b7cb4efe 100644 --- a/markdowns/02_build_a_scalable_question_answering_system.md +++ b/markdowns/02_build_a_scalable_question_answering_system.md @@ -31,8 +31,10 @@ Let's learn how to build a question answering system and discover more about the ## Preparing the Colab Environment -- [Enabling the GPU in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/faq#why-is-haystack-not-logging-everything-to-the-console) +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + ## Installing Haystack diff --git a/markdowns/03_finetune_a_reader.md b/markdowns/03_finetune_a_reader.md index 54c3509f..4c7c8174 100644 --- a/markdowns/03_finetune_a_reader.md +++ b/markdowns/03_finetune_a_reader.md @@ -25,8 +25,10 @@ Fine-tuning can improve your Reader's performance on question answering, especia ## Preparing the Colab Environment -- [Enabling the GPU in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/faq#why-is-haystack-not-logging-everything-to-the-console) +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + ## Installing Haystack diff --git a/markdowns/21_distill_a_reader.md b/markdowns/21_distill_a_reader.md index e1d9f393..77d7b810 100644 --- a/markdowns/21_distill_a_reader.md +++ b/markdowns/21_distill_a_reader.md @@ -26,8 +26,12 @@ Model distillation is the process of teaching a smaller model to imitate the per ## Preparing the Colab Environment -- [Enabling the GPU in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/faq#why-is-haystack-not-logging-everything-to-the-console) +
+- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) +
+ ## Installing Haystack From 11b1d33adab3f2d29be0c9fb9b278327703bb373 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 22 Nov 2022 16:20:19 +0100 Subject: [PATCH 38/48] Incorporate reviewer feedback --- markdowns/01_build_your_first_question_answering_system.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/markdowns/01_build_your_first_question_answering_system.md b/markdowns/01_build_your_first_question_answering_system.md index 59dd166a..864ca791 100644 --- a/markdowns/01_build_your_first_question_answering_system.md +++ b/markdowns/01_build_your_first_question_answering_system.md @@ -80,8 +80,7 @@ fetch_archive_from_http( ```python import os -from haystack.pipelines.standard_pipelines import TextIndexingPipeline -# from text_indexing_pipeline import TextIndexingPipeline +from haystack.pipelines import TextIndexingPipeline files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)] indexing_pipeline = TextIndexingPipeline(document_store) @@ -100,9 +99,9 @@ Retrievers sift through all the Documents and return only those that it thinks m ```python -from haystack.nodes import TfidfRetriever +from haystack.nodes import BM25Retriever -retriever = TfidfRetriever(document_store=document_store) +retriever = BM25Retriever(document_store=document_store) ``` ## Initializing the Reader From f4c77f6639e1b2eabd428051362fe7d217d7f72d Mon Sep 17 00:00:00 2001 From: brandenchan Date: Wed, 23 Nov 2022 09:21:27 +0100 Subject: [PATCH 39/48] Set use_bm25 argument --- markdowns/01_build_your_first_question_answering_system.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/markdowns/01_build_your_first_question_answering_system.md b/markdowns/01_build_your_first_question_answering_system.md index 864ca791..b7e4d60f 100644 --- a/markdowns/01_build_your_first_question_answering_system.md +++ b/markdowns/01_build_your_first_question_answering_system.md @@ -56,7 +56,7 @@ A DocumentStore stores the Documents that the question answering system uses to ```python from haystack.document_stores import InMemoryDocumentStore -document_store = InMemoryDocumentStore() +document_store = InMemoryDocumentStore(use_bm25=True) ``` ## Preparing Documents From 1843731874cfe83d9868aade0326548181eed77d Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 29 Nov 2022 16:33:41 +0100 Subject: [PATCH 40/48] Update naming --- index.toml | 16 +++++++++------- ...a_reader.ipynb => 02_finetune_a_reader.ipynb} | 0 ...d_a_scalable_question_answering_system.ipynb} | 0 3 files changed, 9 insertions(+), 7 deletions(-) rename tutorials/{03_finetune_a_reader.ipynb => 02_finetune_a_reader.ipynb} (100%) rename tutorials/{02_build_a_scalable_question_answering_system.ipynb => 03_build_a_scalable_question_answering_system.ipynb} (100%) diff --git a/index.toml b/index.toml index 8b14598d..1ff2e955 100644 --- a/index.toml +++ b/index.toml @@ -8,25 +8,27 @@ title = "Build Your First Question Answering System" description = "Get Started by creating a Retriever Reader pipeline." level = "beginner" weight = 10 -notebook = "01_Basic_QA_Pipeline.ipynb" +notebook = "01_build_your_first_question_answering_system.ipynb" aliases = ["first-qa-system"] -slug = "01_Basic_QA_Pipeline" +slug = "01_basic_qa_pipeline" [[tutorial]] title = "Fine-Tune a Reader" description = "Improve the performance of your Reader by performing fine-tuning." level = "intermediate" weight = 50 -notebook = "02_Finetune_a_model_on_your_data.ipynb" -aliases = ["fine-tuning-a-model"] +notebook = "02_finetune_a_reader.ipynb" +aliases = ["fine-tuning-a-model", "02_finetune_a_model_on_your_data", "fine-tune-a-model"] +slug = "02_finetune_a_model_on_your_data" [[tutorial]] -title = "Build a QA System Without Elasticsearch" -description = "Create a Retriever Reader pipeline that requires no external database dependencies." +title = "Build a Scalable Question Answering System" +description = "Create a scalable Retriever Reader pipeline that uses an ElasticsearchDocumentStore." level = "beginner" weight = 15 -notebook = "03_Basic_QA_Pipeline_without_Elasticsearch.ipynb" +notebook = "03_build_a_scalable_question_answering_system.ipynb.ipynb" aliases = ["without-elasticsearch"] +slug = "03_scalable_qa_pipeline" [[tutorial]] title = "Utilizing Existing FAQs for Question Answering" diff --git a/tutorials/03_finetune_a_reader.ipynb b/tutorials/02_finetune_a_reader.ipynb similarity index 100% rename from tutorials/03_finetune_a_reader.ipynb rename to tutorials/02_finetune_a_reader.ipynb diff --git a/tutorials/02_build_a_scalable_question_answering_system.ipynb b/tutorials/03_build_a_scalable_question_answering_system.ipynb similarity index 100% rename from tutorials/02_build_a_scalable_question_answering_system.ipynb rename to tutorials/03_build_a_scalable_question_answering_system.ipynb From b785f561e41181f370044e1b15b51c60784be599 Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 29 Nov 2022 16:38:04 +0100 Subject: [PATCH 41/48] Delete old md and regenerate new md --- markdowns/01_Basic_QA_Pipeline.md | 328 ------------------ ...ld_your_first_question_answering_system.md | 197 ----------- markdowns/02_Finetune_a_model_on_your_data.md | 175 ---------- ...ld_a_scalable_question_answering_system.md | 280 --------------- ...Basic_QA_Pipeline_without_Elasticsearch.md | 258 -------------- markdowns/03_finetune_a_reader.md | 126 ------- 6 files changed, 1364 deletions(-) delete mode 100644 markdowns/01_Basic_QA_Pipeline.md delete mode 100644 markdowns/01_build_your_first_question_answering_system.md delete mode 100644 markdowns/02_Finetune_a_model_on_your_data.md delete mode 100644 markdowns/02_build_a_scalable_question_answering_system.md delete mode 100644 markdowns/03_Basic_QA_Pipeline_without_Elasticsearch.md delete mode 100644 markdowns/03_finetune_a_reader.md diff --git a/markdowns/01_Basic_QA_Pipeline.md b/markdowns/01_Basic_QA_Pipeline.md deleted file mode 100644 index 03e5bd75..00000000 --- a/markdowns/01_Basic_QA_Pipeline.md +++ /dev/null @@ -1,328 +0,0 @@ ---- -layout: tutorial -colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb -toc: True -title: "Build Your First QA System" -last_updated: 2022-11-24 -level: "beginner" -weight: 10 -description: Get Started by creating a Retriever Reader pipeline. -category: "QA" -aliases: ['/tutorials/first-qa-system'] -download: "/downloads/01_Basic_QA_Pipeline.ipynb" ---- - - - - - -Question Answering can be used in a variety of use cases. A very common one: Using it to navigate through complex knowledge bases or long documents ("search setting"). - -A "knowledge base" could for example be your website, an internal wiki or a collection of financial reports. -In this tutorial we will work on a slightly different domain: "Game of Thrones". - -Let's see how we can use a bunch of Wikipedia articles to answer a variety of questions about the -marvellous seven kingdoms. - - -### Prepare environment - -#### Colab: Enable the GPU runtime -Make sure you enable the GPU runtime to experience decent speed in this tutorial. -**Runtime -> Change Runtime type -> Hardware accelerator -> GPU** - - - -You can double check whether the GPU runtime is enabled with the following command: - - -```bash -%%bash - -nvidia-smi -``` - -To start, install the latest release of Haystack with `pip`: - - -```bash -%%bash - -pip install --upgrade pip -pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] -``` - -## Logging - -We configure how logging messages should be displayed and which log level should be used before importing Haystack. -Example log message: -INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt -Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily: - - -```python -import logging - -logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) -logging.getLogger("haystack").setLevel(logging.INFO) -``` - -## Document Store - -Haystack finds answers to queries within the documents stored in a `DocumentStore`. The current implementations of `DocumentStore` include `ElasticsearchDocumentStore`, `FAISSDocumentStore`, `SQLDocumentStore`, and `InMemoryDocumentStore`. - -**Here:** We recommended Elasticsearch as it comes preloaded with features like [full-text queries](https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html), [BM25 retrieval](https://www.elastic.co/elasticon/conf/2016/sf/improved-text-scoring-with-bm25), and [vector storage for text embeddings](https://www.elastic.co/guide/en/elasticsearch/reference/7.6/dense-vector.html). - -**Alternatives:** If you are unable to setup an Elasticsearch instance, then follow the [Tutorial 3](https://github.com/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb) for using SQL/InMemory document stores. - -**Hint**: This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores. - -### Start an Elasticsearch server locally -You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source. - - -```python -# Recommended: Start Elasticsearch using Docker via the Haystack utility function -from haystack.utils import launch_es - -launch_es() -``` - -### Start an Elasticsearch server in Colab - -If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source. - - -```bash -%%bash - -wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q -tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz -chown -R daemon:daemon elasticsearch-7.9.2 -``` - - -```bash -%%bash --bg - -sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch -``` - -### Create the Document Store - -The `ElasticsearchDocumentStore` class will try to open a connection in the constructor, here we wait 30 seconds only to be sure Elasticsearch is ready before continuing: - - -```python -import time -time.sleep(30) -``` - -Finally, we create the Document Store instance: - - -```python -import os -from haystack.document_stores import ElasticsearchDocumentStore - -# Get the host where Elasticsearch is running, default to localhost -host = os.environ.get("ELASTICSEARCH_HOST", "localhost") -document_store = ElasticsearchDocumentStore(host=host, username="", password="", index="document") -``` - -## Preprocessing of documents - -Haystack provides a customizable pipeline for: - - converting files into texts - - cleaning texts - - splitting texts - - writing them to a Document Store - -In this tutorial, we download Wikipedia articles about Game of Thrones, apply a basic cleaning function, and index them in Elasticsearch. - - -```python -from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http - - -# Let's first fetch some documents that we want to query -# Here: 517 Wikipedia articles for Game of Thrones -doc_dir = "data/tutorial1" -s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip" -fetch_archive_from_http(url=s3_url, output_dir=doc_dir) - -# Convert files to dicts -# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers) -# It must take a str as input, and return a str. -docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True) - -# We now have a list of dictionaries that we can write to our document store. -# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself. -# The default format here is: -# { -# 'content': "", -# 'meta': {'name': "", ...} -# } -# (Optionally: you can also add more key-value-pairs here, that will be indexed as fields in Elasticsearch and -# can be accessed later for filtering or shown in the responses of the Pipeline) - -# Let's have a look at the first 3 entries: -print(docs[:3]) - -# Now, let's write the dicts containing documents to our DB. -document_store.write_documents(docs) -``` - -## Initialize Retriever, Reader & Pipeline - -### Retriever - -Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered. -They use some simple but fast algorithm. - -**Here:** We use Elasticsearch's default BM25 algorithm - -**Alternatives:** - -- Customize the `BM25Retriever`with custom queries (e.g. boosting) and filters -- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging -- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT) -- Use `DensePassageRetriever` to use different embedding models for passage and query (see Tutorial 6) - - -```python -from haystack.nodes import BM25Retriever - -retriever = BM25Retriever(document_store=document_store) -``` - - -```python -# Alternative: An in-memory TfidfRetriever based on Pandas dataframes for building quick-prototypes with SQLite document store. - -# from haystack.nodes import TfidfRetriever -# retriever = TfidfRetriever(document_store=document_store) -``` - -### Reader - -A Reader scans the texts returned by retrievers in detail and extracts the k best answers. They are based -on powerful, but slower deep learning models. - -Haystack currently supports Readers based on the frameworks FARM and Transformers. -With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models). - -**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2) - -**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package) - -**Alternatives (Models):** e.g. "distilbert-base-uncased-distilled-squad" (fast) or "deepset/bert-large-uncased-whole-word-masking-squad2" (good accuracy) - -**Hint:** You can adjust the model to return "no answer possible" with the no_ans_boost. Higher values mean the model prefers "no answer possible" - -#### FARMReader - - -```python -from haystack.nodes import FARMReader - -# Load a local model or any of the QA models on -# Hugging Face's model hub (https://huggingface.co/models) - -reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) -``` - -#### TransformersReader - -Alternative: - - -```python -from haystack.nodes import TransformersReader -# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1) -``` - -### Pipeline - -With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline. -Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases. -To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions. -You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelines). - - -```python -from haystack.pipelines import ExtractiveQAPipeline - -pipe = ExtractiveQAPipeline(reader, retriever) -``` - -## Voilà! Ask a question! - - -```python -# You can configure how many candidates the Reader and Retriever shall return -# The higher top_k_retriever, the better (but also the slower) your answers. -prediction = pipe.run( - query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}} -) -``` - - -```python -# prediction = pipe.run(query="Who created the Dothraki vocabulary?", params={"Reader": {"top_k": 5}}) -# prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}}) -``` - -Now you can either print the object directly: - - -```python -from pprint import pprint - -pprint(prediction) - -# Sample output: -# { -# 'answers': [ , -# , -# ... -# ] -# 'documents': [ , -# , -# ... -# ], -# 'no_ans_gap': 11.688868522644043, -# 'node_id': 'Reader', -# 'params': {'Reader': {'top_k': 5}, 'Retriever': {'top_k': 5}}, -# 'query': 'Who is the father of Arya Stark?', -# 'root_node': 'Query' -# } -``` - -Or use a util to simplify the output: - - -```python -from haystack.utils import print_answers - -# Change `minimum` to `medium` or `all` to raise the level of detail -print_answers(prediction, details="minimum") -``` - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) -- [FARM](https://github.com/deepset-ai/FARM) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) - diff --git a/markdowns/01_build_your_first_question_answering_system.md b/markdowns/01_build_your_first_question_answering_system.md deleted file mode 100644 index b7e4d60f..00000000 --- a/markdowns/01_build_your_first_question_answering_system.md +++ /dev/null @@ -1,197 +0,0 @@ ---- -layout: tutorial -colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_build_your_first_question_answering_system.ipynb -toc: True -title: "Build Your First Question Answering System" -last_updated: 2022-11-16 -level: "beginner" -weight: 10 -description: Get Started by creating a Retriever Reader pipeline. -category: "QA" -aliases: ['/tutorials/first-qa-system', '/tutorials/01_Basic_QA_Pipeline.ipynb'] ---- - - -# Tutorial: Build Your First Question Answering System - -- **Level**: Beginner -- **Time to complete**: 15 minutes -- **Nodes Used**: `InMemoryDocumentStore`, `BM25Retriever`, `FARMReader` -- **Goal**: After completing this tutorial, you will have learned about the Reader and Retriever, and built a question answering pipeline that can answer questions about the Game of Thrones series. - - -## Overview - -Let's learn how to build a question answering system using Haystack's DocumentStore, Retriever, and Reader. Given a question like "Who is the father of Arya Stark?", this program will search through a knowledge base and look for a fitting answer. - -While the documents we are using in this tutorial are all to do with Game of Thrones, the question answering system can work in many domains if you provide the documents. For example, you could add your company's internal wikis, or a collection of financial reports and still receive answers to questions on these topics. - -To help you get started quicker, we have simplified certain steps in this tutorial. For example, Document preparation and pipeline initialization are handled by ready-made classes that replace lines of initialization code. But don't worry! This doesn't affect how well the question answering system performs. - - -## Preparing the Colab Environment - -- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) -- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) - - -## Installing Haystack - -To start, let's install the latest release of Haystack with `pip`: - - -```bash -%%bash - -pip install --upgrade pip -pip install farm-haystack[colab] -``` - -## Initializing the DocumentStore - -A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` which is the simplest DocumentStore to get started with. It requires no external dependencies and is a good option for smaller projects and debugging. However, it does not scale up so well to larger Document collections. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). - - -```python -from haystack.document_stores import InMemoryDocumentStore - -document_store = InMemoryDocumentStore(use_bm25=True) -``` - -## Preparing Documents - -1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files. - - -```python -from haystack.utils import fetch_archive_from_http - -doc_dir = "data/build_your_first_question_answering_system" - -fetch_archive_from_http( - url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", - output_dir=doc_dir -) -``` - -2. Use the `TextIndexingPipeline` to convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore. - - -```python -import os -from haystack.pipelines import TextIndexingPipeline - -files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)] -indexing_pipeline = TextIndexingPipeline(document_store) -indexing_pipeline.run_batch(file_paths=files_to_index) - - -``` - -While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way. - -As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using `DocumentStore.write_documents()`. - -## Initializing the Retriever - -Retrievers sift through all the Documents and return only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). - - -```python -from haystack.nodes import BM25Retriever - -retriever = BM25Retriever(document_store=document_store) -``` - -## Initializing the Reader - -A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). - - -```python -from haystack.nodes import FARMReader - -reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) -``` - -## Creating the Retriever-Reader Pipeline - -The `ExtractiveQAPipeline` connects the Reader and Retriever. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on. - - -```python -from haystack.pipelines import ExtractiveQAPipeline - -pipe = ExtractiveQAPipeline(reader, retriever) -``` - -## Asking a Question - -1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values). - - - -```python -prediction = pipe.run( - query="Who is the father of Arya Stark?", - params={ - "Retriever": {"top_k": 10}, - "Reader": {"top_k": 5} - } -) -``` - -Here are some questions you could try out: -- Who is the father of Arya Stark? -- Who created the Dothraki vocabulary? -- Who is the sister of Sansa? - -2. The answers returned by the pipeline can be printed out directly: - - -```python -from pprint import pprint - -pprint(prediction) -``` - -3. Simplify the printed answers: - - -```python -from haystack.utils import print_answers - -print_answers( - prediction, - details="minimum" ## Choose from `minimum`, `medium` and `all` -) -``` - -And there you have it! Congratulations on building your first machine learning based question answering system! - -# Next Steps - -Check out [Build a Scalable Question Answering System](https://haystack.deepset.ai/tutorials/02_build_a_scalable_question_answering_system) to learn how to make a more advanced question answering system that uses an Elasticsearch backed DocumentStore and makes more use of the flexibility that pipelines offer. - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) - - - -```python - -``` diff --git a/markdowns/02_Finetune_a_model_on_your_data.md b/markdowns/02_Finetune_a_model_on_your_data.md deleted file mode 100644 index a4572607..00000000 --- a/markdowns/02_Finetune_a_model_on_your_data.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -layout: tutorial -colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_Finetune_a_model_on_your_data.ipynb -toc: True -title: "Fine-Tuning a Model on Your Own Data" -last_updated: 2022-11-24 -level: "intermediate" -weight: 50 -description: Improve the performance of your Reader by performing fine-tuning. -category: "QA" -aliases: ['/tutorials/fine-tuning-a-model'] -download: "/downloads/02_Finetune_a_model_on_your_data.ipynb" ---- - - - -For many use cases it is sufficient to just use one of the existing public models that were trained on SQuAD or other public QA datasets (e.g. Natural Questions). -However, if you have domain-specific questions, fine-tuning your model on custom examples will very likely boost your performance. -While this varies by domain, we saw that ~ 2000 examples can easily increase performance by +5-20%. - -This tutorial shows you how to fine-tune a pretrained model on your own dataset. - -### Prepare environment - -#### Colab: Enable the GPU runtime -Make sure you enable the GPU runtime to experience decent speed in this tutorial. -**Runtime -> Change Runtime type -> Hardware accelerator -> GPU** - - - - -```python -# Make sure you have a GPU running -!nvidia-smi -``` - - -```python -# Install the latest release of Haystack in your own environment -#! pip install farm-haystack - -# Install the latest main of Haystack -!pip install --upgrade pip -!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] -``` - -## Logging - -We configure how logging messages should be displayed and which log level should be used before importing Haystack. -Example log message: -INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt -Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily: - - -```python -import logging - -logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) -logging.getLogger("haystack").setLevel(logging.INFO) -``` - - -```python -from haystack.nodes import FARMReader -from haystack.utils import fetch_archive_from_http -``` - - -## Create Training Data - -There are two ways to generate training data - -1. **Annotation**: You can use the [annotation tool](https://haystack.deepset.ai/guides/annotation) to label your data, i.e. highlighting answers to your questions in a document. The tool supports structuring your workflow with organizations, projects, and users. The labels can be exported in SQuAD format that is compatible for training with Haystack. - -![Snapshot of the annotation tool](https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/annotation_tool.png) - -2. **Feedback**: For production systems, you can collect training data from direct user feedback via Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api). This includes a customizable user feedback API for providing feedback on the answer returned by the API. The API provides a feedback export endpoint to obtain the feedback data for fine-tuning your model further. - - -## Fine-tune your model - -Once you have collected training data, you can fine-tune your base models. -We initialize a reader as a base model and fine-tune it on our own custom dataset (should be in SQuAD-like format). -We recommend using a base model that was trained on SQuAD or a similar QA dataset before to benefit from Transfer Learning effects. - -**Recommendation**: Run training on a GPU. -If you are using Colab: Enable this in the menu "Runtime" > "Change Runtime type" > Select "GPU" in dropdown. -Then change the `use_gpu` arguments below to `True` - - -```python -reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True) -data_dir = "data/squad20" -# data_dir = "PATH/TO_YOUR/TRAIN_DATA" -reader.train(data_dir=data_dir, train_filename="dev-v2.0.json", use_gpu=True, n_epochs=1, save_dir="my_model") -``` - - -```python -# Saving the model happens automatically at the end of training into the `save_dir` you specified -# However, you could also save a reader manually again via: -reader.save(directory="my_model") -``` - - -```python -# If you want to load it at a later point, just do: -new_reader = FARMReader(model_name_or_path="my_model") -``` - -## Distill your model -In this case, we have used "distilbert-base-uncased" as our base model. This model was trained using a process called distillation. In this process, a bigger model is trained first and is used to train a smaller model which increases its accuracy. This is why "distilbert-base-uncased" can achieve quite competitive performance while being very small. - -Sometimes, however, you can't use an already distilled model and have to distil it yourself. For this case, haystack has implemented [distillation features](https://haystack.deepset.ai/guides/model-distillation). - -### Augmenting your training data -To get the most out of model distillation, we recommend increasing the size of your training data by using data augmentation. You can do this by running the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py): - - -```python -# Downloading script -!wget https://raw.githubusercontent.com/deepset-ai/haystack/main/haystack/utils/augment_squad.py - -doc_dir = "data/tutorial2" - -# Downloading smaller glove vector file (only for demonstration purposes) -glove_url = "https://nlp.stanford.edu/data/glove.6B.zip" -fetch_archive_from_http(url=glove_url, output_dir=doc_dir) - -# Downloading very small dataset to make tutorial faster (please use a bigger dataset for real use cases) -s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/squad_small.json.zip" -fetch_archive_from_http(url=s3_url, output_dir=doc_dir) - -# Just replace the path with your dataset and adjust the output (also please remove glove path to use bigger glove vector file) -!python augment_squad.py --squad_path squad_small.json --output_path augmented_dataset.json --multiplication_factor 2 --glove_path glove.6B.300d.txt -``` - -In this case, we use a multiplication factor of 2 to keep this example lightweight. Usually you would use a factor like 20 depending on the size of your training data. Augmenting this small dataset with a multiplication factor of 2, should take about 5 to 10 minutes to run on one V100 GPU. - -### Running distillation -Distillation in haystack is done in two steps: First, you run intermediate layer distillation on the augmented dataset to ensure the two models behave similarly. After that, you run the prediction layer distillation on the non-augmented dataset to optimize the model for your specific task. - -If you want, you can leave out the intermediate layer distillation step and only run the prediction layer distillation. This way you also do not need to perform data augmentation. However, this will make the model significantly less accurate. - - -```python -# Loading a fine-tuned model as teacher e.g. "deepset/​bert-​base-​uncased-​squad2" -teacher = FARMReader(model_name_or_path="my_model", use_gpu=True) - -# You can use any pre-trained language model as teacher that uses the same tokenizer as the teacher model. -# The number of the layers in the teacher model also needs to be a multiple of the number of the layers in the student. -student = FARMReader(model_name_or_path="huawei-noah/TinyBERT_General_6L_768D", use_gpu=True) - -student.distil_intermediate_layers_from(teacher, data_dir=".", train_filename="augmented_dataset.json", use_gpu=True) -student.distil_prediction_layer_from(teacher, data_dir="data/squad20", train_filename="dev-v2.0.json", use_gpu=True) - -student.save(directory="my_distilled_model") -``` - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) -- [FARM](https://github.com/deepset-ai/FARM) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) diff --git a/markdowns/02_build_a_scalable_question_answering_system.md b/markdowns/02_build_a_scalable_question_answering_system.md deleted file mode 100644 index b7cb4efe..00000000 --- a/markdowns/02_build_a_scalable_question_answering_system.md +++ /dev/null @@ -1,280 +0,0 @@ ---- -layout: tutorial -colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_build_a_scalable_question_answering_system.ipynb -toc: True -title: "Build a Scalable Question Answering System" -last_updated: 2022-11-16 -level: "beginner" -weight: 15 -description: Create a scalable Retriever-Reader pipeline with an Elasticsearch DocumentStore. -category: "QA" -aliases: ['/tutorials/without-elasticsearch', '/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb', '/tutorials/scalable-qa-system'] ---- - - -# Tutorial: Build a Scalable Question Answering System - -- **Level**: Beginner -- **Time to complete**: 20 minutes -- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`, `FARMReader` -- **Goal**: After completing this tutorial, you'll have built a scalable search system that runs on text files and can answer questions about Game of Thrones. You'll then be able to expand this system for your needs. - - -## Overview - -Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as "Who is the father of Arya Stark?" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example. - -This tutorial introduces you to all the concepts needed to build such a question answering system. It also uses Haystack components, such as indexing pipelines, querying pipelines, and DocumentStores backed by external database services. - -Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! - - -## Preparing the Colab Environment - -- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) -- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) - - -## Installing Haystack - -To start, let's install the latest release of Haystack with `pip`: - - -```bash -%%bash - -pip install --upgrade pip -pip install farm-haystack[colab] -``` - -## Initializing the ElasticsearchDocumentStore - -A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here, we're using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs independently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). - -1. Download, extract, and set the permissions for the Elasticsearch installation image. - - -```bash -%%bash - -wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q -tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz -chown -R daemon:daemon elasticsearch-7.9.2 -``` - -2. Start the server. - - -```bash -%%bash --bg - -sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch -``` - -If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this manually, or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api#module-doc_store) utility function. - -3. Wait 30s to ensure that the server has fully started up. - - -```python -import time -time.sleep(30) -``` - -4. Initialize the [`ElasticsearchDocumentStore`](https://docs.haystack.deepset.ai/reference/document-store-api#module-elasticsearch). - - - -```python -import os -from haystack.document_stores import ElasticsearchDocumentStore - -# Get the host where Elasticsearch is running, default to localhost -host = os.environ.get("ELASTICSEARCH_HOST", "localhost") - -document_store = ElasticsearchDocumentStore( - host=host, - username="", - password="", - index="document" -) -``` - -## Indexing Documents with a Pipeline - -The indexing pipeline turns your files into Document objects and writes them to the DocumentStore. Our indexing pipeline will have two nodes: `TextConverter` which turns `.txt` files into Haystack `Document` objects and `PreProcessor` which cleans and splits the text within a `Document`. - -Once these nodes are combined into a pipeline, the pipeline will ingest `.txt` file paths, preprocess them, and write them into the DocumentStore. - - -1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/build_a_scalable_question_answering_system` as a set of `.txt` files. - - -```python -from haystack.utils import fetch_archive_from_http - -doc_dir = "data/build_a_scalable_question_answering_system" - -fetch_archive_from_http( - url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", - output_dir=doc_dir -) -``` - -2. Initialize the pipeline, TextConverter, and PreProcessor. - - -```python -from haystack import Pipeline -from haystack.nodes import TextConverter, PreProcessor - -indexing_pipeline = Pipeline() -text_converter = TextConverter() -preprocessor = PreProcessor( - clean_whitespace=True, - clean_header_footer=True, - clean_empty_lines=True, - split_by="word", - split_length=200, - split_overlap=20, - split_respect_sentence_boundary=True, -) - -``` - -To learn more about the parameters of the `PreProcessor`, see [Usage](https://docs.haystack.deepset.ai/docs/preprocessor#usage). To understand why document splitting is important for your question answering system's performance, see [Document Length](https://docs.haystack.deepset.ai/docs/optimization#document-length). - -2. Add the nodes into an indexing pipeline. You should provide the `name` or `name`s of preceding nodes as the `input` argument. Note that in an indexing pipeline, the input to the first node is "File". - - -```python -import os - -indexing_pipeline.add_node(component=text_converter, name="TextConverter", inputs=["File"]) -indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["TextConverter"]) -indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"]) - -``` - -3. Run the indexing pipeline to write the text data into the DocumentStore. - - -```python -files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)] -indexing_pipeline.run_batch(file_paths=files_to_index) -``` - -While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way. - -As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using [`DocumentStore.write_documents()`](https://docs.haystack.deepset.ai/reference/document-store-api#basedocumentstorewrite_documents). - -## Initializing the Retriever - -Retrievers sift through all the Documents and return only those that are relevant to the question. Here we are using the BM25Retriever. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). - - -```python -from haystack.nodes import BM25Retriever - -retriever = BM25Retriever(document_store=document_store) -``` - -## Initializing the Reader - -A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base-sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). - - -```python -from haystack.nodes import FARMReader - -reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) -``` - -## Creating the Retriever-Reader Pipeline - -You can combine the Reader and Retriever in a querying pipeline using the `Pipeline` class. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on. - -1. Initialize the `Pipeline` object and add the Retriever and Reader as nodes. You should provide the `name` or `name`s of preceding nodes as the input argument. Note that in a querying pipeline, the input to the first node is "Query". - - -```python -from haystack import Pipeline - -querying_pipeline = Pipeline() -querying_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) -querying_pipeline.add_node(component=reader, name="Reader", inputs=["Retriever"]) - -``` - -## Asking a Question - -1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values). - - - -```python -prediction = querying_pipeline.run( - query="Who is the father of Arya Stark?", - params={ - "Retriever": {"top_k": 10}, - "Reader": {"top_k": 5} - } -) -``` - - - -Here are some questions you could try out: -- Who is the father of Arya Stark? -- Who created the Dothraki vocabulary? -- Who is the sister of Sansa? - -2. You can directly print out the answers returned by the pipeline: - - -```python -from pprint import pprint - -pprint(prediction) -``` - -3. Simplify the printed answers: - - -```python -from haystack.utils import print_answers - -print_answers( - prediction, - details="minimum" ## Choose from `minimum`, `medium` and `all` -) -``` - -And there you have it! Congratulations on building a scalable machine learning based question answering system! - -# Next Steps - -To learn how to improve the performance of the Reader, see [Fine-Tune a Reader](https://haystack.deepset.ai/tutorials/03_fine_tune_a_reader). - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) - - - -```python - -``` diff --git a/markdowns/03_Basic_QA_Pipeline_without_Elasticsearch.md b/markdowns/03_Basic_QA_Pipeline_without_Elasticsearch.md deleted file mode 100644 index fb7978b5..00000000 --- a/markdowns/03_Basic_QA_Pipeline_without_Elasticsearch.md +++ /dev/null @@ -1,258 +0,0 @@ ---- -layout: tutorial -colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb -toc: True -title: "Build a QA System Without Elasticsearch" -last_updated: 2022-11-24 -level: "beginner" -weight: 15 -description: Create a Retriever Reader pipeline that requires no external database dependencies. -category: "QA" -aliases: ['/tutorials/without-elasticsearch'] -download: "/downloads/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb" ---- - - - -Haystack provides alternatives to Elasticsearch for developing quick prototypes. - -You can use an `InMemoryDocumentStore` or a `SQLDocumentStore`(with SQLite) as the document store. - -If you are interested in more feature-rich Elasticsearch, then please refer to the Tutorial 1. - -### Prepare environment - -#### Colab: Enable the GPU runtime -Make sure you enable the GPU runtime to experience decent speed in this tutorial. -**Runtime -> Change Runtime type -> Hardware accelerator -> GPU** - - - -You can double check whether the GPU runtime is enabled with the following command: - - -```bash -%%bash - -nvidia-smi -``` - -To start, install the latest release of Haystack with `pip`: - - -```bash -%%bash - -pip install --upgrade pip -pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] -``` - -## Logging - -We configure how logging messages should be displayed and which log level should be used before importing Haystack. -Example log message: -INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt -Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily: - - -```python -import logging - -logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) -logging.getLogger("haystack").setLevel(logging.INFO) -``` - -## Document Store - - - -```python -# In-Memory Document Store -from haystack.document_stores import InMemoryDocumentStore - -document_store = InMemoryDocumentStore() -``` - - -```python -# Alternatively, uncomment the following to use the SQLite Document Store: - -# from haystack.document_stores import SQLDocumentStore -# document_store = SQLDocumentStore(url="sqlite:///qa.db") -``` - -## Preprocessing of documents - -Haystack provides a customizable pipeline for: - - converting files into texts - - cleaning texts - - splitting texts - - writing them to a Document Store - -In this tutorial, we download Wikipedia articles on Game of Thrones, apply a basic cleaning function, and index them in Elasticsearch. - - -```python -from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http - - -# Let's first get some documents that we want to query -# Here: 517 Wikipedia articles for Game of Thrones -doc_dir = "data/tutorial3" -s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt3.zip" -fetch_archive_from_http(url=s3_url, output_dir=doc_dir) - -# convert files to dicts containing documents that can be indexed to our datastore -# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers) -# It must take a str as input, and return a str. -docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True) - -# We now have a list of dictionaries that we can write to our document store. -# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself. -# The default format here is: {"name": "", "content": ""} - -# Let's have a look at the first 3 entries: -print(docs[:3]) - -# Now, let's write the docs to our DB. -document_store.write_documents(docs) -``` - -## Initialize Retriever, Reader & Pipeline - -### Retriever - -Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered. - -With InMemoryDocumentStore or SQLDocumentStore, you can use the TfidfRetriever. For more retrievers, please refer to the tutorial-1. - - -```python -# An in-memory TfidfRetriever based on Pandas dataframes -from haystack.nodes import TfidfRetriever - -retriever = TfidfRetriever(document_store=document_store) -``` - -### Reader - -A Reader scans the texts returned by retrievers in detail and extracts the k best answers. They are based -on powerful, but slower deep learning models. - -Haystack currently supports Readers based on the frameworks FARM and Transformers. -With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models). - -**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2) - -**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package) - -**Alternatives (Models):** e.g. "distilbert-base-uncased-distilled-squad" (fast) or "deepset/bert-large-uncased-whole-word-masking-squad2" (good accuracy) - -**Hint:** You can adjust the model to return "no answer possible" with the no_ans_boost. Higher values mean the model prefers "no answer possible" - -#### FARMReader - - -```python -from haystack.nodes import FARMReader - - -# Load a local model or any of the QA models on -# Hugging Face's model hub (https://huggingface.co/models) -reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) -``` - -#### TransformersReader - -Alternatively, we can use a Transformers reader: - - -```python -# from haystack.nodes import FARMReader, TransformersReader -# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1) -``` - -### Pipeline - -With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline. -Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases. -To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions. -You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelines). - - -```python -from haystack.pipelines import ExtractiveQAPipeline - -pipe = ExtractiveQAPipeline(reader, retriever) -``` - -## Voilà! Ask a question! - - -```python -# You can configure how many candidates the reader and retriever shall return -# The higher top_k for retriever, the better (but also the slower) your answers. -prediction = pipe.run( - query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}} -) -``` - - -```python -# You can try asking more questions: - -# prediction = pipe.run(query="Who created the Dothraki vocabulary?", params={"Reader": {"top_k": 5}}) -# prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}}) -``` - - -```python -# Now you can either print the object directly... -from pprint import pprint - -pprint(prediction) - -# Sample output: -# { -# 'answers': [ , -# , -# ... -# ] -# 'documents': [ , -# , -# ... -# ], -# 'no_ans_gap': 11.688868522644043, -# 'node_id': 'Reader', -# 'params': {'Reader': {'top_k': 5}, 'Retriever': {'top_k': 5}}, -# 'query': 'Who is the father of Arya Stark?', -# 'root_node': 'Query' -# } -``` - - -```python -# ...or use a util to simplify the output -from haystack.utils import print_answers - - -# Change `minimum` to `medium` or `all` to control the level of detail -print_answers(prediction, details="minimum") -``` - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) -- [FARM](https://github.com/deepset-ai/FARM) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) diff --git a/markdowns/03_finetune_a_reader.md b/markdowns/03_finetune_a_reader.md deleted file mode 100644 index 4c7c8174..00000000 --- a/markdowns/03_finetune_a_reader.md +++ /dev/null @@ -1,126 +0,0 @@ ---- -layout: tutorial -colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_finetune_a_reader.ipynb -toc: True -title: "Fine-Tune a Reader" -last_updated: 2022-11-16 -level: "intermediate" -weight: 50 -description: Improve the performance of your Reader by performing fine-tuning. -category: "QA" -aliases: ['/tutorials/fine-tuning-a-model', '/tutorials/02_Finetune_a_model_on_your_data.ipynb', '/tutorials/fine-tune-reader'] ---- - - -# Tutorial: Fine-Tune a Reader to Improve its Performance - -- **Level**: Intermediate -- **Time to complete**: 20 minutes -- **Nodes Used**: `FARMReader` -- **Goal**: Learn how to improve the performance of a DistilBERT Reader model by performing further training on the SQuAD dataset. - -## Overview - -Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained on public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset. - -## Preparing the Colab Environment - -- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) -- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) -- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) - - -## Installing Haystack - -To start, let's install the latest release of Haystack with `pip`: - - -```bash -%%bash - -pip install --upgrade pip -pip install farm-haystack[colab] -``` - - -## Creating Training Data - -To start fine-tuning your Reader model, you need question answering data in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format. One sample from this data should contain a question, a text answer, and the document containing the answer. - -You can start generating your own training data using one of the two tools that we offer: - -1. **Annotation Tool**: You can use the deepset [Annotation Tool](https://haystack.deepset.ai/guides/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. You can then export the question-answer pairs in the SQuAD format that is compatible with fine-tuning in Haystack. - -2. **Feedback Mechanism**: In a production system, you can collect users' feedback to model predictions with Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api) and use this as training data. To learn how to interact with the user feedback endpoints, see [User Feedback](https://docs.haystack.deepset.ai/docs/domain_adaptation#user-feedback). - - - -## Fine-tuning the Reader - -1. Initialize the Reader, supplying the name of the base model you wish to improve. - - -```python -from haystack.nodes import FARMReader - -reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True) -``` - -We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this tutorial, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base-sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). - -2. Provide the SQuAD format training data to the `Reader.train()` method. - - -```python -data_dir = "data/squad20" -reader.train( - data_dir=data_dir, - train_filename="dev-v2.0.json", - use_gpu=True, - n_epochs=1, - save_dir="my_model" -) -``` - -With the default parameters above, we are starting with a base model trained on the SQuAD training dataset and we are further fine-tuning it on the SQuAD development dataset. To fine-tune the model for your domain, replace `train_filename` with your domain-specific dataset. - -To perform evaluation over the course of fine-tuning, see [FARMReader.train() API](https://docs.haystack.deepset.ai/reference/reader-api#farmreadertrain) for the relevant arguments. - -## Saving and Loading - -The model is automatically saved at the end of fine-tuning in the `save_dir` that you specified. -However, you can also manually save the Reader again by running: - - -```python -reader.save(directory="my_model") -``` - -To load a saved model, run: - - -```python -new_reader = FARMReader(model_name_or_path="my_model") -``` - -# Next Steps - -Now that you have a model with improved performance, why not transfer its question answering capabilities into a smaller, faster model? Starting with this new model, you can use model distillation to create a more efficient model with only a slight tradeoff in performance. To learn more, see [Distil a Reader](https://haystack.deepset.ai/tutorials/04_distil_a_reader). - -To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader). - -## About us - -This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany - -We bring NLP to the industry via open source! -Our focus: Industry specific language models & large scale QA systems. - -Some of our other work: -- [German BERT](https://deepset.ai/german-bert) -- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) - -Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) - -By the way: [we're hiring!](https://www.deepset.ai/jobs) From 10d332957d113d46aa6227e12120ac8792a8ba1a Mon Sep 17 00:00:00 2001 From: brandenchan Date: Tue, 29 Nov 2022 16:38:18 +0100 Subject: [PATCH 42/48] Update index.toml and readme --- README.md | 2 +- index.toml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e532212d..dfb500ab 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ To contribute to the tutorials please check out our [Contributing Guidelines](./ ## Tutorials | Name | Colab | Source Code | |--------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------| -| Build Your First Question Answering System | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb) | [01_Basic_QA_Pipeline.ipynb](./tutorials/01_Basic_QA_Pipeline.ipynb) | +| Build Your First Question Answering System | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_build_your_first_question_answering_system.ipynb) | [01_build_your_first_question_answering_system.ipynb](./tutorials/01_build_your_first_question_answering_system.ipynb) | | Fine-Tune a Reader | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_finetune_a_reader.ipynb) | [02_Finetune_a_model_on_your_data.ipynb](./tutorials/02_finetune_a_reader.ipynb) | | Build a Scalable Question Answering System | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_build_a_scalable_question_answering_system.ipynb) | [03_build_a_scalable_question_answering_system.ipynb](./tutorials/03_build_a_scalable_question_answering_system.ipynb) | | FAQ Style QA | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/04_FAQ_style_QA.ipynb) | [04_FAQ_style_QA.ipynb](./tutorials/04_FAQ_style_QA.ipynb) | diff --git a/index.toml b/index.toml index 1ff2e955..94735685 100644 --- a/index.toml +++ b/index.toml @@ -26,7 +26,7 @@ title = "Build a Scalable Question Answering System" description = "Create a scalable Retriever Reader pipeline that uses an ElasticsearchDocumentStore." level = "beginner" weight = 15 -notebook = "03_build_a_scalable_question_answering_system.ipynb.ipynb" +notebook = "03_build_a_scalable_question_answering_system.ipynb" aliases = ["without-elasticsearch"] slug = "03_scalable_qa_pipeline" From 1296fc84df7bd790d883c4121ca3b1f9dd77aa2e Mon Sep 17 00:00:00 2001 From: Tuana Celik Date: Wed, 30 Nov 2022 16:29:52 +0000 Subject: [PATCH 43/48] minor changes for new tutorial structure --- index.toml | 12 +- markdowns/01_Basic_QA_Pipeline.md | 193 ++++++++++++ markdowns/02_Finetune_a_model_on_your_data.md | 126 ++++++++ markdowns/03_Scalable_QA_Pipeline.md | 275 ++++++++++++++++++ markdowns/21_distill_a_reader.md | 6 +- ...your_first_question_answering_system.ipynb | 27 +- tutorials/02_finetune_a_reader.ipynb | 98 +++---- ...a_scalable_question_answering_system.ipynb | 31 +- tutorials/21_distill_a_reader.ipynb | 222 +++++++------- 9 files changed, 783 insertions(+), 207 deletions(-) create mode 100644 markdowns/01_Basic_QA_Pipeline.md create mode 100644 markdowns/02_Finetune_a_model_on_your_data.md create mode 100644 markdowns/03_Scalable_QA_Pipeline.md diff --git a/index.toml b/index.toml index 94735685..fb66759a 100644 --- a/index.toml +++ b/index.toml @@ -9,8 +9,8 @@ description = "Get Started by creating a Retriever Reader pipeline." level = "beginner" weight = 10 notebook = "01_build_your_first_question_answering_system.ipynb" -aliases = ["first-qa-system"] -slug = "01_basic_qa_pipeline" +aliases = ["first-qa-system", "without-elasticsearch", "03_Basic_QA_Pipeline_without_Elasticsearch"] +slug = "01_Basic_QA_Pipeline" [[tutorial]] title = "Fine-Tune a Reader" @@ -18,8 +18,8 @@ description = "Improve the performance of your Reader by performing fine-tuning. level = "intermediate" weight = 50 notebook = "02_finetune_a_reader.ipynb" -aliases = ["fine-tuning-a-model", "02_finetune_a_model_on_your_data", "fine-tune-a-model"] -slug = "02_finetune_a_model_on_your_data" +aliases = ["fine-tuning-a-model"] +slug = "02_Finetune_a_model_on_your_data" [[tutorial]] title = "Build a Scalable Question Answering System" @@ -27,8 +27,8 @@ description = "Create a scalable Retriever Reader pipeline that uses an Elastics level = "beginner" weight = 15 notebook = "03_build_a_scalable_question_answering_system.ipynb" -aliases = ["without-elasticsearch"] -slug = "03_scalable_qa_pipeline" +aliases = [] +slug = "03_Scalable_QA_Pipeline" [[tutorial]] title = "Utilizing Existing FAQs for Question Answering" diff --git a/markdowns/01_Basic_QA_Pipeline.md b/markdowns/01_Basic_QA_Pipeline.md new file mode 100644 index 00000000..6518f38d --- /dev/null +++ b/markdowns/01_Basic_QA_Pipeline.md @@ -0,0 +1,193 @@ +--- +layout: tutorial +colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_build_your_first_question_answering_system.ipynb +toc: True +title: "Build Your First Question Answering System" +last_updated: 2022-11-30 +level: "beginner" +weight: 10 +description: Get Started by creating a Retriever Reader pipeline. +category: "QA" +aliases: ['/tutorials/first-qa-system', '/tutorials/without-elasticsearch', '/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch'] +download: "/downloads/01_build_your_first_question_answering_system.ipynb" +--- + + + +- **Level**: Beginner +- **Time to complete**: 15 minutes +- **Nodes Used**: `InMemoryDocumentStore`, `BM25Retriever`, `FARMReader` +- **Goal**: After completing this tutorial, you will have learned about the Reader and Retriever, and built a question answering pipeline that can answer questions about the Game of Thrones series. + + +## Overview + +Let's learn how to build a question answering system using Haystack's DocumentStore, Retriever, and Reader. Given a question like "Who is the father of Arya Stark?", this program will search through a knowledge base and look for a fitting answer. + +While the documents we are using in this tutorial are all to do with Game of Thrones, the question answering system can work in many domains if you provide the documents. For example, you could add your company's internal wikis, or a collection of financial reports and still receive answers to questions on these topics. + +To help you get started quicker, we have simplified certain steps in this tutorial. For example, Document preparation and pipeline initialization are handled by ready-made classes that replace lines of initialization code. But don't worry! This doesn't affect how well the question answering system performs. + + +## Preparing the Colab Environment + +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + + +## Installing Haystack + +To start, let's install the latest release of Haystack with `pip`: + + +```bash +%%bash + +pip install --upgrade pip +pip install farm-haystack[colab] +``` + +## Initializing the DocumentStore + +A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` which is the simplest DocumentStore to get started with. It requires no external dependencies and is a good option for smaller projects and debugging. However, it does not scale up so well to larger Document collections. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). + + +```python +from haystack.document_stores import InMemoryDocumentStore + +document_store = InMemoryDocumentStore() +``` + +## Preparing Documents + +1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/tutorial1` as a set of `.txt` files. + + +```python +from haystack.utils import fetch_archive_from_http + +doc_dir = "data/build_your_first_question_answering_system" + +fetch_archive_from_http( + url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", + output_dir=doc_dir +) +``` + +2. Use the `TextIndexingPipeline` to convert the files you just downloaded into Haystack [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore. + + +```python +import os +from haystack.pipelines.standard_pipelines import TextIndexingPipeline +# from text_indexing_pipeline import TextIndexingPipeline + +files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)] +indexing_pipeline = TextIndexingPipeline(document_store) +indexing_pipeline.run_batch(file_paths=files_to_index) + + +``` + +While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way. + +As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using `DocumentStore.write_documents()`. + +## Initializing the Retriever + +Retrievers sift through all the Documents and return only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). + + +```python +from haystack.nodes import TfidfRetriever + +retriever = TfidfRetriever(document_store=document_store) +``` + +## Initializing the Reader + +A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). + + +```python +from haystack.nodes import FARMReader + +reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) +``` + +## Creating the Retriever-Reader Pipeline + +The `ExtractiveQAPipeline` connects the Reader and Retriever. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on. + + +```python +from haystack.pipelines import ExtractiveQAPipeline + +pipe = ExtractiveQAPipeline(reader, retriever) +``` + +## Asking a Question + +1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values). + + + +```python +prediction = pipe.run( + query="Who is the father of Arya Stark?", + params={ + "Retriever": {"top_k": 10}, + "Reader": {"top_k": 5} + } +) +``` + +Here are some questions you could try out: +- Who is the father of Arya Stark? +- Who created the Dothraki vocabulary? +- Who is the sister of Sansa? + +2. The answers returned by the pipeline can be printed out directly: + + +```python +from pprint import pprint + +pprint(prediction) +``` + +3. Simplify the printed answers: + + +```python +from haystack.utils import print_answers + +print_answers( + prediction, + details="minimum" ## Choose from `minimum`, `medium` and `all` +) +``` + +And there you have it! Congratulations on building your first machine learning based question answering system! + +# Next Steps + +Check out [Build a Scalable Question Answering System](https://haystack.deepset.ai/tutorials/02_build_a_scalable_question_answering_system) to learn how to make a more advanced question answering system that uses an Elasticsearch backed DocumentStore and makes more use of the flexibility that pipelines offer. + +## About us + +This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany + +We bring NLP to the industry via open source! +Our focus: Industry specific language models & large scale QA systems. + +Some of our other work: +- [German BERT](https://deepset.ai/german-bert) +- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) + +Get in touch: +[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) + +By the way: [we're hiring!](https://www.deepset.ai/jobs) + diff --git a/markdowns/02_Finetune_a_model_on_your_data.md b/markdowns/02_Finetune_a_model_on_your_data.md new file mode 100644 index 00000000..688fbcc2 --- /dev/null +++ b/markdowns/02_Finetune_a_model_on_your_data.md @@ -0,0 +1,126 @@ +--- +layout: tutorial +colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_finetune_a_reader.ipynb +toc: True +title: "Fine-Tune a Reader" +last_updated: 2022-11-30 +level: "intermediate" +weight: 50 +description: Improve the performance of your Reader by performing fine-tuning. +category: "QA" +aliases: ['/tutorials/fine-tuning-a-model'] +download: "/downloads/02_finetune_a_reader.ipynb" +--- + + + +- **Level**: Intermediate +- **Time to complete**: 20 minutes +- **Nodes Used**: `FARMReader` +- **Goal**: Learn how to improve the performance of a DistilBERT Reader model by performing further training on the SQuAD dataset. + +## Overview + +Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained on public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset. + +## Preparing the Colab Environment + +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + + +## Installing Haystack + +To start, let's install the latest release of Haystack with `pip`: + + +```bash +%%bash + +pip install --upgrade pip +pip install farm-haystack[colab] +``` + + +## Creating Training Data + +To start fine-tuning your Reader model, you need question answering data in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format. One sample from this data should contain a question, a text answer, and the document containing the answer. + +You can start generating your own training data using one of the two tools that we offer: + +1. **Annotation Tool**: You can use the deepset [Annotation Tool](https://haystack.deepset.ai/guides/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. You can then export the question-answer pairs in the SQuAD format that is compatible with fine-tuning in Haystack. + +2. **Feedback Mechanism**: In a production system, you can collect users' feedback to model predictions with Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api) and use this as training data. To learn how to interact with the user feedback endpoints, see [User Feedback](https://docs.haystack.deepset.ai/docs/domain_adaptation#user-feedback). + + + +## Fine-tuning the Reader + +1. Initialize the Reader, supplying the name of the base model you wish to improve. + + +```python +from haystack.nodes import FARMReader + +reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True) +``` + +We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this tutorial, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base-sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). + +2. Provide the SQuAD format training data to the `Reader.train()` method. + + +```python +data_dir = "data/squad20" +reader.train( + data_dir=data_dir, + train_filename="dev-v2.0.json", + use_gpu=True, + n_epochs=1, + save_dir="my_model" +) +``` + +With the default parameters above, we are starting with a base model trained on the SQuAD training dataset and we are further fine-tuning it on the SQuAD development dataset. To fine-tune the model for your domain, replace `train_filename` with your domain-specific dataset. + +To perform evaluation over the course of fine-tuning, see [FARMReader.train() API](https://docs.haystack.deepset.ai/reference/reader-api#farmreadertrain) for the relevant arguments. + +## Saving and Loading + +The model is automatically saved at the end of fine-tuning in the `save_dir` that you specified. +However, you can also manually save the Reader again by running: + + +```python +reader.save(directory="my_model") +``` + +To load a saved model, run: + + +```python +new_reader = FARMReader(model_name_or_path="my_model") +``` + +# Next Steps + +Now that you have a model with improved performance, why not transfer its question answering capabilities into a smaller, faster model? Starting with this new model, you can use model distillation to create a more efficient model with only a slight tradeoff in performance. To learn more, see [Distil a Reader](https://haystack.deepset.ai/tutorials/04_distil_a_reader). + +To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader). + +## About us + +This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany + +We bring NLP to the industry via open source! +Our focus: Industry specific language models & large scale QA systems. + +Some of our other work: +- [German BERT](https://deepset.ai/german-bert) +- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) + +Get in touch: +[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) + +By the way: [we're hiring!](https://www.deepset.ai/jobs) diff --git a/markdowns/03_Scalable_QA_Pipeline.md b/markdowns/03_Scalable_QA_Pipeline.md new file mode 100644 index 00000000..b01fae85 --- /dev/null +++ b/markdowns/03_Scalable_QA_Pipeline.md @@ -0,0 +1,275 @@ +--- +layout: tutorial +colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_build_a_scalable_question_answering_system.ipynb +toc: True +title: "Build a Scalable Question Answering System" +last_updated: 2022-11-30 +level: "beginner" +weight: 15 +description: Create a scalable Retriever Reader pipeline that uses an ElasticsearchDocumentStore. +category: "QA" +aliases: [] +download: "/downloads/03_build_a_scalable_question_answering_system.ipynb" +--- + + + +- **Level**: Beginner +- **Time to complete**: 20 minutes +- **Nodes Used**: `ElasticsearchDocumentStore`, `BM25Retriever`, `FARMReader` +- **Goal**: After completing this tutorial, you'll have built a scalable search system that runs on text files and can answer questions about Game of Thrones. You'll then be able to expand this system for your needs. + + +## Overview + +Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as "Who is the father of Arya Stark?" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example. + +This tutorial introduces you to all the concepts needed to build such a question answering system. It also uses Haystack components, such as indexing pipelines, querying pipelines, and DocumentStores backed by external database services. + +Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms! + + +## Preparing the Colab Environment + +- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab) +- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled) +- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level) + + +## Installing Haystack + +To start, let's install the latest release of Haystack with `pip`: + + +```bash +%%bash + +pip install --upgrade pip +pip install farm-haystack[colab] +``` + +## Initializing the ElasticsearchDocumentStore + +A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here, we're using the `ElasticsearchDocumentStore` which connects to a running Elasticsearch service which is a fast and scalable text focused storage option. This service runs independently from Haystack and persists even after the Haystack program has finished running. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). + +1. Download, extract, and set the permissions for the Elasticsearch installation image. + + +```bash +%%bash + +wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q +tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz +chown -R daemon:daemon elasticsearch-7.9.2 +``` + +2. Start the server. + + +```bash +%%bash --bg + +sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch +``` + +If you are working in an environment where Docker is available, you can also start Elasticsearch using Docker. You can do this manually, or using our [`launch_es()`](https://docs.haystack.deepset.ai/reference/utils-api#module-doc_store) utility function. + +3. Wait 30s to ensure that the server has fully started up. + + +```python +import time +time.sleep(30) +``` + +4. Initialize the [`ElasticsearchDocumentStore`](https://docs.haystack.deepset.ai/reference/document-store-api#module-elasticsearch). + + + +```python +import os +from haystack.document_stores import ElasticsearchDocumentStore + +# Get the host where Elasticsearch is running, default to localhost +host = os.environ.get("ELASTICSEARCH_HOST", "localhost") + +document_store = ElasticsearchDocumentStore( + host=host, + username="", + password="", + index="document" +) +``` + +## Indexing Documents with a Pipeline + +The indexing pipeline turns your files into Document objects and writes them to the DocumentStore. Our indexing pipeline will have two nodes: `TextConverter` which turns `.txt` files into Haystack `Document` objects and `PreProcessor` which cleans and splits the text within a `Document`. + +Once these nodes are combined into a pipeline, the pipeline will ingest `.txt` file paths, preprocess them, and write them into the DocumentStore. + + +1. Download 517 articles from the Game of Thrones Wikipedia. You can find them in `data/build_a_scalable_question_answering_system` as a set of `.txt` files. + + +```python +from haystack.utils import fetch_archive_from_http + +doc_dir = "data/build_a_scalable_question_answering_system" + +fetch_archive_from_http( + url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip", + output_dir=doc_dir +) +``` + +2. Initialize the pipeline, TextConverter, and PreProcessor. + + +```python +from haystack import Pipeline +from haystack.nodes import TextConverter, PreProcessor + +indexing_pipeline = Pipeline() +text_converter = TextConverter() +preprocessor = PreProcessor( + clean_whitespace=True, + clean_header_footer=True, + clean_empty_lines=True, + split_by="word", + split_length=200, + split_overlap=20, + split_respect_sentence_boundary=True, +) + +``` + +To learn more about the parameters of the `PreProcessor`, see [Usage](https://docs.haystack.deepset.ai/docs/preprocessor#usage). To understand why document splitting is important for your question answering system's performance, see [Document Length](https://docs.haystack.deepset.ai/docs/optimization#document-length). + +2. Add the nodes into an indexing pipeline. You should provide the `name` or `name`s of preceding nodes as the `input` argument. Note that in an indexing pipeline, the input to the first node is "File". + + +```python +import os + +indexing_pipeline.add_node(component=text_converter, name="TextConverter", inputs=["File"]) +indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["TextConverter"]) +indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"]) + +``` + +3. Run the indexing pipeline to write the text data into the DocumentStore. + + +```python +files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)] +indexing_pipeline.run_batch(file_paths=files_to_index) +``` + +While the default code in this tutorial uses Game of Thrones data, you can also supply your own `.txt` files and index them in the same way. + +As an alternative, you can cast you text data into [Document objects](https://docs.haystack.deepset.ai/docs/documents_answers_labels#document) and write them into the DocumentStore using [`DocumentStore.write_documents()`](https://docs.haystack.deepset.ai/reference/document-store-api#basedocumentstorewrite_documents). + +## Initializing the Retriever + +Retrievers sift through all the Documents and return only those that are relevant to the question. Here we are using the BM25Retriever. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). + + +```python +from haystack.nodes import BM25Retriever + +retriever = BM25Retriever(document_store=document_store) +``` + +## Initializing the Reader + +A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base-sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models). + + +```python +from haystack.nodes import FARMReader + +reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True) +``` + +## Creating the Retriever-Reader Pipeline + +You can combine the Reader and Retriever in a querying pipeline using the `Pipeline` class. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on. + +1. Initialize the `Pipeline` object and add the Retriever and Reader as nodes. You should provide the `name` or `name`s of preceding nodes as the input argument. Note that in a querying pipeline, the input to the first node is "Query". + + +```python +from haystack import Pipeline + +querying_pipeline = Pipeline() +querying_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) +querying_pipeline.add_node(component=reader, name="Reader", inputs=["Retriever"]) + +``` + +## Asking a Question + +1. Use the pipeline `run()` method to ask a question. The query argument is where you type your question. Additionally, you can set the number of documents you want the Reader and Retriever to return using the `top-k` parameter. To learn more about setting arguments, see [Arguments](https://docs.haystack.deepset.ai/docs/pipelines#arguments). To understand the importance of the `top-k` parameter, see [Choosing the Right top-k Values](https://docs.haystack.deepset.ai/docs/optimization#choosing-the-right-top-k-values). + + + +```python +prediction = querying_pipeline.run( + query="Who is the father of Arya Stark?", + params={ + "Retriever": {"top_k": 10}, + "Reader": {"top_k": 5} + } +) +``` + + + +Here are some questions you could try out: +- Who is the father of Arya Stark? +- Who created the Dothraki vocabulary? +- Who is the sister of Sansa? + +2. You can directly print out the answers returned by the pipeline: + + +```python +from pprint import pprint + +pprint(prediction) +``` + +3. Simplify the printed answers: + + +```python +from haystack.utils import print_answers + +print_answers( + prediction, + details="minimum" ## Choose from `minimum`, `medium` and `all` +) +``` + +And there you have it! Congratulations on building a scalable machine learning based question answering system! + +# Next Steps + +To learn how to improve the performance of the Reader, see [Fine-Tune a Reader](https://haystack.deepset.ai/tutorials/03_fine_tune_a_reader). + +## About us + +This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany + +We bring NLP to the industry via open source! +Our focus: Industry specific language models & large scale QA systems. + +Some of our other work: +- [German BERT](https://deepset.ai/german-bert) +- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) + +Get in touch: +[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) + +By the way: [we're hiring!](https://www.deepset.ai/jobs) + diff --git a/markdowns/21_distill_a_reader.md b/markdowns/21_distill_a_reader.md index 77d7b810..97ed7fe9 100644 --- a/markdowns/21_distill_a_reader.md +++ b/markdowns/21_distill_a_reader.md @@ -3,16 +3,16 @@ layout: tutorial colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/21_distill_a_reader.ipynb toc: True title: "Distill a Reader" -last_updated: 2022-11-16 +last_updated: 2022-11-30 level: "intermediate" weight: 115 description: Transfer a Reader's question answering ability to a smaller, more efficient model. category: "QA" aliases: ['/tutorials/distill-reader'] +download: "/downloads/21_distill_a_reader.ipynb" --- -# Tutorial: Distill a Reader - **Level**: Advanced - **Time to complete**: 30 minutes @@ -160,6 +160,6 @@ Some of our other work: - [GermanQuAD and GermanDPR](https://deepset.ai/germanquad) Get in touch: -[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) +[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) By the way: [we're hiring!](https://www.deepset.ai/jobs) diff --git a/tutorials/01_build_your_first_question_answering_system.ipynb b/tutorials/01_build_your_first_question_answering_system.ipynb index 598ae774..dd2d9799 100644 --- a/tutorials/01_build_your_first_question_answering_system.ipynb +++ b/tutorials/01_build_your_first_question_answering_system.ipynb @@ -14,6 +14,9 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Overview\n", "\n", @@ -22,10 +25,7 @@ "While the documents we are using in this tutorial are all to do with Game of Thrones, the question answering system can work in many domains if you provide the documents. For example, you could add your company's internal wikis, or a collection of financial reports and still receive answers to questions on these topics.\n", "\n", "To help you get started quicker, we have simplified certain steps in this tutorial. For example, Document preparation and pipeline initialization are handled by ready-made classes that replace lines of initialization code. But don't worry! This doesn't affect how well the question answering system performs." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -279,14 +279,14 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Next Steps\n", "\n", "Check out [Build a Scalable Question Answering System](https://haystack.deepset.ai/tutorials/02_build_a_scalable_question_answering_system) to learn how to make a more advanced question answering system that uses an Elasticsearch backed DocumentStore and makes more use of the flexibility that pipelines offer." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -304,19 +304,10 @@ "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", "\n", "Get in touch:\n", - "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", + "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [], - "metadata": { - "collapsed": false - } } ], "metadata": { diff --git a/tutorials/02_finetune_a_reader.ipynb b/tutorials/02_finetune_a_reader.ipynb index cf71d783..1d696b6d 100644 --- a/tutorials/02_finetune_a_reader.ipynb +++ b/tutorials/02_finetune_a_reader.ipynb @@ -14,14 +14,14 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Overview\n", "\n", "Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained on public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -38,14 +38,14 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Installing Haystack\n", "\n", "To start, let's install the latest release of Haystack with `pip`:" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", @@ -61,6 +61,9 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "\n", "## Creating Training Data\n", @@ -72,57 +75,57 @@ "1. **Annotation Tool**: You can use the deepset [Annotation Tool](https://haystack.deepset.ai/guides/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. You can then export the question-answer pairs in the SQuAD format that is compatible with fine-tuning in Haystack.\n", "\n", "2. **Feedback Mechanism**: In a production system, you can collect users' feedback to model predictions with Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api) and use this as training data. To learn how to interact with the user feedback endpoints, see [User Feedback](https://docs.haystack.deepset.ai/docs/domain_adaptation#user-feedback).\n" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "\n", "## Fine-tuning the Reader\n", "\n", "1. Initialize the Reader, supplying the name of the base model you wish to improve." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "from haystack.nodes import FARMReader\n", "\n", "reader = FARMReader(model_name_or_path=\"distilbert-base-uncased-distilled-squad\", use_gpu=True)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this tutorial, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base-sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this tutorial, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base-sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + ] }, { "cell_type": "markdown", - "source": [ - "2. Provide the SQuAD format training data to the `Reader.train()` method." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "2. Provide the SQuAD format training data to the `Reader.train()` method." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "data_dir = \"data/squad20\"\n", @@ -133,33 +136,30 @@ " n_epochs=1,\n", " save_dir=\"my_model\"\n", ")" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "With the default parameters above, we are starting with a base model trained on the SQuAD training dataset and we are further fine-tuning it on the SQuAD development dataset. To fine-tune the model for your domain, replace `train_filename` with your domain-specific dataset.\n", "\n", "To perform evaluation over the course of fine-tuning, see [FARMReader.train() API](https://docs.haystack.deepset.ai/reference/reader-api#farmreadertrain) for the relevant arguments." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Saving and Loading\n", "\n", "The model is automatically saved at the end of fine-tuning in the `save_dir` that you specified.\n", "However, you can also manually save the Reader again by running:" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", @@ -174,12 +174,12 @@ }, { "cell_type": "markdown", - "source": [ - "To load a saved model, run:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "To load a saved model, run:" + ] }, { "cell_type": "code", @@ -194,16 +194,16 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Next Steps\n", "\n", "Now that you have a model with improved performance, why not transfer its question answering capabilities into a smaller, faster model? Starting with this new model, you can use model distillation to create a more efficient model with only a slight tradeoff in performance. To learn more, see [Distil a Reader](https://haystack.deepset.ai/tutorials/04_distil_a_reader).\n", "\n", "To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader)." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -223,7 +223,7 @@ "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", "\n", "Get in touch:\n", - "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", + "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)" ] diff --git a/tutorials/03_build_a_scalable_question_answering_system.ipynb b/tutorials/03_build_a_scalable_question_answering_system.ipynb index 3f231735..232017b5 100644 --- a/tutorials/03_build_a_scalable_question_answering_system.ipynb +++ b/tutorials/03_build_a_scalable_question_answering_system.ipynb @@ -14,6 +14,9 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Overview\n", "\n", @@ -22,10 +25,7 @@ "This tutorial introduces you to all the concepts needed to build such a question answering system. It also uses Haystack components, such as indexing pipelines, querying pipelines, and DocumentStores backed by external database services.\n", "\n", "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -369,10 +369,10 @@ }, { "cell_type": "markdown", - "source": [], "metadata": { "collapsed": false - } + }, + "source": [] }, { "cell_type": "markdown", @@ -432,14 +432,14 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Next Steps\n", "\n", "To learn how to improve the performance of the Reader, see [Fine-Tune a Reader](https://haystack.deepset.ai/tutorials/03_fine_tune_a_reader)." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -457,19 +457,10 @@ "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", "\n", "Get in touch:\n", - "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", + "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)\n" ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [], - "metadata": { - "collapsed": false - } } ], "metadata": { diff --git a/tutorials/21_distill_a_reader.ipynb b/tutorials/21_distill_a_reader.ipynb index da6caed3..d1103bc0 100644 --- a/tutorials/21_distill_a_reader.ipynb +++ b/tutorials/21_distill_a_reader.ipynb @@ -14,14 +14,14 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Overview\n", "\n", "Model distillation is the process of teaching a smaller model to imitate the performance of a larger, better trained model. By distilling one model into another, you end up with a more computationally efficient version of the original with only a slight trade-off in accuracy. In this tutorial, you will learn how to perform one form of model distillation on Reader models in Haystack. Model distillation is a complex topic and an active area of research so if want to learn more about it, see [Model Distillation](https://docs.haystack.deepset.ai/docs/model_distillation)." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -40,14 +40,14 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Installing Haystack\n", "\n", "To start, let's install the latest release of Haystack with `pip`:" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", @@ -63,47 +63,50 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Augmenting Training Data\n", "\n", "Having more human annotated training data is useful at all levels of model training. However, intermediate layer distillation can benefit even from synthetically generated data, since it is a less exact type of training. In this tutorial, we'll be using the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py) to augment our dataset. It creates artificial copies of question answering samples by replacing randomly chosen words with words of similar meaning. This meaning similarity is determined by their vector representations in a GLoVe word embedding model." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "1. Download the `augment_squad.py` script." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "1. Download the `augment_squad.py` script." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!wget https://raw.githubusercontent.com/deepset-ai/haystack/main/haystack/utils/augment_squad.py" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "2. Download a small slice of the SQuAD question answering database." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "2. Download a small slice of the SQuAD question answering database." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "from haystack.utils import fetch_archive_from_http\n", @@ -113,55 +116,55 @@ "\n", "s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/squad_small.json.zip\"\n", "fetch_archive_from_http(url=s3_url, output_dir=squad_dir)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - " 3. Download a set of GLoVe vectors." - ], "metadata": { "collapsed": false - } + }, + "source": [ + " 3. Download a set of GLoVe vectors." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "glove_dir = doc_dir + \"/glove\"\n", "\n", "glove_url = \"https://nlp.stanford.edu/data/glove.6B.zip\"\n", "fetch_archive_from_http(url=glove_url, output_dir=glove_dir)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "This tutorial uses a smaller set of vectors and a smaller dataset to make it faster. For real use cases, pick larger versions of both." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "This tutorial uses a smaller set of vectors and a smaller dataset to make it faster. For real use cases, pick larger versions of both." + ] }, { "cell_type": "markdown", - "source": [ - "4. Run the `augment_squad.py` script to create an augmented dataset." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "4. Run the `augment_squad.py` script to create an augmented dataset." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python augment_squad.py \\\n", @@ -169,170 +172,167 @@ " --glove_path data/distil_a_reader/glove/glove.6B.300d.txt \\\n", " --output_path augmented_dataset.json \\\n", " --multiplication_factor 2" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "The multiplication factor determines how many augmented samples we're generating. Setting it to 2 makes it much quicker to run. In real use cases, set this to something like 20." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "The multiplication factor determines how many augmented samples we're generating. Setting it to 2 makes it much quicker to run. In real use cases, set this to something like 20." + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Distilling a Reader\n", "\n", "Distillation in Haystack is done in two phases:\n", "- Intermediate layer distillation ensures that the teacher and student models behave similarly. This can be performed using the augmented data. While intermediate layer distillation is optional, it will improve the performance of the model after training.\n", "- Prediction layer distillation optimizes the model for the specific task. This must be performed using the non-augmented data.\n" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "1. Initialize the teacher model." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "1. Initialize the teacher model." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "from haystack.nodes import FARMReader\n", "\n", "teacher = FARMReader(model_name_or_path=\"deepset/bert-base-uncased-squad2\", use_gpu=True)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "Here we are using [`deepset/bert-base-uncased-squad2`](https://huggingface.co/deepset/bert-base-uncased-squad2), a base sized BERT model trained on SQuAD." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "Here we are using [`deepset/bert-base-uncased-squad2`](https://huggingface.co/deepset/bert-base-uncased-squad2), a base sized BERT model trained on SQuAD." + ] }, { "cell_type": "markdown", - "source": [ - "2. Initialize the student model." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "2. Initialize the student model." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "student = FARMReader(model_name_or_path=\"huawei-noah/TinyBERT_General_6L_768D\", use_gpu=True)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "Here we are using a TinyBERT model that is smaller than the teacher model. You can pick any other student model, so long as it uses the same tokenizer as the teacher model. Also, the number of layers in the teacher model must be a multiple of the number of layers in the student." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "Here we are using a TinyBERT model that is smaller than the teacher model. You can pick any other student model, so long as it uses the same tokenizer as the teacher model. Also, the number of layers in the teacher model must be a multiple of the number of layers in the student." + ] }, { "cell_type": "markdown", - "source": [ - "3. Perform intermediate layer distillation." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "3. Perform intermediate layer distillation." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "student.distil_intermediate_layers_from(teacher, data_dir=\".\", train_filename=\"augmented_dataset.json\", use_gpu=True)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "4. Perform prediction layer distillation." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "4. Perform prediction layer distillation." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "student.distil_prediction_layer_from(teacher, data_dir=\"data/squad20\", train_filename=\"dev-v2.0.json\", use_gpu=True)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "5. Save the student model." - ], "metadata": { "collapsed": false - } + }, + "source": [ + "5. Save the student model." + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "student.save(directory=\"my_distilled_model\")" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [], "metadata": { "collapsed": false - } + }, + "source": [] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Next Steps\n", "\n", "To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader)." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -352,7 +352,7 @@ "- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)\n", "\n", "Get in touch:\n", - "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", + "[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n", "\n", "By the way: [we're hiring!](https://www.deepset.ai/jobs)" ] From f0f90c8999a0c666e6d70d7dbc49639542eeb28a Mon Sep 17 00:00:00 2001 From: agnieszka-m Date: Mon, 5 Dec 2022 19:37:58 +0100 Subject: [PATCH 44/48] add bm25 and lg updates --- ...your_first_question_answering_system.ipynb | 23 ++++++++++++------- tutorials/21_distill_a_reader.ipynb | 14 +++++------ 2 files changed, 22 insertions(+), 15 deletions(-) diff --git a/tutorials/01_build_your_first_question_answering_system.ipynb b/tutorials/01_build_your_first_question_answering_system.ipynb index dd2d9799..4a8d9c2a 100644 --- a/tutorials/01_build_your_first_question_answering_system.ipynb +++ b/tutorials/01_build_your_first_question_answering_system.ipynb @@ -22,9 +22,9 @@ "\n", "Let's learn how to build a question answering system using Haystack's DocumentStore, Retriever, and Reader. Given a question like \"Who is the father of Arya Stark?\", this program will search through a knowledge base and look for a fitting answer.\n", "\n", - "While the documents we are using in this tutorial are all to do with Game of Thrones, the question answering system can work in many domains if you provide the documents. For example, you could add your company's internal wikis, or a collection of financial reports and still receive answers to questions on these topics.\n", + "While the documents we're using in this tutorial are all to do with Game of Thrones, the question answering system can work in many domains if you provide the documents. For example, you could add your company's internal wikis, or a collection of financial reports, and still receive answers to questions on these topics.\n", "\n", - "To help you get started quicker, we have simplified certain steps in this tutorial. For example, Document preparation and pipeline initialization are handled by ready-made classes that replace lines of initialization code. But don't worry! This doesn't affect how well the question answering system performs." + "To help you get started quicker, we have simplified certain steps in this tutorial. For example, we use ready-made classes to handle Document preparation and pipeline initialization. These classes replace lines of initialization code you could use otherwise. But don't worry! This doesn't affect how well the question answering system performs." ] }, { @@ -66,7 +66,7 @@ "source": [ "## Initializing the DocumentStore\n", "\n", - "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` which is the simplest DocumentStore to get started with. It requires no external dependencies and is a good option for smaller projects and debugging. However, it does not scale up so well to larger Document collections. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." + "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` which is the simplest DocumentStore to get started with. It requires no external dependencies and is a good option for smaller projects and debugging. But it does not scale up so well to larger Document collections. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." ] }, { @@ -143,7 +143,7 @@ "source": [ "## Initializing the Retriever\n", "\n", - "Retrievers sift through all the Documents and return only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." + "A Retriever sifts through all the Documents and returns only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." ] }, { @@ -183,7 +183,7 @@ "source": [ "## Creating the Retriever-Reader Pipeline\n", "\n", - "The `ExtractiveQAPipeline` connects the Reader and Retriever. The combination of the two speeds up processing because the Reader only processes the Documents that the Retriever has passed on." + "The `ExtractiveQAPipeline` connects the Reader and Retriever. The combination of these two nodes speeds up processing because the Reader only processes the Documents that the Retriever has passed on." ] }, { @@ -197,6 +197,13 @@ "pipe = ExtractiveQAPipeline(reader, retriever)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Pipelines are customizable. If you want to learn more, see [Pipelines](https://docs.haystack.deepset.ai/docs/pipelines)." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -312,7 +319,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3.9.13 64-bit (microsoft store)", "language": "python", "name": "python3" }, @@ -326,11 +333,11 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.3" + "version": "3.9.13" }, "vscode": { "interpreter": { - "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + "hash": "9075e6086e4e65b56cd3eb170a15e0fca54180da9a114ef73f891ab1378b8e41" } } }, diff --git a/tutorials/21_distill_a_reader.ipynb b/tutorials/21_distill_a_reader.ipynb index d1103bc0..a3626ee6 100644 --- a/tutorials/21_distill_a_reader.ipynb +++ b/tutorials/21_distill_a_reader.ipynb @@ -20,7 +20,7 @@ "source": [ "## Overview\n", "\n", - "Model distillation is the process of teaching a smaller model to imitate the performance of a larger, better trained model. By distilling one model into another, you end up with a more computationally efficient version of the original with only a slight trade-off in accuracy. In this tutorial, you will learn how to perform one form of model distillation on Reader models in Haystack. Model distillation is a complex topic and an active area of research so if want to learn more about it, see [Model Distillation](https://docs.haystack.deepset.ai/docs/model_distillation)." + "Model distillation is the process of teaching a smaller model to imitate the performance of a larger, better trained model. By distilling one model into another, you end up with a more computationally efficient version of the original with only a slight trade-off in accuracy. In this tutorial, you will learn how to perform one form of model distillation on Reader models in Haystack. Model distillation is a complex topic and an active area of research so if you want to learn more about it, see [Model Distillation](https://docs.haystack.deepset.ai/docs/model_distillation)." ] }, { @@ -32,9 +32,9 @@ "## Preparing the Colab Environment\n", "\n", "
\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n", + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab).\n", + "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled).\n", + "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level).\n", "
\n" ] }, @@ -69,7 +69,7 @@ "source": [ "## Augmenting Training Data\n", "\n", - "Having more human annotated training data is useful at all levels of model training. However, intermediate layer distillation can benefit even from synthetically generated data, since it is a less exact type of training. In this tutorial, we'll be using the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py) to augment our dataset. It creates artificial copies of question answering samples by replacing randomly chosen words with words of similar meaning. This meaning similarity is determined by their vector representations in a GLoVe word embedding model." + "Having more human-annotated training data is useful at all levels of model training. But intermediate layer distillation can benefit even from synthetically generated data as it's a less exact type of training. In this tutorial, we'll be using the [`augment_squad.py` script](https://github.com/deepset-ai/haystack/blob/main/haystack/utils/augment_squad.py) to augment our dataset. It creates artificial copies of question answering samples by replacing randomly chosen words with words of similar meaning. This meaning similarity is determined by their vector representations in a GLoVe word embedding model." ] }, { @@ -192,8 +192,8 @@ "## Distilling a Reader\n", "\n", "Distillation in Haystack is done in two phases:\n", - "- Intermediate layer distillation ensures that the teacher and student models behave similarly. This can be performed using the augmented data. While intermediate layer distillation is optional, it will improve the performance of the model after training.\n", - "- Prediction layer distillation optimizes the model for the specific task. This must be performed using the non-augmented data.\n" + "- Intermediate layer distillation: This is an optional phase but we recommend doing as it improves the performance of the model after training. Its goal is to ensure the teacher and student models behave similarly. You can use the augmented data in this phase. \n", + "- Prediction layer distillation: This phase optimizes the model for the specific task. You must use non-augmented data in this phase.\n" ] }, { From 5776b81f731d6fa4ca955f11c592c4093763662c Mon Sep 17 00:00:00 2001 From: agnieszka-m Date: Tue, 6 Dec 2022 10:35:08 +0100 Subject: [PATCH 45/48] update with bm25 --- markdowns/01_Basic_QA_Pipeline.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/markdowns/01_Basic_QA_Pipeline.md b/markdowns/01_Basic_QA_Pipeline.md index 6518f38d..6ae9952e 100644 --- a/markdowns/01_Basic_QA_Pipeline.md +++ b/markdowns/01_Basic_QA_Pipeline.md @@ -52,11 +52,13 @@ pip install farm-haystack[colab] A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` which is the simplest DocumentStore to get started with. It requires no external dependencies and is a good option for smaller projects and debugging. However, it does not scale up so well to larger Document collections. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store). +Let's initialize the DocumentStore and enable it to work with the BM25Retriever: + ```python from haystack.document_stores import InMemoryDocumentStore -document_store = InMemoryDocumentStore() +document_store = InMemoryDocumentStore(use_bm25=True) ``` ## Preparing Documents From fc737de598ebdbee73f6be9214d7e25bd6ba2a10 Mon Sep 17 00:00:00 2001 From: agnieszka-m Date: Tue, 6 Dec 2022 10:40:42 +0100 Subject: [PATCH 46/48] Update links and retriever --- markdowns/01_Basic_QA_Pipeline.md | 2 +- ...ild_your_first_question_answering_system.ipynb | 15 +++++++++++---- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/markdowns/01_Basic_QA_Pipeline.md b/markdowns/01_Basic_QA_Pipeline.md index 6ae9952e..022fdbc2 100644 --- a/markdowns/01_Basic_QA_Pipeline.md +++ b/markdowns/01_Basic_QA_Pipeline.md @@ -98,7 +98,7 @@ As an alternative, you can cast you text data into [Document objects](https://do ## Initializing the Retriever -Retrievers sift through all the Documents and return only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). +A Retriever sifts through all the Documents and returns only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever). ```python diff --git a/tutorials/01_build_your_first_question_answering_system.ipynb b/tutorials/01_build_your_first_question_answering_system.ipynb index 4a8d9c2a..f962f544 100644 --- a/tutorials/01_build_your_first_question_answering_system.ipynb +++ b/tutorials/01_build_your_first_question_answering_system.ipynb @@ -69,6 +69,13 @@ "A DocumentStore stores the Documents that the question answering system uses to find answers to your questions. Here we are using the `InMemoryDocumentStore` which is the simplest DocumentStore to get started with. It requires no external dependencies and is a good option for smaller projects and debugging. But it does not scale up so well to larger Document collections. To learn more about the DocumentStore and the different types of external databases that we support, see [DocumentStore](https://docs.haystack.deepset.ai/docs/document_store)." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's initialize the DocumentStore and enable it to work with the BM25Retriever:" + ] + }, { "cell_type": "code", "execution_count": null, @@ -77,7 +84,7 @@ "source": [ "from haystack.document_stores import InMemoryDocumentStore\n", "\n", - "document_store = InMemoryDocumentStore()" + "document_store = InMemoryDocumentStore(use_bm25=True)" ] }, { @@ -143,7 +150,7 @@ "source": [ "## Initializing the Retriever\n", "\n", - "A Retriever sifts through all the Documents and returns only those that it thinks might be relevant to the question. Here we are using the TF-IDF algorithm. For more Retriever options, see [Retriever](https://haystack.deepset.ai/pipeline_nodes/retriever)." + "A Retriever sifts through all the Documents and returns only those that it thinks might be relevant to the question. In this tutorial, we're using the BM25Retriever, which is the recommended default. For more Retriever options, see [Retriever](https://docs.haystack.deepset.ai/docs/retriever)." ] }, { @@ -154,7 +161,7 @@ "source": [ "from haystack.nodes import TfidfRetriever\n", "\n", - "retriever = TfidfRetriever(document_store=document_store)" + "retriever = BM25Retriever(document_store=document_store)" ] }, { @@ -163,7 +170,7 @@ "source": [ "## Initializing the Reader\n", "\n", - "A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + "A Reader scans the texts returned by Retrievers in detail and extracts the top answer candidates. Readers are based on powerful deep learning models but are much slower than Retrievers at processing the same amount of text. Here we are using a base sized RoBERTa question answering model called [`deepset/roberta-base-squad2`](https://huggingface.co/deepset/roberta-base-squad2). To find out what model works best for your use case, see [Models](https://docs.haystack.deepset.ai/docs/reader#models)." ] }, { From 0c7bee8f466e79e214edd366cc6e3ce5d4b9b419 Mon Sep 17 00:00:00 2001 From: agnieszka-m Date: Tue, 6 Dec 2022 13:55:21 +0100 Subject: [PATCH 47/48] Update links --- tutorials/02_finetune_a_reader.ipynb | 14 ++++++++------ ...uild_a_scalable_question_answering_system.ipynb | 4 ++-- tutorials/21_distill_a_reader.ipynb | 8 ++++---- 3 files changed, 14 insertions(+), 12 deletions(-) diff --git a/tutorials/02_finetune_a_reader.ipynb b/tutorials/02_finetune_a_reader.ipynb index 1d696b6d..cda93ea2 100644 --- a/tutorials/02_finetune_a_reader.ipynb +++ b/tutorials/02_finetune_a_reader.ipynb @@ -20,7 +20,9 @@ "source": [ "## Overview\n", "\n", - "Fine-tuning can improve your Reader's performance on question answering, especially if you're working with very specific domains. While many of the existing public models trained on public question answering datasets are enough for most use cases, fine-tuning can help your model understand the phrases and terms specific to your field. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. After completing this tutorial, you will have all the tools needed to fine-tune a pretrained model on your own dataset." + "Fine-tuning can help your model understand the phrases and terms specific to your field and improve your Reader's performance on question answering. If you're working with very specific domains, you may find the existing public models trained on public question answering datasets are not enough for your use case. That's when fine-tuning can help. While this varies for each domain and dataset, we've had cases where ~2000 examples increased performance by as much as +5-20%. \n", + "\n", + "After completing this tutorial, you will have learned how to fine-tune a pretrained model on your own dataset." ] }, { @@ -72,7 +74,7 @@ "\n", "You can start generating your own training data using one of the two tools that we offer:\n", "\n", - "1. **Annotation Tool**: You can use the deepset [Annotation Tool](https://haystack.deepset.ai/guides/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. You can then export the question-answer pairs in the SQuAD format that is compatible with fine-tuning in Haystack.\n", + "1. **Annotation Tool**: You can use the deepset [Annotation Tool](https://docs.haystack.deepset.ai/docs/annotation) to write questions and highlight answers in a document. The tool supports structuring your workflow with organizations, projects, and users. You can then export the question-answer pairs in the SQuAD format that is compatible with fine-tuning in Haystack.\n", "\n", "2. **Feedback Mechanism**: In a production system, you can collect users' feedback to model predictions with Haystack's [REST API interface](https://github.com/deepset-ai/haystack#rest-api) and use this as training data. To learn how to interact with the user feedback endpoints, see [User Feedback](https://docs.haystack.deepset.ai/docs/domain_adaptation#user-feedback).\n" ] @@ -108,7 +110,7 @@ "collapsed": false }, "source": [ - "We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this tutorial, we are using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base-sized DistilBERT model that was trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://haystack.deepset.ai/pipeline_nodes/reader#models)." + "We recommend using a model that was trained on SQuAD or a similar question answering dataset to benefit from transfer learning effects. In this tutorial, we're using [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squadbase), a base-sized DistilBERT model trained on SQuAD. To learn more about what model works best for your use case, see [Models](https://docs.haystack.deepset.ai/docs/reader#models)." ] }, { @@ -231,7 +233,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.8.9 64-bit", + "display_name": "Python 3.9.13 64-bit (microsoft store)", "language": "python", "name": "python3" }, @@ -245,11 +247,11 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.9" + "version": "3.9.13" }, "vscode": { "interpreter": { - "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + "hash": "9075e6086e4e65b56cd3eb170a15e0fca54180da9a114ef73f891ab1378b8e41" } } }, diff --git a/tutorials/03_build_a_scalable_question_answering_system.ipynb b/tutorials/03_build_a_scalable_question_answering_system.ipynb index 232017b5..e64b3884 100644 --- a/tutorials/03_build_a_scalable_question_answering_system.ipynb +++ b/tutorials/03_build_a_scalable_question_answering_system.ipynb @@ -20,11 +20,11 @@ "source": [ "## Overview\n", "\n", - "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\" In this tutorial, we will work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", + "Learn how to set up a question answering system that can search through complex knowledge bases and highlight answers to questions such as \"Who is the father of Arya Stark?\". In this tutorial, we'll work on a set of Wikipedia pages about Game of Thrones, but you can adapt it to search through internal wikis or a collection of financial reports, for example.\n", "\n", "This tutorial introduces you to all the concepts needed to build such a question answering system. It also uses Haystack components, such as indexing pipelines, querying pipelines, and DocumentStores backed by external database services.\n", "\n", - "Let's learn how to build a question answering system and discover more about the marvellous seven kingdoms!" + "Let's learn how to build a question answering system and discover more about the marvelous seven kingdoms!" ] }, { diff --git a/tutorials/21_distill_a_reader.ipynb b/tutorials/21_distill_a_reader.ipynb index a3626ee6..c6e59e90 100644 --- a/tutorials/21_distill_a_reader.ipynb +++ b/tutorials/21_distill_a_reader.ipynb @@ -331,7 +331,7 @@ "source": [ "# Next Steps\n", "\n", - "To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluate_a_reader)." + "To learn how to measure the performance of these Reader models, see [Evaluate a Reader model](https://haystack.deepset.ai/tutorials/05_evaluation)." ] }, { @@ -360,7 +360,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.8.9 64-bit", + "display_name": "Python 3.9.13 64-bit (microsoft store)", "language": "python", "name": "python3" }, @@ -374,11 +374,11 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.9" + "version": "3.9.13" }, "vscode": { "interpreter": { - "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + "hash": "9075e6086e4e65b56cd3eb170a15e0fca54180da9a114ef73f891ab1378b8e41" } } }, From d31cc90a1f081789a59cefb9ed91cebefa1abc0a Mon Sep 17 00:00:00 2001 From: agnieszka-m Date: Wed, 7 Dec 2022 14:18:05 +0100 Subject: [PATCH 48/48] update the gpu links --- ...your_first_question_answering_system.ipynb | 22 ++++++++++++-- tutorials/02_finetune_a_reader.ipynb | 23 +++++++++++++-- ...a_scalable_question_answering_system.ipynb | 29 +++++++++++++++---- tutorials/21_distill_a_reader.ipynb | 26 +++++++++++++---- 4 files changed, 83 insertions(+), 17 deletions(-) diff --git a/tutorials/01_build_your_first_question_answering_system.ipynb b/tutorials/01_build_your_first_question_answering_system.ipynb index f962f544..92ad32cf 100644 --- a/tutorials/01_build_your_first_question_answering_system.ipynb +++ b/tutorials/01_build_your_first_question_answering_system.ipynb @@ -34,9 +34,7 @@ "\n", "## Preparing the Colab Environment\n", "\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n" + "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab)\n" ] }, { @@ -60,6 +58,24 @@ "pip install farm-haystack[colab]" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the logging level to INFO:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", + "logging.getLogger(\"haystack\").setLevel(logging.INFO)" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/tutorials/02_finetune_a_reader.ipynb b/tutorials/02_finetune_a_reader.ipynb index cda93ea2..21b6daf3 100644 --- a/tutorials/02_finetune_a_reader.ipynb +++ b/tutorials/02_finetune_a_reader.ipynb @@ -33,9 +33,7 @@ "source": [ "## Preparing the Colab Environment\n", "\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n" + "[Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab).\n" ] }, { @@ -61,6 +59,25 @@ "pip install farm-haystack[colab]" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the logging level to INFO:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "\n", + "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", + "logging.getLogger(\"haystack\").setLevel(logging.INFO)" + ] + }, { "cell_type": "markdown", "metadata": { diff --git a/tutorials/03_build_a_scalable_question_answering_system.ipynb b/tutorials/03_build_a_scalable_question_answering_system.ipynb index e64b3884..e76f9360 100644 --- a/tutorials/03_build_a_scalable_question_answering_system.ipynb +++ b/tutorials/03_build_a_scalable_question_answering_system.ipynb @@ -34,9 +34,7 @@ "\n", "## Preparing the Colab Environment\n", "\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab)\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level)\n" + "[Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab)\n" ] }, { @@ -60,6 +58,25 @@ "pip install farm-haystack[colab]" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the logging level to INFO:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "\n", + "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", + "logging.getLogger(\"haystack\").setLevel(logging.INFO)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -465,7 +482,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3.9.13 64-bit (microsoft store)", "language": "python", "name": "python3" }, @@ -479,11 +496,11 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.3" + "version": "3.9.13" }, "vscode": { "interpreter": { - "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" + "hash": "9075e6086e4e65b56cd3eb170a15e0fca54180da9a114ef73f891ab1378b8e41" } } }, diff --git a/tutorials/21_distill_a_reader.ipynb b/tutorials/21_distill_a_reader.ipynb index c6e59e90..01625748 100644 --- a/tutorials/21_distill_a_reader.ipynb +++ b/tutorials/21_distill_a_reader.ipynb @@ -31,11 +31,8 @@ "source": [ "## Preparing the Colab Environment\n", "\n", - "
\n", - "- [Enable GPU Runtime in GPU](https://docs.haystack.deepset.ai/docs/enable-gpu-runtime-in-colab).\n", - "- [Check if GPU is Enabled](https://docs.haystack.deepset.ai/docs/check-if-gpu-is-enabled).\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/set-the-logging-level).\n", - "
\n" + "\n", + "[Enable GPU acceleration](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab).\n" ] }, { @@ -61,6 +58,25 @@ "pip install farm-haystack[colab]" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, set the log level to INFO:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "\n", + "logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n", + "logging.getLogger(\"haystack\").setLevel(logging.INFO)``" + ] + }, { "cell_type": "markdown", "metadata": {