diff --git a/README.md b/README.md index 86d7421..01f066e 100644 --- a/README.md +++ b/README.md @@ -10,14 +10,18 @@

Key Features • - InstallationComponents • - Examples • - How To Use • - Benchmarks + Installation • + Getting Started • + Examples

-fast**RAG** is a research framework designed to facilitate the building of retrieval augmented generative pipelines. Its main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models. The framework includes a variety of sparse and dense retrieval models, as well as different extractive and generative information processing models. fast**RAG** aims to provide researchers and developers with a comprehensive tool-set for exploring and advancing the field of retrieval augmented generation. +fast**RAG** is a research framework designed to facilitate the building of retrieval augmented generative pipelines. Its main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models. The framework includes a variety of sparse and dense retrieval models, as well as different extractive and generative information processing models. fastRAG aims to provide researchers and developers with a comprehensive tool-set for exploring and advancing the field of retrieval augmented generation. + +## Updates + +- **May 2023**: [RAG with LLM and dynamic prompt synthesis example](examples/rag-prompt-hf.ipynb). +- **April 2023**: Qdrant `DocumentStore` support. ## :tophat: Key Features @@ -26,57 +30,133 @@ fast**RAG** is a research framework designed to facilitate the building of retri - **Intel Optimizations** (**TBA**): Leverage the latest optimizations developed by Intel for running pipelines with maximum hardware utilization, reduced latency, and increased throughput, using frameworks such as [Intel extensions for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch) and [Intel extension for Transformers](https://github.com/intel/intel-extension-for-transformers). - **Customizable**: Built using [Haystack](https://github.com/deepset-ai/haystack) and HuggingFace. All of fastRAG's components are 100% Haystack compatible. + +| | | +| - | - | +| [Components](fastrag/) | fastRAG components | +| [Models](models.md) | Models overview | +| [Configs](config/) | Example and predefined configurations | +| [Example notebooks](examples/) | Example jupyter notebooks | +| [Demos](demo/README.md) | Example UIs for demos | +| [Benchmarks](benchmarks/README.md) | Misc. benchmarks of fastRAG components | +| [Scripts](scripts/) | Scripts for creating indexes and fine-tuning models | + +## :books: Components + +For a brief overview of the various models, please refer to the [Models Overview](models.md) section. + +Unique components in fastRAG: + +- [**PLAID**](https://arxiv.org/abs/2205.09707): An incredibly efficient engine designed for retrieving information through late interaction. +- [**ColBERT**](https://arxiv.org/abs/2112.01488): A Retriever (used in conjunction with PLAID) and re-ranker (employed with dense embeddings) that employs late interaction to determine relevancy scores. +- [**Fusion-in-Decoder (FiD)**](https://arxiv.org/abs/2007.01282): A generative reader tailored for multi-document retrieval augmentation tasks. +- [**Stable Diffusion Generator**](https://arxiv.org/pdf/2112.10752.pdf): A text-to-image generator that can be seamlessly integrated into any pipeline output. +- [Retrieval-Oriented **Knowledge Graph Construction**](https://arxiv.org/abs/2010.01057): A pipeline component responsible for extracting named entities and creating a graph encompassing all entities specified in the retrieved documents, including the relationships between related pairs of entities. + ## :round_pushpin: Installation Preliminary requirements: -- Python 3.8+ -- PyTorch +- Python version 3.8 or higher +- PyTorch library -In a new virtual environment, run: +To set up the software, perform the following steps in a fresh virtual environment: ```bash pip install . ``` -There are various dependencies, based on usage: +There are several dependencies to consider, depending on your specific usage: ```bash # Additional engines/components -pip install .[faiss-cpu] # CPU-based Faiss -pip install .[faiss-gpu] # GPU-based Faiss -pip install .[qdrant] # Qdrant support -pip install libs/colbert # ColBERT/PLAID indexing engine -pip install .[image-generation] # Stable diffusion library -pip install .[knowledge_graph] # spacy and KG libraries - -# REST API + UI +pip install .[elastic] # Support for ElasticSearch store +pip install .[qdrant] # Support for Qdrant store +pip install libs/colbert # Indexing engine for ColBERT/PLAID +pip install .[faiss-cpu] # CPU-based Faiss library +pip install .[faiss-gpu] # GPU-based Faiss library +pip install .[image-generation] # Stable diffusion library for image generation +pip install .[knowledge_graph] # Libraries for working with spacy and KG + +# User interface (for demos) pip install .[ui] # Benchmarking pip install .[benchmark] -# Dev tools +# Development tools pip install .[dev] ``` -## :books: Components -For a short overview of the different models see [Models Overview](models.md). +## :rocket: Getting Started -Unique components in fastRAG: +fastRAG leverages Haystack's pipelining abstraction. We recommend constructing a flow by incorporating components provided by fastRAG and Haystack, tailored to the specific task you aim to tackle. There are various approaches to achieving this using fastRAG. + +#### Defining Pipelines in Your Code + +To define a pipeline in your Python code, you can initialize all the components with the desired configuration directly in your code. This allows you to have full control over the pipeline structure and parameters. For concrete examples and detailed implementation guidance, please refer to the example [notebooks](examples/) provided by our team. + +#### Defining Pipelines Using YAML + +Another approach to defining pipelines is by writing a YAML file following Haystack's format. This method allows for a more declarative and modular pipeline configuration. You can find detailed information on how to define pipelines using a YAML file in the [Haystack documentation](https://docs.haystack.deepset.ai/docs/pipelines#yaml-file-definitions). The documentation provides guidance on the structure of the YAML file, available components, their parameters, and how to combine them to create a custom pipeline. + +We have provided miscellaneous pipeline configurations in the config directory. + +#### Serving a Pipeline via REST API + +To serve a fastRAG pipeline through a REST API, you can follow these steps: + +1. Execute the following command in your terminal: + +```bash +python -m fastrag.rest_api.application --config=pipeline.yaml +``` + +2. If needed, you can explore additional options using the `-h` flag. + +3. The REST API service includes support for Swagger. You can access a user-friendly UI to observe and interact with the API endpoints by visiting `http://localhost:8000/docs` in your web browser. -- [**PLAID**](https://arxiv.org/abs/2205.09707) - An extremely efficient engine for late interaction retrieval. -- [**ColBERT**](https://arxiv.org/abs/2112.01488) - A Retriever (used with PLAID) and re-ranker (used with dense embeddings) utilizing late interaction for relevancy scoring. -- [**Fusion-in-Decoder (FiD)**](https://arxiv.org/abs/2007.01282) - A generative reader for multi-document retrieval augmented tasks. -- [**Stable Diffusion Generator**](https://arxiv.org/pdf/2112.10752.pdf) - A text-to-image generator. Pluggable to any pipeline output. -- [Retrieval-Oriented **Knowledge Graph Construction**](https://arxiv.org/abs/2010.01057) - A pipeline component for extracting named-entities and creating a graph of all the entities specified in the retrieved documents, with the relations between each pair of related entities. +The available endpoints for the REST API service are as follows: -Addition components: +- `status`: This endpoint can be used to perform a sanity check. +- `version`: This endpoint provides the project version, as defined in `__init__.py`. +- `query`: Use this endpoint to run a query through the pipeline and retrieve the results. -- [Retrieval Augmented Summarization with **T5** family models (such as LongT5, FLAN-T5)](https://arxiv.org/abs/2112.07916) - An encoder-decoder model based on T5 with support for long input, supporting summarization/translation prompts. +By leveraging the REST API service, you can integrate fastRAG pipelines into your applications and easily interact with them using HTTP requests. -## :rocket: Example Use Cases + +#### Generating Pipeline Configurations + +
+generate using a script + +The pipeline in fastRAG is constructed using the Haystack pipeline API and is dynamically generated based on the user's selection of components. To generate a Haystack pipeline that can be executed as a standalone REST server service (refer to [REST API](#rest-api)), you can utilize the [Pipeline Generation](scripts/generate_pipeline.py) script. + +Below is an example that demonstrates how to use the script to generate a pipeline with a ColBERT retriever, an SBERT reranker, and an FiD reader: + +```bash +python generate_pipeline.py --path "retriever,reranker,reader" \ + --store config/store/plaid-wiki.yaml \ + --retriever config/retriever/colbert-v2.yaml \ + --reranker config/reranker/sbert.yaml \ + --reader config/reader/FiD.yaml \ + --file pipeline.yaml +``` + +In the above command, you specify the desired components using the `--path` option, followed by providing the corresponding configuration YAML files for each component (e.g., `--store`, `--retriever`, `--reranker`, `--reader`). Finally, you can specify the output file for the generated pipeline configuration using the `--file` option (in this example, it is set to `pipeline.yaml`). + +
+ +#### Index Creation + +For detailed instructions on creating various types of indexes, please refer to the [Indexing Scripts](scripts/indexing/) directory. It contains valuable information and resources to guide you through the process of creating different types of indexes. + +#### Customizing Models + +To cater to different use cases, we provide a variety of training scripts that allow you to fine-tune models of your choice. For detailed examples, model descriptions, and more information, please refer to the [Models Overview](models.md) page. It will provide you with valuable insights into different models and their applications. + +## :dart: Example Use Cases ### Efficient Open Domain Question-Answering @@ -100,18 +180,44 @@ flowchart LR
:notebook: [Efficient and fast ODQA with PLAID, ColBERT and FiD](examples/plaid_colbert_pipeline.ipynb) +### Retrival Augmented Generation with a LLM + +To enhance generations using a Large Language Model (LLM) with retrieval augmentation, you can follow these steps: + +1. Define a retrieval flow: This involves creating a store that holds the relevant information and one or more retrievers/rankers to retrieve the most relevant documents or passages. + +2. Define a prompt template: Design a template that includes a suitable context or instruction, along with placeholders for the query and information retrieved by the pipeline. These placeholders will be filled in dynamically during generation. + +3. Request token generation from the LLM: Utilize the prompt template and pass it to the LLM, allowing it to generate tokens based on the provided context, query, and retrieved information. + +*Most of Huggingface Decoder LLMs are supported*. + +See a complete example in our [RAG with LLMs](examples/rag-prompt-hf.ipynb):notebook: notebook. + +```mermaid +flowchart LR + id1[(Index)] <-->id2(.. Retrieval pipeline ..) --> id3(Prompt Template) --> id4(LLM) + style id1 fill:#E1D5E7,stroke:#9673A6 + style id2 fill:#DAE8FC,stroke:#6C8EBF + style id3 fill:#F3CECC,stroke:#B25450 + style id4 fill:#D5E8D4,stroke:#82B366 +``` + + ### ChatGPT Open Domain Reranking and QA Use ChatGPT API to both rerank the documents for any query, and provide an answer to the query using the chosen documents. +:notebook: [GPT as both Reranker and Reader](examples/gpt_as_both_reranker_and_reader.ipynb) + ```mermaid flowchart LR - id2(.. Retrieval pipeline ..) --> id4(ChatGPT) + id1[(Index)] <--> id2(.. Retrieval pipeline ..) --> id4(ChatGPT) + style id1 fill:#E1D5E7,stroke:#9673A6 style id2 fill:#DAE8FC,stroke:#6C8EBF style id4 fill:#D5E8D4,stroke:#82B366 ``` -:notebook: [GPT as both Reranker and Reader](examples/gpt_as_both_reranker_and_reader.ipynb) ### Open Domain Summarization @@ -122,6 +228,8 @@ Summarize topics given free-text input and a corpus of knowledge.
**Generation** Using `"summarize: "` prompt, all documents concatenated and _FLAN-T5_ generative model +:notebook: [Open Domain Summarization](examples/od_summarization_pipeline.ipynb) + ```mermaid flowchart LR id1[(Elastic)] <--> id2(BM25) --> id3(SentenceTransformer) -- summarize--> id4(FLAN-T5) @@ -130,12 +238,12 @@ flowchart LR style id4 fill:#D5E8D4,stroke:#82B366 ``` -:notebook: [Open Domain Summarization](examples/od_summarization_pipeline.ipynb) - ### Retrieval-Oriented Knowledge Graph Construction Use with any retrieval pipeline to extract Named Entities (NER) and generate relation-maps using Relation Classification Model (RC). +:notebook: [Knowledge Graph Construction](examples/knowledge_graph_construction.ipynb) + ```mermaid flowchart LR id2(.. Retrieval pipeline ..) --> id4(NER) --> id5(RC) @@ -144,12 +252,12 @@ flowchart LR style id5 fill:#F3CECC,stroke:#B25450 ``` -:notebook: [Knowledge Graph Construction](examples/knowledge_graph_construction.ipynb) - ### Retrieval-Oriented Answer Image Generation Use with any retrieval pipeline to generate a dynamic image from the answer to the query, using a diffusion model. +:notebook: [Answer Image Generation](examples/answer_image_generation.ipynb) + ```mermaid flowchart LR id2(.. Retrieval pipeline ..) --> id4(FiD) --> id5(Diffusion) @@ -158,91 +266,6 @@ flowchart LR style id5 fill:#F3CECC,stroke:#B25450 ``` -:notebook: [Answer Image Generation](examples/answer_image_generation.ipynb) - - - - -## :running: How to Use - -fastRAG has a modular architecture that enables the user to build retrieval-augmented pipelines with different components. The components are python classes that take a set of parameters. -We provide multiple examples of sets of parameters used to build common pipelines; the parameters are organized in YAML files in folders such as `store`, `retriever` and `reader`, all under the [Configuration](config) folder. - -### Pipeline Configuration Generation - -The pipeline is built using Haystack pipeline API and is built dynamically according to the components the user is -interested in. Use the [Pipeline Generation](scripts/generate_pipeline.py) script to generate a Haystack pipeline which can be run by the stand-alone REST server as a service, see [REST API](#rest-api). - -Here is an example of using the script to generate a pipeline with a ColBERT retriever, an SBERT reranker and an FiD reader: - -```bash -python generate_pipeline.py --path "retriever,reranker,reader" \ - --store config/store/plaid-wiki.yaml \ - --retriever config/retriever/colbert-v2.yaml \ - --reranker config/reranker/sbert.yaml \ - --reader config/reader/FiD.yaml \ - --file pipeline.yaml -``` - -> :warning: PLAID Requirements :warning: -> -> If GPU is needed it should be of type RTX 3090 or newer and PyTorch should be installed with CUDA support using: -> ->```bash ->pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 ->``` - -### Running Pipelines - -Pipelines can be run inline (code, service, notebook) once initialized properly. For a concrete example see this [notebook](examples/simple_oqda_pipeline.ipynb). - -#### Standalone UI Demos - -See [Demo](demo/) for a script creating stand alone demos for several workflows; the script creates a REST service and a UI service, ready to be used. Continue reading for more details on these services. - -#### Serve a pipeline via a REST service - -One can start a REST server with a defined pipeline YAML and send queries for processing or benchmarking. A pipeline is -generated according to [Pipeline Generation](#pipeline-configuration-generation) step; see [Usage](#usage). - -Run the following: - -```bash -python -m fastrag.rest_api.application --config=pipeline.yaml -``` - -This will start a `uvicorn` server and build a pipeline as defined in the YAML file. - -There is support for Swagger. One can observe and interact with endpoints in a simple UI by vising `http://localhost:8000/docs` (might need to forward ports locally, if working on a cluster). - -These are the following endpoint: - -- `status`: sanity. -- `version`: project version, as defined in `__init__`. -- `query`: a general query, used for debugging. - -#### Run a demo UI - -Define the endpoint address according to where the web server is; e.g. `localhost` if you start the web server on the -same machine; and run the following: - -```bash -API_ENDPOINT=http://localhost:8000 \ - python -m streamlit run fastrag/ui/webapp.py -``` - -### Creating Indexes - -See [Indexing Scripts](scripts/indexing/) for information about how to create different types of indexes. - -### Pre-training/Fine-tuning Models - -We offer an array of training scripts, to finetune models of your choice for various usecases. See [Models Overview](models.md) for examples, model descriptions, and more. - -## :chart_with_upwards_trend: Benchmarks - -Benchmarks scripts and results can be found here: [Benchmarks](benchmarks/). - ## License The code is licensed under the [Apache 2.0 License](LICENSE). diff --git a/config/rag_generation_with_dynamic_prompt.yaml b/config/rag_generation_with_dynamic_prompt.yaml new file mode 100644 index 0000000..54d6fc0 --- /dev/null +++ b/config/rag_generation_with_dynamic_prompt.yaml @@ -0,0 +1,50 @@ +components: +- name: Store + params: + host: + index: + port: 80 + search_fields: ["title", "content"] + type: ElasticsearchDocumentStore +- name: Retriever + params: + document_store: Store + top_k: 100 + type: BM25Retriever +- name: Reranker + params: + batch_size: 32 + model_name_or_path: cross-encoder/ms-marco-MiniLM-L-6-v2 + top_k: 5 + use_gpu: true + type: SentenceTransformersRanker +- name: AParser + type: AnswerParser +- name: LFQA + params: + name: lfqa + prompt_text: "Answer the question using the provided context. Your answer should be in your own words and be no longer than 50 words. \n\n Context: {join(documents)} \n\n Question: {query} \n\n Answer:" + output_parser: AParser + type: PromptTemplate +- name: Prompter + params: + model_name_or_path: MBZUAI/LaMini-Flan-T5-783M + use_gpu: true + model_kwargs: + model_max_length: 2048 + torch_dtype: torch.bfloat16 + default_prompt_template: LFQA + type: PromptNode +pipelines: +- name: query + nodes: + - inputs: + - Query + name: Retriever + - inputs: + - Retriever + name: Reranker + - inputs: + - Reranker + name: Prompter +version: 1.17.0 diff --git a/demo/README.md b/demo/README.md index 75e2b8d..f5a6f2c 100644 --- a/demo/README.md +++ b/demo/README.md @@ -1,26 +1,50 @@ # Running Demos -To run a demo, use its config name; for example: +To execute a demo, use its configuration name. For instance: ```sh python run_demo.py -t QA1 ``` -The server and UI are are created as subprocesses that run in the background. Use the PIDs to kill them. +The server and UI will be spawned as subprocesses that run in the background. You can use the PIDs (Process IDs) to terminate them when needed. -Use the `--help` flag for a list of available configurations. +To obtain a list of available configurations, utilize the `--help` flag. ## Available Demos -| Name | Comment | Config Name | -|:--------|:------------------------------------------------------------------------------------|:-----------:| -| Q&A | Abstractive Q&A demo using BM25, SBERT reranker and an FiD. | `QA1` | -| Q&A | Abstractive Q&A demo using ColBERT v2 (w/ PLAID index) retriever and an FiD reader. | `QA2` | -| Summary | Summarization using BM25, SBERT reranker and long-T5 reader | `SUM` | -| Image | Abstractive Q&A demo, with an image generation model for the answer. | `QADIFF` | +| Name | Description | Config Name | +|:--------|:-------------------------------------------------------------------------------------|:-----------:| +| Q&A | Abstractive Q&A demo utilizing BM25, SBERT reranker, and FiD model. | `QA1` | +| Q&A | Abstractive Q&A demo using ColBERT v2 (with PLAID index) retriever and FiD reader. | `QA2` | +| Summarization | Summarization demo employing BM25, SBERT reranker, and long-T5 reader. | `SUM` | +| Image | Abstractive Q&A demo with an image generation model for the answer. | `QADIFF` | +| LLM | Retrieval augmented generation with generative LLM model. | `LLM` | -ColBERT demo with a wikipedia index takes about 15 minutes to load up. Also, see remark about GPU usage in the [README](../README.md#plaid-requirements). +Please note that the ColBERT demo with a Wikipedia index may take around 15 minutes to load. Also, make sure to review the [README](../models.md#plaid-requirements) for information regarding GPU usage requirements. -## Demo Screenshot +### Additional Options + +If you already have a fastRAG pipeline service running locally and wish to utilize it with one of the provided UI interfaces, you can add the `--only-ui` flag to the demo script: + +```sh +python run_demo.py -t LLM --only-ui +``` + +In case your pipeline service is running on a non-local machine or a different port other than 8000, you can use the `--endpoint` argument to specify the URL: + +```sh +python run_demo.py -t LLM --endpoint http://hostname:80 +``` + +To manually run a UI with the `API_ENDPOINT` directed to a fastRAG service, you can execute the following command: + +```bash +API_ENDPOINT=http://localhost:8000 \ + python -m streamlit run fastrag/ui/webapp.py +``` + +Make sure to replace `http://localhost:8000` with the appropriate URL of your fastRAG service. + +## Screenshot ![alt text](../assets/qa_demo.png) diff --git a/demo/run_demo.py b/demo/run_demo.py index 452e7ab..743230e 100644 --- a/demo/run_demo.py +++ b/demo/run_demo.py @@ -8,6 +8,7 @@ "QA2": "qa_plaid.yaml", "QADIFF": "qa_diffusion_pipeline.yaml", "SUMR": "summarization_pipeline.yaml", + "LLM": "rag_generation_with_dynamic_prompt.yaml", } SCREENS = { @@ -15,6 +16,7 @@ "QA2": "webapp", "QADIFF": "webapp", "SUMR": "webapp_summarization", + "LLM": "prompt_llm", } @@ -40,27 +42,37 @@ def get_pid(cmd): choices=list(TASKS.keys()), help=f"The abbreviated name for the task configuraion. \n {TASKS} \n", ) + parser.add_argument( + "-e", "--endpoint", default="http://localhost:8000", help="pipeline service endpoint" + ) + parser.add_argument( + "--only-ui", + action="store_true", + help="launch only the UI interface (without launching a service)", + ) args = parser.parse_args() path = os.getcwd() - # Create REST server - cmd = f"python -m fastrag.rest_api.application --config={path}/config/TASKCONFIGURATION" - cmd = cmd.replace("TASKCONFIGURATION", TASKS[args.task_config]) - run_service(cmd) + s_pid = "NA" + if not args.only_ui: + # Create REST server + cmd = f"python -m fastrag.rest_api.application --config={path}/config/TASKCONFIGURATION" + cmd = cmd.replace("TASKCONFIGURATION", TASKS[args.task_config]) + print("Launching fastRAG pipeline service...") + run_service(cmd) + time.sleep(10) + s_pid = get_pid("fastrag.rest_api.application") # Create UI - os.environ["API_ENDPOINT"] = "http://localhost:8000" + os.environ["API_ENDPOINT"] = f"{args.endpoint}" cmd = f"python -m streamlit run {path}/fastrag/ui/SCREEN.py" cmd = cmd.replace("SCREEN", SCREENS[args.task_config]) + print("Launching UI...") + time.sleep(3) run_service(cmd) - - # Sleep and wait for initialization, pids - print("Creating services...") - time.sleep(10) - s_pid = get_pid("fastrag.rest_api.application") u_pid = get_pid("streamlit run") print("\n") - print(f"Server on localhost:8000/docs PID={s_pid}") + print(f"Server on {args.endpoint}/docs PID={s_pid}") print(f"UI on localhost:8501 PID={u_pid}") diff --git a/examples/rag-prompt-hf.ipynb b/examples/rag-prompt-hf.ipynb new file mode 100644 index 0000000..7a611c1 --- /dev/null +++ b/examples/rag-prompt-hf.ipynb @@ -0,0 +1,164 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7af1bfad", + "metadata": {}, + "source": [ + "# Retrieval Augmented Generation with LLMs" + ] + }, + { + "cell_type": "markdown", + "id": "be7a0c4a", + "metadata": {}, + "source": [ + "Define an information source to retrieve from" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "359f06de", + "metadata": {}, + "outputs": [], + "source": [ + "from haystack.schema import Document\n", + "from haystack.document_stores import InMemoryDocumentStore\n", + "\n", + "document_store = InMemoryDocumentStore(use_gpu=False, use_bm25=True)\n", + "\n", + "# 4 example documents to index\n", + "examples = [\n", + " \"Lionel Andrés Messi[note 1] (Spanish pronunciation: [ljoˈnel anˈdɾes ˈmesi] (listen); born 24 June 1987), also known as Leo Messi, is an Argentine professional footballer who plays as a forward for Ligue 1 club Paris Saint-Germain and captains the Argentina national team. Widely regarded as one of the greatest players of all time, Messi has won a record seven Ballon d'Or awards,[note 2] a record six European Golden Shoes, and in 2020 was named to the Ballon d'Or Dream Team. Until leaving the club in 2021, he had spent his entire professional career with Barcelona, where he won a club-record 35 trophies, including 10 La Liga titles, seven Copa del Rey titles and four UEFA Champions Leagues. With his country, he won the 2021 Copa América and the 2022 FIFA World Cup. A prolific goalscorer and creative playmaker, Messi holds the records for most goals in La Liga (474), most hat-tricks in La Liga (36) and the UEFA Champions League (8), and most assists in La Liga (192) and the Copa América (17). He has also the most international goals by a South American male (98). Messi has scored over 795 senior career goals for club and country, and has the most goals by a player for a single club (672).\",\n", + " \"Born and raised in central Argentina, Messi relocated to Spain at the age of 13 to join Barcelona, for whom he made his competitive debut aged 17 in October 2004. He established himself as an integral player for the club within the next three years, and in his first uninterrupted season in 2008–09 he helped Barcelona achieve the first treble in Spanish football; that year, aged 22, Messi won his first Ballon d'Or. Three successful seasons followed, with Messi winning four consecutive Ballons d'Or, making him the first player to win the award four times. During the 2011–12 season, he set the La Liga and European records for most goals scored in a single season, while establishing himself as Barcelona's all-time top scorer. The following two seasons, Messi finished second for the Ballon d'Or behind Cristiano Ronaldo (his perceived career rival), before regaining his best form during the 2014–15 campaign, becoming the all-time top scorer in La Liga and leading Barcelona to a historic second treble, after which he was awarded a fifth Ballon d'Or in 2015. Messi assumed captaincy of Barcelona in 2018, and in 2019 he won a record sixth Ballon d'Or. Out of contract, he signed for Paris Saint-Germain in August 2021.\",\n", + " \"An Argentine international, Messi holds the national record for appearances and is also the country's all-time leading goalscorer. At youth level, he won the 2005 FIFA World Youth Championship, finishing the tournament with both the Golden Ball and Golden Shoe, and an Olympic gold medal at the 2008 Summer Olympics. His style of play as a diminutive, left-footed dribbler drew comparisons with his compatriot Diego Maradona, who described Messi as his successor. After his senior debut in August 2005, Messi became the youngest Argentine to play and score in a FIFA World Cup in 2006, and reached the final of the 2007 Copa América, where he was named young player of the tournament. As the squad's captain from August 2011, he led Argentina to three consecutive finals: the 2014 FIFA World Cup, for which he won the Golden Ball, and the 2015 and 2016 Copa América, winning the Golden Ball in the 2015 edition. After announcing his international retirement in 2016, he reversed his decision and led his country to qualification for the 2018 FIFA World Cup, a third-place finish at the 2019 Copa América, and victory in the 2021 Copa América, while winning the Golden Ball and Golden Boot for the latter. This achievement would see him receive a record seventh Ballon d'Or in 2021. In 2022, he captained his country to win the 2022 FIFA World Cup, for which he won the Golden Ball for a record second time, and broke the record for most appearances in World Cup tournaments with 26 matches played.\",\n", + " \"Messi has endorsed sportswear company Adidas since 2006. According to France Football, he was the world's highest-paid footballer for five years out of six between 2009 and 2014, and was ranked the world's highest-paid athlete by Forbes in 2019 and 2022. Messi was among Time's 100 most influential people in the world in 2011 and 2012. In February 2020, he was awarded the Laureus World Sportsman of the Year, thus becoming the first footballer and the first team sport athlete to win the award. Later that year, Messi became the second footballer and second team-sport athlete to surpass $1 billion in career earnings.\",\n", + " \n", + "]\n", + "\n", + "documents = []\n", + "for i, d in enumerate(examples):\n", + " documents.append(Document(content=d, id=i))\n", + "\n", + "document_store.write_documents(documents)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "7e653e5f", + "metadata": {}, + "source": [ + "Define the prompt template. `{query}` will be replaced with the user's query and `{documents}` with the retrieved documents fetched from the index.\n", + "\n", + "We define a `PromptModel` that automatically uses a Huggingface model interface given by `model_name_or_path`.\n", + "\n", + "Use `{query}` for injecting the original query text into the prompt and `{documents}` to inject the documents fetched by the retriever (can be used with smaller manipulation functions such as `join()` to concatenate the documents)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6443e7a3", + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "from haystack.nodes import PromptNode, PromptTemplate\n", + "from haystack.nodes import BM25Retriever, SentenceTransformersRanker\n", + "\n", + "retriever = BM25Retriever(document_store=document_store, top_k=100)\n", + "reranker = SentenceTransformersRanker(model_name_or_path=\"cross-encoder/ms-marco-MiniLM-L-12-v2\", top_k=1)\n", + "\n", + "\n", + "lfqa_prompt = PromptTemplate(name=\"lfqa\",\n", + " prompt_text=\"Answer the question using the provided context. Your answer should be in your own words and be no longer than 50 words. \\n\\n Context: {join(documents)} \\n\\n Question: {query} \\n\\n Answer:\",\n", + " output_parser={\"type\": \"AnswerParser\"}) \n", + "prompt = PromptNode(model_name_or_path=\"MBZUAI/LaMini-Flan-T5-783M\", default_prompt_template=lfqa_prompt,\n", + " model_kwargs={\"model_max_length\": 2048, \"torch_dtype\": torch.bfloat16},)" + ] + }, + { + "cell_type": "markdown", + "id": "04408d03", + "metadata": {}, + "source": [ + "Defining the pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "4652b226", + "metadata": {}, + "outputs": [], + "source": [ + "from haystack import Pipeline\n", + "p = Pipeline()\n", + "p.add_node(component=retriever, name=\"Retriever\", inputs=[\"Query\"])\n", + "p.add_node(component=reranker, name=\"Reranker\", inputs=[\"Retriever\"])\n", + "p.add_node(component=prompt, name=\"prompt_node\", inputs=[\"Reranker\"])" + ] + }, + { + "cell_type": "markdown", + "id": "0f842709", + "metadata": {}, + "source": [ + "Run a query through the pipeline and print the generated answer" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "3dd989ac", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Messi has won a club-record 35 trophies, including 10 La Liga titles, seven Copa del Rey titles, and four UEFA Champions Leagues. He has also won the 2021 Copa América and the 2022 FIFA World Cup.'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = p.run(\"What trophies does Messi has?\", debug=True)\n", + "a['answers'][0].answer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "65963ad0-ac72-4073-ad8d-cf3d459ea5d5", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/fastrag/__init__.py b/fastrag/__init__.py index c3087c3..943fa38 100644 --- a/fastrag/__init__.py +++ b/fastrag/__init__.py @@ -4,7 +4,7 @@ from fastrag import image_generators, kg_creators, rankers, readers, retrievers, stores from fastrag.utils import add_timing_to_pipeline -__version__ = "1.1.0" +__version__ = "1.2.0" def load_pipeline(config_path: str) -> Pipeline: diff --git a/fastrag/rest_api/controller/qa.py b/fastrag/rest_api/controller/qa.py index c111ae9..a08dd49 100644 --- a/fastrag/rest_api/controller/qa.py +++ b/fastrag/rest_api/controller/qa.py @@ -16,17 +16,14 @@ import collections import json -import logging import time from typing import Any, Dict -import haystack from fastapi import APIRouter, FastAPI -from haystack import Pipeline -from haystack.schema import Document +from haystack.nodes import PromptNode, PromptTemplate +from haystack.schema import Answer, Document from pydantic import BaseConfig -from ..config import LOG_LEVEL from ..schema import QueryRequest, QueryResponse BaseConfig.arbitrary_types_allowed = True @@ -65,6 +62,37 @@ def _process_request(pipeline, request) -> Dict[str, Any]: if isinstance(params[key], collections.Mapping) and "filters" in params[key].keys(): params[key]["filters"] = _format_filters(params[key]["filters"]) + if "generation_kwargs" in params: + for n in pipeline.components.values(): + if isinstance(n, PromptNode): + params.update( + { + n.name: { + "invocation_context": {"generation_kwargs": params["generation_kwargs"]} + } + } + ) + del params["generation_kwargs"] + + if "input_prompt" in params: + new_prompt = PromptTemplate(**params["input_prompt"]) + + for n in pipeline.components.values(): + if isinstance(n, PromptNode): + template_names = n.get_prompt_template_names() + if new_prompt.name in template_names: + del n.prompt_templates[new_prompt.name] + n.add_prompt_template(new_prompt) + n.set_default_prompt_template(new_prompt) + del params["input_prompt"] + + pipeline_components_list = list(pipeline.components.keys()) + for p in list(params.keys()): + if "filters" == str(p): + continue + if str(p) not in pipeline_components_list: + del params[p] + result = pipeline.run(query=request.query, params=params, debug=request.debug) # Ensure answers and documents exist, even if they're empty lists @@ -72,6 +100,8 @@ def _process_request(pipeline, request) -> Dict[str, Any]: result["documents"] = [] if "answers" not in result: result["answers"] = [] + if "results" in result: + result["answers"] = [Answer(res, "generative") for res in result["results"]] logger.info( json.dumps( diff --git a/fastrag/rest_api/schema.py b/fastrag/rest_api/schema.py index 554274a..f254981 100644 --- a/fastrag/rest_api/schema.py +++ b/fastrag/rest_api/schema.py @@ -79,9 +79,10 @@ class CreateLabelSerialized(RequestBaseModel): class QueryResponse(BaseModel): query: str - answers: List[Answer] = [] + answers: Optional[List] = [] documents: List[Document] = [] images: Optional[Dict] = None relations: Optional[List] = None debug: Optional[Dict] = Field(None, alias="_debug") timings: Optional[Dict] = None + results: Optional[List] = None diff --git a/fastrag/rest_api/utils.py b/fastrag/rest_api/utils.py index 4eb898c..781861e 100644 --- a/fastrag/rest_api/utils.py +++ b/fastrag/rest_api/utils.py @@ -19,11 +19,12 @@ from fastapi import APIRouter, FastAPI, HTTPException from fastapi.openapi.utils import get_openapi from fastapi.routing import APIRoute -from haystack import __version__ as haystack_version from starlette.middleware.cors import CORSMiddleware from starlette.requests import Request from starlette.responses import JSONResponse +from fastrag import __version__ as fastrag_version + logger = logging.getLogger(__name__) app = None @@ -44,9 +45,9 @@ def get_app() -> FastAPI: from .config import ROOT_PATH app = FastAPI( - title="Haystack REST API", + title="fasrRAG REST API", debug=True, - version=haystack_version, + version=fastrag_version, root_path=ROOT_PATH, ) @@ -79,19 +80,3 @@ def get_app() -> FastAPI: route.operation_id = route.name return app - - -def get_openapi_specs() -> dict: - """ - Used to autogenerate OpenAPI specs file to use in the documentation. - - See `docs/_src/api/openapi/generate_openapi_specs.py` - """ - app = get_app() - return get_openapi( - title=app.title, - version=app.version, - openapi_version=app.openapi_version, - description=app.description, - routes=app.routes, - ) diff --git a/fastrag/ui/prompt_llm.py b/fastrag/ui/prompt_llm.py new file mode 100644 index 0000000..9a135f0 --- /dev/null +++ b/fastrag/ui/prompt_llm.py @@ -0,0 +1,297 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# +# This file is based on https://github.com/deepset-ai/haystack +# See: https://github.com/deepset-ai/haystack/blob/main/ui/webapp.py +import logging +import os +from json import JSONDecodeError + +import streamlit as st +from annotated_text import annotated_text +from markdown import markdown + +from fastrag.ui.utils import API_ENDPOINT, display_runtime_plot, haystack_is_ready, query + +DEFAULT_PROMPT = """Answer the question using the provided context. Your answer should be in your own words and be no longer than 50 words. \n\n Context: {join(documents)} \n\n Question: {query} \n\n Answer:""" + + +def set_state_if_absent(key, value): + if key not in st.session_state: + st.session_state[key] = value + + +def clean_markdown(string): + return string.replace("$", "\$").replace(":", "\:") + + +def main(): + st.set_page_config( + page_title="fastRAG Demo", + layout="wide", + page_icon="", + ) + + with open("fastrag/ui/style.css", "r") as f: + st.markdown(f"", unsafe_allow_html=True) + + # Persistent state + set_state_if_absent("question", None) + set_state_if_absent("answer", None) + set_state_if_absent("results", None) + set_state_if_absent("raw_json", None) + set_state_if_absent("random_question_requested", False) + set_state_if_absent("images", None) + set_state_if_absent("relations", None) + + # Small callback to reset the interface in case the text of the question changes + def reset_results(*args): + st.session_state.answer = None + st.session_state.results = None + st.session_state.raw_json = None + st.session_state.images = None + + # Title + st.markdown("## Retrieve and Generate with a LLM 📚") + st.sidebar.markdown( + "
   
", + unsafe_allow_html=True, + ) + # Sidebar + st.sidebar.title("Options") + top_k_retriever = st.sidebar.number_input( + "Documents to retrieve from the index", + value=5, + min_value=1, + max_value=100, + step=1, + # on_change=reset_results, + ) + top_k_reranker = st.sidebar.number_input( + "Documents document to re-rank", + value=5, + min_value=1, + max_value=50, + step=1, + # on_change=reset_results, + ) + min_new_tokens = st.sidebar.number_input( + "Min new tokens", + value=20, + min_value=1, + max_value=100, + step=1, + # on_change=reset_results + ) + + max_new_tokens = st.sidebar.number_input( + "Max new tokens", + value=50, + min_value=1, + max_value=500, + step=1, + # on_change=reset_results + ) + + decode_mode = st.sidebar.selectbox( + "Decoding mode", + options=["Beam", "Greedy"], + index=1, + # on_change=reset_results + ) + + temperature = st.sidebar.slider( + "temperature", + min_value=0.0, + max_value=1.0, + value=0.7, + step=0.05, + # on_change=reset_results + ) + + top_p = st.sidebar.slider( + "top_p", + min_value=0.0, + max_value=1.0, + value=0.95, + step=0.05, + # on_change=reset_results + ) + + beams = st.sidebar.number_input( + "Number of beams", + value=4, + min_value=1, + max_value=4, + step=1, + # on_change=reset_results + ) + + early_stopping = st.sidebar.checkbox( + "Early stopping", + value=True, + # on_change=reset_results + ) + + st.sidebar.write("---") + + show_runtime = st.sidebar.checkbox("Show components runtime") + debug = st.sidebar.checkbox("Show debug info") + + st.sidebar.markdown( + f""" + + """, + unsafe_allow_html=True, + ) + + # Search bar + examples = "" + + with st.expander("Customize Prompt"): + prompt_template = st.text_area( + label="Prompt template", max_chars=500, value=DEFAULT_PROMPT, height=150 + ) + + question = st.text_input(label="Query", max_chars=1000, value=examples) + + # Run button + run_pressed = st.button("Run") + + run_query = ( + run_pressed or question != st.session_state.question + ) and not st.session_state.random_question_requested + + # Check the connection + with st.spinner("⌛️    fastRAG demo is starting..."): + if not haystack_is_ready(): + st.error("🚫    Connection Error. Is the fastRAG pipeline service running?") + st.error(f"Using endpoint: {API_ENDPOINT}") + run_query = False + reset_results() + + # Get results for query + if run_query and question: + reset_results() + st.session_state.question = question + + pipeline_params_dict = { + "input_prompt": {"prompt_text": prompt_template, "name": "fastrag-prompt"}, + "generation_kwargs": { + "min_new_tokens": int(min_new_tokens), + "max_new_tokens": int(max_new_tokens), + "temperature": temperature, + "top_p": top_p, + "num_beams": beams, + "early_stopping": early_stopping, + }, + } + if "Greedy" in decode_mode: + pipeline_params_dict["generation_kwargs"]["num_beams"] = 1 + pipeline_params_dict["generation_kwargs"]["do_sample"] = False + elif "Beam" in decode_mode: + pipeline_params_dict["generation_kwargs"]["do_sample"] = True + pipeline_params_dict["generation_kwargs"]["min_length"] = None + pipeline_params_dict["generation_kwargs"]["max_length"] = None + + with st.spinner("Searching through documents and generating answers ... \n "): + try: + st.session_state.clear() + ( + st.session_state.results, + st.session_state.raw_json, + st.session_state.images, + st.session_state.relations, + ) = query( + question, + top_k_retriever=top_k_retriever, + top_k_reranker=top_k_reranker, + pipeline_params_dict=pipeline_params_dict, + debug=True, + ) + except JSONDecodeError as je: + st.error( + "👓    An error occurred reading the results. Is the document store working?" + ) + return + except Exception as e: + logging.exception(e) + if "The server is busy processing requests" in str(e) or "503" in str(e): + st.error("🧑‍🌾    All our workers are busy! Try again later.") + else: + st.error("🐞    An error occurred during the request.") + st.error(f"{e}") + return + + if show_runtime and st.session_state.results and "timings" in st.session_state.raw_json: + display_runtime_plot(st.session_state.raw_json) + + if st.session_state.raw_json is not None: + retrieved_docs = st.session_state.raw_json["_debug"]["Reranker"]["output"]["documents"] + + if st.session_state.images or st.session_state.results: + st.write("### Response") + + if st.session_state.images: + image_markdown = "" + for image_data_index, image_data in enumerate(st.session_state.images): + image_content = image_data["image_content"] + image_text = image_data["text"] + image_markdown += f""" + + """ + + if len(image_markdown) > 0: + st.markdown(image_markdown, unsafe_allow_html=True) + + if st.session_state.results: + for count, result in enumerate(st.session_state.results): + if result["answer"]: + answer = result["answer"] + annotated_text((answer, "Answer")) + st.write("___") + st.write("#### Supporting documents") + for doc_i, doc in enumerate(retrieved_docs): + st.write( + f"**{doc['meta'].get('title')}:** {clean_markdown(doc.get('content'))}" + ) + else: + st.info( + "🤔    fastRAG is unsure whether any of the documents contain an answer to your question. Try to reformulate it!" + ) + if debug: + st.write("___") + st.subheader("REST API JSON response") + st.write(st.session_state.raw_json) + + +main() diff --git a/fastrag/ui/utils.py b/fastrag/ui/utils.py index 3ee9eee..9af4ea1 100644 --- a/fastrag/ui/utils.py +++ b/fastrag/ui/utils.py @@ -56,6 +56,7 @@ def query( diff_steps=None, full_pipeline=True, pipeline_params_dict=None, + debug=False, ) -> Tuple[List[Dict[str, Any]], Dict[str, str]]: """ Send a query to the REST API and parse the answer. @@ -79,7 +80,7 @@ def query( if pipeline_params_dict: update_params(params, pipeline_params_dict) - req = {"query": query, "params": params} + req = {"query": query, "params": params, "debug": debug} else: # reader only url = f"{API_ENDPOINT}/{READER_REQUEST}" params = { @@ -102,7 +103,6 @@ def query( # Format response results = [] - answers = response["answers"] for answer in answers: if type(answer) == list: @@ -111,15 +111,8 @@ def query( if answer.get("answer", None): results.append( { - # "context": "..." + answer["context"] + "...", "answer": answer.get("answer", None), - # "source": answer["meta"]["name"], - # "relevance": round(answer["score"] * 100, 2), - "document": answer["meta"]["content"], - # "document": [ - # doc for doc in response["documents"] if doc["id"] == answer["document_id"] - # ][0], - # "offset_start_in_doc": answer["offsets_in_document"][0]["start"], + "document": response["documents"], "_raw": answer, } ) diff --git a/models.md b/models.md index 6b2c26c..527120d 100644 --- a/models.md +++ b/models.md @@ -42,6 +42,14 @@ Hub](https://huggingface.co/Intel/ColBERT-NQ) for more details. of all the documents. Index can be created by the user given a collection and a checkpoint, or can be specified via a path. +> :warning: PLAID Requirements :warning: +> +> If GPU is needed it should be of type RTX 3090 or newer and PyTorch should be installed with CUDA support using: +> +>```bash +>pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 +>``` + ## Fusion-In-Decoder diff --git a/setup.cfg b/setup.cfg index 8e57f6f..d5a899a 100644 --- a/setup.cfg +++ b/setup.cfg @@ -2,15 +2,18 @@ packages = find: install_requires = - farm-haystack==1.13.2 + farm-haystack @ git+https://github.com/deepset-ai/haystack.git + transformers>=4.28.1 + datasets + evaluate pandas tqdm numba openpyxl - datasets protobuf==3.20.2 ujson - evaluate + fastapi + uvicorn [options.extras_require] dev = @@ -24,14 +27,12 @@ benchmark = kilt @ git+https://github.com/facebookresearch/KILT.git ui = - gunicorn - uvicorn - fastapi streamlit st-annotated-text - pyvis matplotlib - networkx + +elastic = + farm-haystack[elasticsearch]==1.16.1 faiss-gpu = faiss-gpu @@ -47,6 +48,8 @@ image-generation = knowledge_graph = spacy + pyvis + networkx