From dbd4f9e8095dd349125a3d1efdfcb8633b5cfc03 Mon Sep 17 00:00:00 2001 From: anakin87 Date: Thu, 22 Jun 2023 23:57:27 +0200 Subject: [PATCH] add presentation/slides --- README.md | 11 ++++++++++- presentation/fact_checking_rocks.pdf | Bin 0 -> 132 bytes 2 files changed, 10 insertions(+), 1 deletion(-) create mode 100644 presentation/fact_checking_rocks.pdf diff --git a/README.md b/README.md index f64b82b..9a00069 100644 --- a/README.md +++ b/README.md @@ -19,9 +19,11 @@ license: apache-2.0 - [Fact Checking 🎸 Rocks!   ](#fact-checking--rocks---) - [*Fact checking baseline combining dense retrieval and textual entailment*](#fact-checking-baseline-combining-dense-retrieval-and-textual-entailment) - [Idea](#idea) + - [Presentation](#presentation) - [System description](#system-description) - [Indexing pipeline](#indexing-pipeline) - [Search pipeline](#search-pipeline) + - [Explain using a LLM](#explain-using-a-llm) - [Limits and possible improvements](#limits-and-possible-improvements) - [Repository structure](#repository-structure) - [Installation](#installation) @@ -34,10 +36,14 @@ In a nutshell, the flow is as follows: * the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model * the entailment scores are aggregated to produce a summary score. +### Presentation + +- [🍿 Video presentation @ Berlin Buzzwords 2023](https://www.youtube.com/watch?v=4L8Iw9CZNbU) +- [🧑‍🏫 Slides](./presentation/fact_checking_rocks.pdf) + ### System description 🪄 This project is strongly based on [🔎 Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to realize search system. The main components of our system are an indexing pipeline and a search pipeline. - #### Indexing pipeline * [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia) * [Indexing](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/indexing.ipynb) @@ -56,6 +62,9 @@ In a nutshell, the flow is as follows: * aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.** * *empirical consideration: if in the first N passages (N 0.5), it is better not to consider (K-N) less relevant documents.* +#### Explain using a LLM +* if there is entailment or contradiction, prompt `google/flan-t5-large`, asking why the relevant textual passages entail/contradict the user statement. + ### Limits and possible improvements ✨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**: * there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary. diff --git a/presentation/fact_checking_rocks.pdf b/presentation/fact_checking_rocks.pdf new file mode 100644 index 0000000000000000000000000000000000000000..ae081b2f753a7fb00ea71f096eddca6dfe4bdf79 GIT binary patch literal 132 zcmWN?!4bkB5CFh`s-OV_9B>Ee1~?FAR5F5jSiSCNukxOKyhK~;oQIV6zHW~?_y6sa z_B@_)o+ZmmZ=?i?y+O4^&Mxh}ILJCS<3R3Q?)xz^9Nfkg2p~h+hV2I%CMRD2 literal 0 HcmV?d00001