Skip to content

Commit

Permalink
add presentation/slides
Browse files Browse the repository at this point in the history
  • Loading branch information
anakin87 committed Jun 22, 2023
1 parent 28c4c1d commit dbd4f9e
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,11 @@ license: apache-2.0
- [Fact Checking 🎸 Rocks!   ](#fact-checking--rocks---)
- [*Fact checking baseline combining dense retrieval and textual entailment*](#fact-checking-baseline-combining-dense-retrieval-and-textual-entailment)
- [Idea](#idea)
- [Presentation](#presentation)
- [System description](#system-description)
- [Indexing pipeline](#indexing-pipeline)
- [Search pipeline](#search-pipeline)
- [Explain using a LLM](#explain-using-a-llm)
- [Limits and possible improvements](#limits-and-possible-improvements)
- [Repository structure](#repository-structure)
- [Installation](#installation)
Expand All @@ -34,10 +36,14 @@ In a nutshell, the flow is as follows:
* the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model
* the entailment scores are aggregated to produce a summary score.

### Presentation

- [🍿 Video presentation @ Berlin Buzzwords 2023](https://www.youtube.com/watch?v=4L8Iw9CZNbU)
- [🧑‍🏫 Slides](./presentation/fact_checking_rocks.pdf)

### System description
🪄 This project is strongly based on [🔎 Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to realize search system. The main components of our system are an indexing pipeline and a search pipeline.


#### Indexing pipeline
* [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia)
* [Indexing](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/indexing.ipynb)
Expand All @@ -56,6 +62,9 @@ In a nutshell, the flow is as follows:
* aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
* *empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*

#### Explain using a LLM
* if there is entailment or contradiction, prompt `google/flan-t5-large`, asking why the relevant textual passages entail/contradict the user statement.

### Limits and possible improvements
✨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
* there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
Expand Down
Binary file added presentation/fact_checking_rocks.pdf
Binary file not shown.

0 comments on commit dbd4f9e

Please sign in to comment.