Spaces:
Running
Running
add presentation/slides
Browse files- README.md +10 -1
- presentation/fact_checking_rocks.pdf +3 -0
README.md
CHANGED
@@ -19,9 +19,11 @@ license: apache-2.0
|
|
19 |
- [Fact Checking πΈ Rocks! Β ](#fact-checking--rocks---)
|
20 |
- [*Fact checking baseline combining dense retrieval and textual entailment*](#fact-checking-baseline-combining-dense-retrieval-and-textual-entailment)
|
21 |
- [Idea](#idea)
|
|
|
22 |
- [System description](#system-description)
|
23 |
- [Indexing pipeline](#indexing-pipeline)
|
24 |
- [Search pipeline](#search-pipeline)
|
|
|
25 |
- [Limits and possible improvements](#limits-and-possible-improvements)
|
26 |
- [Repository structure](#repository-structure)
|
27 |
- [Installation](#installation)
|
@@ -34,10 +36,14 @@ In a nutshell, the flow is as follows:
|
|
34 |
* the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model
|
35 |
* the entailment scores are aggregated to produce a summary score.
|
36 |
|
|
|
|
|
|
|
|
|
|
|
37 |
### System description
|
38 |
πͺ This project is strongly based on [π Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to realize search system. The main components of our system are an indexing pipeline and a search pipeline.
|
39 |
|
40 |
-
|
41 |
#### Indexing pipeline
|
42 |
* [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia)
|
43 |
* [Indexing](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/indexing.ipynb)
|
@@ -56,6 +62,9 @@ In a nutshell, the flow is as follows:
|
|
56 |
* aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
|
57 |
* *empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*
|
58 |
|
|
|
|
|
|
|
59 |
### Limits and possible improvements
|
60 |
β¨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
|
61 |
* there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
|
|
|
19 |
- [Fact Checking πΈ Rocks! Β ](#fact-checking--rocks---)
|
20 |
- [*Fact checking baseline combining dense retrieval and textual entailment*](#fact-checking-baseline-combining-dense-retrieval-and-textual-entailment)
|
21 |
- [Idea](#idea)
|
22 |
+
- [Presentation](#presentation)
|
23 |
- [System description](#system-description)
|
24 |
- [Indexing pipeline](#indexing-pipeline)
|
25 |
- [Search pipeline](#search-pipeline)
|
26 |
+
- [Explain using a LLM](#explain-using-a-llm)
|
27 |
- [Limits and possible improvements](#limits-and-possible-improvements)
|
28 |
- [Repository structure](#repository-structure)
|
29 |
- [Installation](#installation)
|
|
|
36 |
* the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model
|
37 |
* the entailment scores are aggregated to produce a summary score.
|
38 |
|
39 |
+
### Presentation
|
40 |
+
|
41 |
+
- [πΏ Video presentation @ Berlin Buzzwords 2023](https://www.youtube.com/watch?v=4L8Iw9CZNbU)
|
42 |
+
- [π§βπ« Slides](./presentation/fact_checking_rocks.pdf)
|
43 |
+
|
44 |
### System description
|
45 |
πͺ This project is strongly based on [π Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to realize search system. The main components of our system are an indexing pipeline and a search pipeline.
|
46 |
|
|
|
47 |
#### Indexing pipeline
|
48 |
* [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia)
|
49 |
* [Indexing](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/indexing.ipynb)
|
|
|
62 |
* aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
|
63 |
* *empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*
|
64 |
|
65 |
+
#### Explain using a LLM
|
66 |
+
* if there is entailment or contradiction, prompt `google/flan-t5-large`, asking why the relevant textual passages entail/contradict the user statement.
|
67 |
+
|
68 |
### Limits and possible improvements
|
69 |
β¨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
|
70 |
* there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
|
presentation/fact_checking_rocks.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:de44af8827f3f36648726176d51b09a009528b9168dd0cdef9c4a687ad62247f
|
3 |
+
size 2737149
|