-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.qmd
167 lines (98 loc) · 5.87 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: "AI uses for fact-checking in Brazil"
author: "Adriano Belisario"
format:
revealjs:
theme: sky
incremental: true
slide-number: true
chalkboard:
buttons: false
preview-links: auto
css: styles.css
---
## Overview
- About me
- Language models for fact-checking firearm violence reports
- Claim-matching and cross-platform misinformation
- Considerations on AI uses for fact-checking
::: footer
Link for this presentation: [belisards.github.io/slides-bert-fact-checking/](https://belisards.github.io/slides-bert-fact-checking/)
:::
## Hi!
Data journalist from Brazil, experienced in human rights investigations.
Since 2013, I have collaborated with projects from several Brazilian and international organizations, such as Witness, UNHCR, Forensic Architecture, Open Knowledge, and Bellingcat.
::: {.fragment .fade-in}
Here, I will present two papers based on my research at the Oxford Internet Institute.
:::
## Fogo Cruzado
[**Instituto Fogo Cruzado**](https://fogocruzado.org.br/) (Crossfire Institute) has monitored and collected data from firearm violence events in Rio de Janeiro since 2017. They currently operate in four cities.
Fogo Cruzado's human rights analysts follow online and on-the-ground sources seven days a week.
::: footer
Fogo Cruzado's website: [https://fogocruzado.org.br/](https://fogocruzado.org.br/)
:::
## Fact-checking gun violence
Human rights analysts follow multiple online sources simultaneously.
::: {.fragment .fade-in}
If a gun violence report is identified on Twitter, they ask follow-up questions to the users.
:::
::: {.fragment .fade-in}
Then, they need to fact-check all reports and, if they are true, record an event in their database and provide real-time alerts.
:::
## The needle in the haystack
To keep the message flow manageable, restrictive search filters were used (e.g. using geographic instead of linguistic filters).
However, less than half of Twitter data is geotagged, meaning many reports may have been missed.
## The needle in the haystack
Even so, finding a gun violence report among ordinary text is difficult. Analysts must conciliate the demand for verifying the cases and keeping up with an unstoppable flow of new posts.
##
![](img/tweetdeck.jpeg)
## Goals
- Train an open-source language model on analysts' interactions to detect what a gun violence report is in colloquial Brazilian Portuguese.
- Reduce signal-to-noise ratio for analysts monitoring firearm violence.
- Design an experiment to assess the model's performance in a real-world setting.
## Methods
I fine-tuned a Transformer neural network (the T in ChatGPT) using an open-source base model for Portuguese (BERTimbau) and ~24k tweet messages.
In addition to standard evaluation using holdout datasets, I integrated the model in a prototype designed to provide real-time predictions and evaluated it in real-world conditions with an experiment.
::: footer
[https://huggingface.co/neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased)
:::
## Example
![](img/examples.png)
## The prototype
1. Fetch new tweets using keywords and/or location and language filters.
2. The model predicts a binary classification for the message: gun violence report or ordinary text.
3. The text message and all metadata available - including the predictions - are uploaded to Airtable.
## {background-image="img/airtable.png"}
## The experiment
I conducted an experiment to evaluate the model performance in the wild: for one month, analysts from Rio de Janeiro adopted the AI-based classification to assist their online search, whereas other regions still used only fully human-made reviews.
The experiment was evaluated through surveys, interviews, and the number of Twitter interactions by region (diff-in-diff analysis controlled by external factors).
## Classification metrics
![](img/results.png)
## Results
- The model performed well in all classification metrics (80-90% of success) when tested with holdout datasets.
- Analysts have approved and adopted the prototype as part of their standard workflow to monitor new cases.
- The experiment showed that using AI increased analysts' interactions with users reporting armed violence.
## Cross-platform misinformation
Analyzing Misinformation Claims During the 2022 Brazilian General Election on WhatsApp, Twitter, and Kwai: [arxiv.org/abs/2401.02395](https://arxiv.org/abs/2401.02395)
::: {.fragment .fade-in}
We partnered with the Brazil’s election authority and three fact-checking organizations running WhatsApp tiplines.
We compared the content submitted by users with posts from Twitter and Kwai.
:::
## Results
We observed little overlap between the users of different fact-checking tiplines and a high correlation between the number of users and the amount of unique content, suggesting that WhatsApp tiplines are far from reaching a saturation point.
::: {.fragment .fade-in}
Similarly, we also found little overlap between contents across platforms, indicating the need for further research with cross-platform approaches to identify misinformation dynamics.
:::
## Final considerations
- Language gap in AI
- Cross-platform misinformation
- Multimodal disinformation
## References
Into the crossfire: evaluating the use of a language model to crowdsource gun violence reports: [arxiv.org/abs/2401.12989](https://arxiv.org/abs/2401.12989)
Access this presentation online: [belisards.github.io/slides-bert-fact-checking/](https://belisards.github.io/slides-bert-fact-checking/)
Analyzing Misinformation Claims During the 2022 Brazilian General Election on WhatsApp, Twitter, and Kwai: [arxiv.org/abs/2401.02395](https://arxiv.org/abs/2401.02395)
## Contact me
I'm glad to hear from you!
Email: adrianobf (at) gmail (dot) com
Twitter: [@belisards](https://twitter.com/belisards)
LinkedIn: [Adriano Belisario](https://www.linkedin.com/in/adriano-belisario-76758632/)