An Extensive Paper List (and Various Resources) on NLP for Social Good

This is a reading list of papers on NLP for Social Good.

Contributor: Zhijing Jin. If you want to contribute to this reading list, feel free to make pull requests, or issues telling me if you'd like to be a contributor of this GitHub Repo.

Contents (Actively Updating)

(Hyperlinks works in Chrome/Firefox/etc, and new versions of Safari.)

Meta-Info
- Events and News
- Overview Papers
  - Proactive NLP, Patching NLP's intrinsic problems
Q1: Can we use NLP to save lifes?
Q2: Can we use NLP to improve lives?
Q3: Can we use NLP to help the common future of all humans?
Q4: Can we use NLP to help make all people equal?
- 4.1 NLP for all languages
- 4.2 NLP for gender/demographical equality
  - NLP to detect bias, NLP to detect bias specifically on social media
Q5: Are there concerns over the practice of NLP? Can we mitigate this?
More reading (for Systematic learning)
- Courses
Engagement from Non-Academic Areas
- Non-Profit Movements
Resources of (general) AI for social good
Acknowledgements

Meta-Info

Events and News

[Workshop] Annual Workshop on NLP for Positive Impact (EMNLP 2022, ACL 2021) [Website] Organized by Anjalie Field, Shrimai Prabhumoye, Maarten Sap, Zhijing Jin, Jieyu Zhao, Chris Brockett.
"NLP for Social Good Initiative" runs many events related to NLP4SG, and it also has a set of resources & networks related to NLP4SG. (2021 - now) [Website]
This website includes a visualization of all NLP4SG papers, Twitter @nlp4sg, and many other resources
AAAI has the Special Track on AI for Social Impact [Call for Papers by Sept 8, 2021]
Every year, there is an AI for Social Good Workshop by Google to pair researchers with NGOs [Workshop]
VIS 2021 has a workshop on Visualization for Social Good [Website] [2021 Video]
Ethics in NLP - ACL Wiki. [Website]
ACL 2021 has the new theme track "NLP for Social Good."

Overview Papers

(2021 ACL Findings) How Good Is NLP? A Sober Look at NLP Tasks through the Lens of Social Impact. Zhijing Jin, Geeticka Chauhan, Brian Tse, Mrinmaya Sachan, Rada Mihalcea. [pdf]
(2020 ACL) Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?. Kobi Leins, Jey Han Lau, Timothy Baldwin. [pdf]

Videos

(2021 -- now) Videos by senior researchers on their view in NLP for Social Good. [YouTube]

Overview of proactive NLP to help social good

Applying NLP to help promote social good tasks, e.g., NLP for poverty, NLP for education, NLP for healthcare, etc.

(1997, Language in Society; Linguistics should make contributions in return to society) Unequal partnership: Sociolinguistics and the African American speech community. John Russell Rickford. [pdf]
(1980 Daedalus; High-level discussion of the function of technology) Do artifacts have politics?. Landon Winner. [pdf]

[Summary] Provocative claim that technology be be accurate judged by (1) contributions of efficiency, (2) environmental side effects, and (3) political qualities (i.e., the way they embody specific power and authority). E.g., nuclear power leads to authoritarianism. CORE: tech matters by the social and economic system in which it is embedded.

Overview of methods to patch intrinsic problems with NLP research (side effects)

Defending/fixing issues that is within (or closely co-occur with/caused by) NLP technologies, e.g., data privacy, mediating bias of NLP models, green NLP, etc.

(2016 ACL) The Social Impact of Natural Language Processing. Dirk Hovy and Shannon L. Spruit. [pdf]

[Summary] Introduced issues including (1) exclusion, (2) overgeneralization, (3) exposure inducing bias, topic overexposure, availability heuristic, underexposure. Introduced the concept "dual-use" of NLP models.
(2017, NLP Ethics workshop; Ethical issues with shared tasks) Ethical Considerations in NLP Shared Tasks. Carla Parra Escartín, Wessel Reijers, Teresa Lynn, Joss Moorkens, Andy Way, Chao-Hong Liu. [pdf]
(2017 NLP Ethics workshop) Gender as a variable in natural-language processing: Ethical considerations. Brian N. Larson.

[Summary] Four guidelines: (1) formulate research questions making explicit theories of what “gender” is; (2) avoid modeling gender unless very relevant; (3) make explicit methods for assigning gender categories; and (4) respect the difficulties gender classification
(2018 EMNLP; Women researchers' glass ceiling) The glass ceiling in NLP. Natalie Schluter. [pdf]
(2020 AACL-IJCNLP SRW) Societal impacts of NLP: How and when to integrate them into your research (and how to make time for that). Emily M. Bender. [slides]

Q1: Can we use NLP to save lifes?

1.1 NLP for healthcare (with EHRs)

Reviews

(2020 AMIA) A Review of Challenges and Opportunities in Machine Learning for Health. Marzyeh Ghassemi, Tristan Naumann, Peter Schulam, Andrew Beam, Irene Chen, Rajesh Ranganath. [pdf]

[Summary] E.g., Understanding causality is key
(2020 arXiv) Ethical Machine Learning in Health Care. Irene Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, Marzyeh Ghassemi. [pdf]

NLP on clinical notes

(2021 PSB Oral, Intimate Partner Violence prediction) Intimate Partner Violence and Injury Prediction From Radiology Reports. Irene Y. Chen, Emily Alsentzer, Hyesun Park, Richard Thomas, Babina Gosangi, Rahul Gujrathi, Bharti Khurana. [pdf]
(2020, Machine Learning for Healthcare) Fast, Structured Clinical Documentation via Contextual Autocomplete. Divya Gopinath, Monica Agrawal, Luke Murray, Steven Horng, David Karger, David Sontag. [pdf]
(2020 AAHPM; NER for heart disease patients) An Artificial Intelligence Algorithm to Identify Documented Symptoms in Patients with Heart Failure who Received Cardiac Resynchronization Therapy Richard Leiter, Enrico Santus, Zhijing Jin, Katherine Lee, Miryam Yusufov, Isabel Chien, Ashwin Ramaswamy, Edward Moseley, Yujie Qian, Deborah Schrag, Charlotta Lindvall. [abstract] [paper]
(2020 MLH) UPSTAGE: Unsupervised Context Augmentation for Utterance Classification in Patient-Provider Communication. Do June Min, Veronica Perez-Rosas, Shihchen Kuo, William H. Herman, Rada Mihalcea. [pdf]
(2020 IAAI) A System for Medical Information Extraction and Verification from Unstructured Text. Damir Juric, Giorgos Stoilos, Andre Melo, Jonathan Moore, Mohammad Khodadadi. [pdf]
(2020 Front. Cell Dev. Biol.) Named Entity Recognition and Relation Detection for Biomedical Information Extraction Nadeesha Perera, Matthias Dehmer, Frank Emmert-Streib. [pdf]
(2020 Yearb Med Inform) Medical Information Extraction in the Age of Deep Learning. Udo Hahn, Michel Oleynik. [pdf]
(2020) Information Extraction from Unstructured Electronic Health Records and Integration into a Data Warehouse. Georg Fette, Maximilian Ertl, Anja Wörner, Peter Kluegl, Stefan Störk, Frank Puppe. [pdf]
(2019 arXiv) Clinical Concept Extraction: A Methodology Review. Sunyang Fu, David Chen, Huan He, Sijia Liu, Sungrim Moon, Kevin J Peterson, Feichen Shen, Liwei Wang, Yanshan Wang, Andrew Wen, Yiqing Zhao, Sunghwan Sohn, Hongfang Liu. [pdf]
(2019 Journal of the American Medical Informatics Association) Detecting conversation topics in pri- mary care office visits from transcripts of patient-provider interactions. Jihyun Park, Dimitrios Kotzias, Patty Kuo, Robert L Logan IV, Kritzia Merced, Sameer Singh, Michael Tanana, Efi Karra Taniskidou, Jennifer Elston Lafata, David C Atkins, Ming Tai-Seale, Zac E Imel, and Padhraic Smyth. [pdf]
(2018) Clinical information extraction applications: A literature review. Yanshan Wang, Liwei Wang, Majid Rastegar-Mojarad, Sungrim Moon, Feichen Shen, Naveed Afzal, Sijia Liu, Yuqun Zeng, Saeed Mehrabi, Sunghwan Sohn, Hongfang Liu. [pdf]
(2018 JAMIA) Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes. Yuan Luo, Yu Cheng, Özlem Uzuner, Peter Szolovits, Justin Starren. [pdf]
(2018 Artificial Intelligence in Health) Identification of Serious Illness Conversations in Unstructured Clinical Notes Using Deep Neural Networks. Isabel Chien, Alvin Shi, Alex Chan, Charlotta Lindvall. [pdf]
(2017 JAMIA) De-identification of patient notes with recurrent neural networks. Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, Peter Szolovits. [pdf]
(2015 Journal of Information Technology Research) Information Extraction in the Medical Domain. Ghalem Belalem, Fatiha Barigou, Aicha Ghoulam. [pdf]
(2013 Diabetes Care) Rationale and design of the glycemia reduction approaches in diabetes: A comparative effectiveness study (GRADE). David M. Nathan, John B. Buse, Steven E. Kahn, Heidi Krause-Steinrauf, Mary E. Larkin, Myrlene Staten, Deborah Wexler, John M. Lachin, and the GRADE research group. [pdf]

NLP to facilitate biomedical research

(2020 arXiv) CORD-19: The COVID-19 Open Research Dataset. Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Doug Burdick, Darrin Eide, Kathryn Funk, Yannis Katsis, Rodney Kinney, Yunyao Li, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Stilson, Alex Wade, Kuansan Wang, Nancy Xin Ru Wang, Chris Wilhelm, Boya Xie, Douglas Raymond, Daniel S. Weld, Oren Etzioni, Sebastian Kohlmeier. [pdf]
(2018 LREC) BioRead: A New Dataset for Biomedical Reading Comprehension. Dimitris Pappas, Ion Androutsopoulos, Haris Papageorgiou. [pdf]

NLP to help reduce bias in Healthcare

(2019, AMA Journal of Ethics) Can AI Help Reduce Disparities in General Medical and Mental Health Care?. Irene Y. Chen, Peter Szolovits, and Marzyeh Ghassemi. [pdf]

[Summary] There is bias w.r.t. gender, insurance type, etc.

1.2 NLP for disaster response

Resources and Datasets

(2021 ICWSM) CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing. Firoj Alam, Hassan Sajjad, Muhammad Imran and Ferda Ofli [pdf]
(2021 ICWSM) HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks Firoj Alam, Umair Qazi, Muhammad Imran and Ferda Ofli [pdf]
(2018 ICWSM) CrisisMMD: Multimodal Twitter Datasets from Seven Natural Disasters Firoj Alam, Ferda Ofli and Muhammad Imran [pdf]

Models

(2020 IEEE) The Ivory Tower Lost: How College Students Respond Differently than the General Public to the COVID-19 Pandemic. Viet Duong, Phu Pham, Tongyu Yang, Yu Wang, Jiebo Luo. [pdf]
(2018 ACL) Domain Adaptation with Adversarial Training and Graph Embeddings. Firoj Alam, Shafiq Joty, Muhammad Imran [pdf]
(2019 ACM) Keyphrase Extraction from Disaster-related Tweets. Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea. [pdf]

1.3 NLP to detect armed conflicts

Resources and Datasets

The Armed Conflict Location & Event Data Project. [link]
(2016 ACL) IBC-C: A Dataset for Armed Conflict Event Analysis. Andrej Žukov-Gregorič, Zhiyuan Luo, Bartal Veyhe. [pdf]

Models

(2019 CSS Book Chapter) Text as Data for Conflict Research: A Literature Survey. Seraphine F. MaerzCornelius Puschmann. [pdf]
(Talk@Google) Towards an NLP Pipeline for Conflict Narrative Detection. Stephen Anning, George Konstantinidis, Craig Webber. [link]
(2019 ACL Workshop) One-to-X Analogical Reasoning on Word Embeddings: a Case for Diachronic Armed Conflict Prediction from News Texts. Andrey Kutuzov, Erik Velldal, Lilja Øvrelid. [pdf]
(2017 ACL Workshop) Tracing armed conflicts with diachronic word embedding models. Andrey Kutuzov, Erik Velldal, Lilja Øvrelid. [pdf]
(2012, International Interactions) Event Data on Armed Conflict and Security: New Perspectives, Old Challenges, and Some Solutions. Sven Chojnacki, Christian Ickler, Michael Spies, John Wiesel. [pdf]

Q2: Can we use NLP to improve lives?

2.1 NLP for Education

(2023 EdArXiv) ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education. Enkelejda KasneciKathrin SeßlerStefan KüchemannMaria BannertDaryna DementievaFrank FischerUrs GasserGeorg GrohStephan GünnemannEyke HüllermeierStephan KruscheGitta KutyniokTilman MichaeliClaudia NerdelJürgen PfefferOleksandra PoquetMichael SailerAlbrecht SchmidtTina SeidelMatthias StadlerJochen WellerJochen KuhnGjergji Kasneci. [pdf]
(2021 EACL Workshop) 16th Workshop on Innovative Use of NLP for Building Educational Applications [website]

Improving textbooks

(2010 ACM Symposium) Enriching Textbooks Through Data Mining. Rakesh Agrawal, Sreenivas Gollapudi, Krishnaram Kenthapadi, Nitish Srivastava, Raja Velu. [pdf]

Automatic grading

(2016 EMNLP) A Neural Approach to Automated Essay Scoring. Kaveh Taghipour, Hwee Tou Ng. [pdf]
(2018 ICCL) Automated Scoring: Beyond Natural Language Processing. Nitin Madnani, Aoife Cahill. [pdf]

Plagiarism detection

(2010 IPC) Using Natural Language Processing for Automatic Detection of Plagiarism. Miranda Chong, Lucia Specia, Rusian Mitkov. [pdf] [poster]

Educational Question Answering

(2015 AIED) Educational Question Answering Motivated by Question-Specific Concept Maps. Thushari Atapattu, Katrina Falkner, Nickolas Falkner. [pdf]
(2016 IEEE) Question answering system on education acts using NLP techniques. Sweta P Lende and MM Raghuwanshi. [paper]

Reading/writing assistants

(e.g. writing assistants for Microsoft Word or Google Docs)

(2019 EMNLP) Modeling the Relationship between User Comments and Edits in Document Revision. Xuchao Zhang, Dheeraj Rajagopal, Michael Gamon, Sujay Kumar Jauhar, ChangTien Lu. [pdf]
(2020 CSCW) Characterizing Stage-Aware Writing Assistance in Collaborative Document Authoring. Bahareh Sarrafzadeh, Sujay Kumar Jauhar, Michael Gamon, Edward Lank, Ryen White. [pdf]

First, second (and subsequent) language learning

(2012, The encyclopedia of applied linguistics) Natural Language Processing and Language Learning. Detmar Meurers. [pdf]

Educational data mining from student data logs

(2010 ICEE) Data Mining and Student e-Learning Profiles. Mingming Zhou. [pdf]

Multimodal student-computer interaction

(2020 ICBL) A Multimodal Human-Computer Interaction System and Its Application in Smart Learning Environments. Jiyou Jia, Yunfan He, Huixiao Le. [paper]

Potential new directions to pursue

NLP for career path counseling
Tools for learners with disabilities
NLP for compiling Google searched articles to youth-friendly teaching materials
Student personalization and engagement: assessment of learners’ language and cognitive skill levels and systems that detect and adapt to learners’ cognitive or emotional states

2.2 NLP for mental health

Psychotherapy and counseling

(2020 ACL) What Makes a Good Counselor? Learning to Distinguish between High-quality and Low-quality Counseling Conversations. Verónica Pérez-Rosas, Xinyi Wu, Kenneth Resnicow, Rada Mihalcea. [pdf]
(2020 LREC) Inferring Social Media Users’ Mental Health Status from Multimodal Information. Zhentao Xu, Veronica Pérez-Rosas, Rada Mihalcea. [pdf]
(2020 EMNLP Workshop) Quantifying the Effects of COVID-19 on Mental Health Support Forums. Laura Biester, Katie Matton, Janarthanan Rajendran, Emily Mower Provost, Rada Mihalcea. [pdf]
(2020 arXiv) Expressive Interviewing: A Conversational System for Coping with COVID-19. Charles Welch, Allison Lahnala, Verónica Pérez-Rosas, Siqi Shen, Sarah Seraj, Larry An, Kenneth Resnicow, James Pennebaker, Rada Mihalcea. [pdf]
(2017 ACL) Understanding and Predicting Empathic Behavior in Counseling Therapy. Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An. [pdf]
(2017 EACL) Predicting Counselor Behaviors in Motivational Interviewing Encounters. Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An, Kathy J. Goggin, Delwyn Catley. [pdf]
(2016 TACL) Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Tim Althoff, Kevin Clark, Jure Leskovec. [pdf] [slides@Stanford] [video]
(2016 ACL Workshop) Building a Motivational Interviewing Dataset. Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An. [pdf]
(2014 Psychol Addict Behav) Sustain talk predicts poorer outcomes among mandated college student drinkers receiving a brief motivational intervention. Timothy R Apodaca, Brian Borsari, Kristina M Jackson, Molly Magill, Richard Longabaugh, Nadine R Mastroleo, Nancy P Barnett. [pdf]

NLP for happiness

(2019 ACII) Happiness Entailment: Automating Suggestions for Well-Being. Sara Evensen, Yoshihiko Suhara, Alon Halevy, Vivian Li, Wang-Chiew Tan, Saran Mumick. [pdf]
(2018 LREC) HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments. Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu. [pdf]
(2019, AffCon@AAAI) Ingredients for Happiness: Modeling constructs via semi-supervised content driven inductive transfer. Bakhtiyar Syed, Vijayasaradhi Indurthi, Kulin Shah, Manish Gupta, Vasudeva Varma. [pdf]

Mental health on social media (e.g., hate speech, hope speech, counter speech)

(2020 COLING Workshop) HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion. Bharathi Raja Chakravarthi. [pdf]

[Summary] "Hope speech" is text that is encouraging, positive and supportive, as opposed to hate speech.

Fermi at SemEval-2019 Task 5: Using Sentence Embeddings to identify Hate Speech against Immigrants and Women on Twitter.. Vijayasaradhi Indurthi, Bakhtiyar Syed, Manish Shrivastava, Nikhil Chakravartula, Manish Gupta, Vasudeva Varma. [pdf]
Fermi at SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media using Sentence Embeddings. Vijayasaradhi Indurthi, Bakhtiyar Syed, Manish Shrivastava, Manish Gupta, Vasudeva Varma. [pdf]
(2019 ACL) CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech. Yi-Ling Chung, Elizaveta Kuzmenko, Serra Sinem Tekiroğlu, and Marco Guerini. [pdf]
(2021 ACL Findings) Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech. Yi-Ling Chung, Serra Sinem Tekiroğlu, and Marco Guerini. [pdf]
(2021 Journal of Online Social Networks and Media) Empowering NGOs in Countering Online Hate Messages. Yi-Ling Chung, Serra Sinem Tekiroğlu, Sara Tonelli, and Marco Guerini [pdf]

Mental health through other text forms

(2020 SocInfo) Women worry about family, men about the economy: Gender differences in emotional responses to COVID-19. Isabelle van der Vegt, Bennett Kleinberg. [pdf]

Workshops and Resources

(Every year) CLPsych: Computational Linguistics and Clinical Psychology Workshop [website]
Research Fellowship Program (3-6 months) [FAQ]

2.3 NLP for Political Decision-Making

(2017 ACL Workshop) 200K+ Crowdsourced Political Arguments for a New Chilean Constitution. Constanza Fierro, Claudio Fuentes, Jorge Pérez, Mauricio Quezada. [pdf]
(2021 EMNLP Findings) Mining the Cause of Political Decision-Making from Social Media: A Case Study of COVID-19 Policies across the US States. Zhijing Jin, Zeyu Peng, Tejas Vaidhya, Bernhard Schoelkopf, Rada Mihalcea. [pdf]
(2022 Book Chapter) Natural Language Processing for Policymaking. Zhijing Jin, Rada Mihalcea. [pdf]
(2020 EMNLP Findings) Learning to Classify Events from Human Needs Category Descriptions. Haibo Ding, Zhe Feng. [pdf]

Q3: Can we use NLP to help the common future of all humans?

3.1 NLP for climate change

(Climate Change AI movement; see Table 1's column titled "NLP" for how NLP can help) Tackling Climate Change with Machine Learning. David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio. [pdf]
(2021 NAACL) Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism. Shraey Bhatia, Jey Han Lau, Timothy Baldwin. [pdf]
(2021 SSRN) Cheap Talk and Cherry-Picking: What ClimateBert has to say on Corporate Climate Risk Disclosures. Julia Anna Bingler, Mathias Kraus, Markus Leippold. [pdf]
(2021 SSRN) Ask BERT: How Regulatory Disclosure of Transition and Physical Climate Risks affects the CDS Term Structure. Julian F Kölbel, Markus Leippold, Jordy Rillaerts, Qian Wang. [pdf]
(2021 ACL Workshop) The Climate Change Debate and Natural Language Processing. Manfred Stede, Ronny Patz. [pdf]
(2020 EMNLP Findings) Detecting Stance in Media on Global Warming. Yiwei Luo, Dallas Card, Dan Jurafsky. [pdf]
(2020 NeurIPS Workshop) CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims. Thomas Diggelmann, Jordan Boyd-Graber, Jannis Bulian, Massimiliano Ciaramita, Markus Leippold. [pdf]
(2020 NeurIPS Workshop) ClimaText: A Dataset for Climate Change Topic Detection. Francesco S. Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, Markus Leippold. [pdf]
(2020 arXiv) You are right. I am ALARMED -- But by Climate Change Counter Movement. Shraey Bhatia, Jey Han Lau, Timothy Baldwin. [pdf]
(2017 EMNLP Workshop) Comparing Attitudes to Climate Change in the Media using sentiment analysis based on Latent Dirichlet Allocation. Ye Jiang, Xingyi Song, Jackie Harrison, Shaun Quegan, Diana Maynard. [pdf]
(2020 ICWSM Workshop) Learning Twitter User Sentiments on Climate Change with Limited Labeled Data Allison Koenecke, Jordi Feliu-Fabà. [pdf]
(2022 EMNLP Workshop) A Dataset of Sustainable Diet Arguments on Twitter Marcus Astrup Hansen, Daniel Hershcovich. [pdf]

3.2 NLP for Human Rights

Detecting Human Right Violation

Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases. Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis. [pdf]

[Please don't use NLP in this way] NLP for Privacy Invasion/Surveilance

Background

(Inmate medical surveilance; Intercept 2020) Prisons launch "absurd" attempt to detect coronavirus in inmate phone calls. Akela Lacy, Alice Speri, Jordan Smith, Sam Biddle.[Article]

[Summary] Automatically downloads, analyzes, and transcribes inmate calls, to identify sick inmates, help allocate personnel in understaffed prisons, and even prevent “COVID-19 related murder.
(Teenagers want privacy; Symposium discussion 2011) Social Privacy in Networked Publics: Teens’ Attitudes, Practices, and Strategies. Danah Boyd, Alice Marwick. [pdf]

Surveilance from companies

(Book 2019) The age of surveillance capitalism. Shoshana Zuboff. [Amazon] [pdf]

3.3 Fight against the manipulation of thoughts

Studying the existing trend

(2019 NAACL) Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings. Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Jesse Shapiro, Matthew Gentzkow, Dan Jurafsky. [pdf]

Studying Media Manipulation for Political Reasons

Keywords: Framing, persuation, manipulation.

(EMNLP 2018) Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies. Anjalie Field, Doron Kliger, Shuly Wintner, Jennifer Pan, Dan Jurafsky, and Yulia Tsvetkov. [pdf]
(EMNLP 2019) Fine-Grained Analysis of Propaganda in News Articles. Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Preslav Nakov. [pdf]
(2019 CoNLL) Predicting the Role of Political Trolls in Social Media. Atanas Atanasov, Gianmarco De Francisci Morales, Preslav Nakov. [pdf]
(2020 Harvard Review) Cross-platform disinformation campaigns: Lessons learned and next steps. Tom Wilson, Kate Starbird. [pdf]
(2018 ACL) Classification of Moral Foundations in Microblog Political Discourse. Kristen Johnson and Dan Goldwasser. [pdf]
(arXiv 2020) Automatically Characterizing Targeted Information Operations Through Biases Present in Discourse on Twitter. Autumn Toney, Akshat Pandey, Wei Guo, David Broniatowski, Aylin Caliskan. [pdf]
(Internet Policy Review 2019) Technology, autonomy, and manipulation. Daniel Susser, Beate Roessler, Helen Nissenbaum. [pdf]
(2020 Personality and Social Psychology Bulletin) Historical Change in the Moral Foundations of Political Persuasion. Nicholas Buttrick, Robert Moulder and Shigehiro Oishi. [pdf]
(2019) Measuring Proximity Between Newspapers and Political Parties: The Sentiment Political Compass. Fabian Falck, Julian Marstaller, Niklas Stoehr, Sören Maucher, Jeana Ren, Andreas Thalhammer, Achim Rettinger, Rudi Studer. [pdf]
(2019 arXiv) Red Bots Do It Better: Comparative Analysis of Social Bot Partisan Behavior. Luca Luceri, Ashok Deb, Adam Badawy, Emilio Ferrara. [pdf]
(CSCW 2018) Acting the Part: Examining Information Operations within #BlackLivesMatter Discourse. Ahmer Arif, Leo G. Stewart, Kate Starbird. [pdf]
(CSCW 2018) Assembling Strategic Narratives: Information Operations as Collaborative Work within an Online Community. Tom Wilson, Kaitlyn Zhou, and Kate Starbird. [pdf]
(2020) Potemkin Pages and Personas: Assessing GRU Online Operations, 2014-2019. Renee DiResta, Shelby Grossman. [pdf]
(2020 Book) Blame it on Iran, Qatar, and Turkey: An analysis of a Twitter and Facebook operation linked to Egypt, the UAE, and Saudi Arabia. Shelby Grossman, Khadija H., Renée DiResta, Tara Kheradpir, and Carly Miller. [pdf]
(2018 Book) Network propaganda: Manipulation, disinformation, and radicalization in American politics. Benkler, Yochai, Robert Faris, and Hal Roberts.
(1928) Propaganda. Edward Bernays.
(2019 NAACL) Issue Framing in Online Discussion Fora. Mareike Hartmann, Tallulah Jansen, Isabelle Augenstein, Anders Søgaard. [pdf]
(2019 EMNLP) Modeling Frames in Argumentation. Yamen Ajjour, Milad Alshomary, Henning Wachsmuth, Benno Stein. [pdf]
(2020 AAAI) Automatically Neutralizing Subjective Bias in Text. Reid Pryzant, Richard Diehl Martinez, Nathan Dass, Sadao Kurohashi, Dan Jurafsky, Diyi Yang. [pdf]
(2020 WebSci) A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017. Haewoon Kwak, Jisun An, Yong-Yeol Ahn [pdf]
(2020 arXiv) FrameAxis: Characterizing Framing Bias and Intensity with Word Embedding. Haewoon Kwak, Jisun An, Elise Jing, Yong-Yeol Ahn. [pdf]
(2017 EMNLP) Connotation Frames of Power and Agency in Modern Films. Maarten Sap, Marcella Cindy Prasettio, Ari Holtzman, Hannah Rashkin, Yejin Choi. [pdf]
(2016 EMNLP) Analyzing Framing through the Casts of Characters in the News. Dallas Card, Justin Gross, Amber Boydstun, Noah A. Smith. [pdf]
(2015 ACL) The Media Frames Corpus: Annotations of Frames Across Issues Dallas Card, Amber E. Boydstun, Justin H. Gross, Philip Resnik, Noah A. Smith. [pdf]
(2013 ACL) Linguistic Models for Analyzing and Detecting Biased Language. Marta Recasens, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky. [pdf]
(WWW 2019) Who falls for online political manipulation? Adam Badawy, Kristina Lerman, and Emilio Ferrara. [pdf]

Fake news and misinformation

(2021 arXiv) Misinfo Belief Frames: A Case Study on Covid & Climate News. Saadia Gabriel, Skyler Hallinan, Maarten Sap, Pemi Nguyen, Franziska Roesner, Eunsol Choi, Yejin Choi. [pdf]
(2020 Nature) The online competition between pro- and anti-vaccination views. Neil F. Johnson, Nicolas Velásquez, Nicholas Johnson Restrepo, Rhys Leahy, Nicholas Gabriel, Sara El Oud, Minzhang Zheng, Pedro Manrique, Stefan Wuchty, Yonatan Lupu. [pdf]
(2021 ICWSM) Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms Firoj Alam, Fahim Dalvi, Shaden Shaar, Nadir Durrani, Hamdy Mubarak, Alex Nikolov, Giovanni Da San Martino, Ahmed Abdelali, Hassan Sajjad, Kareem Darwish, Preslav Nakov [pdf]
(2021 arXiv) A Survey on Multimodal Disinformation Detection Firoj Alam, Stefano Cresci, Tanmoy Chakraborty, Fabrizio Silvestri, Dimiter Dimitrov, Giovanni Da San Martino, Shaden Shaar, Hamed Firooz, Preslav Nakov [pdf]
(2021 IJCAI) Automated Fact-Checking for Assisting Human Fact-Checkers Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, Giovanni Da San Martino [pdf]
(2020 LREC) r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection. Kai Nakamura, Sharon Levy, William Yang Wang. [pdf]
(2020 arXiv) You are right. I am ALARMED -- But by Climate Change Counter Movement. Shraey Bhatia, Jey Han Lau, Timothy Baldwin. [pdf]
(2020 European Urology Focus) Fake News: Spread of Misinformation about Urological Conditions on Social Media. Stacy Loeb, Jacob Taylor, James F. Borina, Rada Mihalcea, Veronica Perez-Rosas, Nataliya Byrne, Austin L. Chiang, Aisha Langford [pdf]
(2020 arXiv) Coronavirus on social media: Analyzing misinformation in Twitter conversations. Sharma, Karishma, Sungyong Seo, Chuizheng Meng, Sirisha Rambhatla, Aastha Dua, and Yan Liu. [pdf]
(2020, Reuters Institute) Types, Sources, and Claims of COVID-19 Misinformation. J. Scott Brennen, Felix M. Simon, Philip N. Howard, and Rasmus Kleis Nielsen. [pdf]
(2020 Wired article) The Professors Who Call ‘Bullshit’ on Covid-19 Misinformation. Jevin West, Carl Bergstrom. [website]
(2019 ACM) Combating Fake News: A Survey on Identification and Mitigation Techniques. Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, Yan Liu. [pdf]
(2018 COLING) Automatic Detection of Fake News. Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, Rada Mihalcea. [pdf]
(2018 Science) The spread of true and false news online. Vosoughi, Soroush, Deb Roy, and Sinan Aral. [pdf]
(2018 ACM) A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. Xinyi Zhou, Reza Zafarani. [pdf]
(2018 AJPH) Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate. David A Broniatowski, Amelia M Jamison, SiHua Qi, Lulwah AlKulaib, Tao Chen, Adrian Benton, Sandra C Quinn, Mark Dredze. [pdf]
(2017 ACL) Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. William Yang Wang. [pdf]
(2017 EMNLP) Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, Yejin Choi. [pdf]
(2017, Journal of economic perspectives) Social Media and Fake News in the 2016 Election. Hunt Allcott, Matthew Gentzkow. [pdf]
(2014 ICWSM) Rumor Cascades. Adrien Friggeri, Lada A. Adamic, Dean Eckles, Justin Cheng. [pdf]
(2014 iConference) Rumors, False Flags, and Digital Vigilantes: Misinformation on Twitter after the 2013 Boston Marathon Bombing. Kate Starbird, Jim Maddock, Mania Orand, Peg Achterman, Robert M. Mason. [pdf]
(1986, Raritan Quarterly Review) On Bullshit. Harry Frankfurt. [pdf]

Fact-checking

(2021 arXiv) Extractive and Abstractive Explanations for Fact-Checking and Evaluation of News. Ashkan Kazemi, Zehua Li, Verónica Pérez-Rosas, Rada Mihalcea. [pdf]
(2020 ACL) That is a Known Lie: Detecting Previously Fact-Checked Claims. Shaden Shaar, Giovanni Da San Martino, Nikolay Babulkov, Preslav Nakov. [pdf]
(2020 CL) The Limitations of Stylometry for Detecting Machine-Generated Fake News. Tal Schuster, Roei Schuster, Darsh J Shah, Regina Barzilay. [pdf]
(2020 EMNLP) Fact or Fiction: Verifying Scientific Claims. David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, Hannaneh Hajishirzi. [pdf]
(2020 ICWSM) A Benchmark Dataset of Check-worthy Factual Claims. Fatma Arslan, Naeemul Hassan, Chengkai Li, Mark Tremayne. [pdf]
(2020 ACL) Generating Fact Checking Explanations. Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein. [pdf]
(2019 EMNLP) MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. Augenstein, Isabelle, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, and Jakob Grue Simonsen. [pdf]
(2019 NeurIPS) Defending Against Neural Fake News. Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi. [pdf]

[Summary] Fake news generated by a large neural network, Grover, can appear more trustworthy than those written by human. Since the most accurate discriminator against Grover in the trial is Grover itself, the authors decided to release the model to researchers to facilitate misinformation detection, despite the potential damage it could cause as a fake news generator.
(2019 EMNLP) Evaluating adversarial attacks against multiple fact verification systems. James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal. [pdf]
(2019 EMNLP) Towards Debiasing Fact Verification Models. Tal Schuster, Darsh Shah, Yun Jie Serene Yeo, Daniel Roberto Filizzola Ortiz, Enrico Santus, Regina Barzilay. [pdf]
(2018 NAACL) FEVER: a large-scale dataset for Fact Extraction and VERification. James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal. [pdf]

Q4: Can we use NLP to help make all people equal?

4.1 NLP for all languages

[Main focus] Spreading the benefit of NLP to low-resource languages; equal gender ratio.

Motivation

(2021 arXiv) Systematic Inequalities in Language Technology Performance across the World's Languages. Damián Blasi, Antonios Anastasopoulos, Graham Neubig. [pdf]
(2020 ACL) The State and Fate of Linguistic Diversity and Inclusion in the NLP World. Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, Monojit Choudhury. [pdf]

[Summary] Only very few out of >7000 languages are represented in NLP.
(2021 ACL Demo) ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback. Shiyue Zhang, Benjamin Frey, Mohit Bansal. [pdf]
(2010 EAMT) Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes. William D. Lewis. [pdf]

4.2 NLP for gender/demographical equality

NLP to detect bias

(2021 ICWSM) Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence. Navid Rekabsaz, Robert West, James Henderson, Allan Hanbury. (2020 ACL) [pdf]
(2020 ACL) When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People? Kenneth Joseph, Jonathan H. Morgan. [pdf]
(2020 NBER Manuscript) Stereotypes in High-Stakes Decisions: Evidence from U.S. Circuit Courts. Elliott Ash, Daniel L. Chen, Arianna Ornaghi. [pdf]
(2019 ACL) Entity-Centric Contextual Affective Analysis. Anjalie Field, Yulia Tsvetkov. [pdf]
(2019 ACL) Unsupervised Discovery of Gendered Language through Latent-Variable Modeling. Alexander Miserlis Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Isabelle Augenstein, Ryan Cotterell. [pdf]
(2019 EMNLP) Automatically Inferring Gender Associations from Language. Serina Chang, Kathy McKeown. [pdf]
(2019 ICWSM) Contextual Affective Analysis: A Case Study of People Portrayals in Online #MeToo Stories. Anjalie Field, Gayatri Bhat, Yulia Tsvetkov. [pdf]
(2019 BRiMS) Relating Linguistic Gender Bias, Gender Values, and Gender Gaps: An International Analysis. Scott Friedman, Sonja Schmer-Galunder, Jeffrey Rye, Robert Goldman, and Anthony Chen. [pdf]
(2018 PNAS) Word embeddings quantify 100 years of gender and ethnic stereotypes. Nikhil Garg, Londa Schiebinger, Dan Jurafsky, James Zou. [pdf]
(2017 PNAS) Language from police body camera footage shows racial disparities in officer respect. Rob Voigt, Nicholas P. Camp, Vinodkumar Prabhakaran, William L. Hamilton, Rebecca C. Hetey, Camilla M. Griffiths, David Jurgens, Dan Jurafsky, and Jennifer L. Eberhardt. [pdf]
(2016 IJCAI Workshop) Tie-breaker: Using language models to quantify gender bias in sports journalism. Liye Fu, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. [pdf]
(2016 CWSM) Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community. Ethan Fast, Tina Vachovsky, Michael S. Bernstein. [pdf]

NLP to detect bias specifically on social media

(2020 ACL) Social Bias Frames: Reasoning about Social and Power Implications of Language. Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, Yejin Choi. [pdf]
(2020 arXiv) Racism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis. Caleb Ziems, Bing He, Sandeep Soni, Srijan Kumar. [pdf]
(2020 EMNLP Workshop) Detecting East Asian Prejudice on Social Media. Bertie Vidgen, Austin Botelho, David Broniatowski, Ella Guest, Matthew Hall, Helen Margetts, Rebekah Tromble, Zeerak Waseem, Scott Hale. [pdf]
(2017 CSCW) Girls rule, boys drool: Extracting semantic and affective stereotypes from Twitter. Kenneth Joseph, Wei Wei, and Kathleen M. Carley. [pdf]

Q5: Are there concerns over the practice of NLP? Can we mitigate this?

AI Incident Database. Partnership AI. https://incidentdatabase.ai/

5.1 Prevent future scandals of conversational bots - How dialog systems should handle verbal abuse

(2016, International Journal of Communication) Talking to Bots: Symbiotic Agency and the Case of Tay. Gina Neff, Peter Nagy. [pdf]
(2020 CHI) Empathy Is All You Need: How a Conversational Agent Should Respond to Verbal Abuse. Hyojin Chin, Lebogang Wame Molefi, Mun Yong Yi. [pdf]
(2019 UNESCO Report) I'd Blush if I Could: Closing Gender Divides in Digital Skills Through Education [pdf]
(2018 CHI) Let's Talk About Race: Identity, Chatbots, and AI. Ari Schlesinger, Kenton P. O'Hara, Alex S. Taylor. [pdf]
(2018 ACL workshop) #MeToo: How Conversational Systems Respond to Sexual Harassment. Amanda Cercas Curry, Verena Rieser. [pdf]
(2016 EMNLP) How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, Joelle Pineau. [pdf]

5.2 Who protects my privacy? - Privacy Issues with Dataset Collection

Motivation: Public social media data can invade user privacy.

The Common Rule: The Federal Policy for the Protection of Human Subjects. [45 CFR part 46]
(Human Rights: On Nuremberg Code; New England Journal of Medicine 1997) Fifty years later: the significance of the Nuremberg Code. Evelyne Shuster. [paper]
ACM Code of Ethics.

[Summary] Papers should 1.2 Avoid harm; 1.4 Be fair and take action not to discriminate; 1.6 Respect privacy; 2.6 Perform work only in areas of competence; and 3.1 Ensure that the public good is the central concern during all professional computing work.

Promoting data ethic norms for the community

(TACL 2018) Data statements for NLP: Toward mitigating system bias and enabling better science. Emily Bender and Batya Friedman. [pdf]
(arXiv 2020) Datasheets for Datasets. Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, Kate Crawford. [pdf]
(CSCW 2016) Beyond the Belmont Principles: Ethical Challenges, Practices, and Beliefs in the Online Data Research Community. Jessica Vitak, Katie Shilton, Zahra Ashktorab. [pdf]

[Summary] A survey of 200+ researchers' current practice
(Nature Digital Medicine 2018) Don't quote me: reverse identification of research participants in social media studies. John W Ayers, Theodore L Caputi, Camille Nebeker, Mark Dredze. [pdf]

[Summary] Do not quote users. 72% articles quoted tweets, and the tweeter can be identified 84% of the time.

User surveys about their data being used

(Social Media + Society 2018) "Participant" Perception of Twitter research ethics. Casey Fiesler and Nicholas Proferes. [pdf]

[Summary] (1) Few users knew their public tweets could be used by researchers; (2) The majority thought their consent is important
(Sociology 2017) Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users’ Views, Online Context and Algorithmic Estimation. Matthew L Williams, Pete Burnap, Luke Sloan. [pdf]

Building models that preserves privacy

Writer Profiling Without the Writer’s Text. David Jurgens, Yulia Tsvetkov, Dan Jurafsky. [pdf]

[Summary] Linguistic cues are predictive of user gender, age, religion, diet, and personality traits.
(Dialog; AIES 2017) Ethical Challenges in Data-Driven Dialogue Systems. Peter Henderson, Koustuv Sinha, Nicolas Angelard-Gontier, Nan Rosemary Ke, Genevieve Fried, Ryan Lowe, Joelle Pineau. [pdf]
Privacy-preserving Neural Representations of Text. Maximin Coavoux, Shashi Narayan, Shay B. Cohen. [pdf]

[Summary] Build representations with a tradeoff between privacy and utility of neural representations.
(USENIX Symposium 2019) The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks. Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, Dawn Song. [pdf] [4-gram experiment by Prof Yoav Goldberg]

Improving data quality for ML models

(ACM FAT* 2020) Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From? Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, Jenny Huang. [pdf]

[Summary] Inspecting papers to see whether the data collection process is reasonable.
(SIGdial 2007) Comparing Spoken Dialog Corpora Collected with Recruited Subjects versus Real Users. Hua Ai, Antoine Raux, Dan Bohus, Maxine Eskenazi, Diane Litman. [pdf]

[Summary] Recruited subjects talk more and faster, while real users ask for more help and more frequently interrupt the system.

5.3 Can we prevent our model from being a sexist/racist/etc?

Gender/Demographical bias in models and data - Background

(Science 2017) Semantics derived automatically from language corpora necessarily contain human biases. Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. [pdf]

[Summary] Not only for machines, but psychology finds that humans also learns to be biased after some language inputs. Reason: Langauage contains recoverable and accurate imprints of our historic biases. Effect: We have various biases, including these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first names.
(Book 1999) Sorting Things Out. Geoffrey C. Bowker, Susan Leigh Star. [intro]

Survey/Overview

(2020 arXiv) Language (Technology) is Power: A Critical Survey of "Bias" in NLP. Su Lin Blodgett, Solon Barocas, Hal Daumé III, Hanna Wallach. [pdf]

[Summary] Surveyed 147 papers. NLP researchers should articulate (1) what "bias" they mean—i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, and (2) normative reasoning behind.
(2019 ACL) Mitigating Gender Bias in Natural Language Processing: Literature Review. Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, William Yang Wang. [pdf]
(2018 Book) Algorithms of Oppression. Safiya Noble. [Amazon]
(2017 NIPS Keynote) The trouble with bias. Kate Crawford.

Gender vs. Embeddings

(Gender in word embeddings; NIPS 2016) Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai.
(Gender in word embeddings; NAACL 2019) Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. Hila Gonen, Yoav Goldberg. [pdf]

[Summary] Existing methods mostly hiding the bias, not removing it.
(Gender in sentence embeddings; NAACL 2019) On Measuring Social Biases in Sentence Encoders. Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger. [pdf]
(Gender in sentence embeddings; NeurIPS 2019) Assessing Social and Intersectional Biases in Contextualized Word Representations. Yi Chern Tan, L. Elisa Celis. [pdf]
(Gender in word embeddings; ACL 2019 Workshop) Measuring bias in contextualized word representations. / Quantifying Social Biases in Contextual Word Representations. Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. [pdf]
(Gender in coreference resolution; NAACL 2018) Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang. [pdf]
(Gender in coreference resolution; NAACL 2018) Gender Bias in Coreference Resolution. Rudinger, Jason Naradowsky, Brian Leonard, Benjamin Van Durme.
(Gender and race in sentiment analysis; *SEM 2018) Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. Svetlana Kiritchenko, Saif Mohammad. [pdf]
(Gender in POS tagging and parsing; ACL 2019) Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing. Aparna Garimella, Carmen Banea, Dirk Hovy, Rada Mihalcea. [pdf]
(Gender in relation extraction; ACL 2020) Towards Understanding Gender Bias in Relation Extraction. Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, William Yang Wang. [pdf] [video] [data]
(Gender in MT; Neural Computing and Applications 2019) Assessing Gender Bias in Machine Translation – A Case Study with Google Translate. Marcelo O. R. Prates, Pedro H. C. Avelar, Luis Lamb. [pdf]
(Gender in MT; ACL 2020) Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem. Danielle Saunders, Bill Byrne. [pdf]
(Gender in MT; EMNLP Findings 2020) Automatically Identifying Gender Issues in Machine Translation using Perturbations. Hila Gonen, Kellie Webster. [pdf]
(Gender in coreference resolution; ACL 2020) Toward Gender-Inclusive Coreference Resolution. Yang Trista Cao, Hal Daumé III. [pdf]
(Gender in NER; ACM 2019) Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition. Ninareh Mehrabi, Thamme Gowda, Fred Morstatter, Nanyun Peng, Aram Galstyan. [pdf]
(TACL 2018) Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. Kellie Webster, Marta Recasens, Vera Axelrod, Jason Baldridge [pdf]
(EMNLP 2017) Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang. [pdf]
(ACL 2020) Mitigating Gender Bias Amplification in Distribution by Posterior Regularization. Shengyu Jia, Tao Meng, Jieyu Zhao, Kai-Wei Chang. [pdf]
(Bias in dialog; AIES 2017) Ethical Challenges in Data-Driven Dialogue Systems. Peter Henderson, Koustuv Sinha, Nicolas Angelard-Gontier, Nan Rosemary Ke, Genevieve Fried, Ryan Lowe, Joelle Pineau. [pdf]

Demographics vs. NLP Model Performance

(African American English; EMNLP 2016) Demographic Dialectal Variation in Social Media: A Case Study of African-American English. Su Lin Blodgett, Lisa Green, Brendan O'Connor. [pdf]
(African American English; PNAS 2020) Racial Disparity in Automated Speech Recognition. Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John Rickford, Dan Jurafsky, and Sharad Goel.
(2020 ACL) Social Biases in NLP Models as Barriers for Persons with Disabilities. Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, Stephen Denuyl. [pdf]
(ACL 2017 Workshop) Social Bias in Elicited Natural Language Inferences. Rachel Rudinger, Chandler May, Benjamin Van Durme. [pdf]
(Interspeech 2017) Effects of talker dialect, gender and race on accuracy of Bing speech and YouTube automatic captions. Rachael Tatman, Conner Kasten. [pdf]
(Annual Review of Political Science 2016) Race as a Bundle of Sticks: Designs that Estimate Effects of Seemingly Immutable Characteristics. Maya Sen, Omar Wasow. [pdf]
(KDD 2017 Workshop) Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English. Su Lin Blodgett, Brendan O'Connor. [pdf]

Algorithmic fairness

(FAccT 2021) Re-imagining Algorithmic Fairness in India and Beyond. Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, Vinodkumar Prabhakaran. [pdf] [criticism]

[Summary] Current research on AI ethics is US-centric, not easily transferrable to other countries.

Caveats of Large Language Models

(FAccT 2021) On the Dangers of Stochastic Parrots: Can Languae Models be Too Big. Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell. [pdf] [criticism]
[Summary] Warns large LM's issues, e.g., climate problem, bias, etc.

5.4 Can we save energy when training NLP models? - GreenNLP

(ACL 2019) Energy and policy considerations for deep learning in NLP. Emma Strubell, Ananya Ganesh, Andrew McCallum. [pdf] [video]
(CACM 2020) Green AI. Roy Schwartz, Jesse Dodge, Noah A Smith, Oren Etzioni. [pdf] [video]
(arXiv 2020) Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning. Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau [pdf]
(EMNLP 2022) Towards Climate Awareness in NLP Research. Daniel Hershcovich, Nicolas Webersinke, Mathias Kraus, Julia Anna Bingler, Markus Leippold. [pdf]

5.5 Be alert of recommendation systems

(2019 SSRN) Recommender systems and their ethical challenges. Silvia Milano, Mariarosaria Taddeo, and Luciano Floridi. [pdf]

5.6 Equip AI with the same morals as humans

(2020 EMNLP) Social Chemistry 101: Learning to Reason about Social and Moral Norms. Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, Yejin Choi. [pdf]
(2019 arXiv) BERT has a Moral Compass: Improvements of ethical and moral values of machines. Patrick Schramowski, Cigdem Turan, Sophie Jentzsch, Constantin Rothkopf, Kristian Kersting. [pdf]

Engagement from Non-Academic Areas

Non-Profit Movements

(EA Movement) 80,000 Hours -> Career advice to design your career time (80,000 hours) in the rational way to optimize social good.

Start-Ups

Wadhwani AI https://www.wadhwaniai.org/

Resources of (general) AI for social good

Introduction to Key Concepts in AI and Machine Learning for Good. James Weis, Geeticka Chauhan. [slides]

Gov AI@Oxford, CHAI by Stuart Russell@Berkeley promotes AI that can be compatible with humans

(Call for AI-Human Cooperation) Open Problems in Cooperative AI. Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel. [pdf]

Nick Bostrom: prioritize prevention of existential risks

(2013) Existential risk prevention as global priority. Nick Bostrom

Other materials

(2019) 24.131: Ethics of Technology [reading list]
(Discussion of community engagement, ICLR 2020) Participatory Problem Formulation for Fairer Machine Learning Through Community Based System Dynamics. Donald Martin Jr., Vinodkumar Prabhakaran, Jill Kuhlberg, Andrew Smart, William S. Isaac. [pdf]

Acknowledgements

Lots of credits to the reading list of Stanford CS384.

Files

README.md

Latest commit

History

README.md

File metadata and controls

An Extensive Paper List (and Various Resources) on NLP for Social Good

Contents (Actively Updating)

Meta-Info

Events and News

Overview Papers

Videos

Overview of proactive NLP to help social good

Overview of methods to patch intrinsic problems with NLP research (side effects)

Q1: Can we use NLP to save lifes?

1.1 NLP for healthcare (with EHRs)

Reviews

NLP on clinical notes

NLP to facilitate biomedical research

NLP to help reduce bias in Healthcare

1.2 NLP for disaster response

Resources and Datasets

Models

1.3 NLP to detect armed conflicts

Resources and Datasets

Models

Q2: Can we use NLP to improve lives?

2.1 NLP for Education

Improving textbooks

Automatic grading

Plagiarism detection

Educational Question Answering

Reading/writing assistants

First, second (and subsequent) language learning

Educational data mining from student data logs

Multimodal student-computer interaction

Potential new directions to pursue

2.2 NLP for mental health

Psychotherapy and counseling

NLP for happiness

Mental health on social media (e.g., hate speech, hope speech, counter speech)

Mental health through other text forms

Workshops and Resources

2.3 NLP for Political Decision-Making

Q3: Can we use NLP to help the common future of all humans?

3.1 NLP for climate change

3.2 NLP for Human Rights

Detecting Human Right Violation

[Please don't use NLP in this way] NLP for Privacy Invasion/Surveilance

Background

Surveilance from companies

3.3 Fight against the manipulation of thoughts

Studying the existing trend

Studying Media Manipulation for Political Reasons

Fake news and misinformation

Fact-checking

Q4: Can we use NLP to help make all people equal?

4.1 NLP for all languages

Motivation

4.2 NLP for gender/demographical equality

NLP to detect bias

NLP to detect bias specifically on social media

Q5: Are there concerns over the practice of NLP? Can we mitigate this?

5.1 Prevent future scandals of conversational bots - How dialog systems should handle verbal abuse

5.2 Who protects my privacy? - Privacy Issues with Dataset Collection

Promoting data ethic norms for the community

User surveys about their data being used

Building models that preserves privacy

Improving data quality for ML models

5.3 Can we prevent our model from being a sexist/racist/etc?

Gender/Demographical bias in models and data - Background

Survey/Overview

Gender vs. Embeddings

Demographics vs. NLP Model Performance

Algorithmic fairness

Caveats of Large Language Models

5.4 Can we save energy when training NLP models? - GreenNLP

5.5 Be alert of recommendation systems

5.6 Equip AI with the same morals as humans

More reading (for Systematic learning)