Skip to content

Latest commit

 

History

History
409 lines (292 loc) · 15.5 KB

File metadata and controls

409 lines (292 loc) · 15.5 KB

Auditing (Safety and Accountability)

Aspects of Trustworthy AI and Application Domain

Auditing (Safety and Accountability)

Datasets Used by Cited Publications (Click to expand)

  • ScienceDirect [6]: A bibliographic database that hosts over 18 million publications from more than 4,000 journals and more than 30,000 e-books from the publisher Elsevier. Launched back in 1997, ScienceDirect includes papers from engineering and medical research areas and social sciences and humanities.  Used by: [21]

  • World Bank [9]: A publicly available collection of datasets that facilitate the analysis of global development. Researchers can use this data to compare countries under different developmental aspects, including agricultural progress, poverty, population dynamics, and economic growth.  Used by: [24]

  • World Economic Forum (WEF) [22]: The WEF is an international non-governmental based in Switzerland that publishes economic reports such as the Global Competitiveness Report. The reports are available online, with some of the data being easily accessible through websites like Knoema.  Used by: [10]

  • OECD.Stat [14]: This webpage includes data and metadata for OECD countries and selected non-member economies. The online platform allows researchers to traverse the collected data through given data themes or via search-engine queries.  Used by: [10]

  • Global Brand Database [23]: An online database hosted by the World Intellectual Property Organization (WIPO) that contains information about Trademark applications (e.g., owner of the trademark, its status, or the designation country). It currently contains almost 53 million records from 73 data sources.  Used by: [10]

  • PubMed [2]: A widely-known, free-to-access search engine for biomedical and life science literature developed and maintained by the National Center for Biotechnology Information (NCBI). Researchers can find more than 34 million citations and abstracts of articles. PubMed does not host the articles themselves but frequently provides a link to the full-text articles.  Used by: [8]

  • ProQuest Central [15]: A database containing dissertations and theses in a multitude of disciplines. It currently contains more than 5 million graduate works.  Used by: [8]

  • Cochrane Central Register of Controlled Trials (CENTRAL) [5]: A database of reports for randomized and quasi-randomized controlled trials collected from different online databases. Although it does not contain full-text articles, the CENTRAL includes bibliographic details and often an abstract of the report.  Used by: [8]

  • PsycINFO [1]: A database hosted and developed by American Psychological Association containing abstracts for more than five million articles in the field of psychology.  Used by: [8]

  • Lending Club [7]: A dataset that contains information about all accepted and rejected peer-to-peer loan applications of LendingClub. Currently, the data are only available through the referenced Kaggle entry, as the company no longer provides peer-to-peer loan services1:.  Used by: [19]

  • Taiwanese Credit Data [20,25]: A real-world dataset containing payment data collected in October 2005 from a Taiwanese bank. The commonly used pre-processed version2: [20] contains data from 30,000 individuals described through 16 features (e.g., marital status, age, or payment history).  Used by: [19]

Interesting Causal Tools (Click to expand)

  • CausalImpact [3]: This R package allows users to conduct causal impact assessment for planned interventions on serial data given a response time series and an assortment of control time series. For this purpose, CausalImpact enables the construction of a Bayesian structural time-series model that can be used to predict the resulting counterfactual of an intervention.

  • Causal Inference 360 [18]: A Python package developed by IBM to infer causal effects from given data. Causal Inference 360 includes multiple estimation methods, a medical dataset, and multiple simulation sets. The provided methods can be used for any complex ML model through a scikit-learn-inspired API.

  • gCastle [26]: An end-to-end causal structure learning toolbox that is equipped with 19 techniques for Causal Discovery. It also assists users in data generation and evaluating learned structures. Having a firm understanding of the causal structure is crucial for safety-related research.

  • Benchpress [16]: A benchmark for causal structure learning allowing users to compare their causal discovery methods with over 40 variations of state-of-the-art algorithms. The plethora of available techniques in this single tool could facilitate research into safety and accountability of ML systems through causality.

  • CauseEffectPairs [13]: A collection of more than 100 databases, each annotated with a two-variable cause-effect relationship (e.g., access to drinking water affects infant mortality). Given a database, models need to distinguish between the cause and effect variables.

Prominent Non-Causal Tools (Click to expand)

  • Government of Canada’s AIA tool [4]: The Algorithmic Impact Assessment (AIA) tool is a questionnaire developed in the wake of Canada’s Directive on Automated Decision Making3:. Employees of the Canadian Government wishing to employ automatic decision-making systems in their projects first need to assess the impact of such systems via this tool. Based on answers given to ca. 80 questions revolving around different aspects of the projects, AIA will output two scores: one indicating the risks that automation would bring and one that quantifies the quality of the risk management.

  • Aequitas [17]: An open-source auditing tool designed to assess the bias of algorithmic decision-making systems. It provides utility for evaluating the bias of decision-making outcomes and enables users to assess the bias of actions taken directly.

  • Error Analysis (Responsible AI) [12]: As part of the Responsible AI toolbox, Error Analysis is a model assessment tool capable of identifying subsets of data in which the model performs poorly (e.g., black citizens being more frequently misclassified as potential re-offenders). It also enables users to diagnose the root cause of such poor performance.

  • ML-Doctor [11]: A codebase initially used to compare and evaluate different inference attacks (membership inference, model stealing, model inversion, and attribute inference). Due to its modular structure, it can also be used as a Risk Assessment tool for analyzing the susceptibility against SOTA privacy attacks.

References

[1] American Psychological Association. 2022. PsycINFO. Retrieved from https://www.apa.org/pubs/databases/psycinfo/index

[2] National Center for Biotechnology Information. 2022. PubMed. Retrieved from https://pubmed.ncbi.nlm.nih.gov/

[3] Kay H Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L Scott. 2015. Inferring causal impact using bayesian structural time-series models. The Annals of Applied Statistics (2015), 247–274.

[4] Government of Canada. Algorithmic Impact Assessment tool. Retrieved from https://open.canada.ca/aia-eia-js/?lang=en

[5] Cochrane. 2022. Cochrane library. Retrieved from https://www.cochranelibrary.com/central

[6] Elsevier. 2023. Science direct. Retrieved from https://www.sciencedirect.com

[7] Nathan George. 2018. All lending club loan data. Retrieved from https://www.kaggle.com/datasets/wordsforthewise/lending-club

[8] Brigid M Gillespie, Joseph Gillespie, Rhonda J Boorman, Karin Granqvist, Johan Stranne, and Annette Erichsen-Andersson. 2021. The impact of robotic-assisted surgery on team performance: A systematic mixed studies review. Human factors 63, 8 (2021), 1352–1379.

[9] The World Bank Group. 2022. World development indicators. Retrieved from https://data.worldbank.org/indicator

[10] Muhammad Haseeb, Leonardus WW Mihardjo, Abid Rashid Gill, Kittisak Jermsittiparsert, and others. 2019. Economic impact of artificial intelligence: New look for the macroeconomic assessment in asia-pacific region. International Journal of Computational Intelligence Systems 12, 2 (2019), 1295.

[11] Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, and Yang Zhang. 2022. ML-Doctor: Holistic risk assessment of inference attacks against machine learning models. In 31st usenix security symposium (usenix security 22), USENIX Association, Boston, MA, 4525–4542. Retrieved from https://www.usenix.org/conference/usenixsecurity22/presentation/liu-yugeng

[12] Microsoft. 2022. Responsible AI Toolbox.

[13] Joris M Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. The Journal of Machine Learning Research 17, 1 (2016), 1103–1204.

[14] Organisation for Economic Co-operation and Development. 2023. OECD statistics. Retrieved from https://stats.oecd.org/

[15] ProQuest. 2022. ProQuest. Retrieved from https://www.proquest.com/

[16] Felix L. Rios, Giusi Moffa, and Jack Kuipers. 2021. Benchpress: A scalable and platform-independent workflow for benchmarking structure learning algorithms for graphical models. Retrieved from http://arxiv.org/abs/2107.03863

[17] Pedro Saleiro, Benedict Kuester, Abby Stevens, Ari Anisfeld, Loren Hinkson, Jesse London, and Rayid Ghani. 2018. Aequitas: A bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577 (2018).

[18] Yishai Shimoni, Ehud Karavani, Sivan Ravid, Peter Bak, Tan Hung Ng, Sharon Hensley Alford, Denise Meade, and Yaara Goldschmidt. 2019. An evaluation toolkit to guide model selection and cohort definition in causal inference. arXiv preprint arXiv:1906.00442 (2019).

[19] Stratis Tsirtsis and Manuel Gomez Rodriguez. 2020. Decisions, counterfactual explanations and strategic behavior. Advances in Neural Information Processing Systems 33, (2020), 16749–16760.

[20] Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable recourse in linear classification. In Proceedings of the conference on fairness, accountability, and transparency, 10–19.

[21] Guillaume Voegeli, Werner Hediger, and Franco Romerio. 2019. Sustainability assessment of hydropower: Using causal diagram to seize the importance of impact pathways. Environmental Impact Assessment Review 77, (2019), 69–84.

[22] World Economic Forum. 2023. World economic forum. Retrieved from https://www.weforum.org/reports/

[23] World Intellectual Property Organization. 2023. Global brand database. Retrieved from https://branddb.wipo.int/en/

[24] Susie R Wu, Jiquan Chen, Defne Apul, Peilei Fan, Yanfa Yan, Yi Fan, and Peiling Zhou. 2015. Causality in social life cycle impact assessment (slcia). The International Journal of Life Cycle Assessment 20, 9 (2015), 1312–1323.

[25] I-Cheng Yeh and Che-hui Lien. 2009. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert systems with applications 36, 2 (2009), 2473–2480.

[26] Keli Zhang, Shengyu Zhu, Marcus Kalander, Ignavier Ng, Junjian Ye, Zhitang Chen, and Lujia Pan. 2021. GCastle: A python toolbox for causal discovery. arXiv preprint arXiv:2111.15155 (2021).

Footnotes

  1. https://www.lendingclub.com/investing/peer-to-peer

  2. Available at https://github.com/ustunb/actionable-recourse/tree/master/examples/paper/data under the name "credit_processed.csv"

  3. http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592