Skip to content

2022 RISC IL Project Plan

poorejc edited this page Jun 13, 2022 · 4 revisions

Needs for Knowledge Management

With an increased wealth of public reports about world events in both news and social media, analysts who use reports to summarize trends for policymakers are challenged to absorb and reason over large swaths of reports in timely ways. To reduce this burden, there is an increased interest in knowledge management technologies to create explicitly structured representations of document data (e.g., knowledge graphs, semantic web); these allow data-driven inferences across documents, entities, objects, and properties nested within these structures. Beyond this, commonly curated and archived knowledge bases allow for interoperability across analysts and purposes.

Capability Gaps

Knowledge graphs impose explicit structure to data, exposing a wealth of information that is traversable. However, in analytical use-cases these large information structures are constructed by data scientists and endogenous information (what can be extracted by the data itself). Knowledge graphs for analytical purposes rarely have mechanisms to capitalize on user knowledge; domain expertise that may be expressed in how users search for information. This can make knowledge graphs harder for users to adopt and it presents a lost opportunity--commercial industry uses similar approaches for augmenting how they index and organize content based on how users explore and interact with that content. Streaming platforms and eCommerce sites have adopted 'Recommender Systems' in order to accomplish this and in some cases Knowledge Graphs. Modern Recommender Systems consume user data to model users and continuously retrain their model against predictions about user content to service delivery. Analogously, methods like this might improve analytical use-cases with knowledge graphs in restructuring graphs to reflect domain knowledge and enhance how analysts discover information, as well as work together. However, there remain technical gaps owing to the difference between content/product use-cases and analytical that make said approaches more difficult to implement in the latter than the former. Notably, streaming platforms and eCommerce sites structure their user interfaces to capitalize on collecting user data. In contract, analytical applications are structured to facilitate information visualization and synthesis, which can make collecting user data more difficult and obscure how to use the data that one can collect.

Objective

2022 ARLIS RISC interns will develop a series of examples and libraries that fill gaps in the ability to collect user data from analytical applications for the purpose of learning how better to structure underlying data. Interns will utilize existing open-source projects for user logging used in analytical settings such as Apache Flagon UserALE.js and analytical libraries used to process this data, such as Apache Flagon Distill, see docs). Other sources of user data will also be explored such as queries and filter usage. Interns will also have access to relevant analytical use cases and data, such as the AIID. Additionally, open-source dashboards like Apache Superset will provide working examples of analytical tools from which to collect user data. Interns will use these resources to bootstrap development of resources to collect data from analytical tool features that reflect user domain knowledge that may in turn supplement structured knowledge. Finally, interns may perform trade studies of pertinent literature on Recommender Systems and Active Learning behaviral visualization examples that can be stand-alone, organized into a workflow (e.g., dashboard), or establish how stand-alone visualizations might be used in research/analytic workflows for policy related decisions. Each example should establish how it supports the above reasoning types. Interns are encouraged to use public, robust, open-sources to their fullest extent, and encouraged to explore multiple languages and frameworks in implementing examples. In order to keep development centered on timely challenges, Project GA will focus on a Responsible AI use-case, using public data from the Artificial Intelligence Incident Database (AIID). The AIID contains a corpus of news reports regarding incidents involving in AI that result in societal harm, broadly defined. This is a timely challenge for policy makers in understanding how AI should be governed within supply chain, utilities, media, and governments themselves.

Approach

2022 ARLIS RISC Students will work with existing open-source libraries to develop examples for how to harvest, from analytical applications, user data that might be used to structure underlying information. These libraries will be immediately valuable to ongoing work at ARLIS against knowledge management for AI incidents, but, ideally, generalizable to other related use-cases. This includes a growing corpus of news reports characterizing AI Incidents across the globe (see AIID). To bootstrap and expedite experimentation Project GA will utilize existing open source behavioral logging software mentioned above. ARLIS RISC Interns will:

  • Utilize open-source utilities
  • Existing data available open source related to Artificial Intelligence Incidents
  • Experiment with importing AIID into Graph Databases to serve client logging
  • Utilize existing open source analytical visualizations to understand contextually-relevant user behavior
  • Where available use existing knowledge graphs in the public domain for requirement development

Assumptions:

  1. 2022 ARLIS RISC interns will use existing FOSS libraries (e.g., Apache Flagon UserALE.js, such as Apache Flagon Distill, Apache Superset
  2. 2022 ARLIS RISC interns will have direct access to the AIID data through public repositories
  3. 2022 ARLIS RISC interns will have access to a working taxonomy for AI Incidents (i.e., CSET).

Tasks:

...

Task Description Accountable Manager
name descr. githubId, `
Set Up Prototype Environment get Superset, ELK, UserALE.js functional for development
Extract SVG metadata for logs extract SVG data from map into a custom or existing log

Milestones:

Task Description Entrance Exit Date

References:

  1. LineUp Example
  2. [Plotly Dash Example](https://github.com/sindresorhus/awesome#readme
  3. [Superset Example](https://github.com/UMD-ARLIS/superset()
  4. SVG attributes