Skip to content

Implementation of "Temporal Explanations for Explainable Reinforcement Learning"

Notifications You must be signed in to change notification settings

pseudo-rnd-thoughts/temporal-explanations-4-drl

Repository files navigation

Temporal Explanations for Deep Reinforcement Learning

Despite significant progress in deep reinforcement learning across a range of environments, there are still limited tools to understand why agents make decisions. In particular, we consider how certain actions enable an agent to collect rewards or achieve its goals. Understanding this temporal context for actions is critical to explaining an agent’s choices. To date, however, little research has explored such explanations, and those that do depend on domain knowledge. We address this by developing three novel types of local temporal explanations, two of which do not require domain knowledge, and two novel metrics to evaluate agent skills. We conduct a comprehensive user survey of our explanations against two state-of-the-art local non-temporal explanations for Atari environments and find that our explanations are preferred by users 80.7% of the time over the state-of-the-art explanations.

Example Explanations

The video below is an example contrastive questions from the user survey conducted with an observation from breakout, our novel Plan explanation and on the right a perturbation-based saliency map. All observation / explanations used in the user survey are contained in user-survey along with the survey results and analysis.

contrastive-18.mp4

Click on the following dropdowns to see more examples with all the evaluated explanations mechanisms (Dataset Similarity Explanation, Plan Explanation, Grad-CAM and Perturbation-based Saliency Map).

Example observation for Breakout Breakout observation
Dataset Similarity Explanation
dataset-similarity-explanation.mp4
Skill Explanation
skill-explanation.mp4
Plan Explanation
plan-explanation.mp4
Grad-CAM Explanation Grad-CAM explanation
Perturbation-based Saliency Maps Perturbation-based Saliency map
Example observation for Space Invaders Space Invader observation
Dataset Similarity Explanation
dataset-similarity-explanation.mp4
Skill Explanation
skill-explanation.mp4
Plan Explanation
plan-explanation.mp4
Grad-CAM Explanation Grad-CAM explanation
Perturbation-based Saliency Maps Perturbation-based Saliency map
Example observation for Seaquest Seaquest observation
Dataset Similarity Explanation
dataset-similarity-explanation.mp4
Skill Explanation
skill-explanation.mp4
Plan Explanation
plan-explanation.mp4
Grad-CAM Explanation Grad-CAM explanation
Perturbation-based Saliency Maps Perturbation-based Saliency map

User Survey results

Figure 4 in the paper presenting the user ratings for each explanation mechanism across four different questions.

User rating

Figure 5 in the paper presenting a heatmap of the user preference for each question and between each explanation mechanism. Each grid element is equal to the percentage that the row explanation mechanism was preferred over the column explanation mechanism.

Comparative Rating

All observation explanations shown to the users are provided in user-survey with the raw survey data and analysis notebook.

Code

Python requirements can be found in requirement.txt and installed with pip install -r requirements.txt. Additionally, to use the project might require installing temporal_explanations_4_drl using pip install -e . in the root directory, no pypi exists currently.

To understand the project structure, we have outlined the purpose of the most important files.

  • temporal_explanations_4_drl/explain.py - Explanation code for all of our novel explanation, code to save the explanations with the relevant observation (both individually and to compare) along with implementations of Grad-CAM and Perturbation-based Saliency Maps.
  • temporal_explanations_4_drl/skill.py - Skill instance class and skill alignment and distribution metric implementations
  • temporal_explanations_4_drl/plan.py - Plan class with methods for computing several metrics across all skills and each skill individually
  • temporal_explanations_4_drl/graying_the_black_box.py - Implementation of Zahavy et al., 2016 "Graying the black box: Understanding DQNs"
  • datasets/annotate-domain-knowledge.py - A command line based python script to load pre-defined skills for a set of episode and provide text-based explanations of the purpose for each skill.
  • datasets/hand-label-skills.py - A command line based python script to hand-label skilled for individual episodes, each observation can be provided an individual skill number between 0 and 9
  • datasets/generate_datasets.py - A python script to generate datasets for a several environments with options for the size, agent types, etc
  • datasets/discover_skills.py - A python script using pre-generated datasets to discover agent skills using the algorithm proposed by Zahavy et al., 2016
<style> details.inner { margin: 3%; } </style>

About

Implementation of "Temporal Explanations for Explainable Reinforcement Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published