Despite significant progress in deep reinforcement learning across a range of environments, there are still limited tools to understand why agents make decisions. In particular, we consider how certain actions enable an agent to collect rewards or achieve its goals. Understanding this temporal context for actions is critical to explaining an agent’s choices. To date, however, little research has explored such explanations, and those that do depend on domain knowledge. We address this by developing three novel types of local temporal explanations, two of which do not require domain knowledge, and two novel metrics to evaluate agent skills. We conduct a comprehensive user survey of our explanations against two state-of-the-art local non-temporal explanations for Atari environments and find that our explanations are preferred by users 80.7% of the time over the state-of-the-art explanations.
The video below is an example contrastive questions from the user survey conducted with an observation from breakout, our novel Plan explanation and on the right a perturbation-based saliency map. All observation / explanations used in the user survey are contained in user-survey along with the survey results and analysis.
contrastive-18.mp4
Click on the following dropdowns to see more examples with all the evaluated explanations mechanisms (Dataset Similarity Explanation, Plan Explanation, Grad-CAM and Perturbation-based Saliency Map).
Example observation for Breakout
Dataset Similarity Explanation
dataset-similarity-explanation.mp4
Skill Explanation
skill-explanation.mp4
Plan Explanation
plan-explanation.mp4
Example observation for Space Invaders
Dataset Similarity Explanation
dataset-similarity-explanation.mp4
Skill Explanation
skill-explanation.mp4
Plan Explanation
plan-explanation.mp4
Example observation for Seaquest
Dataset Similarity Explanation
dataset-similarity-explanation.mp4
Skill Explanation
skill-explanation.mp4
Plan Explanation
plan-explanation.mp4
Figure 4 in the paper presenting the user ratings for each explanation mechanism across four different questions.
Figure 5 in the paper presenting a heatmap of the user preference for each question and between each explanation mechanism. Each grid element is equal to the percentage that the row explanation mechanism was preferred over the column explanation mechanism.
All observation explanations shown to the users are provided in user-survey with the raw survey data and analysis notebook.
Python requirements can be found in requirement.txt and installed with pip install -r requirements.txt
. Additionally, to use the project might require installing temporal_explanations_4_drl
using pip install -e .
in the root directory, no pypi exists currently.
To understand the project structure, we have outlined the purpose of the most important files.
temporal_explanations_4_drl/explain.py
- Explanation code for all of our novel explanation, code to save the explanations with the relevant observation (both individually and to compare) along with implementations of Grad-CAM and Perturbation-based Saliency Maps.temporal_explanations_4_drl/skill.py
- Skill instance class and skill alignment and distribution metric implementationstemporal_explanations_4_drl/plan.py
- Plan class with methods for computing several metrics across all skills and each skill individuallytemporal_explanations_4_drl/graying_the_black_box.py
- Implementation of Zahavy et al., 2016 "Graying the black box: Understanding DQNs"datasets/annotate-domain-knowledge.py
- A command line based python script to load pre-defined skills for a set of episode and provide text-based explanations of the purpose for each skill.datasets/hand-label-skills.py
- A command line based python script to hand-label skilled for individual episodes, each observation can be provided an individual skill number between 0 and 9datasets/generate_datasets.py
- A python script to generate datasets for a several environments with options for the size, agent types, etcdatasets/discover_skills.py
- A python script using pre-generated datasets to discover agent skills using the algorithm proposed by Zahavy et al., 2016