Introduce interactive policies to gather data from a user #776

michalzajac-ml · 2023-09-06T19:46:14Z

Description

This PR introduces interactive policies that query the user for actions, as requested in #701.
Such policies can be used e.g. in Behavioral cloning or DAgger.
An example showing the use for Atari is included.
Acknowledgement: tests were heavily based on the ones from #768 by @jas-ho.

Testing

pytest tests/policies/test_interactive.py to run unit tests.
python examples/train_dagger_atari_interactive_policy.py to run the interactive demo.

…rade to 0.2.

tests/policies/test_interactive.py

src/imitation/policies/interactive.py

AdamGleave · 2023-09-07T02:44:18Z

src/imitation/policies/interactive.py

+        assert isinstance(action_space, gym.spaces.Discrete)
+        assert len(action_names) == len(action_keys) == action_space.n
+        # Names and keys should be unique.
+        assert len(set(action_names)) == len(set(action_keys)) == action_space.n


Why do we have these both as sequences rather than a dictionary mapping from action key to action name (or vice-versa)? This would enforce the length the same, and uniqueness on the keys, so one would only need to check that len(the_dict) == action_space.n and that the values (action names) are unique.

I guess we would need an ordered dictionary so perhaps that's a reason against, although all dictionaries are ordered in Python since 3.6.

Yeah, I did this because of ordering, and had doubts about enforcing OrderedDict type, but now I think I agree it's more elegant so changing into OrderedDict.

src/imitation/policies/interactive.py

tests/policies/test_interactive.py

AdamGleave · 2023-09-07T03:00:39Z

examples/train_dagger_atari_interactive_policy.py

+    env.seed(0)
+
+    action_names = env.envs[0].get_action_meanings()
+    names_to_keys = {


There's only a small finite number of legal actions in Atari, so we could define a more comprehensive version of these in a constant somewhere (or even subclass ImageObsDiscreteInteractivePolicy to handle this directly) rather than it having to live in an example.

Done, thanks for suggestion!

examples/train_dagger_atari_interactive_policy.py

src/imitation/policies/interactive.py

Co-authored-by: Jason Hoelscher-Obermaier <jas-ho@users.noreply.github.com>

michalzajac-ml · 2023-09-07T15:43:07Z

tests/test_examples.py

@@ -30,9 +30,12 @@ def _paths_to_strs(x: Iterable[pathlib.Path]) -> Sequence[str]:
 EXAMPLES_DIR = THIS_DIR / ".." / "examples"
 TUTORIALS_DIR = THIS_DIR / ".." / "docs" / "tutorials"

-SH_PATHS = _paths_to_strs(EXAMPLES_DIR.glob("*.sh"))
+EXCLUDED_EXAMPLE_FILES = ["train_dagger_atari_interactive_policy.py"]


Probably one could think about an alternative where we mock parts of the example script etc. However, it does not seem to be super useful, since we have unit tests that check analogous things that this mocked version would check.

AdamGleave

Thanks for this PR! Looks nearly ready -- only major sugestion is to add some more tests (covering AtariInteractivePolicy), others are pretty minor suggestions.

tests/test_examples.py

AdamGleave · 2023-09-07T22:28:29Z

examples/train_dagger_atari_interactive_policy.py

@@ -0,0 +1,41 @@
+"""Training DAgger with an interactive policy that queries the user for actions.


If a matplotlib GUI backend isn't installed, it'll fail with a somewhat cryptic error:

fig.show()

and indeed no figure displays.

Installing the relevant backend seems out-of-scope for this project. But might want to check if the backend is interactive (I think plt.isinteractive() checks for this) and warn if not with link to relevant docs e.g. https://matplotlib.org/stable/users/explain/backends.html

Could you please check again what message you are getting and if this is an error or a warning? In my case, when I set a non-GUI backend like Agg, I get a warning like this (and the execution continues):

UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. fig.show()

Actually, plt.isinteractive() checks for the interactive mode; GUI backends like MacOsX can be in both modes (and we do not need to turn on interactive for it to work). What we actually would like is to check if the backend is "GUI" or "non-GUI" but from a simple search, it does not seem like there is a nice way to do it (rather than check with some white-list of backends). Given that, and the fact that the message I listed above is not that bad, I'd keep this as-is for now. Alternatively, we could opt for throwing an error/assert instead of warning, but again this would require a white-list of backends. WDYT?

For additional context: on my laptop, the example runs nicely although the mode is not interactive by default.

src/imitation/policies/interactive.py

AdamGleave · 2023-09-07T22:32:13Z

src/imitation/policies/interactive.py

-        self.action_key_to_index = {k: i for i, k in enumerate(action_keys)}
+        self.action_keys_names = action_keys_names
+        self.action_key_to_index = {
+            k: i for i, k in enumerate(action_keys_names.keys())


Iterating over a dict gives you the keys by default (you can leave as-is if you want to be explicit about it)

Suggested change

k: i for i, k in enumerate(action_keys_names.keys())

k: i for i, k in enumerate(action_keys_names)

Yeah, in this case I slightly prefer to keep it.

src/imitation/policies/interactive.py

AdamGleave · 2023-09-07T22:34:40Z

src/imitation/policies/interactive.py

 import abc
-from typing import Optional, List
+import collections
+import typing


Our style guide allows importing types directly from typing (i.e. from typing import Optional is permissible) although it's not obligatory -- fine to use this style if you prefer. https://google.github.io/styleguide/pyguide.html#2241-exemptions

Thanks, good to know!

src/imitation/policies/interactive.py

tests/policies/test_interactive.py

AdamGleave · 2023-09-07T22:44:59Z

tests/policies/test_interactive.py

+    with mock.patch("builtins.input", mock_input_invalid_then_valid()):
+        interactive_policy.predict(obs)
+        stdout = capsys.readouterr().out
+        assert "Your choice" in stdout and "Invalid" in stdout


Tests for DiscreteInteractivePolicy looks great. We're not testing AtariInteractivePolicy at all though. It's pretty simple granted but might still be worth testing, even if just a simple smoke test (it runs, if we feed in a key corresponding to "FIRE" we get the correct action back, etc).

Co-authored-by: Adam Gleave <adam@gleave.me>

AdamGleave

LGTM

…tibleAI#776) * Pin huggingface_sb3 version. * Properly specify the compatible seals version so it does not auto-upgrade to 0.2. * Make random_mdp test deterministic by seeding the environment. * WIP: Introduce interactive policies to gather data from a user * Addressing remarks from review * fixes * fix types * formatting * Dummy commit to acknowledge co-authorship. Co-authored-by: Jason Hoelscher-Obermaier <jas-ho@users.noreply.github.com> * Exclude interactive example from running during tests * formatting * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Adressing further suggestions from review * formatting * formatting --------- Co-authored-by: Maximilian Ernestus <maximilian@ernestus.de> Co-authored-by: Jason Hoelscher-Obermaier <jas-ho@users.noreply.github.com> Co-authored-by: Adam Gleave <adam@gleave.me>

ernestum and others added 4 commits September 5, 2023 21:32

Pin huggingface_sb3 version.

b8d1616

Properly specify the compatible seals version so it does not auto-upg…

09c5f2f

…rade to 0.2.

Make random_mdp test deterministic by seeding the environment.

4872ceb

WIP: Introduce interactive policies to gather data from a user

b0efd61

michalzajac-ml commented Sep 6, 2023

View reviewed changes

tests/policies/test_interactive.py Show resolved Hide resolved

michalzajac-ml requested a review from AdamGleave September 6, 2023 19:50

AdamGleave reviewed Sep 7, 2023

View reviewed changes

michalzajac-ml added 2 commits September 7, 2023 12:13

Addressing remarks from review

6a9389e

fixes

8cb822a

michalzajac-ml commented Sep 7, 2023

View reviewed changes

src/imitation/policies/interactive.py Show resolved Hide resolved

michalzajac-ml and others added 3 commits September 7, 2023 13:03

fix types

3a1d10f

formatting

f63bf5e

Dummy commit to acknowledge co-authorship.

79749dd

Co-authored-by: Jason Hoelscher-Obermaier <jas-ho@users.noreply.github.com>

michalzajac-ml changed the title ~~WIP: Introduce interactive policies to gather data from a user~~ Introduce interactive policies to gather data from a user Sep 7, 2023

Exclude interactive example from running during tests

f6ebbc2

michalzajac-ml commented Sep 7, 2023

View reviewed changes

formatting

a3864c3

AdamGleave requested changes Sep 7, 2023

View reviewed changes

Base automatically changed from dependency_fixes to master September 7, 2023 22:56

AdamGleave and others added 5 commits September 7, 2023 16:34

Merge branch 'master' into 701-interactive-data

d0e0ecd

Apply suggestions from code review

10936f7

Co-authored-by: Adam Gleave <adam@gleave.me>

Adressing further suggestions from review

73a33da

formatting

b94faf4

formatting

9d93854

AdamGleave approved these changes Sep 8, 2023

View reviewed changes

AdamGleave merged commit f6a4888 into master Sep 8, 2023

AdamGleave deleted the 701-interactive-data branch September 8, 2023 16:20

AdamGleave mentioned this pull request Sep 9, 2023

Use data acquired by users #701

Open

ZiyueWang25 mentioned this pull request Oct 3, 2023

Add rgb observation to obs for interactive policy prediction #795

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce interactive policies to gather data from a user #776

Introduce interactive policies to gather data from a user #776

michalzajac-ml commented Sep 6, 2023 •

edited

Loading

AdamGleave Sep 7, 2023

AdamGleave Sep 7, 2023

michalzajac-ml Sep 7, 2023

AdamGleave Sep 7, 2023

michalzajac-ml Sep 7, 2023 •

edited

Loading

michalzajac-ml Sep 7, 2023

AdamGleave left a comment

AdamGleave Sep 7, 2023

michalzajac-ml Sep 8, 2023

michalzajac-ml Sep 8, 2023

AdamGleave Sep 7, 2023

michalzajac-ml Sep 8, 2023

AdamGleave Sep 7, 2023

michalzajac-ml Sep 8, 2023

AdamGleave Sep 7, 2023

AdamGleave left a comment

		@@ -0,0 +1,41 @@
		"""Training DAgger with an interactive policy that queries the user for actions.

	k: i for i, k in enumerate(action_keys_names.keys())
	k: i for i, k in enumerate(action_keys_names)

Introduce interactive policies to gather data from a user #776

Introduce interactive policies to gather data from a user #776

Conversation

michalzajac-ml commented Sep 6, 2023 • edited Loading

Description

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michalzajac-ml Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

michalzajac-ml commented Sep 6, 2023 •

edited

Loading

michalzajac-ml Sep 7, 2023 •

edited

Loading