Miscellaneous fixes (#448)

yandexdataschool · Aug 4, 2020 · 3cd62f6 · 3cd62f6 · review-notebook-app · Aug 4, 2020
1 parent cdfbaa6
commit 3cd62f6
Show file tree

Hide file tree

Showing 18 changed files with 41 additions and 39 deletions.
diff --git a/week01_intro/project_starter_evolution_strategies.ipynb b/week01_intro/project_starter_evolution_strategies.ipynb
@@ -72,7 +72,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Tips on atari games\n",
+    "### Tips on Atari games\n",
     "* There's all the pre-processing and tuning done for you in the code below\n",
     "    * Images rescaled to 42x42 to speed up computation\n",
     "    * We use last 4 frames as observations to account for ball velocity\n",

diff --git a/week01_intro/seminar_gym_interface.ipynb b/week01_intro/seminar_gym_interface.ipynb
@@ -181,8 +181,8 @@
     "    # Draw game image on display.\n",
     "    plt.imshow(env.render('rgb_array'))\n",
     "    \n",
-    "    display.clear_output(wait=True)\n",
     "    display.display(plt.gcf())\n",
+    "    display.clear_output(wait=True)\n",
     "\n",
     "    if done:\n",
     "        print(\"Well done!\")\n",

diff --git a/week03_model_free/homework.ipynb b/week03_model_free/homework.ipynb
@@ -388,9 +388,9 @@
     "Here are some of the things you can do if you feel like it:\n",
     "\n",
     "* Play with epsilon. See learned how policies change if you set epsilon to higher/lower values (e.g. 0.75).\n",
-    "* Expected Value SASRSA for softmax policy __(2pts)__:\n",
-    "$$ \\pi(a_i|s) = softmax({Q(s,a_i) \\over \\tau}) = {e ^ {Q(s,a_i)/ \\tau}  \\over {\\sum_{a_j}  e ^{Q(s,a_j) / \\tau }}} $$\n",
-    "* Implement N-step algorithms and TD($\\lambda$): see [Sutton's book](http://incompleteideas.net/book/bookdraft2018jan1.pdf) chapter 7 and chapter 12.\n",
+    "* Expected Value SARSA for softmax policy __(2pts)__:\n",
+    "$$ \\pi(a_i \\mid s) = \\operatorname{softmax} \\left( \\left\\{ {Q(s, a_j) \\over \\tau} \\right\\}_{j=1}^n \\right)_i = {\\operatorname{exp} \\left( Q(s,a_i) / \\tau \\right)  \\over {\\sum_{j}  \\operatorname{exp} \\left( Q(s,a_j) / \\tau  \\right)}} $$\n",
+    "* Implement N-step algorithms and TD($\\lambda$): see [Sutton's book](http://incompleteideas.net/book/RLbook2020.pdf) chapter 7 and chapter 12.\n",
     "* Use those algorithms to train on CartPole in previous / next assignment for this week."
    ]
   },
@@ -704,8 +704,8 @@
     "### Bonus I: TD($\\lambda$) (5+ points)\n",
     "\n",
     "There's a number of advanced algorithms you can find in week 3 materials (Silver lecture II and/or reading about eligibility traces). One such algorithm is TD(lambda), which is based on the idea of eligibility traces. You can also view it as a combination of N-step updates for alll N.\n",
-    "* N-step temporal difference from Sutton's book - [url](http://incompleteideas.net/book/the-book-2nd.html), page 142 / chapter 7 \n",
-    "* Eligibility traces from Sutton's book - same url, chapter 12 / page 278\n",
+    "* N-step temporal difference from Sutton's book - [url](http://incompleteideas.net/book/the-book-2nd.html), Chapter 7 (page 142 in the 2020 edition)\n",
+    "* Eligibility traces from Sutton's book - same url, Chapter 12 (page 287)\n",
     "* Blog post on eligibility traces - [url](http://pierrelucbacon.com/traces/)\n",
     "\n",
     "Here's a practical algorithm you can start with: [url](https://stackoverflow.com/questions/40862578/how-to-understand-watkinss-q%CE%BB-learning-algorithm-in-suttonbartos-rl-book/40892302)\n",

diff --git a/week04_[recap]_deep_learning/seminar_pytorch.ipynb b/week04_[recap]_deep_learning/seminar_pytorch.ipynb
@@ -23,7 +23,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import sys\n",
+    "import sys, os\n",
     "if 'google.colab' in sys.modules and not os.path.exists('.setup_complete'):\n",
     "    !wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/spring20/week04_%5Brecap%5D_deep_learning/notmnist.py\n",
     "\n",

diff --git a/week04_approx_rl/homework_lasagne.ipynb b/week04_approx_rl/homework_lasagne.ipynb
@@ -39,7 +39,7 @@
    "source": [
     "# Processing game image (2 pts)\n",
     "\n",
-    "Raw atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
+    "Raw Atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
     "\n",
     "We can thus save a lot of time by preprocessing game image, including\n",
     "* Resizing to a smaller shape\n",
@@ -271,7 +271,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Create and manage a pool of atari sessions to play with\n",
+    "# Create and manage a pool of Atari sessions to play with\n",
     "\n",
     "* To make training more stable, we shall have an entire batch of game sessions each happening independent of others\n",
     "* Why several parallel agents help training: http://arxiv.org/pdf/1602.01783v1.pdf\n",
@@ -627,7 +627,7 @@
     "\n",
     "You will need to implement loading weights from original network to target network.\n",
     "\n",
-    "We recommend thoroughly debugging your code on simple tests before applying it in atari dqn.\n",
+    "We recommend thoroughly debugging your code on simple tests before applying it in Atari dqn.\n",
     "\n",
     "__2)__ Use pre-build functionality from [here](http://agentnet.readthedocs.io/en/master/modules/target_network.html)\n",
     "\n",

diff --git a/week04_approx_rl/homework_pytorch_debug.ipynb b/week04_approx_rl/homework_pytorch_debug.ipynb
@@ -85,7 +85,8 @@
    "source": [
     "import gym\n",
     "import numpy as np\n",
-    "import matplotlib.pyplot as plt"
+    "import matplotlib.pyplot as plt\n",
+    "%matplotlib inline"
    ]
   },
   {

diff --git a/week04_approx_rl/homework_pytorch_main.ipynb b/week04_approx_rl/homework_pytorch_main.ipynb
@@ -82,7 +82,8 @@
    "source": [
     "import gym\n",
     "import numpy as np\n",
-    "import matplotlib.pyplot as plt"
+    "import matplotlib.pyplot as plt\n",
+    "%matplotlib inline"
    ]
   },
   {
@@ -92,7 +93,7 @@
     "### Let's play some old videogames\n",
     "![img](https://github.com/yandexdataschool/Practical_RL/raw/master/yet_another_week/_resource/nerd.png)\n",
     "\n",
-    "This time we're gonna apply approximate q-learning to an atari game called Breakout. It's not the hardest thing out there, but it's definitely way more complex than anything we tried before.\n"
+    "This time we're gonna apply approximate q-learning to an Atari game called Breakout. It's not the hardest thing out there, but it's definitely way more complex than anything we tried before.\n"
    ]
   },
   {
@@ -167,7 +168,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# # does not work in colab.\n",
+    "# # does not work in Colab.\n",
     "# # make keyboard interrupt to continue\n",
     "\n",
     "# from gym.utils.play import play\n",
@@ -181,7 +182,7 @@
    "source": [
     "### Processing game image \n",
     "\n",
-    "Raw atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
+    "Raw Atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
     "\n",
     "We can thus save a lot of time by preprocessing game image, including\n",
     "* Resizing to a smaller shape, 64 x 64\n",
@@ -335,7 +336,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# # does not work in colab.\n",
+    "# # does not work in Colab.\n",
     "# # make keyboard interrupt to continue\n",
     "\n",
     "# from gym.utils.play import play\n",

diff --git a/week04_approx_rl/homework_tf.ipynb b/week04_approx_rl/homework_tf.ipynb
@@ -61,7 +61,7 @@
     "### Let's play some old videogames\n",
     "![img](https://github.com/yandexdataschool/Practical_RL/raw/master/yet_another_week/_resource/nerd.png)\n",
     "\n",
-    "This time we're gonna apply approximate q-learning to an atari game called Breakout. It's not the hardest thing out there, but it's definitely way more complex than anything we tried before.\n"
+    "This time we're gonna apply approximate q-learning to an Atari game called Breakout. It's not the hardest thing out there, but it's definitely way more complex than anything we tried before.\n"
    ]
   },
   {
@@ -70,7 +70,7 @@
    "source": [
     "### Processing game image \n",
     "\n",
-    "Raw atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
+    "Raw Atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
     "\n",
     "We can thus save a lot of time by preprocessing game image, including\n",
     "* Resizing to a smaller shape, 64 x 64\n",
@@ -770,7 +770,7 @@
     "\n",
     "To do that you should use TensorFlow functionality. \n",
     "\n",
-    "We recommend thoroughly debugging your code on simple tests before applying it in atari dqn."
+    "We recommend thoroughly debugging your code on simple tests before applying it in Atari dqn."
    ]
   },
   {

diff --git a/week04_approx_rl/seminar_tf.ipynb b/week04_approx_rl/seminar_tf.ipynb
@@ -123,7 +123,7 @@
     "    recap: with p = epsilon pick random action, else pick action with highest Q(s,a)\n",
     "    \"\"\"\n",
     "    \n",
-    "    q_values = network.predict(state[None])[0]\n",
+    "    q_values = network(state[None])[0]\n",
     "    \n",
     "    <YOUR CODE>\n",
     "\n",

diff --git a/week06_policy_based/README.md b/week06_policy_based/README.md
@@ -4,7 +4,7 @@
 * Our [lecture](https://yadi.sk/i/yPIPkO_f3TPsNK),  [seminar(pytorch)](https://yadi.sk/i/flW8ezGk3TPsQ5), [seminar(theano)](https://yadi.sk/i/8f9NX_E73GKBkT)
 * Alternative lecture by J. Schulman part 1 - [video](https://www.youtube.com/watch?v=BB-BhTn6DCM)
 * Alternative lecture by J. Schulman part 2 - [video](https://www.youtube.com/watch?v=Wnl-Qh2UHGg)
-* Andrej Karpathy's [post](http://karpathy.github.io/2016/05/31/rl/) on policy gradients 
+* Andrej Karpathy's [post](http://karpathy.github.io/2016/05/31/rl/) on policy gradients
 
 
 ## More materials
@@ -19,6 +19,6 @@
 * Adversarial review of policy gradient - [blog](http://www.argmin.net/2018/02/20/reinforce/)
 
 
-Run seminar notebook in colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/master/week06_policy_based/reinforce_pytorch.ipynb)
+Run seminar notebook in Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/master/week06_policy_based/reinforce_pytorch.ipynb)
 
-Run optional homework notebook in colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/master/week06_policy_based/a2c-optional.ipynb)
+Run optional homework notebook in Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/master/week06_policy_based/a2c-optional.ipynb)
diff --git a/week06_policy_based/atari_wrappers.py b/week06_policy_based/atari_wrappers.py
@@ -133,7 +133,7 @@ def __init__(self, env):
         if (isinstance(env.unwrapped, atari.AtariEnv) and
                 "NoFrameskip" not in env.spec.id):
             raise ValueError(
-                "MaxBetweenFrames requires NoFrameskip in atari env id")
+                "MaxBetweenFrames requires NoFrameskip in Atari env id")
         super(MaxBetweenFrames, self).__init__(env)
         self.last_obs = None
 
@@ -182,7 +182,7 @@ def __init__(self, env, nskip=4):
         super(SkipFrames, self).__init__(env)
         if (isinstance(env.unwrapped, atari.AtariEnv) and
                 "NoFrameskip" not in env.spec.id):
-            raise ValueError("SkipFrames requires NoFrameskip in atari env id")
+            raise ValueError("SkipFrames requires NoFrameskip in Atari env id")
         self.nskip = nskip
 
     def step(self, action):

diff --git a/week06_policy_based/reinforce_pytorch.ipynb b/week06_policy_based/reinforce_pytorch.ipynb
@@ -45,7 +45,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A caveat: we have received reports that the following cell may crash with `NameError: name 'base' is not defined`. The [suggested workaround](https://www.coursera.org/learn/practical-rl/discussions/all/threads/N2Pw652iEemRYQ6W2GuqHg/replies/te3HpQwOQ62tx6UMDoOt2Q/comments/o08gTqelT9KPIE6npX_S3A) is to install `gym==0.14.0` and `pyglet==1.3.2`."
+    "A caveat: with some versions of `pyglet`, the following cell may crash with `NameError: name 'base' is not defined`. The corresponding bug report is [here](https://github.com/pyglet/pyglet/issues/134). If you see this error, try restarting the kernel."
    ]
   },
   {

diff --git a/week06_policy_based/reinforce_tensorflow.ipynb b/week06_policy_based/reinforce_tensorflow.ipynb
@@ -48,7 +48,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A caveat: we have received reports that the following cell may crash with `NameError: name 'base' is not defined`. The [suggested workaround](https://www.coursera.org/learn/practical-rl/discussions/all/threads/N2Pw652iEemRYQ6W2GuqHg/replies/te3HpQwOQ62tx6UMDoOt2Q/comments/o08gTqelT9KPIE6npX_S3A) is to install `gym==0.14.0` and `pyglet==1.3.2`."
+    "A caveat: with some versions of `pyglet`, the following cell may crash with `NameError: name 'base' is not defined`. The corresponding bug report is [here](https://github.com/pyglet/pyglet/issues/134). If you see this error, try restarting the kernel."
    ]
   },
   {

diff --git a/week08_pomdp/env_pool.py b/week08_pomdp/env_pool.py
@@ -1,5 +1,5 @@
 """
-A thin wrapper for openAI gym environments that maintains a set of parallel games and has a method to generate
+A thin wrapper for OpenAI gym environments that maintains a set of parallel games and has a method to generate
 interaction sessions given agent one-step applier function.
 """
 
@@ -19,7 +19,7 @@ def __init__(self, agent, make_env, n_parallel_games=1):
         :param n_games: Number of parallel games. One game by default.
         :param max_size: Max pool size by default (if appending sessions). By default, pool is not constrained in size.
         """
-        # Create atari games.
+        # Create Atari games.
         self.agent = agent
         self.make_env = make_env
         self.envs = [self.make_env() for _ in range(n_parallel_games)]
@@ -35,7 +35,7 @@ def __init__(self, agent, make_env, n_parallel_games=1):
         self.just_ended = [False] * len(self.envs)
 
     def interact(self, n_steps=100, verbose=False):
-        """Generate interaction sessions with ataries (openAI gym atari environments)
+        """Generate interaction sessions with ataries (OpenAI gym Atari environments)
         Sessions will have length n_steps. Each time one of games is finished, it is immediately getting reset
         and this time is recorded in is_alive_log (See returned values).
 

diff --git a/week08_pomdp/homework_common_part2.ipynb b/week08_pomdp/homework_common_part2.ipynb
@@ -31,7 +31,7 @@
     "\n",
     "Choose a partially-observable environment for experimentation out of [atari](https://gym.openai.com/envs#atari), [doom](https://gym.openai.com/envs#doom) or [pygame](https://gym.openai.com/envs#pygame) catalogue (if you really want to try some other pomdp, feel free to proceed at your own risk).\n",
     "\n",
-    "Not all atari environements are bug free and these minor bugs can hurt learning performance. \n",
+    "Not all Atari environements are bug free and these minor bugs can hurt learning performance. \n",
     "We recommend to pick one of those:\n",
     "* [Assault-v0](https://gym.openai.com/envs/Assault-v0) \n",
     "* [DoomDefendCenter-v0](https://gym.openai.com/envs/DoomDefendCenter-v0) (use env code from [this](https://github.com/yandexdataschool/Practical_RL/blob/master/week4/Seminar4.2_conv_agent.ipynb) notebook)\n",
@@ -65,7 +65,7 @@
     "   * ```from gym.wrappers import SkipWrapper```\n",
     "   * ```env = SkipWrapper(how_many_frames_to_skip)(your_env)``` in your make_env\n",
     " \n",
-    " * For atari only, consider __training__ on deterministic version of environment\n",
+    " * For Atari only, consider __training__ on deterministic version of environment\n",
     "   * Works by appending Deterministic to env name: `AssaultDeterministic-v0`, `KungFuMasterDeterministic-v0`\n",
     "   * Expect faster training due to less variance.\n",
     "   * You still need to __switch back to normal env for evaluation__ (there's no leaderbord for deterministic envs)\n",
@@ -291,7 +291,7 @@
     "plt.xticks(np.arange(len(game_names)), np.array(\n",
     "    game_names)[idxs], rotation='vertical')\n",
     "plt.grid()\n",
-    "plt.title(\"Comparison A3C on atari games: with and without LSTM memory\")\n",
+    "plt.title(\"Comparison A3C on Atari games: with and without LSTM memory\")\n",
     "plt.ylabel(\"Difference between A3C_LSTM and A3C_FeadForward scores\")"
    ]
   }

diff --git a/week08_pomdp/practice_pytorch.ipynb b/week08_pomdp/practice_pytorch.ipynb
@@ -37,7 +37,7 @@
    "source": [
     "### Kung-Fu, recurrent style\n",
     "\n",
-    "In this notebook we'll once again train RL agent for for atari [KungFuMaster](https://gym.openai.com/envs/KungFuMaster-v0/), this time using recurrent neural networks.\n",
+    "In this notebook we'll once again train RL agent for for Atari [KungFuMaster](https://gym.openai.com/envs/KungFuMaster-v0/), this time using recurrent neural networks.\n",
     "\n",
     "![img](https://upload.wikimedia.org/wikipedia/en/6/66/Kung_fu_master_mame.png)"
    ]
@@ -137,7 +137,7 @@
    "source": [
     "### POMDP setting\n",
     "\n",
-    "The atari game we're working with is actually a POMDP: your agent needs to know timing at which enemies spawn and move, but cannot do so unless it has some memory. \n",
+    "The Atari game we're working with is actually a POMDP: your agent needs to know timing at which enemies spawn and move, but cannot do so unless it has some memory. \n",
     "\n",
     "Let's design another agent that has a recurrent neural net memory to solve this. Here's a sketch.\n",
     "\n",

diff --git a/week08_pomdp/practice_tensorflow.ipynb b/week08_pomdp/practice_tensorflow.ipynb
@@ -564,7 +564,7 @@
    "source": [
     "### POMDP setting\n",
     "\n",
-    "The atari game we're working with is actually a POMDP: your agent needs to know timing at which enemies spawn and move, but cannot do so unless it has some memory. \n",
+    "The Atari game we're working with is actually a POMDP: your agent needs to know timing at which enemies spawn and move, but cannot do so unless it has some memory. \n",
     "\n",
     "Let's design another agent that has a recurrent neural net memory to solve this.\n",
     "\n",

diff --git a/week10_planning/seminar_MCTS.ipynb b/week10_planning/seminar_MCTS.ipynb
@@ -594,7 +594,7 @@
     "    assert not root.is_leaf(), \\\n",
     "        \"We ran out of tree! Need more planning! Try growing the tree right inside the loop.\"\n",
     "\n",
-    "    # you may want to expand tree here\n",
+    "    # You may want to run more planning here\n",
     "    # <YOUR CODE>"
    ]
   },
@@ -624,7 +624,7 @@
     "\n",
     "\"Build this\" assignment\n",
     "\n",
-    "Apply MCTS to play atari games. In particular, let's start with ```gym.make(\"MsPacman-ramDeterministic-v0\")```.\n",
+    "Apply MCTS to play Atari games. In particular, let's start with ```gym.make(\"MsPacman-ramDeterministic-v0\")```.\n",
     "\n",
     "This requires two things:\n",
     "* Slightly modify WithSnapshots wrapper to work with atari.\n",