Skip to content

Commit

Permalink
Miscellaneous fixes (#448)
Browse files Browse the repository at this point in the history
  • Loading branch information
dniku authored Aug 4, 2020
1 parent cdfbaa6 commit 3cd62f6
Show file tree
Hide file tree
Showing 18 changed files with 41 additions and 39 deletions.
2 changes: 1 addition & 1 deletion week01_intro/project_starter_evolution_strategies.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tips on atari games\n",
"### Tips on Atari games\n",
"* There's all the pre-processing and tuning done for you in the code below\n",
" * Images rescaled to 42x42 to speed up computation\n",
" * We use last 4 frames as observations to account for ball velocity\n",
Expand Down
2 changes: 1 addition & 1 deletion week01_intro/seminar_gym_interface.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -181,8 +181,8 @@
" # Draw game image on display.\n",
" plt.imshow(env.render('rgb_array'))\n",
" \n",
" display.clear_output(wait=True)\n",
" display.display(plt.gcf())\n",
" display.clear_output(wait=True)\n",
"\n",
" if done:\n",
" print(\"Well done!\")\n",
Expand Down
10 changes: 5 additions & 5 deletions week03_model_free/homework.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -388,9 +388,9 @@
"Here are some of the things you can do if you feel like it:\n",
"\n",
"* Play with epsilon. See learned how policies change if you set epsilon to higher/lower values (e.g. 0.75).\n",
"* Expected Value SASRSA for softmax policy __(2pts)__:\n",
"$$ \\pi(a_i|s) = softmax({Q(s,a_i) \\over \\tau}) = {e ^ {Q(s,a_i)/ \\tau} \\over {\\sum_{a_j} e ^{Q(s,a_j) / \\tau }}} $$\n",
"* Implement N-step algorithms and TD($\\lambda$): see [Sutton's book](http://incompleteideas.net/book/bookdraft2018jan1.pdf) chapter 7 and chapter 12.\n",
"* Expected Value SARSA for softmax policy __(2pts)__:\n",
"$$ \\pi(a_i \\mid s) = \\operatorname{softmax} \\left( \\left\\{ {Q(s, a_j) \\over \\tau} \\right\\}_{j=1}^n \\right)_i = {\\operatorname{exp} \\left( Q(s,a_i) / \\tau \\right) \\over {\\sum_{j} \\operatorname{exp} \\left( Q(s,a_j) / \\tau \\right)}} $$\n",
"* Implement N-step algorithms and TD($\\lambda$): see [Sutton's book](http://incompleteideas.net/book/RLbook2020.pdf) chapter 7 and chapter 12.\n",
"* Use those algorithms to train on CartPole in previous / next assignment for this week."
]
},
Expand Down Expand Up @@ -704,8 +704,8 @@
"### Bonus I: TD($\\lambda$) (5+ points)\n",
"\n",
"There's a number of advanced algorithms you can find in week 3 materials (Silver lecture II and/or reading about eligibility traces). One such algorithm is TD(lambda), which is based on the idea of eligibility traces. You can also view it as a combination of N-step updates for alll N.\n",
"* N-step temporal difference from Sutton's book - [url](http://incompleteideas.net/book/the-book-2nd.html), page 142 / chapter 7 \n",
"* Eligibility traces from Sutton's book - same url, chapter 12 / page 278\n",
"* N-step temporal difference from Sutton's book - [url](http://incompleteideas.net/book/the-book-2nd.html), Chapter 7 (page 142 in the 2020 edition)\n",
"* Eligibility traces from Sutton's book - same url, Chapter 12 (page 287)\n",
"* Blog post on eligibility traces - [url](http://pierrelucbacon.com/traces/)\n",
"\n",
"Here's a practical algorithm you can start with: [url](https://stackoverflow.com/questions/40862578/how-to-understand-watkinss-q%CE%BB-learning-algorithm-in-suttonbartos-rl-book/40892302)\n",
Expand Down
2 changes: 1 addition & 1 deletion week04_[recap]_deep_learning/seminar_pytorch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"import sys, os\n",
"if 'google.colab' in sys.modules and not os.path.exists('.setup_complete'):\n",
" !wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/spring20/week04_%5Brecap%5D_deep_learning/notmnist.py\n",
"\n",
Expand Down
6 changes: 3 additions & 3 deletions week04_approx_rl/homework_lasagne.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
"source": [
"# Processing game image (2 pts)\n",
"\n",
"Raw atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
"Raw Atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
"\n",
"We can thus save a lot of time by preprocessing game image, including\n",
"* Resizing to a smaller shape\n",
Expand Down Expand Up @@ -271,7 +271,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create and manage a pool of atari sessions to play with\n",
"# Create and manage a pool of Atari sessions to play with\n",
"\n",
"* To make training more stable, we shall have an entire batch of game sessions each happening independent of others\n",
"* Why several parallel agents help training: http://arxiv.org/pdf/1602.01783v1.pdf\n",
Expand Down Expand Up @@ -627,7 +627,7 @@
"\n",
"You will need to implement loading weights from original network to target network.\n",
"\n",
"We recommend thoroughly debugging your code on simple tests before applying it in atari dqn.\n",
"We recommend thoroughly debugging your code on simple tests before applying it in Atari dqn.\n",
"\n",
"__2)__ Use pre-build functionality from [here](http://agentnet.readthedocs.io/en/master/modules/target_network.html)\n",
"\n",
Expand Down
3 changes: 2 additions & 1 deletion week04_approx_rl/homework_pytorch_debug.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,8 @@
"source": [
"import gym\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
Expand Down
11 changes: 6 additions & 5 deletions week04_approx_rl/homework_pytorch_main.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,8 @@
"source": [
"import gym\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
Expand All @@ -92,7 +93,7 @@
"### Let's play some old videogames\n",
"![img](https://github.com/yandexdataschool/Practical_RL/raw/master/yet_another_week/_resource/nerd.png)\n",
"\n",
"This time we're gonna apply approximate q-learning to an atari game called Breakout. It's not the hardest thing out there, but it's definitely way more complex than anything we tried before.\n"
"This time we're gonna apply approximate q-learning to an Atari game called Breakout. It's not the hardest thing out there, but it's definitely way more complex than anything we tried before.\n"
]
},
{
Expand Down Expand Up @@ -167,7 +168,7 @@
"metadata": {},
"outputs": [],
"source": [
"# # does not work in colab.\n",
"# # does not work in Colab.\n",
"# # make keyboard interrupt to continue\n",
"\n",
"# from gym.utils.play import play\n",
Expand All @@ -181,7 +182,7 @@
"source": [
"### Processing game image \n",
"\n",
"Raw atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
"Raw Atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
"\n",
"We can thus save a lot of time by preprocessing game image, including\n",
"* Resizing to a smaller shape, 64 x 64\n",
Expand Down Expand Up @@ -335,7 +336,7 @@
"metadata": {},
"outputs": [],
"source": [
"# # does not work in colab.\n",
"# # does not work in Colab.\n",
"# # make keyboard interrupt to continue\n",
"\n",
"# from gym.utils.play import play\n",
Expand Down
6 changes: 3 additions & 3 deletions week04_approx_rl/homework_tf.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
"### Let's play some old videogames\n",
"![img](https://github.com/yandexdataschool/Practical_RL/raw/master/yet_another_week/_resource/nerd.png)\n",
"\n",
"This time we're gonna apply approximate q-learning to an atari game called Breakout. It's not the hardest thing out there, but it's definitely way more complex than anything we tried before.\n"
"This time we're gonna apply approximate q-learning to an Atari game called Breakout. It's not the hardest thing out there, but it's definitely way more complex than anything we tried before.\n"
]
},
{
Expand All @@ -70,7 +70,7 @@
"source": [
"### Processing game image \n",
"\n",
"Raw atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
"Raw Atari images are large, 210x160x3 by default. However, we don't need that level of detail in order to learn them.\n",
"\n",
"We can thus save a lot of time by preprocessing game image, including\n",
"* Resizing to a smaller shape, 64 x 64\n",
Expand Down Expand Up @@ -770,7 +770,7 @@
"\n",
"To do that you should use TensorFlow functionality. \n",
"\n",
"We recommend thoroughly debugging your code on simple tests before applying it in atari dqn."
"We recommend thoroughly debugging your code on simple tests before applying it in Atari dqn."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion week04_approx_rl/seminar_tf.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@
" recap: with p = epsilon pick random action, else pick action with highest Q(s,a)\n",
" \"\"\"\n",
" \n",
" q_values = network.predict(state[None])[0]\n",
" q_values = network(state[None])[0]\n",
" \n",
" <YOUR CODE>\n",
"\n",
Expand Down
6 changes: 3 additions & 3 deletions week06_policy_based/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
* Our [lecture](https://yadi.sk/i/yPIPkO_f3TPsNK), [seminar(pytorch)](https://yadi.sk/i/flW8ezGk3TPsQ5), [seminar(theano)](https://yadi.sk/i/8f9NX_E73GKBkT)
* Alternative lecture by J. Schulman part 1 - [video](https://www.youtube.com/watch?v=BB-BhTn6DCM)
* Alternative lecture by J. Schulman part 2 - [video](https://www.youtube.com/watch?v=Wnl-Qh2UHGg)
* Andrej Karpathy's [post](http://karpathy.github.io/2016/05/31/rl/) on policy gradients
* Andrej Karpathy's [post](http://karpathy.github.io/2016/05/31/rl/) on policy gradients


## More materials
Expand All @@ -19,6 +19,6 @@
* Adversarial review of policy gradient - [blog](http://www.argmin.net/2018/02/20/reinforce/)


Run seminar notebook in colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/master/week06_policy_based/reinforce_pytorch.ipynb)
Run seminar notebook in Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/master/week06_policy_based/reinforce_pytorch.ipynb)

Run optional homework notebook in colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/master/week06_policy_based/a2c-optional.ipynb)
Run optional homework notebook in Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/master/week06_policy_based/a2c-optional.ipynb)
4 changes: 2 additions & 2 deletions week06_policy_based/atari_wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ def __init__(self, env):
if (isinstance(env.unwrapped, atari.AtariEnv) and
"NoFrameskip" not in env.spec.id):
raise ValueError(
"MaxBetweenFrames requires NoFrameskip in atari env id")
"MaxBetweenFrames requires NoFrameskip in Atari env id")
super(MaxBetweenFrames, self).__init__(env)
self.last_obs = None

Expand Down Expand Up @@ -182,7 +182,7 @@ def __init__(self, env, nskip=4):
super(SkipFrames, self).__init__(env)
if (isinstance(env.unwrapped, atari.AtariEnv) and
"NoFrameskip" not in env.spec.id):
raise ValueError("SkipFrames requires NoFrameskip in atari env id")
raise ValueError("SkipFrames requires NoFrameskip in Atari env id")
self.nskip = nskip

def step(self, action):
Expand Down
2 changes: 1 addition & 1 deletion week06_policy_based/reinforce_pytorch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A caveat: we have received reports that the following cell may crash with `NameError: name 'base' is not defined`. The [suggested workaround](https://www.coursera.org/learn/practical-rl/discussions/all/threads/N2Pw652iEemRYQ6W2GuqHg/replies/te3HpQwOQ62tx6UMDoOt2Q/comments/o08gTqelT9KPIE6npX_S3A) is to install `gym==0.14.0` and `pyglet==1.3.2`."
"A caveat: with some versions of `pyglet`, the following cell may crash with `NameError: name 'base' is not defined`. The corresponding bug report is [here](https://github.com/pyglet/pyglet/issues/134). If you see this error, try restarting the kernel."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion week06_policy_based/reinforce_tensorflow.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A caveat: we have received reports that the following cell may crash with `NameError: name 'base' is not defined`. The [suggested workaround](https://www.coursera.org/learn/practical-rl/discussions/all/threads/N2Pw652iEemRYQ6W2GuqHg/replies/te3HpQwOQ62tx6UMDoOt2Q/comments/o08gTqelT9KPIE6npX_S3A) is to install `gym==0.14.0` and `pyglet==1.3.2`."
"A caveat: with some versions of `pyglet`, the following cell may crash with `NameError: name 'base' is not defined`. The corresponding bug report is [here](https://github.com/pyglet/pyglet/issues/134). If you see this error, try restarting the kernel."
]
},
{
Expand Down
6 changes: 3 additions & 3 deletions week08_pomdp/env_pool.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
A thin wrapper for openAI gym environments that maintains a set of parallel games and has a method to generate
A thin wrapper for OpenAI gym environments that maintains a set of parallel games and has a method to generate
interaction sessions given agent one-step applier function.
"""

Expand All @@ -19,7 +19,7 @@ def __init__(self, agent, make_env, n_parallel_games=1):
:param n_games: Number of parallel games. One game by default.
:param max_size: Max pool size by default (if appending sessions). By default, pool is not constrained in size.
"""
# Create atari games.
# Create Atari games.
self.agent = agent
self.make_env = make_env
self.envs = [self.make_env() for _ in range(n_parallel_games)]
Expand All @@ -35,7 +35,7 @@ def __init__(self, agent, make_env, n_parallel_games=1):
self.just_ended = [False] * len(self.envs)

def interact(self, n_steps=100, verbose=False):
"""Generate interaction sessions with ataries (openAI gym atari environments)
"""Generate interaction sessions with ataries (OpenAI gym Atari environments)
Sessions will have length n_steps. Each time one of games is finished, it is immediately getting reset
and this time is recorded in is_alive_log (See returned values).
Expand Down
6 changes: 3 additions & 3 deletions week08_pomdp/homework_common_part2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"\n",
"Choose a partially-observable environment for experimentation out of [atari](https://gym.openai.com/envs#atari), [doom](https://gym.openai.com/envs#doom) or [pygame](https://gym.openai.com/envs#pygame) catalogue (if you really want to try some other pomdp, feel free to proceed at your own risk).\n",
"\n",
"Not all atari environements are bug free and these minor bugs can hurt learning performance. \n",
"Not all Atari environements are bug free and these minor bugs can hurt learning performance. \n",
"We recommend to pick one of those:\n",
"* [Assault-v0](https://gym.openai.com/envs/Assault-v0) \n",
"* [DoomDefendCenter-v0](https://gym.openai.com/envs/DoomDefendCenter-v0) (use env code from [this](https://github.com/yandexdataschool/Practical_RL/blob/master/week4/Seminar4.2_conv_agent.ipynb) notebook)\n",
Expand Down Expand Up @@ -65,7 +65,7 @@
" * ```from gym.wrappers import SkipWrapper```\n",
" * ```env = SkipWrapper(how_many_frames_to_skip)(your_env)``` in your make_env\n",
" \n",
" * For atari only, consider __training__ on deterministic version of environment\n",
" * For Atari only, consider __training__ on deterministic version of environment\n",
" * Works by appending Deterministic to env name: `AssaultDeterministic-v0`, `KungFuMasterDeterministic-v0`\n",
" * Expect faster training due to less variance.\n",
" * You still need to __switch back to normal env for evaluation__ (there's no leaderbord for deterministic envs)\n",
Expand Down Expand Up @@ -291,7 +291,7 @@
"plt.xticks(np.arange(len(game_names)), np.array(\n",
" game_names)[idxs], rotation='vertical')\n",
"plt.grid()\n",
"plt.title(\"Comparison A3C on atari games: with and without LSTM memory\")\n",
"plt.title(\"Comparison A3C on Atari games: with and without LSTM memory\")\n",
"plt.ylabel(\"Difference between A3C_LSTM and A3C_FeadForward scores\")"
]
}
Expand Down
4 changes: 2 additions & 2 deletions week08_pomdp/practice_pytorch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"source": [
"### Kung-Fu, recurrent style\n",
"\n",
"In this notebook we'll once again train RL agent for for atari [KungFuMaster](https://gym.openai.com/envs/KungFuMaster-v0/), this time using recurrent neural networks.\n",
"In this notebook we'll once again train RL agent for for Atari [KungFuMaster](https://gym.openai.com/envs/KungFuMaster-v0/), this time using recurrent neural networks.\n",
"\n",
"![img](https://upload.wikimedia.org/wikipedia/en/6/66/Kung_fu_master_mame.png)"
]
Expand Down Expand Up @@ -137,7 +137,7 @@
"source": [
"### POMDP setting\n",
"\n",
"The atari game we're working with is actually a POMDP: your agent needs to know timing at which enemies spawn and move, but cannot do so unless it has some memory. \n",
"The Atari game we're working with is actually a POMDP: your agent needs to know timing at which enemies spawn and move, but cannot do so unless it has some memory. \n",
"\n",
"Let's design another agent that has a recurrent neural net memory to solve this. Here's a sketch.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion week08_pomdp/practice_tensorflow.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -564,7 +564,7 @@
"source": [
"### POMDP setting\n",
"\n",
"The atari game we're working with is actually a POMDP: your agent needs to know timing at which enemies spawn and move, but cannot do so unless it has some memory. \n",
"The Atari game we're working with is actually a POMDP: your agent needs to know timing at which enemies spawn and move, but cannot do so unless it has some memory. \n",
"\n",
"Let's design another agent that has a recurrent neural net memory to solve this.\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions week10_planning/seminar_MCTS.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -594,7 +594,7 @@
" assert not root.is_leaf(), \\\n",
" \"We ran out of tree! Need more planning! Try growing the tree right inside the loop.\"\n",
"\n",
" # you may want to expand tree here\n",
" # You may want to run more planning here\n",
" # <YOUR CODE>"
]
},
Expand Down Expand Up @@ -624,7 +624,7 @@
"\n",
"\"Build this\" assignment\n",
"\n",
"Apply MCTS to play atari games. In particular, let's start with ```gym.make(\"MsPacman-ramDeterministic-v0\")```.\n",
"Apply MCTS to play Atari games. In particular, let's start with ```gym.make(\"MsPacman-ramDeterministic-v0\")```.\n",
"\n",
"This requires two things:\n",
"* Slightly modify WithSnapshots wrapper to work with atari.\n",
Expand Down

1 comment on commit 3cd62f6

@review-notebook-app
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Jupyter notebook diffs for this commit on  ReviewNB


Powered by ReviewNB

Please sign in to comment.