-
Notifications
You must be signed in to change notification settings - Fork 324
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 9a3a688
Showing
98 changed files
with
20,896 additions
and
0 deletions.
There are no files selected for viewing
111 changes: 111 additions & 0 deletions
111
...forcement Learning/.ipynb_checkpoints/1.1 What is Reinforcement Learning-checkpoint.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## What is Reinforcement Learning?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Consider you are teaching the dog to catch a ball, but you cannot teach the dog explicitly to\n", | ||
"catch a ball, instead, you will just throw a ball, every time the dog catches a ball, you will\n", | ||
"give a cookie. If it fails to catch a dog, you will not give a cookie. So the dog will figure out\n", | ||
"what actions it does that made it receive a cookie and repeat that action." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Similarly in an RL environment, you will not teach the agent what to do or how to do,\n", | ||
"instead, you will give feedback to the agent for each action it does. The feedback may be\n", | ||
"positive (reward) or negative (punishment). The learning system which receives the\n", | ||
"punishment will improve itself. Thus it is a trial and error process. The reinforcement\n", | ||
"learning algorithm retains outputs that maximize the received reward over time. In the\n", | ||
"above analogy, the dog represents the agent, giving a cookie to the dog on catching a ball is\n", | ||
"a reward and not giving a cookie is punishment.\n", | ||
"\n", | ||
"There might be delayed rewards. You may not get a reward at each step. A reward may be\n", | ||
"given only after the completion of the whole task. In some cases, you get a reward at each\n", | ||
"step to find out that whether you are making any mistake.\n", | ||
"\n", | ||
"An RL agent can explore for different actions which might give a good reward or it can\n", | ||
"(exploit) use the previous action which resulted in a good reward. If the RL agent explores\n", | ||
"different actions, there is a great possibility to get a poor reward. If the RL agent exploits\n", | ||
"past action, there is also a great possibility of missing out the best action which might give a\n", | ||
"good reward. There is always a trade-off between exploration and exploitation. We cannot\n", | ||
"perform both exploration and exploitation at the same time. We will discuss exploration exploitation\n", | ||
"dilemma detail in the upcoming chapters.\n", | ||
"\n", | ||
"Say, If you want to teach a robot to walk, without getting stuck by hitting at the mountain,\n", | ||
"you will not explicitly teach the robot not to go in the direction of mountain," | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"![title](images/B09792_01_01.png)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Instead, if the robot hits and get stuck on the mountain you will reduce 10 points so that\n", | ||
"robot will understand that hitting mountain will give it a negative reward so it will not go\n", | ||
"in that direction again." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"![title](images/B09792_01_02.png)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"And you will give 20 points to the robot when it walks in the right direction without getting\n", | ||
"stuck. So robot will understand which is the right path to rewards and try to maximize the\n", | ||
"rewards by going in a right direction." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"source": [ | ||
"![title](images/B09792_01_03.png)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python [conda env:anaconda]", | ||
"language": "python", | ||
"name": "conda-env-anaconda-py" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 2 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython2", | ||
"version": "2.7.11" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
111 changes: 111 additions & 0 deletions
111
1. Introduction to Reinforcement Learning/1.1 What is Reinforcement Learning.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## What is Reinforcement Learning?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Consider you are teaching the dog to catch a ball, but you cannot teach the dog explicitly to\n", | ||
"catch a ball, instead, you will just throw a ball, every time the dog catches a ball, you will\n", | ||
"give a cookie. If it fails to catch a dog, you will not give a cookie. So the dog will figure out\n", | ||
"what actions it does that made it receive a cookie and repeat that action." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Similarly in an RL environment, you will not teach the agent what to do or how to do,\n", | ||
"instead, you will give feedback to the agent for each action it does. The feedback may be\n", | ||
"positive (reward) or negative (punishment). The learning system which receives the\n", | ||
"punishment will improve itself. Thus it is a trial and error process. The reinforcement\n", | ||
"learning algorithm retains outputs that maximize the received reward over time. In the\n", | ||
"above analogy, the dog represents the agent, giving a cookie to the dog on catching a ball is\n", | ||
"a reward and not giving a cookie is punishment.\n", | ||
"\n", | ||
"There might be delayed rewards. You may not get a reward at each step. A reward may be\n", | ||
"given only after the completion of the whole task. In some cases, you get a reward at each\n", | ||
"step to find out that whether you are making any mistake.\n", | ||
"\n", | ||
"An RL agent can explore for different actions which might give a good reward or it can\n", | ||
"(exploit) use the previous action which resulted in a good reward. If the RL agent explores\n", | ||
"different actions, there is a great possibility to get a poor reward. If the RL agent exploits\n", | ||
"past action, there is also a great possibility of missing out the best action which might give a\n", | ||
"good reward. There is always a trade-off between exploration and exploitation. We cannot\n", | ||
"perform both exploration and exploitation at the same time. We will discuss exploration exploitation\n", | ||
"dilemma detail in the upcoming chapters.\n", | ||
"\n", | ||
"Say, If you want to teach a robot to walk, without getting stuck by hitting at the mountain,\n", | ||
"you will not explicitly teach the robot not to go in the direction of mountain," | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"![title](images/B09792_01_01.png)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Instead, if the robot hits and get stuck on the mountain you will reduce 10 points so that\n", | ||
"robot will understand that hitting mountain will give it a negative reward so it will not go\n", | ||
"in that direction again." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"![title](images/B09792_01_02.png)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"And you will give 20 points to the robot when it walks in the right direction without getting\n", | ||
"stuck. So robot will understand which is the right path to rewards and try to maximize the\n", | ||
"rewards by going in a right direction." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"source": [ | ||
"![title](images/B09792_01_03.png)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python [conda env:anaconda]", | ||
"language": "python", | ||
"name": "conda-env-anaconda-py" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 2 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython2", | ||
"version": "2.7.11" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.