Skip to content

Commit

Permalink
added code
Browse files Browse the repository at this point in the history
  • Loading branch information
sudharsan13296 committed Jun 11, 2018
0 parents commit 9a3a688
Show file tree
Hide file tree
Showing 98 changed files with 20,896 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is Reinforcement Learning?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Consider you are teaching the dog to catch a ball, but you cannot teach the dog explicitly to\n",
"catch a ball, instead, you will just throw a ball, every time the dog catches a ball, you will\n",
"give a cookie. If it fails to catch a dog, you will not give a cookie. So the dog will figure out\n",
"what actions it does that made it receive a cookie and repeat that action."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly in an RL environment, you will not teach the agent what to do or how to do,\n",
"instead, you will give feedback to the agent for each action it does. The feedback may be\n",
"positive (reward) or negative (punishment). The learning system which receives the\n",
"punishment will improve itself. Thus it is a trial and error process. The reinforcement\n",
"learning algorithm retains outputs that maximize the received reward over time. In the\n",
"above analogy, the dog represents the agent, giving a cookie to the dog on catching a ball is\n",
"a reward and not giving a cookie is punishment.\n",
"\n",
"There might be delayed rewards. You may not get a reward at each step. A reward may be\n",
"given only after the completion of the whole task. In some cases, you get a reward at each\n",
"step to find out that whether you are making any mistake.\n",
"\n",
"An RL agent can explore for different actions which might give a good reward or it can\n",
"(exploit) use the previous action which resulted in a good reward. If the RL agent explores\n",
"different actions, there is a great possibility to get a poor reward. If the RL agent exploits\n",
"past action, there is also a great possibility of missing out the best action which might give a\n",
"good reward. There is always a trade-off between exploration and exploitation. We cannot\n",
"perform both exploration and exploitation at the same time. We will discuss exploration exploitation\n",
"dilemma detail in the upcoming chapters.\n",
"\n",
"Say, If you want to teach a robot to walk, without getting stuck by hitting at the mountain,\n",
"you will not explicitly teach the robot not to go in the direction of mountain,"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![title](images/B09792_01_01.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead, if the robot hits and get stuck on the mountain you will reduce 10 points so that\n",
"robot will understand that hitting mountain will give it a negative reward so it will not go\n",
"in that direction again."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![title](images/B09792_01_02.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And you will give 20 points to the robot when it walks in the right direction without getting\n",
"stuck. So robot will understand which is the right path to rewards and try to maximize the\n",
"rewards by going in a right direction."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"![title](images/B09792_01_03.png)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:anaconda]",
"language": "python",
"name": "conda-env-anaconda-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is Reinforcement Learning?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Consider you are teaching the dog to catch a ball, but you cannot teach the dog explicitly to\n",
"catch a ball, instead, you will just throw a ball, every time the dog catches a ball, you will\n",
"give a cookie. If it fails to catch a dog, you will not give a cookie. So the dog will figure out\n",
"what actions it does that made it receive a cookie and repeat that action."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly in an RL environment, you will not teach the agent what to do or how to do,\n",
"instead, you will give feedback to the agent for each action it does. The feedback may be\n",
"positive (reward) or negative (punishment). The learning system which receives the\n",
"punishment will improve itself. Thus it is a trial and error process. The reinforcement\n",
"learning algorithm retains outputs that maximize the received reward over time. In the\n",
"above analogy, the dog represents the agent, giving a cookie to the dog on catching a ball is\n",
"a reward and not giving a cookie is punishment.\n",
"\n",
"There might be delayed rewards. You may not get a reward at each step. A reward may be\n",
"given only after the completion of the whole task. In some cases, you get a reward at each\n",
"step to find out that whether you are making any mistake.\n",
"\n",
"An RL agent can explore for different actions which might give a good reward or it can\n",
"(exploit) use the previous action which resulted in a good reward. If the RL agent explores\n",
"different actions, there is a great possibility to get a poor reward. If the RL agent exploits\n",
"past action, there is also a great possibility of missing out the best action which might give a\n",
"good reward. There is always a trade-off between exploration and exploitation. We cannot\n",
"perform both exploration and exploitation at the same time. We will discuss exploration exploitation\n",
"dilemma detail in the upcoming chapters.\n",
"\n",
"Say, If you want to teach a robot to walk, without getting stuck by hitting at the mountain,\n",
"you will not explicitly teach the robot not to go in the direction of mountain,"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![title](images/B09792_01_01.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead, if the robot hits and get stuck on the mountain you will reduce 10 points so that\n",
"robot will understand that hitting mountain will give it a negative reward so it will not go\n",
"in that direction again."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![title](images/B09792_01_02.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And you will give 20 points to the robot when it walks in the right direction without getting\n",
"stuck. So robot will understand which is the right path to rewards and try to maximize the\n",
"rewards by going in a right direction."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"![title](images/B09792_01_03.png)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:anaconda]",
"language": "python",
"name": "conda-env-anaconda-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 9a3a688

Please sign in to comment.