added code

sudharsan13296 · Jun 11, 2018 · 9a3a688 · 9a3a688
commit 9a3a688
Show file tree

Hide file tree

Showing 98 changed files with 20,896 additions and 0 deletions.
diff --git a/...forcement Learning/.ipynb_checkpoints/1.1 What is Reinforcement Learning-checkpoint.ipynb b/...forcement Learning/.ipynb_checkpoints/1.1 What is Reinforcement Learning-checkpoint.ipynb
@@ -0,0 +1,111 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## What is Reinforcement Learning?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Consider you are teaching the dog to catch a ball, but you cannot teach the dog explicitly to\n",
+    "catch a ball, instead, you will just throw a ball, every time the dog catches a ball, you will\n",
+    "give a cookie. If it fails to catch a dog, you will not give a cookie. So the dog will figure out\n",
+    "what actions it does that made it receive a cookie and repeat that action."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Similarly in an RL environment, you will not teach the agent what to do or how to do,\n",
+    "instead, you will give feedback to the agent for each action it does. The feedback may be\n",
+    "positive (reward) or negative (punishment). The learning system which receives the\n",
+    "punishment will improve itself. Thus it is a trial and error process. The reinforcement\n",
+    "learning algorithm retains outputs that maximize the received reward over time. In the\n",
+    "above analogy, the dog represents the agent, giving a cookie to the dog on catching a ball is\n",
+    "a reward and not giving a cookie is punishment.\n",
+    "\n",
+    "There might be delayed rewards. You may not get a reward at each step. A reward may be\n",
+    "given only after the completion of the whole task. In some cases, you get a reward at each\n",
+    "step to find out that whether you are making any mistake.\n",
+    "\n",
+    "An RL agent can explore for different actions which might give a good reward or it can\n",
+    "(exploit) use the previous action which resulted in a good reward. If the RL agent explores\n",
+    "different actions, there is a great possibility to get a poor reward. If the RL agent exploits\n",
+    "past action, there is also a great possibility of missing out the best action which might give a\n",
+    "good reward. There is always a trade-off between exploration and exploitation. We cannot\n",
+    "perform both exploration and exploitation at the same time. We will discuss exploration exploitation\n",
+    "dilemma detail in the upcoming chapters.\n",
+    "\n",
+    "Say, If you want to teach a robot to walk, without getting stuck by hitting at the mountain,\n",
+    "you will not explicitly teach the robot not to go in the direction of mountain,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![title](images/B09792_01_01.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Instead, if the robot hits and get stuck on the mountain you will reduce 10 points so that\n",
+    "robot will understand that hitting mountain will give it a negative reward so it will not go\n",
+    "in that direction again."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![title](images/B09792_01_02.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And you will give 20 points to the robot when it walks in the right direction without getting\n",
+    "stuck. So robot will understand which is the right path to rewards and try to maximize the\n",
+    "rewards by going in a right direction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "![title](images/B09792_01_03.png)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:anaconda]",
+   "language": "python",
+   "name": "conda-env-anaconda-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/1. Introduction to Reinforcement Learning/1.1 What is Reinforcement Learning.ipynb b/1. Introduction to Reinforcement Learning/1.1 What is Reinforcement Learning.ipynb
@@ -0,0 +1,111 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## What is Reinforcement Learning?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Consider you are teaching the dog to catch a ball, but you cannot teach the dog explicitly to\n",
+    "catch a ball, instead, you will just throw a ball, every time the dog catches a ball, you will\n",
+    "give a cookie. If it fails to catch a dog, you will not give a cookie. So the dog will figure out\n",
+    "what actions it does that made it receive a cookie and repeat that action."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Similarly in an RL environment, you will not teach the agent what to do or how to do,\n",
+    "instead, you will give feedback to the agent for each action it does. The feedback may be\n",
+    "positive (reward) or negative (punishment). The learning system which receives the\n",
+    "punishment will improve itself. Thus it is a trial and error process. The reinforcement\n",
+    "learning algorithm retains outputs that maximize the received reward over time. In the\n",
+    "above analogy, the dog represents the agent, giving a cookie to the dog on catching a ball is\n",
+    "a reward and not giving a cookie is punishment.\n",
+    "\n",
+    "There might be delayed rewards. You may not get a reward at each step. A reward may be\n",
+    "given only after the completion of the whole task. In some cases, you get a reward at each\n",
+    "step to find out that whether you are making any mistake.\n",
+    "\n",
+    "An RL agent can explore for different actions which might give a good reward or it can\n",
+    "(exploit) use the previous action which resulted in a good reward. If the RL agent explores\n",
+    "different actions, there is a great possibility to get a poor reward. If the RL agent exploits\n",
+    "past action, there is also a great possibility of missing out the best action which might give a\n",
+    "good reward. There is always a trade-off between exploration and exploitation. We cannot\n",
+    "perform both exploration and exploitation at the same time. We will discuss exploration exploitation\n",
+    "dilemma detail in the upcoming chapters.\n",
+    "\n",
+    "Say, If you want to teach a robot to walk, without getting stuck by hitting at the mountain,\n",
+    "you will not explicitly teach the robot not to go in the direction of mountain,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![title](images/B09792_01_01.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Instead, if the robot hits and get stuck on the mountain you will reduce 10 points so that\n",
+    "robot will understand that hitting mountain will give it a negative reward so it will not go\n",
+    "in that direction again."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![title](images/B09792_01_02.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And you will give 20 points to the robot when it walks in the right direction without getting\n",
+    "stuck. So robot will understand which is the right path to rewards and try to maximize the\n",
+    "rewards by going in a right direction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "![title](images/B09792_01_03.png)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:anaconda]",
+   "language": "python",
+   "name": "conda-env-anaconda-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/1. Introduction to Reinforcement Learning/images/B09792_01_01.png b/1. Introduction to Reinforcement Learning/images/B09792_01_01.png
diff --git a/1. Introduction to Reinforcement Learning/images/B09792_01_02.png b/1. Introduction to Reinforcement Learning/images/B09792_01_02.png
diff --git a/1. Introduction to Reinforcement Learning/images/B09792_01_03.png b/1. Introduction to Reinforcement Learning/images/B09792_01_03.png