diff --git a/Ch12_Optimization_Algorithms/RMSProp.ipynb b/Ch12_Optimization_Algorithms/RMSProp.ipynb
new file mode 100644
index 00000000..db613f02
--- /dev/null
+++ b/Ch12_Optimization_Algorithms/RMSProp.ipynb
@@ -0,0 +1,1712 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# RMSProp\n",
+ "\n",
+ "In the experiment in the Adagrad section, the learning rate of each element in the independent variable\n",
+ "of the objective function declines (or remains unchanged) during iteration because the variable $s_t$ in\n",
+ "the denominator is increased by the square by element operation of the mini-batch stochastic gradient,\n",
+ "adjusting the learning rate. Therefore, when the learning rate declines very fast during early iteration, yet\n",
+ "the current solution is still not desirable, Adagrad might have difficulty finding a useful solution because\n",
+ "the learning rate will be too small at later stages of iteration. To tackle this problem, the RMSProp\n",
+ "algorithm made a small modification to Adagrad.\n",
+ "\n",
+ "## 8.6.1 The Algorithm\n",
+ "\n",
+ "We introduced EWMA (exponentially weighted moving average) in the Momentum section. Unlike in\n",
+ "Adagrad, the state variable $s_t$ is the sum of the square by element all the mini-batch stochastic gradients\n",
+ "$g_t$ up to the time step t, RMSProp uses the EWMA on the square by element results of these gradients.\n",
+ "Specifically, given the hyperparameter 0 ≤ $ \\gamma $ < 1, RMSProp is computed at time step t > 0.\n",
+ "\n",
+ "$$ \\begin{aligned} \\mathbf{s}_t \\leftarrow \\gamma \\mathbf{s}_{t-1} + (1 - \\gamma) \\mathbf{g}_t * \\mathbf{g}_t \\end{aligned} $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Like Adagrad, RMSProp re-adjusts the learning rate of each element in the independent variable of the\n",
+ "objective function with element operations and then updates the independent variable.\n",
+ "\n",
+ "$$ \\begin{aligned} \\mathbf{x}_t \\leftarrow \\mathbf{x}_{t-1} (\\frac{\\eta}{\\sqrt{\\mathbf{s}_t + \\epsilon}}) * \\mathbf{g}_t \\end{aligned} $$ \n",
+ "\n",
+ "Here, η is the learning rate while ε is a constant added to maintain numerical stability, such as $10 ^ {−6}$ .\n",
+ "Because the state variable of RMSProp is an EWMA of the squared term $g_t * g_t$ , it can be seen as the\n",
+ "weighted average of the mini-batch stochastic gradient’s squared terms from the last 1/(1 − $ \\gamma $) time steps.\n",
+ "Therefore, the learning rate of each element in the independent variable will not always decline (or remain\n",
+ "unchanged) during iteration.\n",
+ "\n",
+ "By convention, we will use the objective function f (x) = 0.1x 21 + 2x 22 to observe the iterative trajectory\n",
+ "of the independent variable in RMSProp. Recall that in the Adagrad section, when we used Adagrad with\n",
+ "a learning rate of 0.4, the independent variable moved less in later stages of iteration. However, at the\n",
+ "same learning rate, RMSProp can approach the optimal solution faster."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "epoch 20, x1 -0.010599, x2 0.000000\n"
+ ]
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "