microsoft · calebrob6 · Dec 19, 2024 · Dec 12, 2024
diff --git a/docs/tutorials/geospatial.ipynb b/docs/tutorials/geospatial.ipynb
@@ -126,7 +126,7 @@
     "\n",
     "Similar to radar, lidar is another active remote sensing method that replaces microwave pulses with lasers. By measuring the time it takes light to reflect off of an object and return to the sensor, we can generate a 3D point cloud mapping object structures. Mathematically, our dataset would then become:\n",
     "\n",
-    "$$\\mathcal{D} = \\left\\{\\left(x^{(i)}, y^{(i)}, z^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
+    "$$D = \\left\\{\\left(x^{(i)}, y^{(i)}, z^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
     "\n",
     "This technology is frequently used in several different application domains:\n",
     "\n",

diff --git a/docs/tutorials/pytorch.ipynb b/docs/tutorials/pytorch.ipynb
@@ -115,9 +115,9 @@
     "\n",
     "In order to learn by example, we first need examples. In machine learning, we construct datasets of the form:\n",
     "\n",
-    "$$\\mathcal{D} = \\left\\{\\left(x^{(i)}, y^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
+    "$$D = \\left\\{\\left(x^{(i)}, y^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
     "\n",
-    "Written in English, dataset $\\mathcal{D}$ is composed of $N$ pairs of inputs $x$ and expected outputs $y$. $x$ and $y$ can be tabular data, images, text, or any other object that can be represented mathematically.\n",
+    "Written in English, dataset $D$ is composed of $N$ pairs of inputs $x$ and expected outputs $y$. $x$ and $y$ can be tabular data, images, text, or any other object that can be represented mathematically.\n",
     "\n",
     "![EuroSAT](https://github.com/phelber/EuroSAT/blob/master/eurosat-overview.png?raw=true)\n",
     "\n",
@@ -261,11 +261,11 @@
     "\n",
     "If $y$ is our expected output (also called \"ground truth\") and $\\hat{y}$ is our predicted output, our goal is to minimize the difference between $y$ and $\\hat{y}$. This difference is referred to as *error* or *loss*, and the loss function tells us how big of a mistake we made. For regression tasks, a simple mean squared error is sufficient:\n",
     "\n",
-    "$$\\mathcal{L}(y, \\hat{y}) = \\left(y - \\hat{y}\\right)^2$$\n",
+    "$$L(y, \\hat{y}) = \\left(y - \\hat{y}\\right)^2$$\n",
     "\n",
     "For classification tasks, such as EuroSAT, we instead use a negative log-likelihood:\n",
     "\n",
-    "$$\\mathcal{L}_c(y, \\hat{y}) = - \\sum_{c=1}^C \\mathbb{1}_{y=\\hat{y}}\\log{p_c}$$\n",
+    "$$L_c(y, \\hat{y}) = - \\sum_{c=1}^C \\mathbb{1}_{y=\\hat{y}}\\log{p_c}$$\n",
     "\n",
     "where $\\mathbb{1}$ is the indicator function and $p_c$ is the probability with which the model predicts class $c$. By normalizing this over the log probability of all classes, we get the cross-entropy loss."
    ]
@@ -289,7 +289,7 @@
     "\n",
     "In order to minimize our loss, we compute the gradient of the loss function with respect to model parameters $\\theta$. We then take a small step $\\alpha$ (also called the *learning rate*) in the direction of the negative gradient to update our model parameters in a process called *backpropagation*:\n",
     "\n",
-    "$$\\theta \\leftarrow \\theta - \\alpha \\nabla_\\theta \\mathcal{L}(y, \\hat{y})$$\n",
+    "$$\\theta \\leftarrow \\theta - \\alpha \\nabla_\\theta L(y, \\hat{y})$$\n",
     "\n",
     "When done one image or one mini-batch at a time, this is known as *stochastic gradient descent* (SGD)."
    ]