diff --git a/docs/source/explanation/explanation_of_numerical_optimizers.md b/docs/source/explanation/explanation_of_numerical_optimizers.md
index 45f99fa93..6ef1834f0 100644
--- a/docs/source/explanation/explanation_of_numerical_optimizers.md
+++ b/docs/source/explanation/explanation_of_numerical_optimizers.md
@@ -14,8 +14,9 @@ The main principles we describe here are:
 - Derivative free trust region algorithms
 - Derivative free direct search algorithms
 
-This covers a large range of the algorithms that come with optimagic. We do currently
-not cover:
+This covers a large range of the algorithms that come with optimagic. In contrast, the
+following classes of optimizers are also accessible via optimagic, but not yet covered
+in this overview:
 
 - Conjugate gradient methods
 - Genetic algorithms
diff --git a/docs/source/explanation/internal_optimizers.md b/docs/source/explanation/internal_optimizers.md
index bb004e104..89871e8c4 100644
--- a/docs/source/explanation/internal_optimizers.md
+++ b/docs/source/explanation/internal_optimizers.md
@@ -9,7 +9,7 @@ internal optimizer interface.
 
 The advantages of using the algorithm with optimagic over using it directly are:
 
-- optimagic turns an unconstrained optimizer into constrained ones.
+- optimagic turns unconstrained optimizers into constrained ones.
 - You can use logging.
 - You get great error handling for exceptions in the criterion function or gradient.
 - You get a parallelized and customizable numerical gradient if the user did not provide
diff --git a/docs/source/explanation/numdiff_background.md b/docs/source/explanation/numdiff_background.md
index 2f55627bf..d9c368200 100644
--- a/docs/source/explanation/numdiff_background.md
+++ b/docs/source/explanation/numdiff_background.md
@@ -1,4 +1,4 @@
-# Background and methods
+# Numerical differentiation: methods
 
 In this section we explain the mathematical background of forward, backward and central
 differences. The main ideas in this chapter are taken from {cite}`Dennis1996`. x is used
@@ -24,9 +24,9 @@ The central difference for the gradient is given by:
 
 $$
 \nabla f(x) =
-\begin{pmatrix}\frac{f(x + e_0 * h_0) - f(x - e_0 * h_0)}{h_0}\\
-\frac{f(x + e_1 * h_1) - f(x - e_1 * h_1)}{h_1}\\.\\.\\.\\ \frac{f(x + e_n * h_n)
-- f(x - e_n * h_n)}{h_n} \end{pmatrix}
+\begin{pmatrix}\frac{f(x + e_0 * h_0) - f(x - e_0 * h_0)}{2 h_0}\\
+\frac{f(x + e_1 * h_1) - f(x - e_1 * h_1)}{2 h_1}\\.\\.\\.\\ \frac{f(x + e_n * h_n)
+- f(x - e_n * h_n)}{2 h_n} \end{pmatrix}
 $$
 
 For the optimal stepsize h the following rule of thumb is applied: