diff --git a/docs/examples/batch-to-online.ipynb b/docs/examples/batch-to-online.ipynb
index 703a2cbb23..60f5892f99 100644
--- a/docs/examples/batch-to-online.ipynb
+++ b/docs/examples/batch-to-online.ipynb
@@ -67,7 +67,7 @@
     "scorer = metrics.make_scorer(metrics.roc_auc_score)\n",
     "scores = model_selection.cross_val_score(model, X, y, scoring=scorer, cv=cv)\n",
     "\n",
-    "# Display the average score and it's standard deviation\n",
+    "# Display the average score and its standard deviation\n",
     "print(f'ROC AUC: {scores.mean():.3f} (± {scores.std():.3f})')"
    ]
   },
@@ -94,7 +94,7 @@
    "source": [
     "## A hands-on introduction to incremental learning\n",
     "\n",
-    "Incremental learning is also often called *online learning* or *stream learning*, but if you [google online learning](https://www.google.com/search?q=online+learning) a lot of the results will point to educational websites. Hence, the terms \"incremental learning\" and \"stream learning\" (from which River derives it's name) are prefered. The point of incremental learning is to fit a model to a stream of data. In other words, the data isn't available in it's entirety, but rather the observations are provided one by one. As an example let's stream through the dataset used previously."
+    "Incremental learning is also often called *online learning* or *stream learning*, but if you [google online learning](https://www.google.com/search?q=online+learning) a lot of the results will point to educational websites. Hence, the terms \"incremental learning\" and \"stream learning\" (from which River derives its name) are preferred. The point of incremental learning is to fit a model to a stream of data. In other words, the data isn't available in its entirety, but rather the observations are provided one by one. As an example let's stream through the dataset used previously."
    ]
   },
   {
@@ -484,7 +484,7 @@
     "# We compute the CV scores using the same CV scheme and the same scoring\n",
     "scores = model_selection.cross_val_score(model, X, y, scoring=scorer, cv=cv)\n",
     "\n",
-    "# Display the average score and it's standard deviation\n",
+    "# Display the average score and its standard deviation\n",
     "print(f'ROC AUC: {scores.mean():.3f} (± {scores.std():.3f})')"
    ]
   },
diff --git a/docs/introduction/basic-concepts.md b/docs/introduction/basic-concepts.md
index 025399d3c9..669ab4a7d6 100644
--- a/docs/introduction/basic-concepts.md
+++ b/docs/introduction/basic-concepts.md
@@ -22,13 +22,13 @@ The challenge for machine learning is to ensure models you train offline on proa
 
 ## Online processing
 
-Online processing is the act of processing a data stream one element at a time. In the case of machine learning, that means training a model by teaching it one sample at a time. This is completely opposite to the traditional way of doing machine learning, which is to train a model on a whole batch data at a time.
+Online processing is the act of processing a data stream one element at a time. In the case of machine learning, that means training a model by teaching it one sample at a time. This is completely opposite to the traditional way of doing machine learning, which is to train a model on whole batches of data at a time.
 
 An online model is therefore a stateful, dynamic object. It keeps learning and doesn't have to revisit past data. It's a different way of doing things, and therefore has its own set of pros and cons.
 
 ## Tasks
 
-Machine learning encompasses many different tasks: classification, regression, anomaly detection, time series forecasting, etc. The ideology behind River is to be a generic machine learning which allows to perform these tasks in a streaming manner. Indeed, many batch machine learning algorithms have online equivalents.
+Machine learning encompasses many different tasks: classification, regression, anomaly detection, time series forecasting, etc. The ideology behind River is to be a generic machine learning approach which allows these tasks to be performed in a streaming manner. Indeed, many batch machine learning algorithms have online equivalents.
 
 Note that River also supports some more basic tasks. For instance, you might just want to calculate a running average of a data stream. These are usually smaller parts of a whole stream processing pipeline.
 
@@ -36,13 +36,13 @@ Note that River also supports some more basic tasks. For instance, you might jus
 
 River is a Python library. It is composed of a bunch of classes which implement various online processing algorithms. Most of these classes are machine learning models which can process a single sample, be it for learning or for inference.
 
-We made the choice to use dictionaries as the basic building block. First of all, online processing is different to batch processing, in that vectorization doesn't bring any speedup. Therefore numeric processing libraries such as numpy and PyTorch actually bring too much overhead. Using native Python data structures is faster.
+We made the choice to use dictionaries as the basic building block. First of all, online processing is different to batch processing, in that vectorization doesn't bring any speed-up. Therefore numeric processing libraries such as NumPy and PyTorch actually bring too much overhead. Using native Python data structures is faster.
 
-Dictionaries are therefore a perfect fit. They're native to Python and have excellent support in the standard library. They allow naming each feature. They can hold any kind of data type. They allow transparent support of JSON payloads, allowing seemless integration with web apps.
+Dictionaries are therefore a perfect fit. They're native to Python and have excellent support in the standard library. They allow the naming of each feature. They can hold any kind of data type. They allow transparent support of JSON payloads, allowing seamless integration with web apps.
 
 ## Datasets
 
-In production, you're almost always going to face data streams which you have to react to. Such as users visiting your website. The advantage of online machine learning is that you can design models which make predictions as well as learn from this data stream as it flows.
+In production, you're almost always going to face data streams which you have to react to, such as users visiting your website. The advantage of online machine learning is that you can design models that make predictions as well as learn from this data stream as it flows.
 
 But of course, when you're developping a model, you don't usually have access to a real-time feed on which to evaluate your model. You usually have an offline dataset which you want to evaluate your model on. River provides some datasets which can be read in online manner, one sample at a time. It is however crucial to keep in mind that the goal is to reproduce a production scenario as closely as possible, in order to ensure your model will perform just as well in production.
 
@@ -58,4 +58,4 @@ This is what makes online machine learning powerful. By replaying datasets in th
 
 The main reason why an offline model might not perform as expected in production is because of concept drift. But this is true for all machine learning models, be they offline or online.
 
-The advantage of online models over offline models is that they can cope with drift. Indeed, because they can keep learning, they usually adapt to concept drift in a seemless manner. As opposed to batch models which have to be retrained from scratch.
+The advantage of online models over offline models is that they can cope with drift. Indeed, because they can keep learning, they usually adapt to concept drift in a seamless manner. As opposed to batch models which have to be retrained from scratch.
diff --git a/docs/introduction/getting-started/concept-drift-detection.ipynb b/docs/introduction/getting-started/concept-drift-detection.ipynb
index 88b23030de..922936ec1e 100644
--- a/docs/introduction/getting-started/concept-drift-detection.ipynb
+++ b/docs/introduction/getting-started/concept-drift-detection.ipynb
@@ -20,9 +20,9 @@
     "\n",
     "Concept drifts might happen in the electricity demand across the year, in the stock market, in buying preferences, and in the likelihood of a new movie's success, among others.\n",
     "\n",
-    "Let us consider the movie example: two movies made at different epochs can have similar features such as famous actors/directors, storyline, production budget, marketing campaigns, etc., yet it is not certain that both will be similarly successful. What the target audience *considers* is worth watching (and their money) is constantly changing, and production companies must adapt accordingly to avoid \"box office flops\".\n",
+    "Let us consider the movie example: two movies made at different epochs can have similar features such as famous actors/directors, storyline, production budget, marketing campaigns, etc., yet it is not certain that both will be similarly successful. What the target audience *considers* is worth watching (and their money worth spending) is constantly changing, and production companies must adapt accordingly to avoid \"box office flops\".\n",
     "\n",
-    "Prior to the pandemics, the usage of hand sanitizers and facial masks was not widespread. When the cases of COVID-19 started increasing, there was a lack of such products for the final consumer. Imagine a batch-learning model deciding how much of each product a supermarket should stock during those times. What a mess!\n",
+    "Prior to the pandemic, the usage of hand sanitizers and facial masks was not widespread. When the cases of COVID-19 started increasing, there was a lack of such products for the end consumer. Imagine a batch-learning model deciding how much of each product a supermarket should stock during those times. What a mess!\n",
     "\n",
     "## Impact of drift on learning\n",
     "\n",
diff --git a/docs/introduction/why-use-river.md b/docs/introduction/why-use-river.md
index af110b006f..81efb4d41e 100644
--- a/docs/introduction/why-use-river.md
+++ b/docs/introduction/why-use-river.md
@@ -10,7 +10,7 @@ In the streaming setting, data can evolve. Adaptive methods are specifically des
 
 ## General purpose
 
-River supports different machine learning tasks, including regression, classification, and unsupervised learning. It can also be used for adhoc tasks, such as computing online metrics, as well as concept drift detection.
+River supports different machine learning tasks, including regression, classification, and unsupervised learning. It can also be used for ad hoc tasks, such as computing online metrics, as well as concept drift detection.
 
 ## User experience
 
diff --git a/docs/releases/0.4.1.md b/docs/releases/0.4.1.md
index c9bd6c8694..d2be0a7cd0 100644
--- a/docs/releases/0.4.1.md
+++ b/docs/releases/0.4.1.md
@@ -19,7 +19,7 @@
 
 ## ensemble
 
-- Removed `ensemble.HedgeBinaryClassifier` because it's performance was subpar.
+- Removed `ensemble.HedgeBinaryClassifier` because its performance was subpar.
 - Removed `ensemble.GroupRegressor`, as this should be a special case of `ensemble.StackingRegressor`.
 
 ## feature_extraction
diff --git a/river/compose/pipeline.py b/river/compose/pipeline.py
index 66b93e9f1b..2c894ede04 100644
--- a/river/compose/pipeline.py
+++ b/river/compose/pipeline.py
@@ -122,7 +122,7 @@ class Pipeline(base.Estimator):
     """A pipeline of estimators.
 
     Pipelines allow you to chain different steps into a sequence. Typically, when doing supervised
-    learning, a pipeline contains one ore more transformation steps, whilst it's is a regressor or
+    learning, a pipeline contains one or more transformation steps, whilst it's a regressor or
     a classifier. It is highly recommended to use pipelines with River. Indeed, in an online
     learning setting, it is very practical to have a model defined as a single object. Take a look
     at the [user guide](/recipes/pipelines) for further information and
diff --git a/river/imblearn/hard_sampling.py b/river/imblearn/hard_sampling.py
index a7eb07e745..b4c1faf1ba 100644
--- a/river/imblearn/hard_sampling.py
+++ b/river/imblearn/hard_sampling.py
@@ -78,7 +78,7 @@ class HardSamplingRegressor(HardSampling, base.Regressor):
     The hardness of an observation is evaluated with a loss function that compares the sample's
     ground truth with the wrapped model's prediction. If the buffer is not full, then the sample
     is added to the buffer. If the buffer is full and the new sample has a bigger loss than the
-    lowest loss in the buffer, then the sample takes it's place.
+    lowest loss in the buffer, then the sample takes its place.
 
     Parameters
     ----------
@@ -159,7 +159,7 @@ class HardSamplingClassifier(HardSampling, base.Classifier):
     The hardness of an observation is evaluated with a loss function that compares the sample's
     ground truth with the wrapped model's prediction. If the buffer is not full, then the sample
     is added to the buffer. If the buffer is full and the new sample has a bigger loss than the
-    lowest loss in the buffer, then the sample takes it's place.
+    lowest loss in the buffer, then the sample takes its place.
 
     Parameters
     ----------
diff --git a/river/neighbors/knn_classifier.py b/river/neighbors/knn_classifier.py
index b8da74b437..a3e12f3c08 100644
--- a/river/neighbors/knn_classifier.py
+++ b/river/neighbors/knn_classifier.py
@@ -24,7 +24,7 @@ class KNNClassifier(base.Classifier):
         documentation of each available search engine for more details on its usage.
         By default, use the `SWINN` search engine for approximate search queries.
     weighted
-        Weight the contribution of each neighbor by it's inverse distance.
+        Weight the contribution of each neighbor by its inverse distance.
     cleanup_every
         This determines at which rate old classes are cleaned up. Classes that
         have been seen in the past but that are not present in the current
diff --git a/river/optim/losses.py b/river/optim/losses.py
index 559bc9b894..bc03a9fe2f 100644
--- a/river/optim/losses.py
+++ b/river/optim/losses.py
@@ -67,7 +67,7 @@ class Absolute(RegressionLoss):
 
     $$L = |p_i - y_i|$$
 
-    It's gradient w.r.t. to $p_i$ is
+    Its gradient w.r.t. to $p_i$ is
 
     $$\\frac{\\partial L}{\\partial p_i} = sgn(p_i - y_i)$$
 
@@ -203,7 +203,7 @@ class Hinge(BinaryLoss):
 
     $$L = max(0, 1 - p_i * y_i)$$
 
-    It's gradient w.r.t. to $p_i$ is
+    Its gradient w.r.t. to $p_i$ is
 
     $$
     \\frac{\\partial L}{\\partial y_i} = \\left\{
@@ -404,7 +404,7 @@ class Squared(RegressionLoss):
 
     $$L = (p_i - y_i) ^ 2$$
 
-    It's gradient w.r.t. to $p_i$ is
+    Its gradient w.r.t. to $p_i$ is
 
     $$\\frac{\\partial L}{\\partial p_i} = 2 (p_i - y_i)$$
 
@@ -539,7 +539,7 @@ class Poisson(RegressionLoss):
 
     $$L = exp(p_i) - y_i \\times p_i$$
 
-    It's gradient w.r.t. to $p_i$ is
+    Its gradient w.r.t. to $p_i$ is
 
     $$\\frac{\\partial L}{\\partial p_i} = exp(p_i) - y_i$$
 
diff --git a/river/stats/link.py b/river/stats/link.py
index 9f27efae11..459dd4038a 100644
--- a/river/stats/link.py
+++ b/river/stats/link.py
@@ -37,7 +37,7 @@ class Link(stats.base.Univariate):
     >>> stat.update(1)
 
     The output from `get` will still be 0. The reason is that `stats.Shift` has not enough
-    values, and therefore outputs it's default value, which is `None`. The `stats.Mean`
+    values, and therefore outputs its default value, which is `None`. The `stats.Mean`
     instance is therefore not updated.
 
     >>> stat.get()
@@ -57,7 +57,7 @@ class Link(stats.base.Univariate):
     >>> stat.get()
     2.0
 
-    Note that composing statistics returns a new statistic with it's own name.
+    Note that composing statistics returns a new statistic with its own name.
 
     >>> stat.name
     'mean_of_shift_1'