REFACTOR-#4513: Fix spelling mistakes in docs and docstrings (#4514)

Co-authored-by: Rehan Sohail Durrani <rdurrani@berkeley.edu> Signed-off-by: jeffreykennethli <jkli@ponder.io>
modin-project · Jun 6, 2022 · 57e29bc · 57e29bc
1 parent c1d5dbd
commit 57e29bc
Show file tree

Hide file tree

Showing 43 changed files with 152 additions and 151 deletions.
diff --git a/docs/development/contributing.rst b/docs/development/contributing.rst
@@ -13,7 +13,7 @@ want to review in order to get started.
 
 Also, feel free to join the discussions on the `developer mailing list`_.
 
-If you want a quick guide to getting your development enviroment setup, please
+If you want a quick guide to getting your development environment setup, please
 use `the contributing instructions on GitHub`_.
 
 Certificate of Origin

diff --git a/docs/development/partition_api.rst b/docs/development/partition_api.rst
@@ -9,7 +9,7 @@ from raw futures objects.
 Partition IPs
 -------------
 For finer grained placement control, Modin also provides an API to get the IP addresses of the nodes that hold each partition.
-You can pass the partitions having needed IPs to your function. It can help with minimazing of data movement between nodes.
+You can pass the partitions having needed IPs to your function. It can help with minimizing of data movement between nodes.
 
 Partition API implementations
 -----------------------------

diff --git a/docs/flow/modin/config.rst b/docs/flow/modin/config.rst
@@ -5,7 +5,7 @@ Modin Configuration Settings
 
 To adjust Modin's default behavior, you can set the value of Modin
 configs by setting an environment variable or by using the
-``modin.config`` API. To list all avaliable configs in Modin, please
+``modin.config`` API. To list all available configs in Modin, please
 run ``python -m modin.config`` to print all
 Modin configs with descriptions.
 

diff --git a/docs/flow/modin/core/dataframe/index.rst b/docs/flow/modin/core/dataframe/index.rst
@@ -3,7 +3,7 @@
 Core Modin Dataframe Objects
 ============================
 
-Modin paritions data to scale efficiently.
+Modin partitions data to scale efficiently.
 To keep track of everything a few key classes are introduced: ``Dataframe``, ``Partition``, ``AxisPartiton`` and ``PartitionManager``.
 
 * ``Dataframe`` is the class conforming to Dataframe Algebra.

diff --git a/docs/flow/modin/core/dataframe/pandas/dataframe.rst b/docs/flow/modin/core/dataframe/pandas/dataframe.rst
@@ -8,7 +8,7 @@ The class serves as the intermediate level
 between ``pandas`` query compiler and conforming partition manager. All queries formed
 at the query compiler layer are ingested by this class and then conveyed jointly with the stored partitions
 into the partition manager for processing. Direct partitions manipulation by this class is prohibited except
-cases if an operation is striclty private or protected and called inside of the class only. The class provides
+cases if an operation is strictly private or protected and called inside of the class only. The class provides
 significantly reduced set of operations that fit plenty of pandas operations.
 
 Main tasks of :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe` are storage of partitions, manipulation with labels of axes and

diff --git a/docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst b/docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst
@@ -18,7 +18,7 @@ Partition manager can apply user-passed (arbitrary) function in different modes:
 
 * block-wise (apply a function to individual block partitions):
 
-  * optinally accepting partition indices along each axis
+  * optionally accepting partition indices along each axis
   * optionally accepting an item to be split so parts of it would be sent to each partition
 
 * along a full axis (apply a function to an entire column or row made up of block partitions when user function needs information about the whole axis)

diff --git a/...xecution/dask/implementations/pandas_on_dask/partitioning/partition_manager.rst b/...xecution/dask/implementations/pandas_on_dask/partitioning/partition_manager.rst
@@ -2,7 +2,7 @@ PandasOnDaskDataframePartitionManager
 """""""""""""""""""""""""""""""""""""
 
 This class is the specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.partition_manager.PandasDataframePartitionManager`
-using Dask as the execution engine. This class is responsible for partition manipulation and applying a funcion to
+using Dask as the execution engine. This class is responsible for partition manipulation and applying a function to
 block/row/column partitions.
 
 Public API

diff --git a/docs/flow/modin/core/execution/python/implementations/pandas_on_python/index.rst b/docs/flow/modin/core/execution/python/implementations/pandas_on_python/index.rst
@@ -33,7 +33,7 @@ PandasOnPython Dataframe implementation
 This page describes implementation of :doc:`Modin PandasDataframe Objects </flow/modin/core/dataframe/pandas/index>`
 specific for `PandasOnPython` execution. Since Python engine doesn't allow computation parallelization,
 operations on partitions are performed sequentially. The absence of parallelization doesn't give any
-perfomance speed-up, so ``PandasOnPython`` is used for testing purposes only.
+performance speed-up, so ``PandasOnPython`` is used for testing purposes only.
 
 * :doc:`PandasOnPythonDataframe <dataframe>`
 * :doc:`PandasOnPythonDataframePartition <partitioning/partition>`

diff --git a/...tion/python/implementations/pandas_on_python/partitioning/partition_manager.rst b/...tion/python/implementations/pandas_on_python/partitioning/partition_manager.rst
@@ -3,7 +3,7 @@ PandasOnPythonDataframePartition
 
 The class is specific implementation of :py:class:`~modin.core.dataframe.pandas.partitioning.partition_manager.PandasDataframePartitionManager`
 using Python as the execution engine. This class is responsible for partitions manipulation and applying
-a funcion to block/row/column partitions.
+a function to block/row/column partitions.
 
 Public API
 ----------

diff --git a/.../execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.rst b/.../execution/ray/implementations/pandas_on_ray/partitioning/partition_manager.rst
@@ -2,7 +2,7 @@ PandasOnRayDataframePartitionManager
 """"""""""""""""""""""""""""""""""""
 
 This class is the specific implementation of :py:class:`~modin.core.execution.ray.generic.partitioning.GenericRayDataframePartitionManager`
-using Ray distributed engine. This class is responsible for partition manipulation and applying a funcion to
+using Ray distributed engine. This class is responsible for partition manipulation and applying a function to
 block/row/column partitions.
 
 Public API

diff --git a/docs/flow/modin/core/io/index.rst b/docs/flow/modin/core/io/index.rst
@@ -6,34 +6,33 @@ IO Module Description
 Dispatcher Classes Workflow Overview
 ''''''''''''''''''''''''''''''''''''
 
-Call from ``read_*`` function of execution-specific IO class (for example, ``PandasOnRayIO`` for
-Ray engine and pandas storage format) is forwarded to the ``_read`` function of file
+Calls from ``read_*`` functions of execution-specific IO classes (for example, ``PandasOnRayIO`` for
+Ray engine and pandas storage format) are forwarded to the ``_read`` function of the file
 format-specific class (for example ``CSVDispatcher`` for CSV files), where function parameters are
-preprocessed to check if they are supported (otherwise default pandas implementation
-is used) and compute some metadata common for all partitions. Then file is splitted
-into chunks (mechanism of splitting is described below) and using this data, tasks
-are launched on the remote workers. After remote tasks are finished, additional
-results postprocessing is performed, and new query compiler with imported data will
+preprocessed to check if they are supported (defaulting to pandas if not)
+and common metadata is computed for all partitions. The file is then split
+into chunks (splitting mechanism described below) and the data is used to launch tasks
+on the remote workers. After the remote tasks finish, additional
+postprocessing is performed on the results, and a new query compiler with the imported data will
 be returned.
 
 Data File Splitting Mechanism
 '''''''''''''''''''''''''''''
 
-Modin file splitting mechanism differs depending on the data format type:
+Modin's file splitting mechanism differs depending on the data format type:
 
-* text format type - file is splitted into bytes according user specified needs.
+* text format type - the file is split into bytes according to user specified arguments(?).
   In the simplest case, when no row related parameters (such as ``nrows`` or
-  ``skiprows``) are passed, data chunks limits (start and end bytes) are derived
-  by just roughly dividing the file size by the number of partitions (chunks can
+  ``skiprows``) are passed, data chunk limits (start and end bytes) are derived
+  by dividing the file size by the number of partitions (chunks can
   slightly differ between each other because usually end byte may occurs inside a
   line and in that case the last byte of the line should be used instead of initial
-  value). In other cases the same splitting into bytes is used, but chunks sizes are
+  value). In other cases the same splitting mechanism is used, but chunks sizes are
   defined according to the number of lines that each partition should contain.
 
-* columnar store type - file is splitted by even distribution of columns that should
-  be read between chunks.
+* columnar store type - the file is split so that each chunk contains approximately the same number of columns.
 
-* SQL type - chunking is obtained by wrapping initial SQL query into query that
+* SQL type - chunking is obtained by wrapping initial SQL query with a query that
   specifies initial row offset and number of rows in the chunk.
 
 After file splitting is complete, chunks data is passed to the parser functions
@@ -121,10 +120,10 @@ of ``header`` and ``skiprows`` parameters:
   df = pandas.read_csv(StringIO(data), skiprows=[2, 3, 4], header=2)
 
 In the examples above list-like ``skiprows`` values are fixed and ``header`` is varied. In the first
-example with no ``header`` provided, rows 2, 3, 4 are skipped and row 0 is considered as a header.
-In the second example ``header == 1``, so 0th row is skipped and the next available row is
-considered as a header. The third example shows the case when ``header`` and ``skiprows`` parameters
-values are intersected - in this case skipped rows are dropped first and only then ``header`` is got
+example with no ``header`` provided, rows 2, 3, 4 are skipped and row 0 is considered as the header.
+In the second example ``header == 1``, so the zeroth row is skipped and the next available row is
+considered the header. The third example illustrates when the ``header`` and ``skiprows`` parameters
+values are both present - in this case ``skiprows`` rows are dropped first and then the ``header`` is derived
 from the remaining rows (rows before header are skipped too).
 
 In the examples above only list-like ``skiprows`` and integer ``header`` parameters are considered,

diff --git a/docs/flow/modin/core/storage_formats/index.rst b/docs/flow/modin/core/storage_formats/index.rst
@@ -13,8 +13,9 @@ limited to the objects that conform to pandas API. There are formats that are ab
 SQL-like databases (:doc:`OmniSci storage format </flow/modin/experimental/core/storage_formats/omnisci/index>`)
 inside Modin Dataframe's partitions.
 
-An honor of converting high-level pandas API calls to the ones that are understandable
-by the corresponding execution implementation belongs to the Query Compiler (QC) object.
+The storage format + execution engine (Ray, Dask, etc.) form the execution backend. 
+The Query Compiler (QC) converts high-level pandas API calls to queries that are understood 
+by the execution backend.
 
 .. _query_compiler_def:
 

diff --git a/docs/flow/modin/core/storage_formats/pandas/parsers.rst b/docs/flow/modin/core/storage_formats/pandas/parsers.rst
@@ -8,10 +8,9 @@ and util functions for handling parsing results. ``PandasParser`` is base class
 classes with pandas storage format, that contains methods common for all child classes. Other
 module classes implement ``parse`` function that performs parsing of specific format data
 basing on the chunk information computed in the ``modin.core.io`` module. After
-chunk data parsing is completed, resulting ``DataFrame``-s will be splitted into smaller
-``DataFrame``-s according to ``num_splits`` parameter, data type and number or
-rows/columns in the parsed chunk, and then these frames and some additional metadata will
-be returned.
+the chunk is parsed, the resulting ``DataFrame``-s will be split into smaller
+``DataFrame``-s according to the ``num_splits`` parameter, data type, or number of
+rows/columns in the parsed chunk. These frames, along with some additional metadata, are then returned.
 
 .. note:: 
     If you are interested in the data parsing mechanism implementation details, please refer

diff --git a/.../experimental/core/execution/native/implementations/omnisci_on_native/index.rst b/.../experimental/core/execution/native/implementations/omnisci_on_native/index.rst
@@ -108,14 +108,14 @@ e.g. validating a parameter from the query and defining specific intermediate va
 to provide more context to the query compiler.
 
 The :py:class:`~modin.experimental.core.storage_formats.omnisci.query_compiler.DFAlgQueryCompiler`
-is responsible for reducing the recieved query to the pre-defined Dataframe algebra operators
-and pass their execution to the
+is responsible for reducing the query to the pre-defined Dataframe algebra operators
+and triggering execution on the
 :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.dataframe.dataframe.OmnisciOnNativeDataframe`.
 
-When :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.dataframe.dataframe.OmnisciOnNativeDataframe`
-recieves a query it determines whether the operation requires data materialization
-or can be performed lazily. Depending on that the operation is either appended to a
-lazy computation tree or executed.
+When the :py:class:`~modin.experimental.core.execution.native.implementations.omnisci_on_native.dataframe.dataframe.OmnisciOnNativeDataframe`
+receives a query, it determines whether the operation requires data materialization
+or whether it can be performed lazily. The operation is then either appended to a
+lazy computation tree or executed immediately.
 
 Lazy execution
 """"""""""""""

diff --git a/...odin/experimental/core/execution/ray/implementations/pandas_on_ray/io/index.rst b/...odin/experimental/core/execution/ray/implementations/pandas_on_ray/io/index.rst
@@ -1,7 +1,7 @@
 :orphan:
 
-IO module Description For Pandas-on-Ray Excecution
-""""""""""""""""""""""""""""""""""""""""""""""""""
+IO module Description For Pandas-on-Ray Execution
+"""""""""""""""""""""""""""""""""""""""""""""""""
 
 High-Level Module Overview
 ''''''''''''''''''''''''''
@@ -25,8 +25,8 @@ statement as follows:
 Submodules Description
 ''''''''''''''''''''''
 
-``modin.experimental.core.execution.ray.implementations.pandas_on_ray`` module is used mostly for storing utils and 
-functions for experimanetal IO class:
+The ``modin.experimental.core.execution.ray.implementations.pandas_on_ray`` module primarily houses utils and 
+functions for the experimental IO class:
 
 * ``io.py`` - submodule containing IO class and parse functions, which are responsible
   for data processing on the workers.

diff --git a/docs/flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray.rst b/docs/flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray.rst
@@ -18,10 +18,10 @@ by the pandas creator, pandas internal architecture is not optimal and sometimes
 needs up to ten times more memory than the original dataset size
 (note, that pandas rule of thumb: `have 5 to 10 times as much RAM as the size of your
 dataset`). In order to fix this issue (or at least to reduce needed memory amount and
-needed data copying), ``PyArrow-on-Ray`` module was added. Due to optimized architecture
-of PyArrow Tables, number of needed copies can be decreased `down to zero
+needed data copying), ``PyArrow-on-Ray`` module was added. Due to the optimized architecture
+of PyArrow Tables, `no additional copies are needed
 <https://arrow.apache.org/docs/python/pandas.html#zero-copy-series-conversions>`_ in some
-corner cases, that can signifficantly improve Modin performance. The downside of this approach
-is that PyArrow and pandas do not support the same APIs and some functions/parameters can have
-incompatibilities or output different results, so for now ``PyArrow-on-Ray`` engine is
+corner cases, which can significantly improve Modin performance. The downside of this approach
+is that PyArrow and pandas do not support the same APIs and some functions/parameters may have
+different signatures or output different results, so for now the ``PyArrow-on-Ray`` engine is
 under development and marked as experimental.
diff --git a/docs/flow/modin/experimental/xgboost.rst b/docs/flow/modin/experimental/xgboost.rst
@@ -48,9 +48,10 @@ Internal functions :py:func:`~modin.experimental.xgboost.xgboost_ray._train` and
 Training
 ********
 
-1. The data is passed to :py:func:`~modin.experimental.xgboost.xgboost_ray._train`
-   function as a :py:class:`~modin.experimental.xgboost.DMatrix` object. Using an iterator of
-   :py:class:`~modin.experimental.xgboost.DMatrix`, lists of ``ray.ObjectRef`` with row partitions of Modin DataFrame are exctracted. Example:
+1. The data is passed to the :py:func:`~modin.experimental.xgboost.xgboost_ray._train`
+   function as a :py:class:`~modin.experimental.xgboost.DMatrix` object. Lists of ``ray.ObjectRef``
+   corresponding to row partitions of Modin DataFrames are extracted by iterating over the 
+   :py:class:`~modin.experimental.xgboost.DMatrix`. Example:
 
    .. code-block:: python
 

diff --git a/docs/getting_started/quickstart.rst b/docs/getting_started/quickstart.rst
@@ -137,8 +137,8 @@ create the large dataframe, while pandas took close to a minute.
 Faster ``apply`` over a single column
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The performance benefits of Modin becomes aparent when we operate on large 
-gigabyte-scale datasets. For example, let's say that we want to round up the number 
+The performance benefits of Modin become apparent when we operate on large 
+gigabyte-scale datasets. Let's say we want to round up values 
 across a single column via the ``apply`` operation. 
 
 .. code-block:: python