forked from modin-project/modin
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
FEAT-modin-project#2013: Merge remote-tracking branch 'upstream/maste…
…r' into merge_asof/2013 Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
- Loading branch information
Showing
23 changed files
with
469 additions
and
316 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,69 @@ | ||
Modin vs. Pandas | ||
Modin vs. pandas | ||
================ | ||
|
||
Coming Soon... | ||
Modin exposes the pandas API through ``modin.pandas``, but it does not inherit the same | ||
pitfalls and design decisions that make it difficult to scale. This page will discuss | ||
how Modin's dataframe implementation differs from pandas, and how Modin scales pandas. | ||
|
||
Scalablity of implementation | ||
---------------------------- | ||
|
||
The pandas implementation is inherently single-threaded. This means that only one of | ||
your CPU cores can be utilized at any given time. In a laptop, it would look something | ||
like this with pandas: | ||
|
||
.. image:: /img/pandas_multicore.png | ||
:alt: pandas is single threaded! | ||
:align: center | ||
:scale: 80% | ||
|
||
However, Modin's implementation enables you to use all of the cores on your machine, or | ||
all of the cores in an entire cluster. On a laptop, it will look something like this: | ||
|
||
.. image:: /img/modin_multicore.png | ||
:alt: modin uses all of the cores! | ||
:align: center | ||
:scale: 80% | ||
|
||
The additional utilization leads to improved performance, however if you want to scale | ||
to an entire cluster, Modin suddenly looks something like this: | ||
|
||
.. image:: /img/modin_cluster.png | ||
:alt: modin works on a cluster too! | ||
:align: center | ||
:scale: 30% | ||
|
||
Modin is able to efficiently make use of all of the hardware available to it! | ||
|
||
Memory usage and immutability | ||
----------------------------- | ||
|
||
The pandas API contains many cases of "inplace" updates, which are known to be | ||
controversial. This is due in part to the way pandas manages memory: the user may | ||
think they are saving memory, but pandas is usually copying the data whether an | ||
operation was inplace or not. | ||
|
||
Modin allows for inplace semantics, but the underlying data structures within Modin's | ||
implementation are immutable, unlike pandas. This immutability gives Modin the ability | ||
to internally chain operators and better manage memory layouts, because they will not | ||
be changed. This leads to improvements over pandas in memory usage in many common cases, | ||
due to the ability to share common memory blocks among all dataframes. | ||
|
||
Modin provides the inplace semantics by having a mutable pointer to the immutable | ||
internal Modin dataframe. This pointer can change, but the underlying data cannot, so | ||
when an inplace update is triggered, Modin will treat it as if it were not inplace and | ||
just update the pointer to the resulting Modin dataframe. | ||
|
||
API vs implementation | ||
--------------------- | ||
|
||
It is well known that the pandas API contains many duplicate ways of performing the same | ||
operation. Modin instead enforces that any one behavior have one and only one | ||
implementation internally. This guarantee enables Modin to focus on and optimize a | ||
smaller code footprint while still guaranteeing that it covers the entire pandas API. | ||
Modin has an internal algebra, which is roughly 15 operators, narrowed down from the | ||
original >200 that exist in pandas. The algebra is grounded in both practical and | ||
theoretical work. Learn more in our `VLDB 2020 paper`_. More information about this | ||
algebra can be found in the :doc:`../developer/architecture` documentation. | ||
|
||
.. _VLDB 2020 paper: https://arxiv.org/abs/2001.00888 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.