Skip to content

Releases: modin-project/modin

Modin 0.15.1

16 Jun 13:45
0.15.1
efdc97c
Compare
Choose a tag to compare

This release pins Ray < 1.13.0 to avoid deserialization race condition.

Key Features and Updates

  • Stability and Bugfixes
    • FIX-#4566: Pin Ray < 1.13.0 to avoid deserialization race condition. (#4567)

Contributors

@mvashishtha

Modin 0.15.0

08 Jun 16:38
0.15.0
efdfbac
Compare
Choose a tag to compare

This release includes updated support for pandas 1.4.2, new Batch and Logging APIs, and a plethora
of bug fixes and documentation improvements.

Key Features and Updates

  • Stability and Bugfixes
    • FIX-#4376: Upgrade pandas to 1.4.2 (#4377)
    • FIX-#3615: Relax some deps in development env (#4365)
    • FIX-#4370: Fix broken docstring links (#4375)
    • FIX-#4392: Align Modin XGBoost with xgb>=1.6 (#4393)
    • FIX-#4385: Get rid of use-deprecated option in pip (#4386)
    • FIX-#3527: Fix parquet partitioning issue causing negative row length partitions (#4368)
    • FIX-#4330: Override the memory limit to start ray 1.11.0 on Macs (#4335)
    • FIX-#4407: Align insert function with pandas in case of numpy array with several columns (#4408)
    • FIX-#4373: Fix invalid file path when trying read_csv_glob with usecols parameter (#4405)
    • FIX-#4394: Fix issue with multiindex metadata desync (#4395)
    • FIX-#4438: Fix reindex function that doesn't preserve initial index metadata (#4442)
    • FIX-#4425: Add parameters to groupby pct_change (#4429)
    • FIX-#4457: Fix loc in case when need reindex item (#4457)
    • FIX-#4414: Add missing f prefix on f-strings found at https://codereview.doctor/ (#4415)
    • FIX-#4461: Fix S3 CSV data path (#4462)
    • FIX-#4467: drop_duplicates no longer removes items based on index values (#4468)
    • FIX-#4449: Drain the call queue before waiting on result in benchmark mode (#4472)
    • FIX-#4518: Fix Modin Logging to report specific Modin warnings/errors (#4519)
    • FIX-#4481: Allow clipping with a Modin Series of bounds (#4486)
    • FIX-#4504: Support na_action in applymap (#4505)
    • FIX-#4503: Stop the memory logging thread after session exit (#4515)
    • FIX-#4531: Fix a makedirs race condition in to_parquet (#4533)
    • FIX-#4464: Refactor Ray utils and quick fix groupby.count failing on virtual partitions (#4490)
    • FIX-#4436: Fix to_pydatetime dtype for timezone None (#4437)
    • FIX-#4541: Fix merge_asof with non-unique right index (#4542)
  • Performance enhancements
    • FEAT-#4320: Add connectorx as an alternative engine for read_sql (#4346)
    • PERF-#4493: Use partition size caches more in Modin dataframe (#4495)
  • Benchmarking enhancements
    • FEAT-#4371: Add logging to Modin (#4372)
    • FEAT-#4501: Add RSS Memory Profiling to Modin Logging (#4502)
    • FEAT-#4524: Split Modin API and Memory log files (#4526)
  • Refactor Codebase
    • REFACTOR-#4284: use variable length unpacking when getting results from deploy function (#4285)
    • REFACTOR-#3642: Move PyArrow storage format usage from main feature to experimental ones (#4374)
    • REFACTOR-#4003: Delete the deprecated cloud mortgage example (#4406)
    • REFACTOR-#4513: Fix spelling mistakes in docs and docstrings (#4514)
    • REFACTOR-#4510: Align experimental and regular IO modules initializations (#4511)
  • Developer API enhancements
    • FEAT-#4359: Add dataframe method to the protocol dataframe (#4360)
  • Update testing suite
    • TEST-#4363: Use Ray from pypi in CI (#4364)
    • FIX-#4422: get rid of case sensitivity for warns_that_defaulting_to_pandas (#4423)
    • TEST-#4426: Stop passing is_default kwarg to Modin and pandas (#4428)
    • FIX-#4439: Fix flake8 CI fail (#4440)
    • FIX-#4409: Fix eval_insert utility that doesn't actually check results of insert function (#4410)
    • TEST-#4482: Fix getitem and loc with series of bools (#4483).
  • Documentation improvements
  • Dependencies
    • FIX-#4327: Update min pin for xgboost version (#4328)
    • FIX-#4383: Remove pathlib from deps (#4384)
    • FIX-#4390: Add redis to Modin dependencies (#4396)
    • FIX-#3689: Add black and flake8 into development environment files (#4480)
    • TEST-#4516: Add numpydoc to developer requirements (#4517)
  • New Features
    • FEAT-#4412: Add Batch Pipeline API to Modin (#4452)

Contributors

@YarShev
@Garra1980
@prutskov
@alexander3774
@amyskov
@wangxiaoying
@jeffreykennethli
@mvashishtha
@anmyachev
@dchigarev
@devin-petersohn
@jrsacher
@orcahmlee
@naren-ponder
@RehanSD

Modin 0.14.1

04 May 15:39
0.14.1
d7eb019
Compare
Choose a tag to compare

This release contains a few key bugfixes and pandas version update.

Key Features and Updates

  • FIX-#4376: Upgrade pandas to 1.4.2 (#4377)
  • FIX-#4390: Add redis to Modin dependencies (#4396)
  • FIX-#3527: Fix parquet partitioning issue causing negative row length partitions (#4368)
  • FIX-#4330: Override the memory limit to start ray 1.11.0 on Macs. (#4335)
  • FIX-#4394: Fix issue with multiindex metadata desync (#4395)
  • FIX-#4373: fix usage of 'read_csv_glob' with 'usecols' parameter (#4405)
  • FIX-#4425: Add parameters to groupby pct_change. (#4429)

Contributors

@Garra1980, @devin-petersohn, @dchigarev, @jeffreykennethli, @mvashishtha, @YarShev, @anmyachev

Modin 0.14.0

29 Mar 16:54
c5f623f
Compare
Choose a tag to compare

This release contains significant upgrades to Developer API, as well as to Modin's documentation,
some refactor codebase and performance enhancements, and multiple bugfixes.

Key Features and Updates

  • Stability and Bugfixes
    • FIX-#4058: Allow pickling empty dataframes and series (#4095)
    • FIX-#4136: Fix exercise_3.ipynb example notebook (#4137)
    • FIX-#4105: Fix names of pandas options to avoid OptionError (#4109)
    • FIX-#3417: Fix read_csv with skiprows and header parameters (#3419)
    • FIX-#4142: Fix OmniSci enabling (#4146)
    • FIX-#4162: Use skipif instead of skip for compatibility with pytest 7.0 (#4163)
    • FIX-#4158: Do not print OmniSci logs to stdout by default (#4159)
    • FIX-#4177: Support read_feather from pathlike objects (#4177)
    • FIX-#4234: Upgrade pandas to 1.4.1 (#4235)
    • FIX-#3368: support unsigned integers in OmniSci backend (#4256)
    • FIX-#4057: Allow reading an empty parquet file (#4075)
    • FIX-#3884: Fix read_excel() dropping empty rows (#4161)
    • FIX-#4257: Fix Categorical() for scalar categories (#4258)
    • FIX-#4300: Fix Modin Categorical column dtype categories (#4276)
    • FIX-#4208: Fix lazy metadata update for PandasDataFrame.from_labels (#4209)
    • FIX-#3981, FIX-#3801, FIX-#4149: Stop broadcasting scalars to set items (#4160)
    • FIX-#4185: Fix rolling across column partitions (#4262)
    • FIX-#4303: Fix the syntax error in reading from postgres (#4304)
    • FIX-#4308: Add proper error handling in df.set_index (#4309)
    • FIX-#4056: Allow an empty parse_date list in read_csv_glob (#4074)
    • FIX-#4312: Fix constructing categorical frame with duplicate column names (#4313).
    • FIX-#4314: Allow passing a series of dtypes to astype (#4318)
    • FIX-#4310: Handle lists of lists of ints in read_csv_glob (#4319)
    • FIX-#4138, FIX-#4009: remove redundant sorting in the internal
  • Performance enhancements
    • FIX-#4138, FIX-#4009: remove redundant sorting in the internal '.mask()' flow (#4140)
    • FIX-#4183: Stop shallow copies from creating global shared state. (#4184)
  • Benchmarking enhancements
    • FIX-#4221: add wait method for PandasOnRayDataframeColumnPartition class (#4231)
  • Refactor Codebase
    • REFACTOR-#3990: remove code duplication in PandasDataframePartition hierarchy (#3991)
    • REFACTOR-#4229: remove unused dask_client global variable in modin\pandas\__init__.py (#4230)
    • REFACTOR-#3997: remove code duplication for broadcast_apply method (#3996)
    • REFACTOR-#3994: remove code duplication for get_indices function (#3995)
    • REFACTOR-#4331: remove code duplication for to_pandas, to_numpy functions in QueryCompiler hierarchy (#4332)
    • REFACTOR-#4213: Refactor modin/examples/tutorial/ directory (#4214)
    • REFACTOR-#4206: add assert check into __init__ method of PandasOnDaskDataframePartition class (#4207)
    • REFACTOR-#3900: add flake8-no-implicit-concat plugin and refactor flake8 error codes (#3901)
    • REFACTOR-#4093: Refactor base to be smaller (#4220)
    • REFACTOR-#4047: Rename cluster directory to cloud in examples (#4212)
    • REFACTOR-#3853: interacting with Dask interface through DaskWrapper class (#3854)
    • REFACTOR-#4322: Move is_reduce_fn outside of groupby_agg (#4323)
  • Pandas API implementations and improvements
    • FEAT-#3603: add experimental read_custom_text function that can read custom line-by-line text files (#3441)
    • FEAT-#979: Enable reading from SQL server (#4279)
  • Developer API enhancements
    • FEAT-#4245: Define base interface for dataframe exchange protocol (#4246)
    • FEAT-#4244: Implement dataframe exchange protocol for OmnisciOnNative execution (#4269)
    • FEAT-#4144: Implement dataframe exchange protocol for pandas storage format (#4150)
    • FEAT-#4342: Support `from_dataframe`` for pandas storage format (#4343)
  • Update testing suite
    • TEST-#3628: Report coverage data for test-internals CI job (#4198)
    • TEST-#3938: Test tutorial notebooks in CI (#4145)
    • TEST-#4153: Fix condition of running lint-commit and set of CI triggers (#4156)
    • TEST-#4201: Add read_parquet, explode, tail, and various arithmetic functions to asv_bench (#4203)
  • Documentation improvements
    • DOCS-#4077: Add release notes template to docs folder (#4078)
    • DOCS-#4082: Add pdf/epub/htmlzip formats for doc builds (#4083)
    • DOCS-#4168: Fix rendering the examples on troubleshooting page (#4169)
    • DOCS-#4151: Add info in troubleshooting page related to Dask engine usage (#4152)
    • DOCS-#4172: Refresh Intel Distribution of Modin paragraph (#4175)
    • DOCS-#4173: Mention strict channel priority in conda install section (#4178)
    • DOCS-#4176: Update OmniSci usage section (#4192)
    • DOCS-#4027: Add GIF images and chart to Modin README demonstrating speedups (https://github.com/modin-project/m...
Read more

Modin 0.13.3

18 Mar 08:08
0.13.3
bac4031
Compare
Choose a tag to compare

This release contains a few key bugfixes and pandas version update.

Key Features and Updates

  • Stability and Bugfixes
    • Stop shallow dataframe copies from creating global shared state (#4184)
    • Make PandasOnRayDataframeColumnPartition conformant to partition interface (#4231)
    • Fix lazy metadata update for PandasDataFrame.from_labels (#4209)
    • Fix Categorical() for scalar categories (#4258)
    • Fix some cases when assigning a scalar to a subset of dataframe or series. (#4160)
    • Align read_excel() behaviour on empty rows with pandas 1.3+ (#4161)
    • Allow reading an empty parquet file. (#4075)
    • Pin Dask<2022.2.0 as a temporary fix. (#4218)
    • Add proper error handling in df.set_index. (#4309)
  • Documentation improvements
    • Clarify OmniSci activation in its usage section. (#4192)
  • Upgrade pandas to 1.4.1 (#4235)

Contributors

@mvashishtha @anmyachev @prutskov @devin-petersohn @naren-ponder @YarShev @Garra1980

Modin 0.13.2

10 Feb 18:43
0.13.2
ea6951c
Compare
Choose a tag to compare

This release contains documentation polishing and small user experience
improvements.

Key Features and Updates

  • Mention strict channel priority in conda install section (#4178)
  • Refresh Intel Distribution of Modin paragraph (#4175)
  • Add info in troubleshooting page related to Dask engine usage (#4152)
  • Do not print OmniSci logs to stdout by default (#4159)
  • Fix rendering the examples on troubleshooting page (#4169)
  • Use skipif instead of skip for compatibility with pytest 7.0 (#4163)

Contributors

@RehanSD, @YarShev, @dchigarev, @prutskov, @Garra1980

Modin 0.13.1

04 Feb 18:07
0.13.1
f2aa03f
Compare
Choose a tag to compare

This release contains a few key bugfixes and updates to the documentation.

Key Features and Updates

  • Stability and Bugfixes
    • FIX-#4058: Allow pickling empty dataframes and series (#4095)
    • FIX-#4105: Fix names of pandas options to avoid OptionError (#4109)
    • FIX-#4142: Fix OmniSci enabling (#4146)
  • Documentation improvements
    • DOCS-#4082: Add pdf/epub/htmlzip formats for doc builds (#4083)
    • DOCS-#4079: Fix link to PandasDataframe in docs (#4108)

Contributors

@prutskov, @paulovn, @YarShev, @RehanSD, @devin-petersohn,
@mvashishtha

Modin 0.13.0

27 Jan 01:08
0.13.0
8743203
Compare
Choose a tag to compare

This release contains significant upgrades to Modin's documentation,
support for pandas 1.4, new algebra and partitioning layer APIs, and some bugfixes.

Key Features and Updates

  • Stability and bugfixes
    • Support for subscripting Resampler (1a1edfd)
    • Fix groupby with column name for by (a04d7b7)
    • Workaround for groupby with sort=False with categorical keys (c67a7c5)
    • Align default value of REDIS_PASSWORD with Ray's DEFAULT_REDIS_PASSWORD (f79cb85)
    • Fix groupby dictionary aggregation when by and columns to aggregate overlap (d42c070)
    • Fix read_csv when callables are provided for skip_rows parameter (7c84758)
    • Ensure address is not passed to ray.init when running Ray in local mode (02a23d4)
    • Ensure that groupby.indices returns positional indices (e9c06f2)
    • Fix setting of categorical values (0e36e22)
    • Ensure df.__getitem__ respects step attribute of slice (7e85c5d)
    • Ensure data argument is delievered to the Dataframe in experimental cloud mode (2f7da1f)
    • Fix assigning to a Series with a single item (0d9d14e)
    • Fix the default to pandas in pd.DataFrame.sparse.from_spmatrix (ab2855b)
    • Fix apply result type inference (ac17ca1)
    • Exclude "scripts" from setup package (6224aba)
    • Fix assigning a Categorical to a column (cb4e727)
    • Ensure df.to_csv propagates metadata (e.g. index) (154697b)
    • Update pyarrow requirement in environment files (b55b08d)
  • Performance enhancements
    • Optimize __getitem__ flow for .loc/.iloc (0947ee8)
    • Delay instantiation of lazy dtypes on transpose (cd8db0c)
  • Benchmarking enhancements
    • Update benchmarks for groupby that are more representative (0582aa2)
  • Refactor Codebase
    • Update CODEOWNERS to reflect repository after refactor (cde6390)
    • Remove duplicate import of FactoryDispatcher in Modin experimental pandas IO (2cfabaf)
    • Update Modin to incorporate dataframe algebra (58bbcc3)
  • Pandas API implementations and improvements
    • Add support for storage_options argument to read_csv_glob (7c33afe)
    • Add support for dropna argument for groupby.indices and groupby.groups (144a613)
    • Ensure relabeling Modin Frame does not lose partition shape (3c740db)
    • Update Series.values to default to to_numpy() (67228ef)
    • Add support for modin.pandas.show_versions and python -m modin --versions (efe717f)
    • Upgrade pandas support to 1.4 (39fbc57)
  • OmniSci enhancements
    • Update benchmarks for groupby that are more representative (9396f23)
    • Update documentation on Native + OmniSci (edc1608)
    • Add support for getArrowTable() (6882ec2)
    • Fix segfault during init when only OmniSci is present (8c8a6a3)
    • Optimize append with default arguments (67013f9)
    • Fix OmniSci engine enabling for IO functions (9d1a334)
  • XGBoost enhancements
  • Developer API enhancements
    • Add parameter for minimum partition size (1be66d1)
    • Improve documentation for read_csv_glob and ensure warning raised if wildcard not in filepath_or_buffer (be10ba9)
    • Expand virtual partitioning utility (8d1004f)
  • Update testing suite
  • Documentation improvements
    • Improve documentation on pandas on Ray execution (b76dc57)
    • Reformat documentation to match pandas documentation theme (cc96f5d)
    • Improve documentation on pandas on Python execution (d590de0)
    • Improve System view in architecture documentation (6d51921)
    • Improve documentation on using pandas on Dask (003f338)
    • Improve documentation on pandas on Dask execution (61bf043)
    • Add documentation on using pandas on Python (195b668)
    • Improve Modin Out of Core documentation (cf426c4)
    • Improve documentation on OmniSci on native execution (689faee)
    • Improve documentation on IO (ffa67c7)
    • Add documentation on factories and parsers (6ca66db)
    • Improve documentation for experimental pandas on Ray execution (20abddd)
    • Improve documentation for modin.core.dataframe.base and modin.core.dataframe.pandas (cf1e541)
    • Update troubleshooting documentation and add FAQs (cc95ae2)
    • Improve README introduction and installation sections (a632d1f)
    • Update copyright year (7da1dc8)
    • Update a link to pandas.read_json (0315823)
    • Improve documentation for Modin vs. Dask (34732cb)
    • Fix links to the contributing page (81a06d6)
    • Remove broken links from supported apis (c04502d)
    • Change docs copyright statement to 'Modin Developers' (ed2a7a4)
    • Rename Developer page to Development in docs (406af7c)
    • Improve "Getting Started" section (4a62bba)
    • Update Modin tutorials (76707bf)
    • Add back quickstart notebook (4dd97ab)
    • Fix links in README and update README and FAQs (5d84042)
    • Update Modin module layout in architecture docs (7fcafa7)
    • Update documentation with new algebra operators and ModinDataframe (4b70725)
    • Add usage guide to documentation (4511566)
    • Build docs with Python 3.8 (01c1876)
  • Dependencies
    • Update PyArrow to 6.0 and OmniSci to 5.10.1 (018515f)

Contributors

@anmyachev, @prutskov, @Rubtsowa, @vnlitvinov, @dchigarev, @YarShev, @amyskov,
@mvashishtha, @dorisjlee, @devin-petersohn, @jeffreykennethli, @RehanSD,
@novichkovg, @Lozovskii-Aleksandr, @naren-ponder, @ahallermed, @fexolm,
@adityagp, @susmitpy, @ienkovich

Modin 0.12.1

19 Dec 03:35
0.12.1
34962ec
Compare
Choose a tag to compare
This release contains an update to the pandas version and a few bugfixes.

Key Features and Updates
------------------------
* Update supported pandas version to 1.3.5 (b79989a)
* Improvements to groupby
  * Fix `groupby` for case `by` is `None` (40d45c8)
  * Fix handling of dictionary aggregation (29f927b)
  * Return positional indices for Groupby property (c66324d)
* Fix slicing dataframes with `step` property (5651844)
* Fix assignment of data to category column (23dd3f8)

Contributors this release
-------------------------

@Rubtsowa, @prutskov, @dchigarev, @amyskov, @vnlitvinov, @mvashishtha,
@YarShev, @devin-petersohn

Modin 0.12.0

24 Nov 01:48
0.12.0
054e7fb
Compare
Choose a tag to compare
This release contains a refactor to the codebase, encapsulating
significant amounts of improvements to the maintainability of the code,
and a plethora of bugfixes.

This release also introduces a slack community for Modin users to interact
with Modin developers. Please join us at our [Slack](https://modin.org/slack.html)
to continue the conversation!

Key Features and Updates
------------------------
* Stability and bugfixes
  * Support allowing callables and scalars together in .loc/.iloc (25ea7fd)
  * Ensure .loc with slice and scalar column returns Series (9492878)
  * Fix Modin OmniSci Docker example (b853c51)
  * Ensure Modin OmniSci + Modin Ray Docker containers install packages from conda-forge (032afd6)
  * Determine return type (Series or DataFrame) from one element Series (17ad1f0)
  * Update cloud examples (648b6a0)
  * Fix Modin OmniSci memory leak during `read_csv` (8581ba1)
  * Use `floor` for casting `float` to `int` for OmniSci 5.8.0 (c67a936)
  * Fix .loc on empty DataFrame (2260431)
  * Ensure Modin on Ray does not duplicate writes to disk on `to_csv` when workers die (6178a57)
  * Add support for `storage_options` argument in `read_*` functions except `read_excel` (77a00cc)
  * Ensure Modin Ray correctly raises exceptions when `to_parquet` or `to_csv` fail (8d67cd3)
  * Ensure Modin Ray does not hang when workers crash on `to_csv` (73bf061)
  * Remove platform specific code from `setup.py` to ensure distributions are pure Python (b186e40)
* Refactor Codebase
  * Update import of public index classes to import from `pandas.core.indexes.api ` module (488357a)
  * Replace `try...finally` with pytest fixtures (c349a94)
  * Restructure project files (b37bcf8)
  * Use `fsspec` to open files (b8a9c07)
  * Add LGTM Service to CI (b193fef)
  * Remove extraneous `*NUM_THREADS` environment variables from CI (b925625)
  * Update documentation + code + comment language to reflect new project structure (7a81588)
  * Update language to reflect new project structure and add implementation to BaseDataframeAxisPartition (7ab2d90)
  * De-dupe `read_fwf` and `read_csv` code (2f824f8)
  * Reformat entire codebase with `black` and `flake8` (75f698c)
* Pandas API implementations and improvements
  * Add support for `{true|false}_values` for `read_csv` for Modin OmniSci (9cd93f2)
  * Implement `explode` for Series and DataFrame (ddd4afe)
  * Support reading gzipped fwf (a80cb3b)
  * Add support for `to_parquet` Modin Ray (643596d)
  * Add support for creating an `sqlalchemy` connection with arbitrary arguments (ece98a6, 4a42e04)
  * Add support for `set_index` with different input types (cab37f2)
* XGBoost enhancements
  * Support new DMatrix parameters (4d7f6d4)
* Developer API enhancements
  * Throw custom errors when optional dependencies are missing (53bb047)
  * Improve Modin OmniSci quickstart (167957b)
* Update testing suite
* Documentation improvements
* Dependencies
  * Add fsspec (dependency for IO) to dependencies (44e3f10)
  * Make `botocore` import optional (adc15c6)
  * Pin minimum `s3fs` dependency to fix `aibotocore` issue (8acad95)
  * Update PyArrow to 5.0 and OmniSci to 5.8 (4121358)

Contributors
------------
@ienkovich, @vnlitvinov, @mvashishtha, @devin-petersohn, @dchigarev, @prutskov, @amyskov,
@gshimansky, @anmyachev, @YarShev, @Garra1980, @Rubtsowa, @jeffreykennethli, @RehanSD,
@dorisjlee, @naren-ponder