Skip to content

Modin 0.13.0

Compare
Choose a tag to compare
@RehanSD RehanSD released this 27 Jan 01:08
· 1184 commits to master since this release
0.13.0
8743203

This release contains significant upgrades to Modin's documentation,
support for pandas 1.4, new algebra and partitioning layer APIs, and some bugfixes.

Key Features and Updates

  • Stability and bugfixes
    • Support for subscripting Resampler (1a1edfd)
    • Fix groupby with column name for by (a04d7b7)
    • Workaround for groupby with sort=False with categorical keys (c67a7c5)
    • Align default value of REDIS_PASSWORD with Ray's DEFAULT_REDIS_PASSWORD (f79cb85)
    • Fix groupby dictionary aggregation when by and columns to aggregate overlap (d42c070)
    • Fix read_csv when callables are provided for skip_rows parameter (7c84758)
    • Ensure address is not passed to ray.init when running Ray in local mode (02a23d4)
    • Ensure that groupby.indices returns positional indices (e9c06f2)
    • Fix setting of categorical values (0e36e22)
    • Ensure df.__getitem__ respects step attribute of slice (7e85c5d)
    • Ensure data argument is delievered to the Dataframe in experimental cloud mode (2f7da1f)
    • Fix assigning to a Series with a single item (0d9d14e)
    • Fix the default to pandas in pd.DataFrame.sparse.from_spmatrix (ab2855b)
    • Fix apply result type inference (ac17ca1)
    • Exclude "scripts" from setup package (6224aba)
    • Fix assigning a Categorical to a column (cb4e727)
    • Ensure df.to_csv propagates metadata (e.g. index) (154697b)
    • Update pyarrow requirement in environment files (b55b08d)
  • Performance enhancements
    • Optimize __getitem__ flow for .loc/.iloc (0947ee8)
    • Delay instantiation of lazy dtypes on transpose (cd8db0c)
  • Benchmarking enhancements
    • Update benchmarks for groupby that are more representative (0582aa2)
  • Refactor Codebase
    • Update CODEOWNERS to reflect repository after refactor (cde6390)
    • Remove duplicate import of FactoryDispatcher in Modin experimental pandas IO (2cfabaf)
    • Update Modin to incorporate dataframe algebra (58bbcc3)
  • Pandas API implementations and improvements
    • Add support for storage_options argument to read_csv_glob (7c33afe)
    • Add support for dropna argument for groupby.indices and groupby.groups (144a613)
    • Ensure relabeling Modin Frame does not lose partition shape (3c740db)
    • Update Series.values to default to to_numpy() (67228ef)
    • Add support for modin.pandas.show_versions and python -m modin --versions (efe717f)
    • Upgrade pandas support to 1.4 (39fbc57)
  • OmniSci enhancements
    • Update benchmarks for groupby that are more representative (9396f23)
    • Update documentation on Native + OmniSci (edc1608)
    • Add support for getArrowTable() (6882ec2)
    • Fix segfault during init when only OmniSci is present (8c8a6a3)
    • Optimize append with default arguments (67013f9)
    • Fix OmniSci engine enabling for IO functions (9d1a334)
  • XGBoost enhancements
  • Developer API enhancements
    • Add parameter for minimum partition size (1be66d1)
    • Improve documentation for read_csv_glob and ensure warning raised if wildcard not in filepath_or_buffer (be10ba9)
    • Expand virtual partitioning utility (8d1004f)
  • Update testing suite
  • Documentation improvements
    • Improve documentation on pandas on Ray execution (b76dc57)
    • Reformat documentation to match pandas documentation theme (cc96f5d)
    • Improve documentation on pandas on Python execution (d590de0)
    • Improve System view in architecture documentation (6d51921)
    • Improve documentation on using pandas on Dask (003f338)
    • Improve documentation on pandas on Dask execution (61bf043)
    • Add documentation on using pandas on Python (195b668)
    • Improve Modin Out of Core documentation (cf426c4)
    • Improve documentation on OmniSci on native execution (689faee)
    • Improve documentation on IO (ffa67c7)
    • Add documentation on factories and parsers (6ca66db)
    • Improve documentation for experimental pandas on Ray execution (20abddd)
    • Improve documentation for modin.core.dataframe.base and modin.core.dataframe.pandas (cf1e541)
    • Update troubleshooting documentation and add FAQs (cc95ae2)
    • Improve README introduction and installation sections (a632d1f)
    • Update copyright year (7da1dc8)
    • Update a link to pandas.read_json (0315823)
    • Improve documentation for Modin vs. Dask (34732cb)
    • Fix links to the contributing page (81a06d6)
    • Remove broken links from supported apis (c04502d)
    • Change docs copyright statement to 'Modin Developers' (ed2a7a4)
    • Rename Developer page to Development in docs (406af7c)
    • Improve "Getting Started" section (4a62bba)
    • Update Modin tutorials (76707bf)
    • Add back quickstart notebook (4dd97ab)
    • Fix links in README and update README and FAQs (5d84042)
    • Update Modin module layout in architecture docs (7fcafa7)
    • Update documentation with new algebra operators and ModinDataframe (4b70725)
    • Add usage guide to documentation (4511566)
    • Build docs with Python 3.8 (01c1876)
  • Dependencies
    • Update PyArrow to 6.0 and OmniSci to 5.10.1 (018515f)

Contributors

@anmyachev, @prutskov, @Rubtsowa, @vnlitvinov, @dchigarev, @YarShev, @amyskov,
@mvashishtha, @dorisjlee, @devin-petersohn, @jeffreykennethli, @RehanSD,
@novichkovg, @Lozovskii-Aleksandr, @naren-ponder, @ahallermed, @fexolm,
@adityagp, @susmitpy, @ienkovich