Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move defaulting to pandas at groupby aggregation from API level to backend #2269

Closed
pastranaluis opened this issue Oct 18, 2020 · 2 comments · Fixed by #2332
Closed

Move defaulting to pandas at groupby aggregation from API level to backend #2269

pastranaluis opened this issue Oct 18, 2020 · 2 comments · Fixed by #2332
Assignees
Labels
bug 🦗 Something isn't working P1 Important tasks that we should complete soon
Milestone

Comments

@pastranaluis
Copy link

Hi! I have been working on some other backend for modin, and I have seen that modin defaults multiple column groupby to pandas. I have two questions/concerns:

  1. Why if I pass a one element list ['column_title'], modin classifies this as a _is_multi_by? In particular, I am wondering what is the logic behind these lines of code in groupby.py
self._is_multi_by = (
                isinstance(by, type(self._query_compiler)) and len(by.columns) > 1
            ) or (
                not isinstance(by, type(self._query_compiler))
                and axis == 0
                and all(obj in self._query_compiler.columns for obj in self._by)
            )

the first condition makes sense to me, but the second one is throwing me off.

  1. Why is the reason behind defaulting multi_col index to pandas? I would feel that the engine should take care of this.

Thanks!
LP

@pastranaluis pastranaluis added the question ❓ Questions about Modin label Oct 18, 2020
@dchigarev
Copy link
Collaborator

dchigarev commented Oct 20, 2020

Hi @pastranakike, thanks for your question! Currently, Modin Groupby defaulting to pandas if we're applying aggregation function on multiple columns. Defaulting to pandas at the API level in this case is a kind of deprecated behavior, and definitely should be changed, I have classified this issue as a bug and I think that we'll fix that in this release.

About lists of one element as a by. I've found this if statement, that should filter these cases, do you have a reproducer where it doesn't work and groupby considered to be multi-column if by is a list of single element?

if (
not isinstance(by, (pandas.Series, Series))
and is_list_like(by)
and len(by) == 1
):
by = by[0]

@dchigarev dchigarev added bug 🦗 Something isn't working P1 Important tasks that we should complete soon and removed question ❓ Questions about Modin labels Oct 20, 2020
@dchigarev dchigarev added this to the 0.8.2 milestone Oct 20, 2020
@dchigarev dchigarev changed the title groupby with multiple index Move defaulting to pandas at groupby aggregation from API level to backend Oct 20, 2020
@dchigarev dchigarev self-assigned this Oct 20, 2020
@Garra1980 Garra1980 assigned YarShev and unassigned dchigarev Oct 27, 2020
@YarShev
Copy link
Collaborator

YarShev commented Oct 28, 2020

Hi @pastranakike , Answering to the first one question I don't see a case when by is one element list and Modin treats it as is_multi_by=True. If we look at simple example from pandas documentation, we can see that is_multi_by is treated as False by Modin.

import modin.pandas as pd
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.]})
df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
df.groupby(['Animal']).mean() # is_multi_by=False while performing the operation
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

If you have a reproducer where is_multi_by is treated as True for one element list by Modin, feel free to open new issue.
Regarding to the second one question, we are not currently processing the situation is_multi_col=True, that's why we are defaulting to pandas. For now we could move the logic regarding is_multi_col=True from API layer to the query compiler layer.

YarShev added a commit to YarShev/modin that referenced this issue Oct 30, 2020
… to backend

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>
YarShev pushed a commit to YarShev/modin that referenced this issue Oct 30, 2020
Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>
YarShev pushed a commit to YarShev/modin that referenced this issue Oct 30, 2020
Moved wrap_udf_function into backend because omnisci doesn't support
executing lambdas.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>
YarShev added a commit to YarShev/modin that referenced this issue Oct 30, 2020
…backend,

refactor default to pandas functions in BaseQC

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>
YarShev pushed a commit to YarShev/modin that referenced this issue Oct 30, 2020
into private function of Pandas backend because it is not used anywhere
else.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>
YarShev pushed a commit to YarShev/modin that referenced this issue Oct 30, 2020
now it is possible to specify --backend=PandasOnDask,
--backend=PandasOnRay or --backend=PandasOnPython, not just
--backend=BaseOnPython.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>
YarShev added a commit to YarShev/modin that referenced this issue Oct 30, 2020
Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>
YarShev added a commit to YarShev/modin that referenced this issue Oct 30, 2020
Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>
dchigarev added a commit to dchigarev/modin that referenced this issue Oct 30, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
dchigarev added a commit to YarShev/modin that referenced this issue Oct 30, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
gshimansky added a commit that referenced this issue Oct 30, 2020
…2332)

* FIX-#2269: Move `default_to_pandas` logic from API layer to backend

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-#2269: Added a test which calls _apply_agg_function

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-#2269: Added required arguments for groupby_agg

Moved wrap_udf_function into backend because omnisci doesn't support
executing lambdas.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-#2269: Use correct default_to_pandas for groupby in backend,
refactor default to pandas functions in BaseQC

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-#2269: Renamed new default_to_pandas_groupby function

into private function of Pandas backend because it is not used anywhere
else.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-#2269: Fixed specification of backend

now it is possible to specify --backend=PandasOnDask,
--backend=PandasOnRay or --backend=PandasOnPython, not just
--backend=BaseOnPython.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-#2269: Fix BaseOnPython tests

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-#2269: Remove default_to_pandas_groupby

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-#2269: logic of dropping 'by' moved back to API level

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

Co-authored-by: Gregory Shimansky <gregory.shimansky@intel.com>
Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
aregm added a commit to aregm/modin that referenced this issue Feb 18, 2021
* FIX-modin-project#2195: fix describe error for datasets with datetimes (modin-project#2272)

* FIX-modin-project#2195: fix describe error for datasets with datetimes

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2195: add test

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2195: enable fix

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2195: Update modin/pandas/test/dataframe/test_reduction.py

Co-authored-by: Dmitry Chigarev <62142979+dchigarev@users.noreply.github.com>

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#1906: fixed incorrect behaviour of 'groupby.__getattr' (modin-project#2276)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2277: applied Title Case to the names of DATASET_SIZE_DICT keys (modin-project#2278)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2280: use 32 bytes in secrets.token_hex (modin-project#2286)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2260: use recommended pandas testing api (modin-project#2273)

* TEST-modin-project#2260: use recommended pandas testing api

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2260: replace getSeriesData with test_data

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2260: remove assert_categories_equal

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2254: handling dict functions at groupby.agg improved (modin-project#2267)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FEAT-modin-project#2282: support DataFrame.[count|max|min|sum] for OmniSci backend (modin-project#2283)

Signed-off-by: ienkovich <ilya.enkovich@intel.com>

* FIX-modin-project#1976: indices matching at reduction functions fixed (modin-project#2270)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FEAT-modin-project#2299: support value_counts in OmniSci backend. (modin-project#2300)

Signed-off-by: ienkovich <ilya.enkovich@intel.com>

* FIX-modin-project#1765: Fix support of s3 in `read_parquet` (modin-project#2287)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* FIX-modin-project#2285: Default to pandas warning message improved (modin-project#2302)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FEAT-modin-project#2303: fix OmniSci aggregates and add mean (modin-project#2304)

Signed-off-by: ienkovich <ilya.enkovich@intel.com>

* FIX-modin-project#2258: return 'Commit Message formatting' topic (modin-project#2306)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2133 modin-project#2265: Fix binary operations for modin frames in case when partitioning isn't aligned (modin-project#2256)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* FIX-modin-project#2239: Compute row index start using pandas (modin-project#2240)

* FIX-modin-project#2239: Compute row index start using pandas

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#2239: Documentation

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#2239: Improve testing for case

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#2253: loc assignment fixed in case of (1, 1) shape frame (modin-project#2316)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2311: fixed performance bottleneck at reduction operations (modin-project#2314)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* TEST-modin-project#2288: Cover by tests delimiters parameters of read_csv (modin-project#2310)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* FIX-modin-project#2234: update dask_deps in setup.py (modin-project#2325)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2326: move s3fs import in _read function (modin-project#2327)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2329: TypeError while creating cluster  (modin-project#2330)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-#0000: Indexing regression (modin-project#2333)

* FIX-#0000: Indexing regression

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-#0000: Fix `loc`

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-#0000: Fix DatetimeIndex

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-#0000: Fix Datetime and checks

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* DOCS-modin-project#2334: Add tutorials to main repo (modin-project#2335)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* DOCS-modin-project#2193: Add contributing doc in checklist (modin-project#2216)

* DOCS-modin-project#2193: update contributing doc

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* REFACTOR-modin-project#2343: refactor offset, _read_rows, partitioned_file (modin-project#2344)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#1927: Fix performance issue related to `sparse` attribute access (modin-project#2318)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2269: Move `default_to_pandas` logic from API layer to backend (modin-project#2332)

* FIX-modin-project#2269: Move `default_to_pandas` logic from API layer to backend

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2269: Added a test which calls _apply_agg_function

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2269: Added required arguments for groupby_agg

Moved wrap_udf_function into backend because omnisci doesn't support
executing lambdas.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2269: Use correct default_to_pandas for groupby in backend,
refactor default to pandas functions in BaseQC

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2269: Renamed new default_to_pandas_groupby function

into private function of Pandas backend because it is not used anywhere
else.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2269: Fixed specification of backend

now it is possible to specify --backend=PandasOnDask,
--backend=PandasOnRay or --backend=PandasOnPython, not just
--backend=BaseOnPython.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2269: Fix BaseOnPython tests

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2269: Remove default_to_pandas_groupby

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2269: logic of dropping 'by' moved back to API level

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

Co-authored-by: Gregory Shimansky <gregory.shimansky@intel.com>
Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* TEST-modin-project#2292: Cover by tests Datetime Handling parameters of read_csv (modin-project#2336)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* FEAT-modin-project#2271: Add implementation of `groupby.shift` (modin-project#2323)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* FIX-modin-project#2348: Fix default to pandas warnings (modin-project#2349)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2357: Fix path to documentation for contributing (modin-project#2358)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2352: remove deprecated option: 'num-redis-shards' (modin-project#2353)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2339: Fix links to documentation (modin-project#2361)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2354: use conda activate instead of conda run (modin-project#2355)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2363: introduce getter and setter for index name (modin-project#2368)

Signed-off-by: ienkovich <ilya.enkovich@intel.com>

* FEAT-modin-project#1844: upgrade pyarrow to 1.0 (modin-project#2347)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2365: Fix `Series.value_counts` when `dropna=False` (modin-project#2366)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2369: Update pandas version to 1.1.4 (modin-project#2371)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2322: add aligning partition' blocks (modin-project#2367)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* Bump version to 0.8.2 (modin-project#2383)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#2386: add new location for import ray functions (modin-project#2387)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2388: Fixed requirements for omnisci binaries (modin-project#2389)

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2380: don't ignore lengths parameter for dask engine (modin-project#2381)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2390: Fix inserting Series into DataFrame (modin-project#2391)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-2200: Enable Calcite by default in OmniSci backend (modin-project#2385)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2289: Columns, Index Locations and Names parameters of read_csv (modin-project#2319)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* REFACTOR-modin-project#2397: remove redundant assigment (modin-project#2398)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2363: fix index name setter in OmniSci backend (modin-project#2379)

Signed-off-by: ienkovich <ilya.enkovich@intel.com>

* Merged groupby_agg and groupby_dict_agg to implement dictionary functions aggregations (modin-project#2317)

* FIX-modin-project#2254: Added dictionary functions to groupby aggregate tests

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2254: Initial implementation of dictionary functions aggregation

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2254: Remove lambda wrapper to allow dictionary to go to backend

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2254: Fixed AttributeError not being thrown from getattr

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2254: Lint fixes

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FEAT-modin-project#2363: fix index name setter in OmniSci backend

Signed-off-by: ienkovich <ilya.enkovich@intel.com>

* FIX-modin-project#2254: Removed obsolete groupby_dict_agg API function

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2254: Fixed dict aggregate for base backend

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2254: Address reformatting comments

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2254: Remove whitespace

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2254: Removed redundant argument conversion

because it is already done inside of base backend.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

Co-authored-by: ienkovich <ilya.enkovich@intel.com>

* FIX-modin-project#2406: filter dictionary aggregation keys to limit them to keys only present in current partition (modin-project#2407)

* FIX-modin-project#2406: Added test to detect this bug

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2406: Added filter for keys absent in current partition

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2406: Attemt to fix broken test on BaseOnPython backend

This test gets a corrupted dataframe with "col2" removed by previous
test cases.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* DOCS-modin-project#2413: Add examples page to documentation (modin-project#2414)

* Resolves modin-project#2413

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* DOCS-modin-project#2415: Add comparisons section to documentation with stubs (modin-project#2416)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* DOCS-modin-project#2417: add sklearn example (modin-project#2425)

Signed-off-by: reshamas <reshama.stat@gmail.com>

* DOCS-modin-project#2421: Fixes bad link on contributing from architecture.rst (modin-project#2427)

Signed-off-by: Victor Fomin <vfdev.5@gmail.com>

* DOCS-modin-project#2419: Updated CONTRIBUTING.rst (modin-project#2423)

Signed-off-by: Victor Fomin <vfdev.5@gmail.com>

* DOCS-modin-project#2426,DOCS-modin-project#2424: Fixed two issues (modin-project#2431)

- Closes modin-project#2424, CONTRIBUTING.rst does not render the commit message formatting example
- Closes modin-project#2426, Bad links in index.rst
- Renamed CONTRIBUTING.rst into contributing.rst

Signed-off-by: Victor Fomin <vfdev.5@gmail.com>

* DOCS-modin-project#2420: Changed documentation to numpydoc style (modin-project#2429)

Signed-off-by: Mohammed Kashif <md.kashif.py93@gmail.com>

Co-authored-by: Mohammed Kashif <md.kashif.py93@gmail.com>

* DOCS-modin-project#2433: Updated README.md with modin_vs_dask.md doc (modin-project#2435)

Signed-off-by: Abdulelah S. Al Mesfer <abdulelah.almesfer@gmail.com>

* FIX-modin-project#2450: fix CI recipe (modin-project#2449)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* DOCS-modin-project#2437: Add documentation contrasting Modin and Dask (modin-project#2441)

* Resolves modin-project#2437

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FEAT-modin-project#2444: add docker file for nyc on omnisci (modin-project#2445)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2458: fix 'psutil' install (modin-project#2452)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2456: update taxi queries with .copy usage (modin-project#2457)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2447: add docker file for census on omnisci (modin-project#2448)

Also add instructions for building docker images

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2470: revert b867edf (modin-project#2471)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* FIX-modin-project#2473: Some configuration values should not be transformed (modin-project#2476)

* FIX-modin-project#2473: Some configuration values should not be transformed

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#2473: Add tests for ExactStr

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#2402: Fix read_excel when files come from older windows (modin-project#2403)

* Resolves modin-project#2402
* Search for the content files instead of assuming location

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* REFACTOR-modin-project#2467: Convert internal base dataframe objects to ABC (modin-project#2468)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#2459: Updated TeamCity tests image to use Ray as base image (modin-project#2460)

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* TEST-modin-project#2488: Increase commitlint message length limit to 88 characters from 70 (modin-project#2489)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* DOCS-modin-project#2439: Add Documentation for Modin vs. pandas (modin-project#2487)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* TEST-modin-project#2290: Cover by tests General Parsing Configuration parameters of read_csv (modin-project#2331)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* FIX-modin-project#2453: Remove sorting indices for equal values in `Series.value_counts` (modin-project#2454)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* TEST-modin-project#2291: Cover by tests NA and Missing Data Handling parameters of read_csv (modin-project#2337)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* REFACTOR-modin-project#2496: Change internal reader names to dispatcher (modin-project#2497)

* Resolves modin-project#2496

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* TEST-modin-project#2294: add iteration parameters for read_csv tests (modin-project#2477)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* FIX-modin-project#2463: Added test with callable functions as aggregate argument (modin-project#2503)

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* TEST-modin-project#2296: Error Handling parameters of read_csv (modin-project#2501)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2295: Cover by tests Quoting, Compression, and File Format parameters of read_csv (modin-project#2495)

Co-authored-by: Anatoly Myachev <45976948+anmyachev@users.noreply.github.com>
Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* FEAT-modin-project#2479: integrate asv (modin-project#2484)

* FEAT-modin-project#2479: integrate asv

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2479: add merge pytest-benchmark in asv style

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2479: add CI job for check asv benchmarks

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2479: increase verbosity

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2479: use launch-method=spawn

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2479: add CpuCount usage to control number of partitions

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2479: change: TestDatasetSize -> MODIN_TEST_DATASET_SIZE

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2374: remove extra code; add pandas way to handle duplicate values in reindex func for binary operations (modin-project#2378)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2297: Cover by tests Internal parameters of read_csv (modin-project#2502)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* Ensure excel reader closes file if it is passed as path (modin-project#2514)

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FEAT-modin-project#2375: implementation of multi-column groupby aggregation (modin-project#2461)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2442: fixed Series assignment with different indices (modin-project#2443)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FEAT-modin-project#2013: merge_asof that is a little more efficient (modin-project#2510)

* FEAT-modin-project#2013: merge_asof that is a little more efficient.

Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* DOCS-modin-project#2436: Explicit local / single node backend (modin-project#2483)

Signed-off-by: raphaelauv <raphaelauv@users.noreply.github.com>

* Fix indices when reading Excel files in parallel (modin-project#2526)

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#2527: Use random name for hdf file test, clean file after testing (modin-project#2528)

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#2524: Update pandas version to 1.1.5 (modin-project#2525)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2408: Fix read_csv and read_table args when used inside a decora… (modin-project#2486)

Signed-off-by: Weiwen Gu <gwengww@gmail.com>

* FIX-modin-project#2169: avoid unnecessary index access in groupby (modin-project#2469)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2313: improved handling non-numeric types at 'mean' when 'axis=1' (modin-project#2535)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* TEST-modin-project#2509: Io tests refactoring (modin-project#2523)

* TEST-modin-project#2509: refactor read_csv tests

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: refactor tests with warnings

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: read_parquet tests refactoring

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: read_json tests refactoring

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: read_excel tests refactoring

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: read_hdf tests refactoring

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: add html and sql tests

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: fwf tests refactoring

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: further tests refactoring

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: mark xfailed tests and fix

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: fix

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: further refactoring

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

TEST-modin-project#2509: correct teardown stage

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2509: mark failed tests

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2509: fix

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2509: correct test_HDFStore test

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2509: use common teardown function

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2509: typo fix

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2509: fix

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2509: addressing review comments

Co-authored-by: Anatoly Myachev <45976948+anmyachev@users.noreply.github.com>
Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2509: addressing review comments

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

Co-authored-by: Anatoly Myachev <45976948+anmyachev@users.noreply.github.com>

* FIX-modin-project#2540: add __iter__ implementation (modin-project#2541)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2520: add most important operations for asv benchmarks (modin-project#2539)

* FEAT-modin-project#2520: add most important operations for asv benchmarks

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2520: add groupby microbenchmarks

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2520: address review comments

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2498: Fix possible number of partitions for Dask engine (modin-project#2532)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2550: remove decorators usage for asv tested functions (modin-project#2551)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2236: Handling of space limited Ray Plasma directories (modin-project#2547)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* DOCS-modin-project#2518: add asv usage topic (modin-project#2549)

* DOCS-modin-project#2518: add asv usage topic

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* DOCS-modin-project#2518: fix style

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* DOCS-modin-project#2518: address review comments

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2491: optimized groupby dictionary aggregation (modin-project#2534)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FEAT-modin-project#2553: add ability to run microbenchmarks for old Modin version (modin-project#2554)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* Fix .loc[] assignment for Modin Series (modin-project#2555)

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#2482: improved handling non-str 'by' (modin-project#2548)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* Fix taxi-runner.py cluster example (modin-project#2557)

* Added regression test
* Fix modin package installation

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* Fix loc/iloc assignments when columns are selected (modin-project#2536)

* FIX-modin-project#1620: Add test for reported issue

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#1620: Use pandas.reindex() properly

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#1620: Improve tests

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#1620: Convert lookups to values for both indices and columns

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#1620: Add test for .loc[] ordering

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#1620: XFail a test that unearths internal sorting

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#1620: Improve test robustness a bit per code review

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#2559: Ignore files from /proc/ when detecting file leaks (modin-project#2560)

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* Switch to Ray from conda-forge (modin-project#2562)

* FIX-modin-project#2561: Switch to Ray from conda-forge, abandon pip caching

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#2561: Remove pip caching from push CI actions

Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>

* FIX-modin-project#2566: Ensure `Series.unique` does not return a scalar when there is only one unique value (modin-project#2567)

* FIX-modin-project#2566: Ensure unique doesn't return a scalar using np.atleast_1d

Signed-off-by: Richard Lin <richard.lin.047@berkeley.edu>

* FIX-modin-project#2566: Check array shapes match for test_unique

Signed-off-by: Richard Lin <richard.lin.047@berkeley.edu>

* FIX-modin-project#2566: Reduce unique dimensions using constructor instead

Signed-off-by: Richard Lin <richard.lin.047@berkeley.edu>

* FIX-modin-project#2572: fixed arrow version in OmniSci dependencies (modin-project#2571)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* DOCS-modin-project#2578: fix simple typo, parition -> partition (modin-project#2573)

There is a small typo in modin/engines/dask/pandas_on_dask/frame/partition.py, modin/engines/ray/pandas_on_ray/frame/partition.py.

Should read `partition` rather than `parition`.

Signed-off-by: Tim Gates <tim.gates@iress.com>

* FIX-#0000: pin xlrd<=1.2.0 (modin-project#2594)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2543: fixed handling 'as_index' at groupby dictionary renaming aggregation (modin-project#2592)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* Release commit for version 0.8.3 (modin-project#2597)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* REFACTOR-modin-project#2580: Move automatic engine init to after data ingestion (modin-project#2581)

* REFACTOR-modin-project#2580: Move automatic engine init to after data ingestion

* Resovles modin-project#2580

Instead of automatically starting the engine when Modin is imported,
we start it after the first time the user reads or creates a dataframe.
This is intended to help downstream libraries not need the engine to
check for typing, as well as clear up some transient errors that can
occur with certain engines on large machines.

I have also added a warning message that informs the user how to clear
the message. We will likely need a way to suppress these errors, because
many users will not care about them and potentially want to suppress.
We will probably also want to add a benchmarking page on best practices
for benchmarking because this change can give the impression of a
performance degradation on data ingestion even though nothing is
changing from that perspective.

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* REFACTOR-modin-project#2580: Add to experimental API

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* REFACTOR-modin-project#2580: Add `read_feather` and `read_clipboard`

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* REFACTOR-modin-project#2580: Remove redundant error message

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* TEST-modin-project#2598: Add test for clean install from source (modin-project#2599)

* TEST-modin-project#2598: Add test for clean install from source

* Resolves modin-project#2598

This change adds a test for installing Modin without all of the testing
dependencies.

It is intended to test how a user who does not have all of the test
dependencies will see a Modin import.

* TEST-modin-project#2598: Target Python3

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#976: add encoding parameter to read_csv call (modin-project#2593)

* FIX-modin-project#976: add failed test

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#976: add encoding parameter to read_csv call

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#976: fix test in experimental mode

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2342: Add axis partitions API (modin-project#2515)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

Co-authored-by: Devin Petersohn <devin.petersohn@gmail.com>

* Fixed MultiIndex.from_frame implementation (modin-project#2587)

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2608: Disable proxy for commands running inside container (modin-project#2609)

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FIX-modin-project#2601: reduce data size for some asv tests (modin-project#2602)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2611: Fixed crash and sklearn version (modin-project#2612)

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FEAT-modin-project#2604: add docker file with plasticc benchmark on omnisci (modin-project#2605)

* FEAT-modin-project#2604: add docker file with plasticc benchmark on omnisci
* FEAT-modin-project#2604: change xgboost verbose_eval

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* DOCS-modin-project#2618: Add code of conduct (modin-project#2619)

* Resolves modin-project#2618

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FEAT-modin-project#2373: Add distributed xgboost on Modin with Ray (modin-project#2545)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

Co-authored-by: Devin Petersohn <devin.petersohn@gmail.com>

* FEAT-2624: Improve performance of read_* methods when file handles are passed in (modin-project#2625)

Signed-off-by: Zain Patel <zain.patel06@gmail.com>

* FIX-modin-project#2616: Add config for num partitions, deprecate DEFAULT_NPARTITIONS (modin-project#2622)

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FEAT-modin-project#2091: add distributed dataframe compare (modin-project#2579)

Signed-off-by: Khang Vu <khangvu200391845@gmail.com>

* DOCS-modin-project#2649: Fix github pr template's dead link. (modin-project#2650)

Signed-off-by: William Ma <williamwma5@gmail.com>

* FEAT-modin-project#2606: Support creating DataFrame from remote partitions (modin-project#2613)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* FIX-modin-project#2637: Fix deprecation warnings due to invalid escape sequences. (modin-project#2641)

Signed-off-by: Karthikeyan Singaravelan <tir.karthi@gmail.com>

* REFACTOR-modin-project#2648: Correct uses of MapReduceFunction and metadata manipu… (modin-project#2655)

* REFACTOR-modin-project#2648: Correct uses of MapReduceFunction and metadata manipulation

Resolves modin-project#2648

Removes some code that is problematic for performance. There was a mix
of use cases for modifying the external metadata and internal metadata,
and some problematic components of these APIs that could hide bugs. The
implementation has been updated to ensure that these bugs do not
resurface.

Previously, the internal and external indices were compared, and then
updated according to some arguments that were passed in. This is not
scalable because collecting the indices is expensive. The possible bugs
hidden in this implementation decision could end up being very difficult
to detect: it implicitly updates the internal or external indices based
on a somewhat cryptic string pattern combined with a boolean flag.
Another very large issue is that sometimes external indices are updated
based on the partition lengths metadata. This was likely done to solve a
use case of not using the APIs properly.

This implementation has been removed and replaced with something more
explicit. If the internal indices need to be updated, they are updated
explicitly via existing APIs. Likewise if external indices need to be
updated, they are updated with a different API.

Several QueryCompiler APIs had to be reverted because they were misusing
the ReductionFunction or MapReduceFunction, thus the need for the
implicit modification of metadata. When this implicit modification was
removed, these APIs no longer worked, and so were reverted until they
can be reimplemented using correct APIs. The following APIs were
reverted as a part of this commit:

* `is_monotonic_increasing`
* `is_monotonic_decreasing`
* `value_counts`
* `searchsorted`
* `dt_tz`
* `dt_freq`

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* REFACTOR-modin-project#2648: Remove debug code

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* REFACTOR-modin-project#2648: Fix explicit rename

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* DOCS-2653: Fix links in Modin's documentation (modin-project#2654)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* FEAT-modin-project#2663: Add algebraic operator `from_labels` (modin-project#2665)

Resolves modin-project#2663

This operator is necessary for efficient `reset_index` operations. See
this paper for more information on the operator:
http://www.vldb.org/pvldb/vol13/p2033-petersohn.pdf

Co-authored-by: William Ma <12377941+williamma12@users.noreply.github.com>

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#2672: pin numpy>=1.16.5,<1.20  (modin-project#2673)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FEAT-modin-project#2675: Added benchmark for sort_values (modin-project#2676)

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>

* FEAT-modin-project#2664: Add `to_labels` algebraic operator (modin-project#2666)

Resolves modin-project#2664

This add the algebraic operator for `to_labels`, which enables Modin to
better optimize the movement of data to metadata. See more in the paper
about the algebraic operator:
http://www.vldb.org/pvldb/vol13/p2033-petersohn.pdf

Co-authored-by: William Ma <williamwma5@gmail.com>

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#1806: Resolved error when reverting to Pandas for Multiindex (modin-project#2660)

Signed-off-by: Todd Yu <toddyu@berkeley.edu>

* FIX-modin-project#2614: Up python version for test jobs (modin-project#2615)

Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* DOCS-2633: Add documentation for distributed XGBoost on Modin (modin-project#2640)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* FIX-modin-project#2667: Change names of files for development env (modin-project#2668)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* FIX-modin-project#2658: Move backend check in xgb to train/predict (modin-project#2659)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* FEAT-modin-project#2451: Read multiple csv files simultaneously via glob paths (modin-project#2662)

Signed-off-by: William Ma <williamwma5@gmail.com>

* FIX-modin-project#2681: pin numpy<1.20.0 for docker containers with omnisci (modin-project#2682)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: some updates to improve asv tests stability (modin-project#2671)

* TEST-modin-project#2670: some updates to improve asv tests stability

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: fixes

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: data_size -> shape

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: use dict approach

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: use CpuCount when Npartitions isn't defined

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: fix ASV_DATASET_SIZE

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: update TimeSortValues

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: modify asv tests for using with old modin version

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: reply to review comments

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2670: use env variables for default values

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2686: add fillna benchmark (modin-project#2687)

* TEST-modin-project#2686: add fillna benchmark

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2686: reply to review comments

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2686: add inplace parameter

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2692: add drop benchmark (modin-project#2693)

* TEST-modin-project#2692: add drop benchmark

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2692: add one column case

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2688: Update ray.ObjectID to ray.ObjectRef for Ray 2.0 (modin-project#2695)

* FIX-modin-project#2688: Update ray.ObjectID to ray.ObjectRef for Ray 2.0

Resovles modin-project#2688

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* FIX-modin-project#2688: Address comments

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

* TEST-modin-project#2707: add lint check for ASV benchmarks (modin-project#2708)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* TEST-modin-project#2699: add append benchmark (modin-project#2700)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2684: Add method level docs for Modin XGBoost (modin-project#2685)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* TEST-modin-project#2694: add head benchmark (modin-project#2696)

* TEST-modin-project#2694: add head benchmark

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2694: add small number for head op

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2705: add 'value_counts' benchmarks (modin-project#2706)

* TEST-modin-project#2705: add 'value_counts' benchmarks

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* TEST-modin-project#2705: apply suggestions from review

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2709: fixed typo in '_copartition' (modin-project#2710)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2596: Update pandas version to 1.2.1 (modin-project#2600)

Co-authored-by: Alexey Prutskov <alexey.prutskov@intel.com>
Co-authored-by: Devin Petersohn <devin.petersohn@gmail.com>
Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Co-authored-by: Devin Petersohn <devin-petersohn@users.noreply.github.com>
Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>

* TEST-modin-project#2690: add astype benchmark (modin-project#2691)

* TEST-modin-project#2690: add astype benchmark

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2690: add category dtype; use df.types

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2690: add case with one column

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2702: add loc/iloc benchmark (modin-project#2703)

* TEST-modin-project#2702: add loc/iloc benchmark

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2702: add multiindex loc bench

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2702: add row_loc check

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* TEST-modin-project#2716: add describe bench (modin-project#2718)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* DOCS-modin-project#2717: Fix version of Modin for building latest docs (modin-project#2719)

Signed-off-by: Alexey Prutskov <alexey.prutskov@intel.com>

* FEAT-modin-project#1611: Add mod operation (modin-project#2726)

Signed-off-by: Alina <alina.bykovskaya@intel.com>

* TEST-modin-project#2725: add index, columns, shape benchmarks (modin-project#2727)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

* FIX-modin-project#2305: fix handling of renaming aggregation (modin-project#2732)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2362: fix key handling in 'Series.__setitem__' (modin-project#2731)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* TEST-modin-project#2722: add ASV read_csv skiprows benchmark (modin-project#2724)

* TEST-modin-project#2722: add ASV read_csv skiprows benchmark

Co-authored-by: Anatoly Myachev <45976948+anmyachev@users.noreply.github.com>
Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* FIX-modin-project#2735: move '.reindex' logic about axis dispatching from the base class (modin-project#2736)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* TEST-modin-project#1496: add tests for setting new column with different from frame length (modin-project#2733)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* REFACTOR-modin-project#2739: io tests refactoring (modin-project#2740)

Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* TEST-modin-project#2753: add GroupBy benchmarsk with huge amount of groups (modin-project#2754)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2362: fix handling slices in 'DataFrame.__setitem__' (modin-project#2741)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2742: fix performance degradation for dictionary GroupBy aggregation (modin-project#2743)

* FIX-modin-project#2742: changed callable functions to its names in dict aggregation

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2742: commends added

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>

* FIX-modin-project#2737: fix handling of dates for read_csv with OmniSci backend (modin-project#2738)

Co-authored-by: Anatoly Myachev <45976948+anmyachev@users.noreply.github.com>
Signed-off-by: Alexander Myskov <alexander.myskov@intel.com>

* DOCS-modin-project#2584: Add CODEOWNERS file (modin-project#2759)

* Resolves modin-project#2584

Signed-off-by: Devin Petersohn <devin.petersohn@gmail.com>

Co-authored-by: Anatoly Myachev <45976948+anmyachev@users.noreply.github.com>
Co-authored-by: Dmitry Chigarev <62142979+dchigarev@users.noreply.github.com>
Co-authored-by: ienkovich <ilya.enkovich@intel.com>
Co-authored-by: Alexey Prutskov <alexey.prutskov@intel.com>
Co-authored-by: Devin Petersohn <devin-petersohn@users.noreply.github.com>
Co-authored-by: amyskov <55585026+amyskov@users.noreply.github.com>
Co-authored-by: YarShev <yaroslav.igoshev@intel.com>
Co-authored-by: Gregory Shimansky <gregory.shimansky@intel.com>
Co-authored-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Co-authored-by: Gregory Shimansky <gshimansky@users.noreply.github.com>
Co-authored-by: Reshama Shaikh <reshama.stat@gmail.com>
Co-authored-by: vfdev <vfdev.5@gmail.com>
Co-authored-by: Mohammed Kashif <mohammed15035@iiitd.ac.in>
Co-authored-by: Mohammed Kashif <md.kashif.py93@gmail.com>
Co-authored-by: Abdulelah S. Al Mesfer <28743265+abdulelahsm@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
Co-authored-by: Vasily Litvinov <45396231+vnlitvinov@users.noreply.github.com>
Co-authored-by: Itamar Turner-Trauring <itamar@itamarst.org>
Co-authored-by: raphaelauv <raphaelauv@users.noreply.github.com>
Co-authored-by: Weiwen Gu <gwengww@gmail.com>
Co-authored-by: Richard Lin <35508487+richardlin047@users.noreply.github.com>
Co-authored-by: Tim Gates <tim.gates@iress.com>
Co-authored-by: Devin Petersohn <devin.petersohn@gmail.com>
Co-authored-by: Zain Patel <zain.patel06@gmail.com>
Co-authored-by: Khang Vu <khangvu200391845@gmail.com>
Co-authored-by: William Ma <12377941+williamma12@users.noreply.github.com>
Co-authored-by: Karthikeyan Singaravelan <tir.karthi@gmail.com>
Co-authored-by: Todd Yu <toddyu@berkeley.edu>
Co-authored-by: Alina Bykovskaya <alina.bykovskaya@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P1 Important tasks that we should complete soon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants