Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#4035: Upgrade pandas support to 1.4 #4036

Merged
merged 68 commits into from
Jan 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
9904cfc
FEAT-#4035: Upgrade pandas support to 1.4
devin-petersohn Jan 24, 2022
c9270b7
FEAT-#4035: Upgrade pandas support to 1.4
devin-petersohn Jan 24, 2022
cfdfa51
Upgrade pandas to 1.4.0 in env files
YarShev Jan 24, 2022
de9d095
Upgrade min python version in setup.py and ci.yml
YarShev Jan 24, 2022
81955c6
Upgrade min numpy version
YarShev Jan 24, 2022
ec62f84
Remove FilePathOrBuffer import
YarShev Jan 24, 2022
10b970e
Handle axis more carefully
YarShev Jan 24, 2022
c250e6b
Fix `test_resample_getitem`.
prutskov Jan 24, 2022
5b925e1
Merge branch 'issues/4035' of github.com:devin-petersohn/modin into i…
devin-petersohn Jan 24, 2022
c6ad1aa
Fix kurtosis exception type
devin-petersohn Jan 24, 2022
2f01bf4
Fix test_append by removing stale sort workaround
RehanSD Jan 24, 2022
b4639de
Fix more tests
devin-petersohn Jan 24, 2022
9b1bfdd
Fix more skipna changes
devin-petersohn Jan 24, 2022
13379eb
Update series.py __repr__ to use display.max_{rows|cols} instead of m…
RehanSD Jan 24, 2022
45c2fe6
Merge branch 'issues/4035' of https://github.com/devin-petersohn/modi…
RehanSD Jan 24, 2022
8fa36a0
Update simple_row_groupby to specify categorical data is ordered
RehanSD Jan 24, 2022
6bf97b6
Add codepath to check that Modin raises ValueError when passing None …
RehanSD Jan 25, 2022
b98bfc4
Add comment to explain new codepath
RehanSD Jan 25, 2022
892bc8e
Add codepath to check that Modin raises ValueError when passing None …
RehanSD Jan 25, 2022
f17181c
Add codepath to check that Modin raises ValueError when passing None …
RehanSD Jan 25, 2022
f4c4f54
FEAT-#4035: Upgrade pandas support to 1.4
devin-petersohn Jan 24, 2022
8e44e7c
Upgrade pandas to 1.4.0 in env files
YarShev Jan 24, 2022
a25bf13
Upgrade min python version in setup.py and ci.yml
YarShev Jan 24, 2022
11d7a97
Upgrade min numpy version
YarShev Jan 24, 2022
2f3bdf3
Remove FilePathOrBuffer import
YarShev Jan 24, 2022
daf2d9c
Handle axis more carefully
YarShev Jan 24, 2022
0b9d617
Fix `test_resample_getitem`.
prutskov Jan 24, 2022
63ff1b8
Fix kurtosis exception type
devin-petersohn Jan 24, 2022
35d4511
Fix test_append by removing stale sort workaround
RehanSD Jan 24, 2022
40be69f
Update series.py __repr__ to use display.max_{rows|cols} instead of m…
RehanSD Jan 24, 2022
19a71e6
Fix more tests
devin-petersohn Jan 24, 2022
09eda82
Fix more skipna changes
devin-petersohn Jan 24, 2022
c892e9b
Update simple_row_groupby to specify categorical data is ordered
RehanSD Jan 24, 2022
2dbc295
Add codepath to check that Modin raises ValueError when passing None …
RehanSD Jan 25, 2022
eb282a6
Add comment to explain new codepath
RehanSD Jan 25, 2022
6de34e7
Add codepath to check that Modin raises ValueError when passing None …
RehanSD Jan 25, 2022
f98f0e7
Add codepath to check that Modin raises ValueError when passing None …
RehanSD Jan 25, 2022
161e9b3
Merge branch 'issues/4035' of https://github.com/devin-petersohn/modi…
RehanSD Jan 25, 2022
01241c1
Merge with master and remove merge conflict
RehanSD Jan 25, 2022
fe11d81
Fix linting
YarShev Jan 25, 2022
2f4a1a1
Fix tests for merge
YarShev Jan 25, 2022
70ba389
Fix reset_index
YarShev Jan 25, 2022
57fdeef
Adjust number of warnings for OmniSci tests.
ienkovich Jan 25, 2022
4df0606
Fix reset_index
YarShev Jan 25, 2022
47000f7
Fix series.asof, series.reindex
prutskov Jan 25, 2022
4e49e47
Revert "Add codepath to check that Modin raises ValueError when passi…
RehanSD Jan 25, 2022
4dc5b49
Revert "Add codepath to check that Modin raises ValueError when passi…
RehanSD Jan 25, 2022
ed265fe
Revert "Add comment to explain new codepath"
RehanSD Jan 25, 2022
1a908c7
Revert "Add codepath to check that Modin raises ValueError when passi…
RehanSD Jan 25, 2022
ccad5c1
Fix test cases where pandas throws errors
devin-petersohn Jan 25, 2022
e578af8
Lint
devin-petersohn Jan 25, 2022
d977687
Fix warnings
devin-petersohn Jan 25, 2022
8565d74
Fix import and validation
devin-petersohn Jan 25, 2022
eb0cd2a
Fix test_join_sort::test_sort_values by skipping ascending = None
RehanSD Jan 25, 2022
af1565a
lint
RehanSD Jan 25, 2022
d0a6d9d
Fix read_fwf issue
devin-petersohn Jan 25, 2022
51f0616
Update insert to throw IndexError if negative index is out of bounds,…
RehanSD Jan 26, 2022
72cfadf
Lint
RehanSD Jan 26, 2022
c2b5dfe
Convert error strings to f-strings
RehanSD Jan 26, 2022
061b3a7
Resolve fileno error by setting memory_map to False when using BytesI…
RehanSD Jan 26, 2022
f3a6f53
Remove unused imports
RehanSD Jan 26, 2022
b7b9d96
Remove keyerror for memory_map
RehanSD Jan 26, 2022
5ff2baa
lint
RehanSD Jan 26, 2022
8bd0766
Address comments
YarShev Jan 26, 2022
d9e983d
Apply suggestions from code review
devin-petersohn Jan 26, 2022
3149280
Update modin/experimental/core/execution/native/implementations/omnis…
devin-petersohn Jan 26, 2022
913ee69
Update modin/pandas/test/dataframe/test_join_sort.py
YarShev Jan 26, 2022
62f2946
Resolve merge conflicts
RehanSD Jan 26, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
fetch-depth: 1
- uses: actions/setup-python@v2
with:
python-version: "3.7.x"
python-version: "3.8.x"
YarShev marked this conversation as resolved.
Show resolved Hide resolved
architecture: "x64"
- run: pip install black
- run: black --check --diff modin/ asv_bench/benchmarks scripts/doc_checker.py
Expand All @@ -43,7 +43,7 @@ jobs:
fetch-depth: 1
- uses: actions/setup-python@v2
with:
python-version: "3.7.x"
python-version: "3.8.x"
architecture: "x64"
- run: pip install -r docs/requirements-doc.txt
- run: cd docs && sphinx-build -T -E -b html . build
Expand All @@ -57,7 +57,7 @@ jobs:
fetch-depth: 1
- uses: actions/setup-python@v2
with:
python-version: "3.7.x"
python-version: "3.8.x"
architecture: "x64"
- run: pip install pytest pytest-cov pydocstyle numpydoc==1.1.0 xgboost
- run: pytest scripts/test
Expand Down Expand Up @@ -132,7 +132,7 @@ jobs:
fetch-depth: 1
- uses: actions/setup-python@v2
with:
python-version: "3.7.x"
python-version: "3.8.x"
architecture: "x64"
- run: pip install flake8 flake8-print
- run: flake8 --enable=T modin/ asv_bench/benchmarks scripts/doc_checker.py
Expand All @@ -152,7 +152,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
# Miniconda setup sometimes fails because of an HTTP error. Retry
Expand Down Expand Up @@ -185,7 +185,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
# Miniconda setup sometimes fails because of an http error. retry
Expand Down Expand Up @@ -214,7 +214,7 @@ jobs:
fetch-depth: 1
- uses: actions/setup-python@v2
with:
python-version: "3.7.x"
python-version: "3.8.x"
architecture: "x64"
- name: Clean install and run
run: |
Expand All @@ -235,7 +235,7 @@ jobs:
fetch-depth: 1
- uses: actions/setup-python@v2
with:
python-version: "3.7.x"
python-version: "3.8.x"
architecture: "x64"
- name: Clean install and run
run: |
Expand All @@ -258,7 +258,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
# Miniconda setup sometimes fails because of an http error. retry
Expand Down Expand Up @@ -294,7 +294,7 @@ jobs:
env:
MODIN_MEMORY: 1000000000
MODIN_TEST_DATASET_SIZE: "small"
name: Test ${{ matrix.execution }} execution, Python 3.7
name: Test ${{ matrix.execution }} execution, Python 3.8
steps:
- uses: actions/checkout@v2
with:
Expand All @@ -303,7 +303,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
# Miniconda setup sometimes fails because of an http error. retry
Expand Down Expand Up @@ -355,7 +355,7 @@ jobs:
shell: bash -l {0}
env:
MODIN_STORAGE_FORMAT: "omnisci"
name: Test OmniSci storage format, Python 3.7
name: Test OmniSci storage format, Python 3.8
steps:
- uses: actions/checkout@v2
with:
Expand All @@ -365,7 +365,7 @@ jobs:
with:
activate-environment: modin_on_omnisci
environment-file: requirements/env_omnisci.yml
python-version: 3.7
python-version: 3.8
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
# Miniconda setup sometimes fails because of an http error. retry
# it once if it fails. todo(https://github.com/conda-incubator/setup-miniconda/issues/129):
Expand Down Expand Up @@ -469,7 +469,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.7", "3.8"]
python-version: ["3.8"]
vnlitvinov marked this conversation as resolved.
Show resolved Hide resolved
engine: ["python", "ray", "dask"]
env:
MODIN_ENGINE: ${{matrix.engine}}
Expand Down Expand Up @@ -556,7 +556,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
# Miniconda setup sometimes fails because of an http error. retry
Expand Down Expand Up @@ -602,7 +602,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
# Miniconda setup sometimes fails because of an http error. retry
Expand Down Expand Up @@ -640,7 +640,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.7", "3.8"]
python-version: ["3.8"]
YarShev marked this conversation as resolved.
Show resolved Hide resolved
engine: ["ray", "dask"]
test-task:
- modin/pandas/test/dataframe/test_binary.py
Expand Down Expand Up @@ -702,7 +702,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.7", "3.8"]
python-version: ["3.8"]
env:
MODIN_STORAGE_FORMAT: pyarrow
MODIN_EXPERIMENTAL: "True"
Expand Down Expand Up @@ -739,7 +739,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: [ "3.7", "3.8" ]
python-version: ["3.8" ]
engine: ["ray", "dask"]
env:
MODIN_EXPERIMENTAL: "True"
Expand Down
10 changes: 4 additions & 6 deletions .github/workflows/push-to-master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ jobs:
with:
activate-environment: modin
environment-file: requirements/requirements-no-engine.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
- name: install Ray nightly build
run: pip install https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
run: pip install https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp38-cp38-manylinux2014_x86_64.whl
- name: Conda environment
run: |
conda info
Expand Down Expand Up @@ -63,7 +63,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
- name: Conda environment
Expand All @@ -81,7 +81,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.7", "3.8"]
python-version: ["3.8"]
test-task:
- modin/pandas/test/dataframe/test_binary.py
- modin/pandas/test/dataframe/test_default.py
Expand Down Expand Up @@ -118,8 +118,6 @@ jobs:
- run: pip install -r requirements-dev.txt --use-deprecated=legacy-resolver
# Use a ray master commit that includes the fix here: https://github.com/ray-project/ray/pull/16278
# Can be changed after a Ray version > 1.4 is released.
- run: pip install https://s3-us-west-2.amazonaws.com/ray-wheels/master/c8e3ed9eec30119092ef966ee7b8982c8954c333/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
if: matrix.python-version == '3.7'
- run: pip install https://s3-us-west-2.amazonaws.com/ray-wheels/master/c8e3ed9eec30119092ef966ee7b8982c8954c333/ray-2.0.0.dev0-cp38-cp38-manylinux2014_x86_64.whl
if: matrix.python-version == '3.8'
- name: Install HDF5
Expand Down
18 changes: 9 additions & 9 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
- name: Conda environment
Expand All @@ -41,7 +41,7 @@ jobs:
env:
MODIN_MEMORY: 1000000000
MODIN_TEST_DATASET_SIZE: "small"
name: Test ${{ matrix.execution }} execution, Python 3.7
name: Test ${{ matrix.execution }} execution, Python 3.8
steps:
- uses: actions/checkout@v2
with:
Expand All @@ -50,7 +50,7 @@ jobs:
with:
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.7
python-version: 3.8
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
- name: Conda environment
Expand Down Expand Up @@ -96,7 +96,7 @@ jobs:
MODIN_EXPERIMENTAL: "True"
MODIN_ENGINE: "native"
MODIN_STORAGE_FORMAT: "omnisci"
name: Test OmniSci storage format, Python 3.7
name: Test OmniSci storage format, Python 3.8
steps:
- uses: actions/checkout@v2
with:
Expand All @@ -106,7 +106,7 @@ jobs:
with:
activate-environment: modin_on_omnisci
environment-file: requirements/env_omnisci.yml
python-version: 3.7
python-version: 3.8
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
- name: Conda environment
run: |
Expand Down Expand Up @@ -135,7 +135,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.7", "3.8"]
python-version: ["3.8"]
engine: ["python", "ray", "dask"]
env:
MODIN_ENGINE: ${{matrix.engine}}
Expand Down Expand Up @@ -202,7 +202,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.7", "3.8"]
python-version: ["3.8"]
engine: ["ray", "dask"]
test-task:
- modin/pandas/test/dataframe/test_binary.py
Expand Down Expand Up @@ -257,7 +257,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.7", "3.8"]
python-version: ["3.8"]
devin-petersohn marked this conversation as resolved.
Show resolved Hide resolved
env:
MODIN_STORAGE_FORMAT: pyarrow
MODIN_EXPERIMENTAL: "True"
Expand Down Expand Up @@ -287,7 +287,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: [ "3.7", "3.8" ]
python-version: ["3.8"]
engine: ["ray", "dask"]
env:
MODIN_EXPERIMENTAL: "True"
Expand Down
4 changes: 2 additions & 2 deletions environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ name: modin
channels:
- conda-forge
dependencies:
- pandas==1.3.5
- numpy>=1.16.5
- pandas==1.4.0
- numpy>=1.18.5
- pyarrow>=4.0.1
- dask[complete]>=2.22.0
- distributed>=2.22.0
Expand Down
4 changes: 2 additions & 2 deletions modin/core/io/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

import pandas
import pandas._libs.lib as lib
from pandas._typing import CompressionOptions, FilePathOrBuffer, StorageOptions
from pandas._typing import CompressionOptions, StorageOptions
from pandas.util._decorators import doc

from modin.db_conn import ModinDatabaseConnection
Expand Down Expand Up @@ -826,7 +826,7 @@ def to_sql(
def to_pickle(
cls,
obj: Any,
filepath_or_buffer: FilePathOrBuffer,
filepath_or_buffer,
devin-petersohn marked this conversation as resolved.
Show resolved Hide resolved
compression: CompressionOptions = "infer",
protocol: int = pickle.HIGHEST_PROTOCOL,
storage_options: StorageOptions = None,
Expand Down
3 changes: 1 addition & 2 deletions modin/core/io/text/fwf_dispatcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
"""Module houses `FWFDispatcher` class, that is used for reading of tables with fixed-width formatted lines."""

import pandas
from pandas._typing import FilePathOrBuffer

from modin.core.io.text.text_file_dispatcher import TextFileDispatcher

Expand All @@ -27,7 +26,7 @@ class FWFDispatcher(TextFileDispatcher):
@classmethod
def check_parameters_support(
cls,
filepath_or_buffer: FilePathOrBuffer,
filepath_or_buffer,
read_kwargs: dict,
):
"""
Expand Down
7 changes: 3 additions & 4 deletions modin/core/io/text/text_file_dispatcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
import numpy as np
import pandas
import pandas._libs.lib as lib
from pandas._typing import FilePathOrBuffer
from pandas.core.dtypes.common import is_list_like

from modin.core.io.file_dispatcher import FileDispatcher, OpenFile
Expand All @@ -36,7 +35,7 @@
from modin.core.io.text.utils import CustomNewlineIterator
from modin.config import NPartitions

ColumnNamesTypes = Tuple[Union[pandas.Index, pandas.MultiIndex, pandas.Int64Index]]
ColumnNamesTypes = Tuple[Union[pandas.Index, pandas.MultiIndex]]
IndexColType = Union[int, str, bool, Sequence[int], Sequence[str], None]


Expand Down Expand Up @@ -614,7 +613,7 @@ def _launch_tasks(cls, splits: list, **partition_kwargs) -> Tuple[list, list, li
@classmethod
def check_parameters_support(
cls,
filepath_or_buffer: FilePathOrBuffer,
filepath_or_buffer,
read_kwargs: dict,
) -> bool:
"""
Expand Down Expand Up @@ -912,7 +911,7 @@ def _get_new_qc(
return new_query_compiler

@classmethod
def _read(cls, filepath_or_buffer: FilePathOrBuffer, **kwargs):
def _read(cls, filepath_or_buffer, **kwargs):
"""
Read data from `filepath_or_buffer` according to `kwargs` parameters.

Expand Down
3 changes: 3 additions & 0 deletions modin/core/storage_formats/pandas/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,9 @@ def generic_parse(fname, **kwargs):

bio.seek(start)
to_read = header + bio.read(end - start)
if "memory_map" in kwargs:
kwargs = kwargs.copy()
devin-petersohn marked this conversation as resolved.
Show resolved Hide resolved
del kwargs["memory_map"]
pandas_df = callback(BytesIO(to_read), **kwargs)
index = (
pandas_df.index
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@

import pandas
import pandas._libs.lib as lib
from pandas._typing import FilePathOrBuffer
from pandas.io.common import is_url

ReadCsvKwargsType = Dict[
Expand All @@ -51,7 +50,6 @@
Sequence,
Callable,
Dialect,
FilePathOrBuffer,
None,
devin-petersohn marked this conversation as resolved.
Show resolved Hide resolved
devin-petersohn marked this conversation as resolved.
Show resolved Hide resolved
],
]
Expand Down
Loading