Skip to content

Commit

Permalink
Merge branch 'master' into 4310-read-csv-glob-lists-of-lists-of-ints
Browse files Browse the repository at this point in the history
  • Loading branch information
YarShev authored Mar 24, 2022
2 parents f1200a1 + 2809f7c commit f6aa01c
Show file tree
Hide file tree
Showing 27 changed files with 3,497 additions and 296 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ jobs:
modin/experimental/core/execution/native/implementations/omnisci_on_native/expr.py \
modin/experimental/core/execution/native/implementations/omnisci_on_native/omnisci_worker.py \
- run: python scripts/doc_checker.py modin/experimental/core/storage_formats/omnisci
- run: python scripts/doc_checker.py modin/experimental/core/execution/native/implementations/omnisci_on_native/exchange/dataframe_protocol

lint-flake8:
name: lint (flake8)
Expand Down Expand Up @@ -339,6 +340,8 @@ jobs:
- run: MODIN_BENCHMARK_MODE=True pytest modin/pandas/test/internals/test_benchmark_mode.py
- run: pytest modin/experimental/core/execution/native/implementations/omnisci_on_native/test/test_dataframe.py
- run: pytest modin/pandas/test/test_io.py::TestCsv --verbose
- run: pytest modin/test/exchange/dataframe_protocol/test_general.py
- run: pytest modin/test/exchange/dataframe_protocol/omnisci
- uses: codecov/codecov-action@v2

test-asv-benchmarks:
Expand Down Expand Up @@ -480,6 +483,7 @@ jobs:
- run: python -m pytest modin/experimental/pandas/test/test_io_exp.py
- run: pip install "dfsql>=0.4.2" "pyparsing<=2.4.7" && pytest modin/experimental/sql/test/test_sql.py
- run: pytest modin/test/exchange/dataframe_protocol/test_general.py
- run: pytest modin/test/exchange/dataframe_protocol/pandas/test_protocol.py
- uses: codecov/codecov-action@v2

test-experimental:
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ jobs:
- run: pytest modin/test/storage_formats/omnisci/test_internals.py
- run: pytest modin/experimental/core/execution/native/implementations/omnisci_on_native/test/test_dataframe.py
- run: pytest modin/pandas/test/test_io.py::TestCsv
- run: pytest modin/test/exchange/dataframe_protocol/test_general.py
- run: pytest modin/test/exchange/dataframe_protocol/omnisci
- uses: codecov/codecov-action@v2

test-all:
Expand Down Expand Up @@ -182,6 +184,7 @@ jobs:
- run: python -m pytest modin/pandas/test/test_io.py
- run: python -m pytest modin/experimental/pandas/test/test_io_exp.py
- run: pytest modin/test/exchange/dataframe_protocol/test_general.py
- run: pytest modin/test/exchange/dataframe_protocol/pandas/test_protocol.py
- uses: codecov/codecov-action@v2

test-windows:
Expand Down
12 changes: 9 additions & 3 deletions docs/release_notes/release_notes-0.14.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ Key Features and Updates
* FIX-#4303: Fix the syntax error in reading from postgres (#4304)
* FIX-#4308: Add proper error handling in df.set_index (#4309)
* FIX-#4056: Allow an empty parse_date list in `read_csv_glob` (#4074)
* FIX-#4312: Fix constructing categorical frame with duplicate column names (#4313).
* FIX-#4314: Allow passing a series of dtypes to astype (#4318)
* FIX-#4310: Handle lists of lists of ints in read_csv_glob (#4319)
* FIX-#4312: Fix constructing categorical frame with duplicate column names (#4313).
* FIX-#4314: Allow passing a series of dtypes to astype (#4318)
* FIX-#4310: Handle lists of lists of ints in read_csv_glob (#4319)
* Performance enhancements
* FIX-#4138, FIX-#4009: remove redundant sorting in the internal '.mask()' flow (#4140)
* FIX-#4183: Stop shallow copies from creating global shared state. (#4184)
Expand Down Expand Up @@ -54,6 +54,9 @@ Key Features and Updates
*
* Developer API enhancements
* FEAT-#4245: Define base interface for dataframe exchange protocol (#4246)
* FEAT-#4244: Implement dataframe exchange protocol for OmnisciOnNative execution (#4269)
* FEAT-#4144: Implement dataframe exchange protocol for pandas storage format (#4150)
* FEAT-#4342: Support `from_dataframe`` for pandas storage format (#4343)
* Update testing suite
* TEST-#3628: Report coverage data for `test-internals` CI job (#4198)
* TEST-#3938: Test tutorial notebooks in CI (#4145)
Expand All @@ -74,6 +77,9 @@ Key Features and Updates
* DOCS-#4280: Change links in jupyter notebooks (#4281)
* DOCS-#4290: Add changes for OmniSci notebooks (#4291)
* DOCS-#4241: Update warnings and docs regarding defaulting to pandas (#4242)
* DOCS-#3099: Fix `BasePandasDataSet` docstrings warnings (#4333)
* DOCS-#4339: Reformat I/O functions docstrings (#4341)
* DOCS-#4336: Reformat general utilities docstrings (#4338)
* Dependencies
* FIX-#4113, FIX-#4116, FIX-#4115: Apply new `black` formatting, fix pydocstyle check and readthedocs build (#4114)
* TEST-#3227: Use codecov github action instead of bash form in GA workflows (#3226)
Expand Down
26 changes: 26 additions & 0 deletions modin/core/dataframe/base/exchange/dataframe_protocol/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,16 @@ class ArrowCTypes:
# - microseconds -> 'u'
# - nanoseconds -> 'n'
TIMESTAMP = "ts{resolution}:{tz}"
TIME = "tt{resolution}"


class Endianness:
"""Enum indicating the byte-order of a data-type."""

LITTLE = "<"
BIG = ">"
NATIVE = "="
NA = "|"


def pandas_dtype_to_arrow_c(dtype) -> str:
Expand Down Expand Up @@ -158,3 +168,19 @@ def pandas_dtype_to_arrow_c(dtype) -> str:
raise NotImplementedError(
f"Convertion of {dtype} to Arrow C format string is not implemented."
)


def raise_copy_alert(copy_reason=None):
"""
Raise a ``RuntimeError`` mentioning that there's a copy required.
Parameters
----------
copy_reason : str, optional
The reason of making a copy. Should fit to the following format:
'The copy occured due to {copy_reason}.'.
"""
msg = "Copy required but 'allow_copy=False' is set."
if copy_reason:
msg += f" The copy occured due to {copy_reason}."
raise RuntimeError(msg)
33 changes: 33 additions & 0 deletions modin/core/dataframe/pandas/dataframe/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2859,3 +2859,36 @@ def __dataframe__(self, nan_as_null: bool = False, allow_copy: bool = True):
return PandasProtocolDataframe(
self, nan_as_null=nan_as_null, allow_copy=allow_copy
)

@classmethod
def from_dataframe(cls, df: "ProtocolDataframe") -> "PandasDataframe":
"""
Convert a DataFrame implementing the dataframe exchange protocol to a Core Modin Dataframe.
See more about the protocol in https://data-apis.org/dataframe-protocol/latest/index.html.
Parameters
----------
df : ProtocolDataframe
The DataFrame object supporting the dataframe exchange protocol.
Returns
-------
PandasDataframe
A new Core Modin Dataframe object.
"""
if type(df) == cls:
return df

if not hasattr(df, "__dataframe__"):
raise ValueError(
"`df` does not support DataFrame exchange protocol, i.e. `__dataframe__` method"
)

from modin.core.dataframe.pandas.exchange.dataframe_protocol.from_dataframe import (
from_dataframe_to_pandas,
)

ErrorMessage.default_to_pandas(message="`from_dataframe`")
pandas_df = from_dataframe_to_pandas(df)
return cls.from_pandas(pandas_df)
Loading

0 comments on commit f6aa01c

Please sign in to comment.