Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add "future" versioning #17421

Merged
merged 14 commits into from
Jul 7, 2024
Merged

feat: Add "future" versioning #17421

merged 14 commits into from
Jul 7, 2024

Conversation

ruihe774
Copy link
Contributor

@ruihe774 ruihe774 commented Jul 4, 2024

Previously, future=True renders data structure unstable. When exchanging data with e.g. pyo3-polars, this can result in data corruption if the versions of polars do not match. This PR introduces versioning to "future" (in Python side) and to "pl_flavor" (in Rust side). This allows efficient data exchange and also allows future updates to the data structure.

The current flavors are:

pub enum PlFlavor {
    Compatible,
    Future1,
}

N.B. it is a breaking change to Rust polars. Python polars is backward compatible.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jul 4, 2024
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the benefit of the future parameter internally, but the whole thing feels to me like we're leaking internals to the public API.

py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jul 4, 2024

Codecov Report

Attention: Patch coverage is 79.66102% with 72 lines in your changes missing coverage. Please review.

Project coverage is 80.46%. Comparing base (27ac6cc) to head (0401852).
Report is 8 commits behind head on main.

Files Patch % Lines
crates/polars-core/src/chunked_array/ops/arity.rs 45.00% 11 Missing ⚠️
py-polars/polars/dataframe/frame.py 55.00% 6 Missing and 3 partials ⚠️
py-polars/src/conversion/mod.rs 38.46% 8 Missing ⚠️
crates/polars-core/src/chunked_array/collect.rs 30.00% 7 Missing ⚠️
...core/src/chunked_array/logical/categorical/from.rs 61.53% 5 Missing ⚠️
...es/polars-core/src/chunked_array/ops/unique/mod.rs 50.00% 5 Missing ⚠️
crates/polars-core/src/chunked_array/ops/full.rs 78.94% 4 Missing ⚠️
crates/polars-io/src/ipc/write_async.rs 0.00% 3 Missing ⚠️
py-polars/polars/interchange/protocol.py 87.50% 3 Missing ⚠️
py-polars/polars/series/series.py 70.00% 2 Missing and 1 partial ⚠️
... and 9 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17421      +/-   ##
==========================================
- Coverage   80.58%   80.46%   -0.12%     
==========================================
  Files        1480     1483       +3     
  Lines      193682   195006    +1324     
  Branches     2765     2780      +15     
==========================================
+ Hits       156071   156910     +839     
- Misses      37103    37585     +482     
- Partials      508      511       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ruihe774

This comment was marked as resolved.

@ruihe774 ruihe774 force-pushed the future-version branch 3 times, most recently from 644b8d8 to c9f3c55 Compare July 4, 2024 09:43
@ruihe774 ruihe774 requested a review from stinodego July 4, 2024 15:34
@eitsupi
Copy link
Contributor

eitsupi commented Jul 5, 2024

I think this is a breaking change for both Rust and Python. (In other words, something like duckdblabs/db-benchmark#79, which was broken a few days ago, will break again.)
I don't think this change should be made until v2 in Python.

@ruihe774
Copy link
Contributor Author

ruihe774 commented Jul 5, 2024

I think this is a breaking change for both Rust and Python. (In other words, something like duckdblabs/db-benchmark#79, which was broken a few days ago, will break again.) I don't think this change should be made until v2 in Python.

I have renamed flavor back to future. With c9f3c55 and ea92723, I think it will not break existing Python code, despite type checking will fail. @eitsupi would you plz check it?

@eitsupi
Copy link
Contributor

eitsupi commented Jul 5, 2024

Thanks, this looks good to me.

(By the way, shouldn't this have been unmarked unstable when that was changed to future=True default before v1.0.0? StringView is now an official specification of Apache Arrow and is supported by reference implementations like pyarrow)

@ritchie46
Copy link
Member

@eitsupi this wasn't a breaking change in the benchmarks. We also don't guarantee stability of in memory formats and we may export string views once consumers support them.

For instance latest pyarrow can handle string views. That said. I don't want to leak this inner future to the users like in this PR.

@ritchie46
Copy link
Member

Could you explain more what you were doing in pyo3-polars?

@ruihe774
Copy link
Contributor Author

ruihe774 commented Jul 5, 2024

Could you explain more what you were doing in pyo3-polars?

See pola-rs/pyo3-polars#85. I'm trying to add future=True to to_arrow to allow faster data exchange, but simply adding future=True will result in forward compatibility problem giving the versions of polars (more specifically, polars-arrow) used by python and used by plugin can differ. The python polars may be newer and include incompatible change to the data exported with future=True. To address this, data structure versioning is necessary: in the plugin, pyo3-polars call to_arrow with highest supported flavor (PlFlavor::highest()) in its side, and the python polars side can return data that conforms to the flavor.

For example, a new version of polars introduces a new arrow data type, which is a breaking change to the data structure, so the flavor version increases (PlFlavor::Future2). Old versions of polars cannot handle this. If pyo3-polars simply set future=True to to_arrow, plugins built with old versions of polars cannot work anymore. But with flavor versioning, pyo3-polars can set PlFlavor::Future1. According to the requested flavor, the python polars does not export the new data type, so the compatibility is maintained.

@ritchie46
Copy link
Member

Right, now I understand more. Note that we don't call that plugins, but rather pyo3 extensions. The plugins are versioned separately.

Instead of an enum, can make it a struct with an internal version: PlFlavor{ version: u16 }.

@ruihe774
Copy link
Contributor Author

ruihe774 commented Jul 6, 2024

Instead of an enum, can make it a struct with an internal version: PlFlavor{ version: u16 }.

Good idea. In this way, we can hide the implementation detail.

py-polars/polars/interchange/protocol.py Outdated Show resolved Hide resolved
py-polars/tests/unit/io/test_ipc.py Outdated Show resolved Hide resolved
crates/polars-core/src/datatypes/dtype.rs Outdated Show resolved Hide resolved
@ruihe774 ruihe774 requested a review from ritchie46 July 6, 2024 17:26
@ruihe774 ruihe774 force-pushed the future-version branch 4 times, most recently from 341f66a to 9362a7f Compare July 6, 2024 19:18
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ruihe774.

I think it looks great now. I only have a few minor comments left and then it is good to go.

py-polars/tests/unit/io/test_ipc.py Outdated Show resolved Hide resolved
py-polars/tests/unit/io/test_ipc.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
@ritchie46 ritchie46 merged commit ad836bd into pola-rs:main Jul 7, 2024
24 of 25 checks passed
@stinodego stinodego changed the title feat: add "future" versioning feat: Add "future" versioning Jul 8, 2024
henryharbeck pushed a commit to henryharbeck/polars that referenced this pull request Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants