-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework the python bindings [WIP] #856
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks a lot, @kszucs !
My only concern is the addition of the Py
prefix in a lot of public stuff. Doesn't this cause all public APIs in Python to be prepended by Py
also? IMO there should not be any prefix.
Right, the intent there is to expose the rust python objects under from datafusion import internals
class DataFrame(internals.PyDataFrame):
# additional functionality One additional advantage of using the Note that this is a common approach for python bindings, see py-polars for example. |
from .internals import PyExpr as Expr | ||
from .internals import functions | ||
|
||
__all__ = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao the symbols are exported without the Py
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh, that makes sense. Thanks for the clarification! LGTM
fn to_pyarrow(&self, py: Python) -> PyResult<PyObject>; | ||
} | ||
|
||
impl PyArrowConvert for DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to try adding this module to arrow-rs
as an optional one, so we can implement the PyO3
conversion traits, like FromPyObject
directly for the arrow types.
This will further reduce the required conversion boilerplate in the python bindinds.
For example we should be able to write
fn create_dataframe(
&mut self,
partitions: Vec<Vec<RecordBatch>>,
) -> PyResult<PyDataFrame> {
// partitions are going to be converted by PyO3 automatically
instead of the current
fn create_dataframe(
&mut self,
partitions: Vec<Vec<&PyAny>>,
) -> PyResult<PyDataFrame> {
let partitions: Vec<Vec<RecordBatch>> = partitions
.into_iter()
.map(|batches| {
batches
.into_iter()
.map(RecordBatch::from_pyarrow)
.collect::<PyResult<_>>()
})
.collect::<PyResult<_>>()?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit part of the reason I avoided something like that at the time was to avoid canibalizing pyarrow, since it makes it appealing from there to have arrow-rs exposed as a Python library, which imo would add confusion to the Python ecosystem (having two official Python libraries, one from each implementation).
OTOH, polars, and likely others, would benefit from such a pyo3 library, as anyone building bindings on top of arrow-rs could re-use that code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao created a prototype, see the referenced PRs below. I guess the code should be portable to arrow2 as well.
Closing in favor of #873 |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?