-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: deprecate register api #8863
Conversation
81f2f50
to
df9b8ad
Compare
68a4d5b
to
ce51755
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay @ncclementi !
The deprecation implementation is good -- I think some of the tests can be updated now and some should probably wait until we have a standardized read_in_memory
ibis/backends/tests/test_register.py
Outdated
with pytest.warns(FutureWarning, match="v9.0"): | ||
t = con.register(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The in-memory tests we can probably leave alone for now, but we should add a comment that these should be updated to work with the upcoming read_in_memory
and we should also add a polars dataframe to the mix
I modified the tests in question to use |
7097b5d
to
5b02581
Compare
Update regarding adapting Turns out that doing this would require to modify the tests quite a bit given that now this will trigger some XPASS failures, and the tests will become way more convoluted than what they are at the moment. After chatting with @cpcloud, the suggestion was to leave the tests as they are now, and discuss a decision for when we remove the |
Ok, I'm good with the changes here, but I think we need a consensus from a few folks on whether to merge now or not. |
@gforsyth Brings up a good point. Let's hold off on merging to 9.0 for now to avoid delaying its release. We can merge this soon after 9.0 is cut (not immediately, just to wait for the bug reports to come in and cut a 9.0.1 or 9.1 bug fix release). |
5200c97
to
cb920e2
Compare
I just rebased, I think when CI finishes we can get this one in since 9.0 is out |
cb920e2
to
c99c3d9
Compare
@gforsyth should we get this one in, or are there still concerns merging this deprecation without having an alternative |
My preference is to have a replacement available -- I lost track of that work, but let me see if I can pick it back up. |
c99c3d9
to
0608259
Compare
Ready, waiting for #9251 to be merged. |
This PR adds/codifies support for passing in-memory data to `create_table`. The default behavior for most backends is to first create a `memtable` with whatever `obj` is passed to `create_table`, then we create a table based on that `memtable` -- because of this, semantics around `temp` tables and `catalog.database` locations are handled correctly. After the new table (that the user has provided a name for) is created, we drop the intermediate `memtable` so we don't add two tables for every in-memory object passed to `create_table`. Currently most backends fail when passed `RecordBatchReaders`, or a single `RecordBatch`, or a `pyarrow.Dataset` -- if we add support for these to `memtable`, all of those backends would start working, so I've marked those xfails as `notimpl` for now. A few backends _don't_ work this way: `polars` reads in the table directly using their fast-path local-memory reading stuff. `datafusion` uses a fast-path read, then creates a table from the table that is created by the fast-path -- this is because the `datafusion` dataframe API has no way to specify things like `overwrite`, or table location, but the CTAS from already present tables is very quick (and _possibly_ zero-copy?) so no issue there. `duckdb` has a refactored `read_in_memory` (which we should deprecate), but it isn't entirely hooked up inside of `create_table` yet, so some paths may go via `memtable` creation, but `memtable` creation on DuckDB is especially fast, so I'm all for fixing this up eventually. `pyspark` works with the intermediate `memtable` -- there are possibly fast-paths available, but they aren't currently implemented. `pandas` and `dask` have a custom `_convert_object` path TODO: * ~[ ] Flink~ Flink can't create tables from in-memory data? * [x] Impala * [x] BigQuery * [x] Remove `read_in_memory` from datafusion and polars Resolves #6593 xref #8863 Signed-off-by: Gil Forsyth <gil@forsyth.dev> - refactor(duckdb): add polars df as option, move test to backend suite - feat(polars): enable passing in-memory data to create_table - feat(datafusion): enable passing in-memory data to create_table - feat(datafusion): use info_schema for list_tables - feat(duckdb): enable passing in-memory data to create_table - feat(postgres): allow passing in-memory data to create_table - feat(trino): allow passing in-memory date to create_table - feat(mysql): allow passing in-memory data to create_table - feat(mssql): allow passing in-memory data to create_table - feat(exasol): allow passing in-memory data to create_table - feat(risingwave): allow passing in-memory data to create_table - feat(sqlite): allow passing in-memory data to create_table - feat(clickhouse): enable passing in-memory data to create_table - feat(oracle): enable passing in-memory data to create_table - feat(snowflake): allow passing in-memory data to create_table - feat(pyspark): enable passing in-memory data to create_table - feat(pandas,dask): allow passing in-memory data to create_table --------- Signed-off-by: Gil Forsyth <gil@forsyth.dev> Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Looks like I messed something up in the rebase, I'll fix it soon.