refactor: deprecate register api #8863

ncclementi · 2024-04-02T17:57:29Z

Looks like I messed something up in the rebase, I'll fix it soon.

Closes chore: deprecate register functionality #8735

gforsyth

Sorry for the delay @ncclementi !

The deprecation implementation is good -- I think some of the tests can be updated now and some should probably wait until we have a standardized read_in_memory

ibis/backends/datafusion/tests/test_register.py

ibis/backends/duckdb/tests/test_client.py

ibis/backends/duckdb/tests/test_register.py

ibis/backends/tests/test_register.py

gforsyth · 2024-04-07T15:44:49Z

ibis/backends/tests/test_register.py

+    with pytest.warns(FutureWarning, match="v9.0"):
+        t = con.register(df)


The in-memory tests we can probably leave alone for now, but we should add a comment that these should be updated to work with the upcoming read_in_memory and we should also add a polars dataframe to the mix

ibis/backends/tests/test_register.py

ncclementi · 2024-04-09T20:35:26Z

I modified the tests in question to use read_csv or read_parquet I wasn't sure what the name of the test should be or if we need them given that there are separate read_csv and read_parquet, though the register tests seem to be more complete. Happy to re-visit.

ncclementi · 2024-04-10T20:16:14Z

Update regarding adapting test_register_csv and similar to use con.read_csv

Turns out that doing this would require to modify the tests quite a bit given that now this will trigger some XPASS failures, and the tests will become way more convoluted than what they are at the moment.

After chatting with @cpcloud, the suggestion was to leave the tests as they are now, and discuss a decision for when we remove the register api completely.

gforsyth · 2024-04-11T14:44:40Z

Ok, I'm good with the changes here, but I think we need a consensus from a few folks on whether to merge now or not.
If we want to add this in Ibis 9.0, I think we also need to handle standardizing read_in_memory xref #6593 before we release, because it seems irresponsible to deprecate functionality without offering a replacement.

cpcloud · 2024-04-12T11:25:31Z

@gforsyth Brings up a good point. Let's hold off on merging to 9.0 for now to avoid delaying its release. We can merge this soon after 9.0 is cut (not immediately, just to wait for the bug reports to come in and cut a 9.0.1 or 9.1 bug fix release).

ncclementi · 2024-05-21T15:56:43Z

I just rebased, I think when CI finishes we can get this one in since 9.0 is out

ncclementi · 2024-05-23T17:57:07Z

@gforsyth should we get this one in, or are there still concerns merging this deprecation without having an alternative read_in_memory api?

gforsyth · 2024-05-23T18:15:44Z

should we get this one in, or are there still concerns merging this deprecation without having an alternative read_in_memory api?

My preference is to have a replacement available -- I lost track of that work, but let me see if I can pick it back up.
In the interim, we'll want to update all the deprecated as of messages to be 9.1 instead of 9.0

ncclementi · 2024-05-28T20:43:23Z

Ready, waiting for #9251 to be merged.

This PR adds/codifies support for passing in-memory data to `create_table`. The default behavior for most backends is to first create a `memtable` with whatever `obj` is passed to `create_table`, then we create a table based on that `memtable` -- because of this, semantics around `temp` tables and `catalog.database` locations are handled correctly. After the new table (that the user has provided a name for) is created, we drop the intermediate `memtable` so we don't add two tables for every in-memory object passed to `create_table`. Currently most backends fail when passed `RecordBatchReaders`, or a single `RecordBatch`, or a `pyarrow.Dataset` -- if we add support for these to `memtable`, all of those backends would start working, so I've marked those xfails as `notimpl` for now. A few backends _don't_ work this way: `polars` reads in the table directly using their fast-path local-memory reading stuff. `datafusion` uses a fast-path read, then creates a table from the table that is created by the fast-path -- this is because the `datafusion` dataframe API has no way to specify things like `overwrite`, or table location, but the CTAS from already present tables is very quick (and _possibly_ zero-copy?) so no issue there. `duckdb` has a refactored `read_in_memory` (which we should deprecate), but it isn't entirely hooked up inside of `create_table` yet, so some paths may go via `memtable` creation, but `memtable` creation on DuckDB is especially fast, so I'm all for fixing this up eventually. `pyspark` works with the intermediate `memtable` -- there are possibly fast-paths available, but they aren't currently implemented. `pandas` and `dask` have a custom `_convert_object` path TODO: * ~[ ] Flink~ Flink can't create tables from in-memory data? * [x] Impala * [x] BigQuery * [x] Remove `read_in_memory` from datafusion and polars Resolves #6593 xref #8863 Signed-off-by: Gil Forsyth <gil@forsyth.dev> - refactor(duckdb): add polars df as option, move test to backend suite - feat(polars): enable passing in-memory data to create_table - feat(datafusion): enable passing in-memory data to create_table - feat(datafusion): use info_schema for list_tables - feat(duckdb): enable passing in-memory data to create_table - feat(postgres): allow passing in-memory data to create_table - feat(trino): allow passing in-memory date to create_table - feat(mysql): allow passing in-memory data to create_table - feat(mssql): allow passing in-memory data to create_table - feat(exasol): allow passing in-memory data to create_table - feat(risingwave): allow passing in-memory data to create_table - feat(sqlite): allow passing in-memory data to create_table - feat(clickhouse): enable passing in-memory data to create_table - feat(oracle): enable passing in-memory data to create_table - feat(snowflake): allow passing in-memory data to create_table - feat(pyspark): enable passing in-memory data to create_table - feat(pandas,dask): allow passing in-memory data to create_table --------- Signed-off-by: Gil Forsyth <gil@forsyth.dev> Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>

ncclementi force-pushed the deprecate-register-api branch from 81f2f50 to df9b8ad Compare April 2, 2024 19:07

ncclementi changed the title ~~chore: deprecate register api [WIP]~~ refactor: deprecate register api Apr 4, 2024

ncclementi force-pushed the deprecate-register-api branch from 68a4d5b to ce51755 Compare April 4, 2024 17:30

ncclementi requested a review from gforsyth April 4, 2024 18:15

gforsyth requested changes Apr 7, 2024

View reviewed changes

ncclementi force-pushed the deprecate-register-api branch 2 times, most recently from 7097b5d to 5b02581 Compare April 10, 2024 20:06

ncclementi mentioned this pull request Apr 26, 2024

[meta] Ibis expression API stability #8996

Closed

10 tasks

ncclementi mentioned this pull request May 6, 2024

[meta] Ibis backend public API stability #8994

Closed

ncclementi force-pushed the deprecate-register-api branch 2 times, most recently from 5200c97 to cb920e2 Compare May 21, 2024 15:54

ncclementi force-pushed the deprecate-register-api branch from cb920e2 to c99c3d9 Compare May 23, 2024 17:29

gforsyth mentioned this pull request May 24, 2024

feat(all): enable passing in-memory data to create_table #9251

Merged

3 tasks

ncclementi added 2 commits May 28, 2024 10:33

chore: fix imports

3102cd6

chore: bump deprecation version to 9.1

0608259

ncclementi force-pushed the deprecate-register-api branch from c99c3d9 to 0608259 Compare May 28, 2024 14:49

cpcloud approved these changes May 29, 2024

View reviewed changes

cpcloud merged commit 7a39bd3 into ibis-project:main May 29, 2024
74 checks passed

ncclementi mentioned this pull request Sep 19, 2024

refactor(api): remove register API #10175

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: deprecate register api #8863

refactor: deprecate register api #8863

ncclementi commented Apr 2, 2024 •

edited

Loading

gforsyth left a comment

gforsyth Apr 7, 2024

ncclementi commented Apr 9, 2024 •

edited

Loading

ncclementi commented Apr 10, 2024

gforsyth commented Apr 11, 2024

cpcloud commented Apr 12, 2024

ncclementi commented May 21, 2024

ncclementi commented May 23, 2024

gforsyth commented May 23, 2024

ncclementi commented May 28, 2024

		with pytest.warns(FutureWarning, match="v9.0"):
		t = con.register(df)

refactor: deprecate register api #8863

refactor: deprecate register api #8863

Conversation

ncclementi commented Apr 2, 2024 • edited Loading

gforsyth left a comment

Choose a reason for hiding this comment

gforsyth Apr 7, 2024

Choose a reason for hiding this comment

ncclementi commented Apr 9, 2024 • edited Loading

ncclementi commented Apr 10, 2024

gforsyth commented Apr 11, 2024

cpcloud commented Apr 12, 2024

ncclementi commented May 21, 2024

ncclementi commented May 23, 2024

gforsyth commented May 23, 2024

ncclementi commented May 28, 2024

ncclementi commented Apr 2, 2024 •

edited

Loading

ncclementi commented Apr 9, 2024 •

edited

Loading