Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not find data frame with duckdb #1033

Open
charlax opened this issue Sep 17, 2024 · 5 comments
Open

Does not find data frame with duckdb #1033

charlax opened this issue Sep 17, 2024 · 5 comments

Comments

@charlax
Copy link

charlax commented Sep 17, 2024

What happens?

For some reason, jupysql can't find my dataframe, even though I used exactly the example in the docs...

To Reproduce

As of writing, I have the latest jupyterlab, jupysql, and duckedb. I'm following the docs and getting "Catalog Error: Table with name df does not exist!", both with native and sqlalchemy connections.

image

image

OS:

macOS

JupySQL Version:

0.10.13

Full Name:

Charles-Axel Dein

Affiliation:

Stealth Startup

@charlax charlax changed the title Does not find data frame with ducked Does not find data frame with duckdb Sep 17, 2024
@Alex-Monahan
Copy link

I just encountered the same issue using the Google Collab here: https://colab.research.google.com/drive/1eOA2FYHqEfZWLYssbUxdIpSL3PFxWVjk?usp=sharing#scrollTo=uNxSRUVu4YvY

@Alex-Monahan
Copy link

Alex-Monahan commented Sep 18, 2024

I just encountered the same issue using the Google Collab here: https://colab.research.google.com/drive/1eOA2FYHqEfZWLYssbUxdIpSL3PFxWVjk?usp=sharing#scrollTo=uNxSRUVu4YvY

The DuckDB configuration setting to allow DataFrame access is set to true, so that isn't the issue.
image

I also tried with a native DuckDB connection within JupySQL instead of SQLAlchemy and received a similar error.

However, running DuckDB directly on the dataframe works without an issue.

conn = duckdb.connect()
conn.execute('''SELECT sum(i) as total_i FROM input_df''').df()

@Alex-Monahan
Copy link

Alex-Monahan commented Sep 18, 2024

This may be related to duckdb/duckdb#13896. This fix is coming in DuckDB 1.1.1, tentatively planned for about a week from now.

@Alex-Monahan
Copy link

My current hypothesis is that it is related to the DuckDB issue above (duckdb/duckdb#13896). Maybe JupySQL is trying to access a dataframe that is no longer in the same frame? I think there are 2 paths: either adjust how the code is executed so that the dataframes are in the same frame, or once 1.1.1 arrives, automatically set the upcoming setting python_scan_all_frames to true automatically when using JupySQL.

@edublancas
Copy link

I tried this with duckdb 1.0.0 and it works:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"x":range(10)})

In [3]: df
Out[3]:
   x
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

In [5]: %load_ext sql
The 'toml' package isn't installed. To load settings from pyproject.toml or ~/.jupysql/config, install with: pip install toml

In [6]: %sql duckdb://
Connecting to 'duckdb://'

In [7]: %sql select * from df
Running query in 'duckdb://'
Out[7]:
+---+
| x |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
Truncated to displaylimit of 10.

my recommendation for now is to downgrade:

pip install 'duckdb<1.1'

once the duckdb fix is in, we'll update jupysql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants