-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robust SQL query generation #158
Comments
Very appreciate for the research, I can work on this immediately I agree this is more robust way instead of just removing @edublancas The write parameter will be dynamic, according to the current database connection, which we already able to get the current dialect info in previous Telemetry database version collection, it's doable.
Ref: transpile |
I think we can use DuckDB since we can test that one easily. But we should keep an eye on the functions we use because if we use functions that are only available on DuckDB, the transpiled queries will fail as sqlglot does not check that the functions exist in the target dialect. Down the road, we can decide if we want to keep using DuckDB or switch to another one. I think another good option would be PostgreSQL since many databases attempt to be postgres-compatible. |
OK let's start with |
➤ Tony Kuo commented: We have some blocking issues by sqlglot, need to wait for them and unblock the issue Move to block now |
Currently, we have a few SQL templates that we use to generate queries (examples: here, and here). All of these templates have double quotes to wrap identifiers such as a table or column names (I added this to support identifiers with spaces). However, this isn't compatible with MySQL (and possibly other databases as well).
We've had reports (CTEs and plotting) where JupySQL fails on MySQL because the default configuration uses backticks and breaks with double quotes.
solution: sqlglot
I did some quick comparison and determined that sqlglot is the best solution: https://github.com/ploomber/contributing/blob/main/notes/sqlalchemy-sqlglot.ipynb
requirements:
percentile_disc
use case (namely a SQL query that haspercentile_disc([0.25, 0.50])
which is valid in duckdb, let's validate that the output from sqlglot is valid in MySQL, postgres, and sqlitewrite
parameter. sqlalchemy's dialect string might be different from the parameter that sqlglot expectscomments
Note that this is a configurable parameter, and users can configure MySQL (or other databases) to use double quotes and perhaps other characters. So even if we generate SQL statements with the default character, it might fail. For now, we can add a section in our docs explaining this issue (that JupySQL will generate SQL statements with the default delimiter).
The text was updated successfully, but these errors were encountered: