Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding --with for CTE #705

Merged
merged 2 commits into from
Jul 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
* [Fix] Refactored `ResultSet` to lazy loading (#470)
* [Fix] Removed `WITH` when a snippet does not have a dependency (#657)
* [Fix] Used display module when generating CTE (#649)
* [Fix] Adding `--with` back because of issues with sqlglot query parser (#684)

## 0.7.9 (2023-06-19)

Expand Down
48 changes: 47 additions & 1 deletion doc/compose.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pip install jupysql matplotlib
```


```{versionchanged} 0.7.8
```{versionchanged} 0.7.10
```

```{note}
Expand Down Expand Up @@ -158,6 +158,52 @@ We can verify the retrieved query returns the same result:
{{final}}
```

#### `--with` argument

JupySQL also allows you to specify the snippet name explicitly by passing the `--with` argument. This is particularly useful when our parsing logic is unable to determine the table name due to dialect variations. For example, consider the below example:

```{code-cell} ipython3
%sql duckdb://
```

```{code-cell} ipython3
%%sql --save first_cte --no-execute
SELECT 1 AS column1, 2 AS column2
```

```{code-cell} ipython3
%%sql --save second_cte --no-execute
SELECT
sum(column1),
sum(column2) FILTER (column2 = 2)
FROM first_cte
```

```{code-cell} ipython3
:tags: [raises-exception]

%%sql
SELECT * FROM second_cte
```

Note that the query fails because the clause `FILTER (column2 = 2)` makes it difficult for the parser to extract the table name. While this syntax works on some dialects like `DuckDB`, the more common usage is to specify `WHERE` clause as well, like `FILTER (WHERE column2 = 2)`.

Now let's run the same query by specifying `--with` argument.

```{code-cell} ipython3
%%sql --with first_cte --save second_cte --no-execute
SELECT
sum(column1),
sum(column2) FILTER (column2 = 2)
FROM first_cte
```

```{code-cell} ipython3
%%sql
SELECT * FROM second_cte
```


## Summary

In the given example, we demonstrated JupySQL's usage as a tool for managing large SQL queries in Jupyter Notebooks. It effectively broke down a complex query into smaller, organized parts, simplifying the process of analyzing a record store's sales database. By using JupySQL, users can easily maintain and reuse their queries, enhancing the overall data analysis experience.
7 changes: 7 additions & 0 deletions doc/plot.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,13 @@ We can see the highest value is a bit over 6, that's expected since we set a 6.3

+++

If you wish to specify the saved snippet explicitly, please use the `--with` argument.
[Click here](../compose) for more details on when to specify `--with` explicitly.

```{code-cell} ipython3
%sqlplot boxplot --table short_trips --column trip_distance --with short_trips
```

## Histogram

To create a histogram, call `%sqlplot histogram`, and pass the name of the table, the column you want to plot, and the number of bins. Similarly to what we did in the [Boxplot](#boxplot) example, JupySQL detects a saved snippet and only plots such data subset.
Expand Down
2 changes: 0 additions & 2 deletions src/sql/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
from .magic import RenderMagic, SqlMagic, load_ipython_extension
from .error_message import SYNTAX_ERROR
from .connection import PLOOMBER_DOCS_LINK_STR

__version__ = "0.7.10dev"
Expand All @@ -9,6 +8,5 @@
"RenderMagic",
"SqlMagic",
"load_ipython_extension",
"SYNTAX_ERROR",
"PLOOMBER_DOCS_LINK_STR",
]
4 changes: 4 additions & 0 deletions src/sql/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ def __init__(self, magic, user_ns, line, cell) -> None:
if add_alias:
self.parsed["connection"] = self.args.line[0]

if self.args.with_:
final = store.render(self.parsed["sql"], with_=self.args.with_)
self.parsed["sql"] = str(final)

@property
def sql(self):
"""
Expand Down
85 changes: 27 additions & 58 deletions src/sql/error_message.py
Original file line number Diff line number Diff line change
@@ -1,70 +1,39 @@
import sqlglot
import sqlparse

SYNTAX_ERROR = "\nLooks like there is a syntax error in your query."
ORIGINAL_ERROR = "\nOriginal error message from DB driver:\n"
CTE_MSG = (
"If using snippets, you may pass the --with argument explicitly.\n"
"For more details please refer : "
"https://jupysql.ploomber.io/en/latest/compose.html#with-argument"
)


def parse_sqlglot_error(e, q):
def _is_syntax_error(error):
"""
Function to parse the error message from sqlglot

Parameters
----------
e: sqlglot.errors.ParseError, exception
while parsing through sqlglot
q : str, user query

Returns
-------
str
Formatted error message containing description
and positions
Function to detect whether error message from DB driver
is related to syntax error in user query.
"""
err = e.errors
position = ""
for item in err:
position += (
f"Syntax Error in {q}: {item['description']} at "
f"Line {item['line']}, Column {item['col']}\n"
)
msg = "Possible reason: \n" + position if position else ""
return msg
error_lower = error.lower()
return (
"syntax error" in error_lower
or ("catalog error" in error_lower and "does not exist" in error_lower)
or "error in your sql syntax" in error_lower
or "incorrect syntax" in error_lower
or "not found" in error_lower
)


def detail(original_error, query=None):
def detail(original_error):
original_error = str(original_error)
return_msg = SYNTAX_ERROR
if "syntax error" in original_error:
query_list = sqlparse.split(query)
for q in query_list:
try:
q = q.strip()
q = q[:-1] if q.endswith(";") else q
parse = sqlglot.transpile(q)
suggestions = ""
if q.upper() not in [suggestion.upper() for suggestion in parse]:
suggestions += f"Did you mean : {parse}\n"
return_msg = (
return_msg + "Possible reason: \n" + suggestions
if suggestions
else return_msg
)

except sqlglot.errors.ParseError as e:
parse_msg = parse_sqlglot_error(e, q)
return_msg = return_msg + parse_msg if parse_msg else return_msg

return return_msg + "\n" + ORIGINAL_ERROR + original_error + "\n"
if _is_syntax_error(original_error):
return f"{CTE_MSG}\n\n{ORIGINAL_ERROR}{original_error}\n"

if "fe_sendauth: no password supplied" in original_error:
return (
"\nLooks like you have run into some issues. "
"Review our DB connection via URL strings guide: "
"https://jupysql.ploomber.io/en/latest/connecting.html ."
" Using Ubuntu? Check out this guide: "
"https://help.ubuntu.com/community/PostgreSQL#fe_sendauth:_"
"no_password_supplied\n" + ORIGINAL_ERROR + original_error + "\n"
)
specific_error = """\nLooks like you have run into some issues.
Review our DB connection via URL strings guide:
https://jupysql.ploomber.io/en/latest/connecting.html .
Using Ubuntu? Check out this guide: "
https://help.ubuntu.com/community/PostgreSQL#fe_sendauth:_
no_password_supplied\n"""

return f"{specific_error}\n{ORIGINAL_ERROR}{original_error}\n"

return None
79 changes: 47 additions & 32 deletions src/sql/magic.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
from sql.magic_cmd import SqlCmdMagic
from sql._patch import patch_ipython_usage_error
from sql import query_util
from sql.util import get_suggestions_message, show_deprecation_warning
from sql.util import get_suggestions_message, pretty_print
from ploomber_core.dependencies import check_installed

from sql.error_message import detail
Expand Down Expand Up @@ -214,6 +214,35 @@ def check_random_arguments(self, line="", cell=""):
"Unrecognized argument(s): {}".format(check_argument)
)

def _error_handling(self, e, query):
detailed_msg = detail(e)
if self.short_errors:
if detailed_msg is not None:
err = exceptions.UsageError(detailed_msg)
raise err
# TODO: move to error_messages.py
# Added here due to circular dependency issue (#545)
elif "no such table" in str(e):
tables = query_util.extract_tables_from_query(query)
for table in tables:
suggestions = get_close_matches(table, list(self._store))
err_message = f"There is no table with name {table!r}."
# with_message = "Alternatively, please specify table
# name using --with argument"
if len(suggestions) > 0:
suggestions_message = get_suggestions_message(suggestions)
raise exceptions.TableNotFoundError(
f"{err_message}{suggestions_message}"
)
display.message(str(e))
else:
display.message(str(e))
else:
if detailed_msg is not None:
display.message(detailed_msg)
e.modify_exception = True
raise e

@no_var_expand
@needs_local_scope
@line_magic("sql")
Expand Down Expand Up @@ -364,12 +393,17 @@ def interactive_execute_wrapper(**kwargs):

args = command.args

with_ = self._store.infer_dependencies(command.sql_original, args.save)
if with_:
command.set_sql_with(with_)
display.message(f"Generating CTE with stored snippets: {', '.join(with_)}")
if args.with_:
with_ = args.with_
else:
with_ = None
with_ = self._store.infer_dependencies(command.sql_original, args.save)
if with_:
command.set_sql_with(with_)
display.message(
f"Generating CTE with stored snippets : {pretty_print(with_)}"
)
else:
with_ = None

# Create the interactive slider
if args.interact and not is_interactive_mode:
Expand Down Expand Up @@ -405,7 +439,7 @@ def interactive_execute_wrapper(**kwargs):
raw_args = raw_args[1:-1]
args.connection_arguments = json.loads(raw_args)
except Exception as e:
print(e)
display.message(str(e))
raise e
else:
args.connection_arguments = {}
Expand Down Expand Up @@ -458,8 +492,7 @@ def interactive_execute_wrapper(**kwargs):

if not command.sql:
return
if args.with_:
show_deprecation_warning()

# store the query if needed
if args.save:
if "-" in args.save:
Expand Down Expand Up @@ -514,30 +547,12 @@ def interactive_execute_wrapper(**kwargs):
# JA: added DatabaseError for MySQL
except (ProgrammingError, OperationalError, DatabaseError) as e:
# Sqlite apparently return all errors as OperationalError :/
detailed_msg = detail(e, command.sql)
if self.short_errors:
if detailed_msg is not None:
err = exceptions.UsageError(detailed_msg)
raise err
# TODO: move to error_messages.py
# Added here due to circular dependency issue (#545)
elif "no such table" in str(e):
tables = query_util.extract_tables_from_query(command.sql)
for table in tables:
suggestions = get_close_matches(table, list(self._store))
if len(suggestions) > 0:
err_message = f"There is no table with name {table!r}."
suggestions_message = get_suggestions_message(suggestions)
raise exceptions.TableNotFoundError(
f"{err_message}{suggestions_message}"
)
print(e)
else:
print(e)
self._error_handling(e, command.sql)
except Exception as e:
# handle DuckDB exceptions
if "Catalog Error" in str(e):
self._error_handling(e, command.sql)
else:
if detailed_msg is not None:
print(detailed_msg)
e.modify_exception = True
raise e

legal_sql_identifier = re.compile(r"^[A-Za-z0-9#_$]+")
Expand Down
19 changes: 9 additions & 10 deletions src/sql/magic_plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,14 +80,21 @@ def execute(self, line="", cell="", local_ns=None):
"Example: %sqlplot histogram"
)

if cmd.args.line[0] not in SUPPORTED_PLOTS + ["hist", "box"]:
plot_str = util.pretty_print(SUPPORTED_PLOTS, last_delimiter="or")
raise exceptions.UsageError(
f"Unknown plot {cmd.args.line[0]!r}. Must be any of: " f"{plot_str}"
)

column = util.sanitize_identifier(column)
table = util.sanitize_identifier(cmd.args.table)

if cmd.args.with_:
util.show_deprecation_warning()
if cmd.args.line[0] in {"box", "boxplot"}:
with_ = cmd.args.with_
else:
with_ = self._check_table_exists(table)

if cmd.args.line[0] in {"box", "boxplot"}:
return plot.boxplot(
table=table,
column=column,
Expand All @@ -96,7 +103,6 @@ def execute(self, line="", cell="", local_ns=None):
conn=None,
)
elif cmd.args.line[0] in {"hist", "histogram"}:
with_ = self._check_table_exists(table)
return plot.histogram(
table=table,
column=column,
Expand All @@ -105,7 +111,6 @@ def execute(self, line="", cell="", local_ns=None):
conn=None,
)
elif cmd.args.line[0] in {"bar"}:
with_ = self._check_table_exists(table)
return plot.bar(
table=table,
column=column,
Expand All @@ -115,19 +120,13 @@ def execute(self, line="", cell="", local_ns=None):
conn=None,
)
elif cmd.args.line[0] in {"pie"}:
with_ = self._check_table_exists(table)
return plot.pie(
table=table,
column=column,
with_=with_,
show_num=cmd.args.show_numbers,
conn=None,
)
else:
plot_str = util.pretty_print(SUPPORTED_PLOTS, last_delimiter="or")
raise exceptions.UsageError(
f"Unknown plot {cmd.args.line[0]!r}. Must be any of: " f"{plot_str}"
)

@staticmethod
def _check_table_exists(table):
Expand Down
Loading