Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--persist-replace argument #440

Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b8497fd
Add: init
tonykploomber Apr 24, 2023
b94064e
Add: doc
tonykploomber Apr 24, 2023
11460bb
fix: test_command
tonykploomber Apr 24, 2023
f53e7a2
fix: lint
tonykploomber Apr 24, 2023
e56996e
Fix: comments
tonykploomber Apr 25, 2023
2914f5d
Fix: comments
tonykploomber Apr 25, 2023
f38618d
Fix: doc
tonykploomber Apr 25, 2023
778b4c7
Merge remote-tracking branch 'upstream/master' into 154-proposal-add-…
tonykploomber Apr 25, 2023
53faac7
fix: error
tonykploomber Apr 25, 2023
5e00266
fix: lint
tonykploomber Apr 25, 2023
fc63265
Resolve conflict
tonykploomber Apr 26, 2023
db86e06
fix test case
tonykploomber Apr 27, 2023
f8d0981
fix: comment
tonykploomber Apr 27, 2023
831304f
Remove: empty cell
tonykploomber Apr 27, 2023
0f92d52
fix comment
tonykploomber Apr 28, 2023
9a2f476
fix: comment
tonykploomber Apr 28, 2023
b99cf09
fix: --persist and --persist-replace used together
tonykploomber Apr 28, 2023
483bc5a
Merge remote-tracking branch 'upstream/master' into 154-proposal-add-…
tonykploomber Apr 28, 2023
e61f037
Merge remote-tracking branch 'upstream/master' into 154-proposal-add-…
tonykploomber Apr 29, 2023
d56695d
fix Table already exists
tonykploomber May 6, 2023
7e0ea1a
fix overridden
tonykploomber May 6, 2023
5ed823f
fix: You cannot simultaneously replace
tonykploomber May 6, 2023
be7a9f7
Merge remote-tracking branch 'upstream/master' into 154-proposal-add-…
tonykploomber May 6, 2023
09fbd34
fix lint
tonykploomber May 6, 2023
13a3a5b
fix: persist and args.persist_replace
tonykploomber May 15, 2023
2d4a54c
fix: test cases
tonykploomber May 16, 2023
f1b9ce5
address comments
tonykploomber May 17, 2023
264f414
fix: lint
tonykploomber May 17, 2023
bf644bf
Merge remote-tracking branch 'upstream/master' into 154-proposal-add-…
tonykploomber May 22, 2023
0341738
Merge remote-tracking branch 'upstream/master' into 154-proposal-add-…
tonykploomber May 22, 2023
f0a78a4
add changelog
tonykploomber May 24, 2023
01b3fe8
update already exists message
tonykploomber May 24, 2023
0b2742c
fix to usage error
tonykploomber May 24, 2023
4909a44
fix You cannot simultaneously persist
tonykploomber May 24, 2023
b433319
print -> warnings.warn
tonykploomber May 26, 2023
4fa9e22
Merge remote-tracking branch 'upstream/master' into 154-proposal-add-…
tonykploomber May 26, 2023
e737c9d
rebuild
tonykploomber May 26, 2023
fe6ade0
update testing with ploomber-core
tonykploomber May 26, 2023
d336210
Merge remote-tracking branch 'upstream/master' into 154-proposal-add-…
tonykploomber May 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
* [Feature] Better error messages when function used in plotting API unsupported by DB driver (#159)
* [Fix] Fix the default value of %config SqlMagic.displaylimit to 10 (#462)
* [Feature] Detailed error messages when syntax error in SQL query, postgres connection password missing or inaccessible, invalid DuckDB connection string (#229)
* [Feature] Adds `--persist-replace` argument (#440)
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved


## 0.7.4 (2023-04-28)
Expand Down
30 changes: 27 additions & 3 deletions doc/integrations/pandas.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
jupytext:
notebook_metadata_filter: myst
cell_metadata_filter: -all
formats: md:myst
notebook_metadata_filter: myst
text_representation:
extension: .md
format_name: myst
Expand All @@ -14,7 +14,8 @@ kernelspec:
name: python3
myst:
html_meta:
description lang=en: Convert outputs from SQL queries to pandas data frames using JupySQL
description lang=en: Convert outputs from SQL queries to pandas data frames using
JupySQL
keywords: jupyter, sql, jupysql, pandas
property=og:locale: en_US
---
Expand Down Expand Up @@ -86,7 +87,9 @@ df

+++

The `--persist` argument, with the name of a DataFrame object in memory,
### `--persist`

The `--persist` argument, with the name of a DataFrame object in memory,
will create a table name in the database from the named DataFrame. Or use `--append` to add rows to an existing table by that name.

```{code-cell} ipython3
Expand All @@ -97,6 +100,27 @@ will create a table name in the database from the named DataFrame. Or use `--a
%sql SELECT * FROM df;
```

### `--persist-replace`

The `--persist-replace` performs the similiar functionaility with `--persist`,
but it will drop the existing table before inserting the new table

#### Declare the dataframe again

```{code-cell} ipython3
df = %sql SELECT * FROM writer LIMIT 1
df
```

tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
#### Use `--persist-replace`

```{code-cell} ipython3
%sql --persist-replace df
```

#### df table is overridden

```{code-cell} ipython3
%sql SELECT * FROM df;
```

50 changes: 44 additions & 6 deletions src/sql/magic.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,12 @@ def _mutex_autopandas_autopolars(self, change):
action="store_true",
help="create a table name in the database from the named DataFrame",
)
@argument(
"-P",
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
"--persist-replace",
action="store_true",
help="replace the DataFrame if it exists, otherwise perform --persist",
)
@argument(
"-n",
"--no-index",
Expand Down Expand Up @@ -367,11 +373,34 @@ def interactive_execute_wrapper(**kwargs):
alias=args.alias,
)
payload["connection_info"] = conn._get_curr_sqlalchemy_connection_info()
if args.persist:
if args.persist_replace and args.append:
raise exceptions.ValueError(
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
"""You cannot simultaneously replace and append data to a dataframe;
please choose to utilize either one or the other."""
)
if args.persist and args.persist_replace:
edublancas marked this conversation as resolved.
Show resolved Hide resolved
print("Please use either --persist or --persist-replace")
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
return self._persist_dataframe(
command.sql,
conn,
user_ns,
append=False,
index=not args.no_index,
replace=True,
)
elif args.persist:
return self._persist_dataframe(
command.sql, conn, user_ns, append=False, index=not args.no_index
)

elif args.persist_replace:
return self._persist_dataframe(
command.sql,
conn,
user_ns,
append=False,
index=not args.no_index,
replace=True,
)
if args.append:
return self._persist_dataframe(
command.sql, conn, user_ns, append=True, index=not args.no_index
Expand Down Expand Up @@ -447,7 +476,9 @@ def interactive_execute_wrapper(**kwargs):
legal_sql_identifier = re.compile(r"^[A-Za-z0-9#_$]+")

@modify_exceptions
def _persist_dataframe(self, raw, conn, user_ns, append=False, index=True):
def _persist_dataframe(
self, raw, conn, user_ns, append=False, index=True, replace=False
):
"""Implements PERSIST, which writes a DataFrame to the RDBMS"""
if not DataFrame:
raise exceptions.MissingPackageError(
Expand Down Expand Up @@ -486,14 +517,21 @@ def _persist_dataframe(self, raw, conn, user_ns, append=False, index=True):
table_name = frame_name.lower()
table_name = self.legal_sql_identifier.search(table_name).group(0)

if_exists = "append" if append else "fail"
if replace:
if_exists = "replace"
elif append:
if_exists = "append"
else:
if_exists = "fail"

try:
frame.to_sql(
table_name, conn.session.engine, if_exists=if_exists, index=index
)
except ValueError as e:
raise exceptions.ValueError(e) from e
except ValueError:
raise exceptions.ValueError(
"Table already exists; consider using --persist-replace."
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
)

return "Persisted %s" % table_name

Expand Down
1 change: 1 addition & 0 deletions src/tests/test_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ def test_args(ip, sql_magic):
"creator": None,
"section": None,
"persist": False,
"persist_replace": False,
"no_index": False,
"append": False,
"connection_arguments": None,
Expand Down
209 changes: 209 additions & 0 deletions src/tests/test_magic.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,215 @@ def test_persist_bare(ip):
assert result.error_in_exec


def get_table_rows_as_dataframe(ip, table, name=None):
"""The function will generate the pandas dataframe in the namespace
by querying the data by given table name"""
if name:
saved_df_name = name
else:
saved_df_name = f"df_{table}"
ip.run_cell(f"results = %sql SELECT * FROM {table} LIMIT 1;")
ip.run_cell(f"{saved_df_name} = results.DataFrame()")
return saved_df_name


@pytest.mark.parametrize(
"test_table, expected_result",
[
("test", [(0, 1, "foo")]),
("author", [(0, "William", "Shakespeare", 1616)]),
(
"website",
[
(
0,
"Bertold Brecht",
"https://en.wikipedia.org/wiki/Bertolt_Brecht",
1954,
)
],
),
("number_table", [(0, 4, -2)]),
],
)
def test_persist_replace_abbr_no_override(ip, test_table, expected_result):
saved_df_name = get_table_rows_as_dataframe(ip, table=test_table)
ip.run_cell(f"%sql -P sqlite:// {saved_df_name}")
out = ip.run_cell(f"%sql SELECT * FROM {saved_df_name}")
assert out.result == expected_result
assert out.error_in_exec is None


tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
@pytest.mark.parametrize(
"test_table, expected_result",
[
("test", [(0, 1, "foo")]),
("author", [(0, "William", "Shakespeare", 1616)]),
(
"website",
[
(
0,
"Bertold Brecht",
"https://en.wikipedia.org/wiki/Bertolt_Brecht",
1954,
)
],
),
("number_table", [(0, 4, -2)]),
],
)
def test_persist_replace_no_override(ip, test_table, expected_result):
saved_df_name = get_table_rows_as_dataframe(ip, table=test_table)
ip.run_cell(f"%sql --persist-replace sqlite:// {saved_df_name}")
out = ip.run_cell(f"%sql SELECT * FROM {saved_df_name}")
assert out.result == expected_result
assert out.error_in_exec is None


@pytest.mark.parametrize(
"first_test_table, second_test_table, expected_result",
[
("test", "author", [(0, "William", "Shakespeare", 1616)]),
("author", "test", [(0, 1, "foo")]),
("test", "number_table", [(0, 4, -2)]),
("number_table", "test", [(0, 1, "foo")]),
],
)
def test_persist_replace_override(
ip, first_test_table, second_test_table, expected_result
):
saved_df_name = "dummy_df_name"
table_df = get_table_rows_as_dataframe(
ip, table=first_test_table, name=saved_df_name
)
ip.run_cell(f"%sql --persist sqlite:// {table_df}")
table_df = get_table_rows_as_dataframe(
ip, table=second_test_table, name=saved_df_name
)
# To test the second --persist-replace executes successfully
persist_replace_out = ip.run_cell(f"%sql --persist-replace sqlite:// {table_df}")
assert persist_replace_out.error_in_exec is None

# To test the persisted data is from --persist
out = ip.run_cell(f"%sql SELECT * FROM {table_df}")
assert out.result == expected_result
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
assert out.error_in_exec is None


@pytest.mark.parametrize(
"first_test_table, second_test_table, expected_result",
[
("test", "author", [(0, 1, "foo")]),
("author", "test", [(0, "William", "Shakespeare", 1616)]),
("test", "number_table", [(0, 1, "foo")]),
("number_table", "test", [(0, 4, -2)]),
],
)
def test_persist_replace_override_reverted_order(
ip, first_test_table, second_test_table, expected_result
):
saved_df_name = "dummy_df_name"
table_df = get_table_rows_as_dataframe(
ip, table=first_test_table, name=saved_df_name
)
ip.run_cell(f"%sql --persist-replace sqlite:// {table_df}")
table_df = get_table_rows_as_dataframe(
ip, table=second_test_table, name=saved_df_name
)
persist_out = ip.run_cell(f"%sql --persist sqlite:// {table_df}")

# To test the second --persist executes not successfully
assert "Table already exists; consider using --persist-replace." in str(
persist_out.error_in_exec
)

out = ip.run_cell(f"%sql SELECT * FROM {table_df}")
# To test the persisted data is from --persist-replace
assert out.result == expected_result
assert out.error_in_exec is None


@pytest.mark.parametrize(
"test_table", [("test"), ("author"), ("website"), ("number_table")]
)
def test_persist_and_append_use_together(ip, test_table):
# Test error message when use --persist and --append together
saved_df_name = get_table_rows_as_dataframe(ip, table=test_table)
out = ip.run_cell(f"%sql --persist-replace --append sqlite:// {saved_df_name}")

assert """You cannot simultaneously replace and append data to a dataframe;
please choose to utilize either one or the other.""" in str(
out.error_in_exec
)
assert (out.error_in_exec.error_type) == "ValueError"


@pytest.mark.parametrize(
"test_table, expected_result",
[
("test", [(0, 1, "foo")]),
("author", [(0, "William", "Shakespeare", 1616)]),
(
"website",
[
(
0,
"Bertold Brecht",
"https://en.wikipedia.org/wiki/Bertolt_Brecht",
1954,
)
],
),
("number_table", [(0, 4, -2)]),
],
)
def test_persist_and_persist_replace_use_together(
ip, capsys, test_table, expected_result
):
# Test error message when use --persist and --persist-replace together
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
saved_df_name = get_table_rows_as_dataframe(ip, table=test_table)
ip.run_cell(f"%sql --persist --persist-replace sqlite:// {saved_df_name}")
persist_replace_out, _ = capsys.readouterr()
execute_out = ip.run_cell(f"%sql SELECT * FROM {saved_df_name}")

# Test warning message
assert "Please use either --persist or --persist-replace" in persist_replace_out
# Test persist-replace is used
assert execute_out.result == expected_result
tonykploomber marked this conversation as resolved.
Show resolved Hide resolved
assert execute_out.error_in_exec is None


@pytest.mark.parametrize(
"first_test_table, second_test_table, expected_result",
[
("test", "author", [(0, "William", "Shakespeare", 1616)]),
("author", "test", [(0, 1, "foo")]),
("test", "number_table", [(0, 4, -2)]),
("number_table", "test", [(0, 1, "foo")]),
],
)
def test_persist_replace_twice(
ip, first_test_table, second_test_table, expected_result
):
saved_df_name = "dummy_df_name"

table_df = get_table_rows_as_dataframe(
ip, table=first_test_table, name=saved_df_name
)
ip.run_cell(f"%sql --persist-replace sqlite:// {table_df}")

table_df = get_table_rows_as_dataframe(
ip, table=second_test_table, name=saved_df_name
)
ip.run_cell(f"%sql --persist-replace sqlite:// {table_df}")

out = ip.run_cell(f"%sql SELECT * FROM {table_df}")
# To test the persisted data is from --persist-replace
assert out.result == expected_result
assert out.error_in_exec is None

tonykploomber marked this conversation as resolved.
Show resolved Hide resolved

def test_connection_args_enforce_json(ip):
result = ip.run_cell('%sql --connection_arguments {"badlyformed":true')
assert result.error_in_exec
Expand Down
1 change: 1 addition & 0 deletions src/tests/test_parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ def complete_with_defaults(mapping):
"creator": None,
"section": None,
"persist": False,
"persist_replace": False,
"no_index": False,
"append": False,
"connection_arguments": None,
Expand Down