Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%sqlplot bar and pie charts #508

Merged
merged 44 commits into from
Jun 1, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
93e0672
Added bar and pie charts
mehtamohit013 May 23, 2023
63598ac
Added tests
mehtamohit013 May 24, 2023
1abbab8
Format
mehtamohit013 May 24, 2023
1d56891
Added support for orient in bar graph
mehtamohit013 May 24, 2023
10dc8a5
test and formatting
mehtamohit013 May 24, 2023
8ed37ac
Added doc
mehtamohit013 May 24, 2023
e7eae4d
changelog
mehtamohit013 May 24, 2023
3739263
Fix api doc
mehtamohit013 May 25, 2023
f8d962c
Fix
mehtamohit013 May 25, 2023
4e6b609
Removed facet
mehtamohit013 May 25, 2023
3a7edd7
Fix
mehtamohit013 May 25, 2023
f381423
Fix
mehtamohit013 May 25, 2023
fed5bbd
Added --show_number argument
mehtamohit013 May 25, 2023
64ce72d
Modified doc
mehtamohit013 May 25, 2023
3896866
Updated tests
mehtamohit013 May 25, 2023
8697181
Fixed error type
mehtamohit013 May 28, 2023
db3e167
typo
mehtamohit013 May 28, 2023
ad81d68
Merge branch 'master' into plot
mehtamohit013 May 28, 2023
e49ecb5
Changed arg name
mehtamohit013 May 28, 2023
f6befa2
Merge branch 'plot' of github.com:mehtamohit013/jupysql into plot
mehtamohit013 May 28, 2023
b57626f
Fix error type
mehtamohit013 May 28, 2023
5bd5045
Added version admonition
mehtamohit013 May 28, 2023
36ab740
Added matplotlib test
mehtamohit013 May 29, 2023
2748470
Format
mehtamohit013 May 29, 2023
e52e909
Fix tests
mehtamohit013 May 29, 2023
dbf1425
Format
mehtamohit013 May 29, 2023
955463d
Merge branch 'master' into plot
mehtamohit013 May 29, 2023
61817df
Fix tests
mehtamohit013 May 29, 2023
4163e5d
Fix tests
mehtamohit013 May 29, 2023
46aacb0
Added support for NULLs
mehtamohit013 May 29, 2023
c60d48e
Fix
mehtamohit013 May 30, 2023
c125c91
Lint
mehtamohit013 May 30, 2023
d43cc9e
Updated imgs
mehtamohit013 May 30, 2023
d2ed0f3
Fix doc
mehtamohit013 May 30, 2023
1633599
Minor
mehtamohit013 May 30, 2023
3192c9a
lint
mehtamohit013 May 30, 2023
4ab7954
Merge branch 'master' into plot
mehtamohit013 May 31, 2023
43f99ed
Added print statement
mehtamohit013 May 31, 2023
8cfce28
Merge branch 'master' into plot
mehtamohit013 May 31, 2023
548890f
Added null test
mehtamohit013 May 31, 2023
e259e33
Lint
mehtamohit013 May 31, 2023
5f43ecb
Fix
mehtamohit013 May 31, 2023
ec33a6d
Fix tests
mehtamohit013 May 31, 2023
54afbef
Format
mehtamohit013 May 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
* [Feature] Better error messages when function used in plotting API unsupported by DB driver (#159)
* [Fix] Fix the default value of %config SqlMagic.displaylimit to 10 (#462)
* [Feature] Detailed error messages when syntax error in SQL query, postgres connection password missing or inaccessible, invalid DuckDB connection string (#229)
* [Feature] Added bar plot and pie charts to %sqlplot (#508)


## 0.7.4 (2023-04-28)
Expand Down
97 changes: 97 additions & 0 deletions doc/api/magic-plot.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,100 @@ ax = %sqlplot histogram --table no_nulls --column body_mass_g --with no_nulls
ax.set_title("Body mass (grams)")
_ = ax.grid()
```
## `%sqlplot bar`
mehtamohit013 marked this conversation as resolved.
Show resolved Hide resolved

Shortcut: `%sqlplot bar`
mehtamohit013 marked this conversation as resolved.
Show resolved Hide resolved

`-t`/`--table` Table to use (if using DuckDB: path to the file to query)

`-c`/`--column` Column to plot.

`-o`/`--orient` Barplot orientation (`h` for horizontal, `v` for vertical)

`-w`/`--with` Use a previously saved query as input data

`-sn`/`--show_num` Show number on top of the bar
mehtamohit013 marked this conversation as resolved.
Show resolved Hide resolved

Barplot does not support NULL values, so let's remove them:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add support for NULL values? you can run a query to filter them first

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I raise some warning when there is NULL?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. not a warnings.warn, but printing a message saying that we're removing NULLs makes sense

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the message be printed only when there is NULL or a generic message?
Currently added a general message:

print(f"Removing NULLs, if there exists any from {column}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't think about this. we have two options

  1. check if there is a null (by querying the table). if so, show a message and filter them out
  2. embed the null filter in the existing SQL queries as a subquery - this will impact performance but I'm guessing most SQL engines are smart enough to just pass the data once

let's implement 2. this implies that we won't show any message to the user since we'll filter out NULLs all the time - so there's no point in showing a message. we can later optimize.

Copy link
Author

@mehtamohit013 mehtamohit013 May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edublancas
Implemented 2 option + Generic null Message for info :

Sample SQL query:

select "{{x_}}" as x,
"{{height_}}" as height
from "{{table}}"
where "{{x_}}" is not null
and "{{height_}}" is not null;


```{code-cell} ipython3
%%sql --save no_nulls --no-execute
SELECT *
FROM penguins.csv
WHERE species IS NOT NULL
```

```{code-cell} ipython3
%sqlplot bar --table no_nulls --column species --with no_nulls
```

You can additionally pass two columns to bar plot i.e. `x` and `height` columns.

```{code-cell} ipython3
%%sql --save add_col --no-execute --with no_nulls
SELECT species, count(species) as cnt
FROM no_nulls
group by species
```

```{code-cell} ipython3
%sqlplot bar --table add_col --column species cnt --with add_col
```

You can also pass the orientation using the `orient` argument.

```{code-cell} ipython3
%sqlplot bar --table add_col --column species cnt --with add_col --orient h
```

You can also show the number on top of the bar using the `show_num` argument.

```{code-cell} ipython3
%sqlplot bar --table no_nulls --column species --with no_nulls -sn
```

## `%sqlplot pie`

mehtamohit013 marked this conversation as resolved.
Show resolved Hide resolved
Shortcut: `%sqlplot pie`

`-t`/`--table` Table to use (if using DuckDB: path to the file to query)

`-c`/`--column` Column to plot

`-w`/`--with` Use a previously saved query as input data

`-sn`/`--show_num` Show the percentage on top of the pie

Pie chart does not support NULL values, so let's remove them:

```{code-cell} ipython3
%%sql --save no_nulls --no-execute
SELECT *
FROM penguins.csv
WHERE species IS NOT NULL
```

```{code-cell} ipython3
%sqlplot pie --table no_nulls --column species --with no_nulls
```

You can additionally pass two columns to bar plot i.e. `labels` and `x` columns.

```{code-cell} ipython3
%%sql --save add_col --no-execute --with no_nulls
SELECT species, count(species) as cnt
FROM no_nulls
group by species
```

```{code-cell} ipython3
%sqlplot pie --table add_col --column species cnt --with add_col
```
Here, `species` is the `labels` column and `cnt` is the `x` column.


You can also show the percentage on top of the pie using the `show_num` argument.

```{code-cell} ipython3
%sqlplot pie --table no_nulls --column species --with no_nulls -sn
```
36 changes: 34 additions & 2 deletions src/sql/magic_plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
from sql import exceptions
from sql import util

SUPPORTED_PLOTS = ["histogram", "boxplot", "bar", "pie"]


@magics_class
class SqlPlotMagic(Magics, Configurable):
Expand Down Expand Up @@ -51,6 +53,12 @@ class SqlPlotMagic(Magics, Configurable):
action="append",
dest="with_",
)
@argument(
"-sn",
"--show_num",
action="store_true",
help="Show number of observations",
)
@modify_exceptions
def execute(self, line="", cell="", local_ns=None):
"""
Expand All @@ -65,8 +73,10 @@ def execute(self, line="", cell="", local_ns=None):
column = cmd.args.column

if not cmd.args.line:
plot_str = util.pretty_print(SUPPORTED_PLOTS, last_delimiter="or")
raise exceptions.UsageError(
"Missing the first argument, must be: 'histogram' or 'boxplot'. "
"Missing the first argument, must be any of: "
f"{plot_str}\n"
"Example: %sqlplot histogram"
)

Expand All @@ -92,7 +102,29 @@ def execute(self, line="", cell="", local_ns=None):
with_=cmd.args.with_,
conn=None,
)
elif cmd.args.line[0] in {"bar"}:
util.is_table_exists(table, with_=cmd.args.with_)
yafimvo marked this conversation as resolved.
Show resolved Hide resolved

return plot.bar(
table=table,
column=column,
with_=cmd.args.with_,
orient=cmd.args.orient,
show_num=cmd.args.show_num,
conn=None,
)
elif cmd.args.line[0] in {"pie"}:
util.is_table_exists(table, with_=cmd.args.with_)

return plot.pie(
table=table,
column=column,
with_=cmd.args.with_,
show_num=cmd.args.show_num,
conn=None,
)
else:
plot_str = util.pretty_print(SUPPORTED_PLOTS, last_delimiter="or")
raise exceptions.UsageError(
f"Unknown plot {cmd.args.line[0]!r}. Must be: 'histogram' or 'boxplot'"
f"Unknown plot {cmd.args.line[0]!r}. Must be any of: " f"{plot_str}"
)
Loading