Skip to content

Commit

Permalink
add support for spark (#968)
Browse files Browse the repository at this point in the history
* add basic spark support to library

* adding tests

* formatting

* add spark connection

* add spark connection

* fixed test and formating

* added docs

* exclude execution

* documentation updates

* adjust doc string for close

* add generic

* integrated better with existing functionality

* finishing integration tests

* pass config and alias correctly

* fixed issue with backticks and also implemented fake cursor

* change configuration name

* fix env variable error integration tests CI

* fixing lint errors

* change log formating

* metadata ipynb

* addressing comments

* update changelog

* changelog

* fix row count

* spelling

* spelling

* remove pypark dev dependency

* review comments

* missed readStream in connection.py
  • Loading branch information
gilandose authored Dec 24, 2023
1 parent 4fda165 commit f4088a3
Show file tree
Hide file tree
Showing 25 changed files with 1,858 additions and 11 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## 0.10.7dev

* [Feature] Add Spark Connection as a dialect for Jupysql ([#965](https://github.com/ploomber/jupysql/issues/965)) (by [@gilandose](https://github.com/gilandose))

## 0.10.6 (2023-12-21)

* [Fix] Fix error when `%sql` includes a query with negative numbers ([#958](https://github.com/ploomber/jupysql/issues/958))
Expand Down
1 change: 1 addition & 0 deletions doc/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ parts:
- file: integrations/duckdb-native
- file: integrations/compatibility
- file: integrations/chdb
- file: integrations/spark

- caption: API Reference
chapters:
Expand Down
20 changes: 20 additions & 0 deletions doc/api/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,26 @@ value enables the ones from previous values plus new ones:
- `2`: All feedback
- Footer to distinguish pandas/polars data frames from JupySQL's result sets

## `lazy_execution`

```{versionadded} 0.10.7
This option only works when connecting to Spark
```

Default: `False`

Return lazy relation to dataset rather than executing through JupySql.

```{code-cell} ipython3
%config SqlMagic.lazy_execution = True
df = %sql SELECT * FROM languages
```

```{code-cell} ipython3
%config SqlMagic.lazy_execution = False
res = %sql SELECT * FROM languages
```

## `named_parameters`

```{versionadded} 0.9
Expand Down
1 change: 1 addition & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"integrations/oracle.ipynb",
"integrations/snowflake.ipynb",
"integrations/redshift.ipynb",
"integrations/spark.ipynb",
]
nb_execution_in_temp = True
nb_execution_show_tb = True
Expand Down
18 changes: 17 additions & 1 deletion doc/integrations/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,4 +114,20 @@ These table reflects the compatibility status of JupySQL `>=0.7`
- Listing tables with `%sqlcmd tables` βœ…
- Listing columns with `%sqlcmd columns` βœ…
- Parametrized SQL queries via `{{parameter}}` βœ…
- Interactive SQL queries via `--interact` βœ…
- Interactive SQL queries via `--interact` βœ…

## Spark

- Running queries with `%%sql` βœ…
- CTEs with `%%sql --save NAME` βœ…
- Plotting with `%%sqlplot boxplot` ❓
- Plotting with `%%sqlplot bar` βœ…
- Plotting with `%%sqlplot pie` βœ…
- Plotting with `%%sqlplot histogram` βœ…
- Plotting with `ggplot` βœ…
- Profiling tables with `%sqlcmd profile` βœ…
- Listing tables with `%sqlcmd tables` ❌
- Listing columns with `%sqlcmd columns` ❌
- Parametrized SQL queries via `{{parameter}}` βœ…
- Interactive SQL queries via `--interact` βœ…
- Persisting Dataframes via `--persist` βœ…
Loading

0 comments on commit f4088a3

Please sign in to comment.