Skip to content

Commit

Permalink
[Draft] Investigating SQL Support. (#11)
Browse files Browse the repository at this point in the history
Beginning to investigate SQL support. Right now, for anything beyond a
small Xarray dataset, this is terribly slow.
This PR also introduces a profiling methodology using py-spy
(statistical profiling).
  • Loading branch information
alxmrs authored Feb 15, 2024
1 parent c75c10b commit 429e5b4
Show file tree
Hide file tree
Showing 14 changed files with 1,823 additions and 4 deletions.
19 changes: 19 additions & 0 deletions perf_tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Performance testing & profiling

So far, this includes statistical profiles via py-spy.

## Dev Process

1. Run a profile test with the `profile.sh` script as so:

```shell
# PROFILE_CASE_PY=groupby_air.py
sudo ./profile.sh $PROFILE_CASE_PY
```

This will open a flame graph in the browser.

2. After tuning code in qarray, run another profile to generate a SVG.

3. Please commit the "after" profile SVG along with the performance improvements.

34 changes: 34 additions & 0 deletions perf_tests/groupby_air.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/usr/bin/env python3

import xarray as xr
import qarray as qr
from dask_sql import Context


if __name__ == '__main__':
air = xr.tutorial.open_dataset('air_temperature')
chunks = {'time': 240, 'lat': 5, 'lon': 7}
air = air.chunk(chunks)
air_small = air.isel(
time=slice(0, 12), lat=slice(0, 11), lon=slice(0, 10)
).chunk(chunks)

df = qr.to_dd(air_small)

c = Context()
c.create_table('air', df)

query = c.sql('''
SELECT
"lat", "lon", SUM("air") as air_total
FROM
"air"
GROUP BY
"lat", "lon"
''')

result = query.compute()

expected = air.dims['lat'] * air.dims['lon']
assert len(result) == expected, f'Length must be {expected}.'
print(expected)
Loading

0 comments on commit 429e5b4

Please sign in to comment.