-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Draft] Investigating SQL Support. (#11)
Beginning to investigate SQL support. Right now, for anything beyond a small Xarray dataset, this is terribly slow. This PR also introduces a profiling methodology using py-spy (statistical profiling).
- Loading branch information
Showing
14 changed files
with
1,823 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Performance testing & profiling | ||
|
||
So far, this includes statistical profiles via py-spy. | ||
|
||
## Dev Process | ||
|
||
1. Run a profile test with the `profile.sh` script as so: | ||
|
||
```shell | ||
# PROFILE_CASE_PY=groupby_air.py | ||
sudo ./profile.sh $PROFILE_CASE_PY | ||
``` | ||
|
||
This will open a flame graph in the browser. | ||
|
||
2. After tuning code in qarray, run another profile to generate a SVG. | ||
|
||
3. Please commit the "after" profile SVG along with the performance improvements. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
#!/usr/bin/env python3 | ||
|
||
import xarray as xr | ||
import qarray as qr | ||
from dask_sql import Context | ||
|
||
|
||
if __name__ == '__main__': | ||
air = xr.tutorial.open_dataset('air_temperature') | ||
chunks = {'time': 240, 'lat': 5, 'lon': 7} | ||
air = air.chunk(chunks) | ||
air_small = air.isel( | ||
time=slice(0, 12), lat=slice(0, 11), lon=slice(0, 10) | ||
).chunk(chunks) | ||
|
||
df = qr.to_dd(air_small) | ||
|
||
c = Context() | ||
c.create_table('air', df) | ||
|
||
query = c.sql(''' | ||
SELECT | ||
"lat", "lon", SUM("air") as air_total | ||
FROM | ||
"air" | ||
GROUP BY | ||
"lat", "lon" | ||
''') | ||
|
||
result = query.compute() | ||
|
||
expected = air.dims['lat'] * air.dims['lon'] | ||
assert len(result) == expected, f'Length must be {expected}.' | ||
print(expected) |
Oops, something went wrong.