Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(bigquery): port to sqlglot #7968

Merged
merged 3 commits into from
Jan 18, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jan 12, 2024

Description of changes

This PR ports the BigQuery dialect to sqlglot.

A number of changes were required to make this work:

  • I removed the ibis.bigquery.udf module in favor of the newer
    ibis.expr.operations.udf-based UDFs

  • As part of the previous point I removed explicit support for udf.sql and
    udf.js. udf.sql was only used in the typeof implementation, and
    udf.js was only used to implementation Python UDFs.

  • BigQuery has only partial support for NULL FIRST/LAST in RANGE windows:

    • sum(x) over (order by y desc nulls last)
    • 🚫 sum(x) over (order by y asc nulls last)
    • sum(x) over (order by y asc nulls first)
    • 🚫 sum(x) over (order by y desc nulls first)

    To workaround this while allowing null ordering in ORDER BY (which seems to
    work fine) I added a sqlglot transform to remove null ordering in the above
    cases that do not work.

  • I adjusted the generic UDF implementation to allow for redefinitions of the same UDF, which BigQuery allowed. This implementation is the same strategy used by the previous BQ implementation which is to attach a number to the name of every identically-named UDF. For example, if you define my_udf twice, the compiled name will be my_udf_0 wherever the first definition is invoked and my_udf_1 wherever the second definition is invoked.

@cpcloud cpcloud requested review from tswast and kszucs January 12, 2024 12:24
@cpcloud cpcloud added refactor Issues or PRs related to refactoring the codebase bigquery The BigQuery backend tes-required-for-merge Issues that must addressed before merging the-epic-split branch into main labels Jan 12, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Jan 12, 2024

BigQuery tests are passing:

…/ibis on  tes-bigquery is 📦 v7.2.0 via 🐍 v3.10.13 via ❄️  impure (ibis-3.10.13-env)
❯ pytest -m bigquery --snapshot-update -n auto --dist loadgroup
==================================================================================================== test session starts ====================================================================================================
platform linux -- Python 3.10.13, pytest-7.4.4, pluggy-1.3.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Using --randomly-seed=1392314196
rootdir: /home/cloud/src/ibis
configfile: pyproject.toml
plugins: xdist-3.5.0, cov-4.1.0, benchmark-4.0.0, hypothesis-6.92.2, randomly-3.15.0, mock-3.12.0, httpserver-1.0.8, repeat-0.9.3, snapshot-0.9.0, clarity-1.0.1
16 workers [1682 items]
ssssssssssssssssssssss......xxxxxx......x.....xx.x.xxx..xx......x.xxx.x.x.x.x..xx..........x.x..x.xx.xxx.xx..xxxx..x...x...x.xx....x....x........x........x.................x..................................x..... [ 12%]
.x..x....x.........x............x......x.............x..x..x.x.x...x...........x..x.xxx......x..x..x..........x.....x...................x...........xx.......x....x...x...x...................x.............x........ [ 25%]
......x..x....x.x...x.....x...xx..x.x......x.........x.x..s.x...xx..xx..x......s......sx..........xx......x....x...x.x...............x......x..x.........x........x..x...x.........x............x......xx............ [ 37%]
....x..............................x.......x......x.x.........................................xx...............x....x.x..xx.......x.....x....................s.............x....x................x.......x........... [ 50%]
.........................x..............................................................x.......x.x.x.....x........x.......x.....xx.....x........x....x...x....x..x.........x....x......x....xx...x.....x...x...xx.xx [ 63%]
..x.....x......x.............xxx.xx.x.x......xx..x.xx..xxx..x.x....xxx.x..xxx..x.x.x.....x...x..x......x..x............x...x.....x.x....x..x.............x........xxx.x..x.xxxxxxx.xxxx.xxxxx.........x.x.x.......... [ 75%]
............................................................................x.x...................xx.x............................................................................................................... [ 88%]
.............................x..x.......x............................s..........x................................x.......................x..................x.................x...xx..xx.......                       [100%]
================================================================================= 1408 passed, 27 skipped, 247 xfailed in 366.12s (0:06:06) =================================================================================

@cpcloud cpcloud force-pushed the tes-bigquery branch 3 times, most recently from 457f9f5 to 6603c80 Compare January 12, 2024 14:25
@cpcloud cpcloud force-pushed the tes-bigquery branch 3 times, most recently from e7205ce to 5b4721f Compare January 13, 2024 11:33
@cpcloud cpcloud force-pushed the tes-bigquery branch 4 times, most recently from ad5f6d0 to 18ef5e3 Compare January 13, 2024 13:42
@cpcloud
Copy link
Member Author

cpcloud commented Jan 13, 2024

BigQuery tests are passing:

cloud in 🌐 falcon in …/ibis on  tes-bigquery is 📦 v7.2.0 via 🐍 v3.10.13 via ❄️  impure (ibis-3.10.13-env)
❯ pytest -m bigquery --snapshot-update -n auto --dist loadgroup -q
bringing up nodes...
ssssssssssssssssssssss.xxx.xxxxxxx..xx....x....x.....x..x.x...s.x..x..s.....s..x...x.......x.....x.................xxx.x....x..x.x..x.............x.................xx.x...x...x. [ 10%]
...xx....x......x....xxx..xx..x...xx.s.x.....xx......x..x...x....x..x......x...x...........x...x.x........x..x.xx.x.xxx.xxxx.xxxx.xx.xxxxx...x...xx..x......x.x....x..x.......... [ 21%]
...........x.....xx.....x.......x....x..........x.x.xxx.x....x.xx.xx..xxx.x..x..x..x..xx.xx.x....x.xxxxx..x.x...x..x......x.....xxx...xxxxx..................................x... [ 31%]
......................x..........x....x.....................x..x.x..x.....x......x.............x.x..................x...x.........xx.x....x.....x..........x.x.....x.x.....x..... [ 42%]
..x...x..x......x.x......x...xx.x..x...xxx...x..x......x...xx.........x......x...x.......x....x....x.....x.x......x.........x...x..xx.........x...xx..........x.....x..........x. [ 52%]
....x............x......................x.......xx...x.x..............x.x..x.....................x.x...........................................................................x. [ 63%]
..........x.....x...............................x.......x...x....x.....x...............x......x..........x........x.....x...x.x..xx......x..x........x.......x...x.....x......xx. [ 73%]
.........................................................................................................................x.................................x.x.......s........... [ 84%]
........................................................x......................................................................................................................... [ 94%]
...............................x......................................x......x..........                                                                                          [100%]
1409 passed, 27 skipped, 246 xfailed in 243.04s (0:04:03)

@cpcloud cpcloud force-pushed the tes-bigquery branch 2 times, most recently from 871d4f4 to a607fea Compare January 13, 2024 14:39
@cpcloud
Copy link
Member Author

cpcloud commented Jan 16, 2024

BigQuery tests passing after rebase:

cloud in 🌐 falcon in …/ibis on  tes-bigquery is 📦 v7.2.0 via 🐍 v3.10.13 via ❄️  impure (ibis-3.10.13-env)
❯ pytest -m bigquery --snapshot-update -n auto --dist loadgroup -q
bringing up nodes...
ssssssssssssssssssssss...x.....xxx......xxxx.x.x......x.xx.....x..x....x.....xxxx.xx...x..x...xxx..x.x.xxx.x..xxx...x.xxx.x...x.x..xxx.x.x.x.x..x.x.xxxx.x..x.xx..xx.xx.x......x..x........... [ 11%]
....x............x.......x...x.x.x....x..x...x..........xx....x...x.......x.x............x...x....x...........x.x.....x......x..x..x.xxx.x.xx.....x...xxx..x.x..x..xx..xxxx.x.xxx..x.xx.x.x... [ 22%]
.xx.xxx.x.x..x.........x.x....x........x...x....x...........x................................x..................x...............................x...............................x............. [ 33%]
.................x.......s.........x...........s..........s....x.......x..x.............xx.xx..x.x.....xxs...xxxxxx....xxx.x.x......x....x.xx.x..xx.xx....x.........x....x.......xxx.....xx.x. [ 45%]
............x....x...x........x.x..x..x....x.x................x..x.............................x.....x..................x.x.....x......x..x........................x.x......x............x.... [ 56%]
...........x.......xx.....x....x.......xx.xx.......................x................x..xx...............xx............x.....x....x..................x.x.........................x............. [ 67%]
................xx...x.xx......x..................x........x............xx..........x..x....x........x.................................x...................................................... [ 79%]
...........x................................................................................................................................................................................x. [ 90%]
.....................................................................................x...x.........x..............................x.............xs...............                              [100%]
1409 passed, 27 skipped, 245 xfailed in 188.27s (0:03:08)

@cpcloud cpcloud force-pushed the tes-bigquery branch 2 times, most recently from 6ae24df to 9a2ed28 Compare January 16, 2024 18:05
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to review every line, but overall looking great.

P.S. I tried to run this against googleapis/python-bigquery-dataframes#277 more-or-less unchanged, but still having issues related to #7632. I think I'll probably need #7781 before BigQuery DataFrames can migrate.

@cpcloud
Copy link
Member Author

cpcloud commented Jan 16, 2024

BigQuery passing:

❯ pytest -m bigquery --snapshot-update -n auto --dist loadgroup -q
bringing up nodes...
xxxssssssssssssssssssssss.xx..x..xxx.xx.xxxxx..x...xxxx.xx.x.xxx.xxx.x.xxxxx..x..x.........x.......x..x.xxxx.........................................xx.....................x...x............... [ 11%]
.............xxx....x...x.....x..........x................x.x......x.x.x...............x....x..xxx.x.x.........xx.....x..xx.x..x....x....................x...x...xx......x..............x....... [ 22%]
........x......................x.x......x....x..x.............x............x....x.........x...xx.................x..x....x...............x..x..........x.......x........x..x.................... [ 34%]
................x.......x.x..........x...........xx......x...x......xx.......x...x..x...x.............x.x.xxx..x..x...x.......x....................x..................................s......... [ 45%]
.........x....................................................x...................x.....x..........x....xxx.x...x..xx....x.......x..x.....x......x....x.x............x...........x....x.xxx.x..x [ 57%]
......x...x...........x......xx...xx...x.xx......x.x......xx...xxxx.xxx.x.xxxxs............x...xx......x....x.x.x..........x..........sxx.x.x..x.............x....x.............x.x.x.xxxx.....x [ 68%]
..x.......x.............x.................x......x...x.x.x.xxxxxxxxxx.xx.xx....xx...x...xx.xxx.x.x..x...........................s.................x............................................. [ 79%]
..........x......................x.....................................................x................x....x.................................................................................. [ 91%]
.............................................x...................................x......................x..........................s.............                                                [100%]
1409 passed, 27 skipped, 245 xfailed in 197.96s (0:03:17)

@cpcloud cpcloud added the tes-required-for-release Things that must be addressed before *release* of `main` after merging in `the-epic-split` label Jan 17, 2024
@cpcloud cpcloud force-pushed the tes-bigquery branch 2 times, most recently from bf5d359 to 8a44d75 Compare January 17, 2024 22:16
@cpcloud
Copy link
Member Author

cpcloud commented Jan 17, 2024

BigQuery passing:

cloud in 🌐 falcon in …/ibis on  tes-bigquery is 📦 v7.2.0 via 🐍 v3.10.13 via ❄️  impure (ibis-3.10.13-env)
❯ pytest -m bigquery -n auto --dist loadgroup --snapshot-update -q
bringing up nodes...
xxxxxxxxxx.xxxxxxxxxx...xx..xxxx.xxxx..x.....xx.xx..x...xx.xxx.xxxxxxx.x.x.....x.xx.x.xxxx.....xx.x.x........xx.x..........xx.x.x.......x.ssssssssssssssssssssss..x....x.x.......xx............ [ 11%]
.........x...xx.x.............x.xx.x...x..x...x.xx.xx.....x...................x...x..x....xxx....x..x.x.....x.........................x....xx.xxx........xx.x......x.x......x.x..........x..... [ 22%]
.........xxs.....x....x......x.x.....x.......x.........x........x......x..x...............x...x......x......x...x.x..x.xx.xx.xxxx.xxx..xx...x...x....xx...x.xxx.x.x..x.xxx..x...x.............. [ 34%]
.x..x.........x...xx....x.......xx.x......x....xx.......x.x...............x.......xxxxx...x..........xx.........x...xx....x.x.x.......x....x....x....x...x...x.............xx.............x.... [ 45%]
..........x..........................................................x....................x............x........x....................x.................................................x....... [ 56%]
....................................................................................x.x.x..........x................x................x...........x.x..x.............xx....x........x...x.x..... [ 68%]
...x......x.....x.....x.......x.s.x.........xx.....x.......x.x.....x.......x..................................x................................................................................ [ 79%]
.........................................................................x...........................................................x......................................................... [ 90%]
.......................................................................................x......x......x....x......s......s.x...x.....x.................s... [100%]
1409 passed, 27 skipped, 246 xfailed in 215.05s (0:03:35)

Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@kszucs kszucs merged commit 319f345 into ibis-project:the-epic-split Jan 18, 2024
45 checks passed
@cpcloud cpcloud deleted the tes-bigquery branch January 18, 2024 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend refactor Issues or PRs related to refactoring the codebase tes-required-for-merge Issues that must addressed before merging the-epic-split branch into main tes-required-for-release Things that must be addressed before *release* of `main` after merging in `the-epic-split`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants