Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery): general approx_quantile similar to approx_median but for arbitrary quantile #9541

Closed
1 task done
tswast opened this issue Jul 9, 2024 · 3 comments · Fixed by #9881
Closed
1 task done
Assignees
Labels
bigquery The BigQuery backend feature Features or general enhancements
Milestone

Comments

@tswast
Copy link
Collaborator

tswast commented Jul 9, 2024

Is your feature request related to a problem?

There isn't a great way to use BigQuery's APPROX_QUANTILES feature to get values besides the approximate median in Ibis.

What is the motivation behind your request?

To mimic pandas describe, BigQuery DataFrames uses APPROX_QUANTILES to get an approximate 25th percentile, median, and 75th percentile.

Note: the percentiles are configurable in pandas, but unfortunately BigQuery SQL's # of bins approach makes it difficult to support arbitrary percentiles.

Describe the solution you'd like

From BigQuery DataFrames, perspective, it'd be great if there was an API to get evenly-spaced approximate quantiles, but perhaps a approx_quantile function that takes an integer from 0 - 100 would be more flexible? Or 0 to 1 to mimic pandas, but with the note that some backends like bigquery only support precision up to a certain point (maybe nearest 0.05 or nearest 0.01)?

What version of ibis are you running?

8.x, but working on 9.x upgrade

What backend(s) are you using, if any?

BigQuery

Code of Conduct

  • I agree to follow this project's Code of Conduct
@tswast tswast added the feature Features or general enhancements label Jul 9, 2024
@deepyaman
Copy link
Contributor

I think approx_quantile makes sense to expose (other backends like DuckDB also offer this).

@chloeh13q pointed out a bit of potential inconsistency, where we implement median for the BigQuery backend using the approx_quantile method, but don't call it approx_median; in that case, would it be more consistent to expose approx_ versions of both quantile and median?

I think it makes sense to revisit this next week once most of the maintainers are back from SciPy.

@ncclementi
Copy link
Contributor

Tangential but ...

where we implement median for the BigQuery backend using the approx_quantile method, but don't call it approx_median ...

@deepyaman Can you point to where this happens?

We do have an Approximate Median supported using Approximate quantiales: see

def visit_ApproxMedian(self, op, *, arg, where):
return self.agg.approx_quantiles(arg, 2, where=where)[self.f.offset(1)]

and the exact Median it's not supported: https://github.com/ibis-project/ibis/blob/b44dac2a5d0346ed0f3dbdc05597104f64e40779/ibis/backends/bigquery/compiler.py#L38C1-L43C20

@ncclementi ncclementi added the bigquery The BigQuery backend label Jul 13, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 17, 2024

@tswast Thanks for the issue!

I think it makes sense to add an Column.approx_quantile method that mirrors the quantile method that we have now.

Any takers from the BigFrames team?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend feature Features or general enhancements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants