-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery): Include offset
parameter to unnest
#8356
Comments
Let's work through the API design first before jumping into compiled SQL. The problem we need to solve to expose |
Thinking about this some more I think this functionality warrants a new API. I'll propose one: This API would be analogous to Python's built-in function We'd do something similar, so the full signature of # for ArrayValue[T]
def enumerate(self, start: int = 0) -> dt.StructValue[index: int, value: T]:
... Thoughts? We should probably experiment with two different flavors of backends (BigQuery and DuckDB) to determine feasibility of compilation here. |
While the proposed
|
If we want to continue the implementation for |
driveby-ing: |
@NickCrews We do have array zip, not sure if you've seen that: https://ibis-project.org/reference/expression-collections#ibis.expr.types.arrays.ArrayValue.zip |
@chelsea-lin I'll try to address your concerns here and provide a path forward:
Yes. Is that different from any other expression that takes input arguments?
table.select("grouper", table.x.enumerate())
Start by implementing the Here's the approach I would take:
|
@cpcloud thanks! The implementation looks good overall. My remaining question is about |
@cpcloud Trying to simple my previous question. If |
Ok, after experimenting in #8429, I don't think I'm going to take a different approach, and experiment with adding a |
Thanks @cpcloud for prioritizing the |
@cpcloud We've found a workaround solution (reliant on the feature in #8892). This code effectively unnests two columns sequentially, includes an offset ID, and handles empty arrays as null. Thanks to @TrevorBergeron for the helpful ideas!
|
This was closed by #9423 ( |
Is your feature request related to a problem?
It's not possible to express that I want to save the original order, such as with the WITH OFFSET AS clause for UNNEST in BigQuery. With the current version of unnest, it only returns a single column. I suppose it would be possible to add a with_offset parameter here, but then the return type might have to be a struct column or maybe a tuple of two columns.
This issue has the same request as #7781
Describe the solution you'd like
I'd like to add an
offset
parameter to theunnest
function. Whenoffset
isTrue
, the compiler can returns asge.Posexplode()
rather thansge.Explode()
node. Then, sqlglot can handle the SQL translation as expected. Here is the draft PR: https://github.com/ibis-project/ibis/pull/8354/filesWith that change, the translated SQL looks good to me:
However, the ibis node type is not correct. The ibis assumes this node only returns a single value and ignore the
pos_2
here. Here is the repro code:Any suggestions to correct the change so that ibis can return
pos_2
as the offset?What version of ibis are you running?
8.0.0
What backend(s) are you using, if any?
BigQuery
Code of Conduct
The text was updated successfully, but these errors were encountered: