Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): improve ibis.array() #9458

Closed
wants to merge 1 commit into from

Conversation

NickCrews
Copy link
Contributor

Picking out the array stuff from #8666

@NickCrews NickCrews changed the title feat: support empty arrays, improve ibis.array() API feat(api): ibis.array() Jun 27, 2024
@NickCrews NickCrews changed the title feat(api): ibis.array() feat(api): improve ibis.array() Jun 27, 2024
@NickCrews NickCrews force-pushed the api-array branch 3 times, most recently from a55b1c6 to 61e1a2c Compare June 28, 2024 00:04
@NickCrews
Copy link
Contributor Author

I'm thinking about this more and how I don't like the current conditional casting logic, and how only sometimes is the dtype in ops.Array needed. Currently, when constructing an op.Array, all the arguments are taken as they are, and the casting only happens during compile time. So you can have an op.Array(exprs=(1,2), dtype="array<string>"). This is leading to some conditional logic at compile time, deciding when a cast is needed or not. What could be simpler is if we ensure that all the exprs inside an op.Array are already the correct type during/before construction. so we would never find a op.Array(exprs=(1,2), dtype="array<string>"), instead we would have found the value_type of string, and pre-cast all the arguments, so we would have op.Array(exprs=(cast(1, string),cast(2, string)), dtype="array<string>"). Then in the compile step, we would need to look at the dtype iff the length was 0. This then logically leads to @cpcloud 's suggestion of having two different ops, one for 0-length arrays with a dtype, and another for arrays with vals. Also, by pushing this member-casting to construction time, we would catch more misuses, such as ibis.array([1,2], type="array<array<string>>".

So the logic would happen earlier, something like

def array(vals, *, type):
    vals = promote_tuple(vals)
    type = ibis.dtype(type)
    if not type.is_array():
         raise
    if not vals:
         if not type:
               raise
         return ops.EmptyArray(type)
    val_type = type.value_type
    casted = [value(e).cast(val_type) for e in exprs]
    return ops.Array(casted)

def value(v):
     if isinstance(v, (ibis.Expr, ibis.Deferred)):
         return v
     return ibis.literal(v)

@NickCrews
Copy link
Contributor Author

closing in favor of #9473

@NickCrews NickCrews closed this Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant