Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't panic when creating Field from FFI_ArrowSchema with no name #6251

Closed
kylebarron opened this issue Aug 14, 2024 · 10 comments · Fixed by #6273
Closed

Don't panic when creating Field from FFI_ArrowSchema with no name #6251

kylebarron opened this issue Aug 14, 2024 · 10 comments · Fixed by #6273
Labels
arrow Changes to the arrow crate bug

Comments

@kylebarron
Copy link
Contributor

kylebarron commented Aug 14, 2024

Describe the bug

When constructing a Field from an FFI_ArrowSchema, it calls the name() method:

let mut field = Field::new(c_schema.name(), dtype, c_schema.nullable());

That method asserts that name is not null:
assert!(!self.name.is_null());

But according to the C Data Interface spec, it is allowed for the name to be null:

const char *ArrowSchema.name
Optional. A null-terminated, UTF8-encoded string of the field or array name. This is mainly used to reconstruct child fields of nested types.

Producers MAY decide not to provide this information, and consumers MAY decide to ignore it. If omitted, MAY be NULL or an empty string.

I would argue that this function should at least not panic on spec-valid input. It should either infer a blank name for that field or return Option<&str>.

To Reproduce

I can try to provide a full rust repro case if necessary; the claimed bug should be clear without a full repro case.

Expected behavior

Expected not to panic.

Additional context

I hit this in my arro3 project, where I pass in a pyarrow Schema and try to reconstruct it on the Rust side as a Field. This is slightly simpler because I can directly get a single DataType::Struct with all fields, rather than accessing the fields of a Schema and then passing that to DataType::Struct. But since the pyarrow Schema doesn't set a name on the Arrow C Schema, this fails.

Since they're communicating via a valid Arrow C Schema, I would argue this should work, and at least not panic.

@kylebarron kylebarron added the bug label Aug 14, 2024
@ByteBaker
Copy link
Contributor

Hi, I'd like to take this up.

@kylebarron
Copy link
Contributor Author

@ByteBaker sorry but it was only a one line change and I needed a git tag I could point to from my project, so I created #6273

@ByteBaker
Copy link
Contributor

@kylebarron @tustvold @alamb

  • name field is optional in Arrow spec.
  • name field in arrow::datatypes::Field is required.

I suppose the non-null assertion inside FFI_ArrowSchema::name() was originally meant to establish this guarantee for arrow::datatypes::Field.field but since this is the issue, what should be the course of action to resolve this?

  1. Make arrow::datatypes::Field.name optional and make FFI_ArrowSchema::name() return Option<&str>.
  2. Make FFI_ArrowSchema::name() return Option<&str> and let the user handle None.
  3. Make FFI_ArrowSchema::name() return "".

I'm personally with 2 because it leaves the control with the user and the change is minimal (in addition to being C-spec compliant) what's your thoughts?

@ByteBaker
Copy link
Contributor

@ByteBaker sorry but it was only a one line change and I needed a git tag I could point to from my project, so I created #6273

@kylebarron sorry I had already typed and this message didn't sync up. But do you still wanna carry forward this discussion for a more permanent and agreed upon solution or should I delete the one above?

@kylebarron
Copy link
Contributor Author

It's good to have this discussion here. In #6273 I'm suggesting something like option 2.

I think to be spec compliant, FFI_ArrowSchema must return Option<&str>. Unless we change the return type to String, we can't allocate there, and so we can't return "". But I also think it would be way too disruptive for arrow::datatypes::Field.name to be Option<String>. So I think we should default to "" when importing to Field.

For reference, this is the same behavior as pyarrow, which defaults to an empty string when importing a null name to a Field:

import pyarrow as pa
dt = pa.uint8()
field = pa.field(dt)
assert field.name == "" # passes

@ByteBaker
Copy link
Contributor

So, the 3rd one, I suppose?

@kylebarron
Copy link
Contributor Author

My proposal is more in line with your number 2. FFI_ArrowSchema should return Option<&str>, see https://github.com/apache/arrow-rs/pull/6273/files#diff-7e788382f74d78e1e32885a29b3040a5b82915542087c703d9e0b158271e2357R264-R274. So any user that interacts with FFI_ArrowSchema directly will have a breaking change and need to think about how they handle None. But any user who uses the default TryFrom<FFI_ArrowSchema> for Field will have it default to "".

@ByteBaker
Copy link
Contributor

Understood. In that case, #6273 should be sufficient and I need not write anything at all.

Let's see what @alamb @tustvold have to say.

@alamb
Copy link
Contributor

alamb commented Aug 19, 2024

Returning Option<&str> seems like a good idea to me

@alamb alamb added the arrow Changes to the arrow crate label Aug 31, 2024
@alamb
Copy link
Contributor

alamb commented Aug 31, 2024

label_issue.py automatically added labels {'arrow'} from #6273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants