-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38718: [Go][Format][Integration] Add StringView/BinaryView to Go implementation #35769
Conversation
func (*BinaryViewType) binary() {} | ||
func (*BinaryViewType) view() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convenience interfaces. in datatype.go
there's BinaryDataType
which defines an interface adding IsUtf8() bool
which requires a binary()
method so that you can easily write code that works for all of these binary interfaces via type switches or interfaces.
Then I added this view()
method for a BinaryViewDataType
so you can write code targeting both BinaryView and StringView without needing to duplicate code by just using the interface in the type switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting optimization. I'm not entirely sure why these new types of builders/arrays are named XXXView, considering that, from what I understand, they store small strings inline in a fixed width data array and encode the index/offset for longer strings, with the remainder of the string being stored externally. However, I don't have a better name to suggest at the moment :)
In general, as a user of this library, I would find it useful to add more comments to methods whose usage isn't immediately clear from the name.
return &a.values[a.array.data.offset+i] | ||
} | ||
|
||
func (a *BinaryView) Value(i int) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no compiler guarantee in Go that the slice returned by this method will remain unchanged (whether intentionally or unintentionally) by users of this API, I would suggest adding a comment to this method. This comment should clearly specify that it's unsafe to make any kind of changes to the returned slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a concern present everywhere in Arrow though, so a comment here could be understood as implying that places without a comment like this allow buffers to be mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a reflex from my Rust experience where this slice will be immutable. In the context of Go, I agree with you that sporadically adding a comment might be counterproductive if we do not apply this globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the top level, there do already exist comments that state that it is intended that all Arrow Arrays be immutable. I agree with the concern that adding a comment here specifically could be counterproductive. If you can think of a good place to put such a comment that would be more universal, I'd be more than happy to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a first round
func (sh *StringHeader) InlineData() (data string) { | ||
debug.Assert(sh.IsInline(), "calling InlineData on non-inline StringHeader") | ||
|
||
h := (*reflect.StringHeader)(unsafe.Pointer(&data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any diff with go1.19 implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea. The Len
in the StringHeader
for tinygo is a uintptr
instead of an int
which is what go1.19 has. Compare line 33 of each file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how sad, but now I see why it's warranted.
However, a comment to note the diff for those implementations would be nice to have (not required, but nice to have nonetheless).
Co-authored-by: Alex Shcherbakov <candiduslynx@users.noreply.github.com>
Co-authored-by: Alex Shcherbakov <candiduslynx@users.noreply.github.com>
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 26149d9. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…o Go implementation (apache#35769) ### Rationale for this change See apache#35628 for the rationale and description of the StringView/BinaryView array types. This change is adding Go as a second implementation of it. ### What changes are included in this PR? Add Array Types for `StringView` and `BinaryView` along with `StringViewType` and `BinaryViewType` and necessary enums and builders. These arrays can be round tripped through JSON and IPC. ### Are these changes tested? Yes, unit tests have been added and integration tests run * Closes: [apache#38718](apache#38718) * Closes: apache#38718 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Alex Shcherbakov <candiduslynx@users.noreply.github.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Rationale for this change
See #35628 for the rationale and description of the StringView/BinaryView array types.
This change is adding Go as a second implementation of it.
What changes are included in this PR?
Add Array Types for
StringView
andBinaryView
along withStringViewType
andBinaryViewType
and necessary enums and builders. These arrays can be round tripped through JSON and IPC.Are these changes tested?
Yes, unit tests have been added and integration tests run