Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue in Chunk#toBytes #943

Closed

Conversation

adamretter
Copy link

@adamretter adamretter commented Oct 5, 2017

Fixes an issue where too many bytes are returned from Chunk#toBytes, when a Chunk.Bytes has a length less than the backing array.

This may not be the most elegant, efficient or even correct way to fix this, but it certainly causes the test case that I have added to pass, where it previously failed.

…en a Chunk.Bytes has a length less than the backing array
@SystemFw
Copy link
Collaborator

SystemFw commented Oct 5, 2017

Hi @adamretter, thanks for sending this in. I'll investigate some more to see if we can keep this optimisation without compromising md5. Good catch!

@SystemFw SystemFw self-assigned this Oct 5, 2017
@SystemFw
Copy link
Collaborator

SystemFw commented Oct 5, 2017

I think we can keep this optimisation by having the Bytes class have both a values and rawValues. values would be rawValues.take(size). This way you should still get access to the raw array should one need to (I haven't checked we do, but I'm thinking Chunks backed by a subsequence of an array), but still fixing this bug, and avoiding pointless copies. The same applies to the other Chunk.PrimitiveTypes classes.

@mpilquist WDYT?

EDIT: Alternatively, we could simply modify md5 to do the right thing, but I'm not sure we wouldn't be bitten by a similar bug again in another scenario

@mpilquist
Copy link
Member

Hm, I think this is okay as-is. The Bytes class has a values array along with an offset and length. We could rename values to underlying or something to help clarify that values is the source. If folks want the subsequence array, they can call .toArray.

Note that hash is also incorrectly ignoring the offset parameter.

@SystemFw
Copy link
Collaborator

SystemFw commented Oct 6, 2017

Right, so Bytes should stay the same, save for maybe renaming values to underlying.

Should we change toBytes, or just fix hash to take only the portion of the array going from offset to offset + size?

@mpilquist
Copy link
Member

Yeah, let's fix hash to operate on the proper slice. It did that correctly in 0.9: https://github.com/functional-streams-for-scala/fs2/blob/series/0.9/core/jvm/src/main/scala/fs2/hash/hash.scala#L19

I think I know how this bug was introduced in 0.10. Early in the 0.10 work, Chunk.Bytes didn't have an offset or size. The hash object must have been ported at that time. Then later in 0.10 development, we added back the offset/size feature for performance reasons but forgot to update hash.

@SystemFw
Copy link
Collaborator

SystemFw commented Oct 6, 2017

I'll send a PR in asap

@SystemFw
Copy link
Collaborator

SystemFw commented Oct 6, 2017

Thanks a lot @adamretter for spotting this! Closing :)

@SystemFw SystemFw closed this Oct 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants