Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed unnecessary copy to heap, see https://github.com/apache/pulsar/pull/10330 #2701

Merged
merged 3 commits into from
Apr 26, 2021

Conversation

dlg99
Copy link
Contributor

@dlg99 dlg99 commented Apr 23, 2021

Descriptions of the changes in this PR:

Handling CompositeByteBuf in a way that avoids unnecessary data copy.

Motivation

apache/pulsar#10330

apache/pulsar#10330 (comment)

Changes

Handling CompositeByteBuf in a way that avoids unnecessary data copy.


In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit
checks for pull requests. A pull request can only be merged when it passes precommit checks.


Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

If this PR is a BookKeeper Proposal (BP):

  • Make sure the PR title is formatted like:
    <BP-#>: Description of bookkeeper proposal
    e.g. BP-1: 64 bits ledger is support
  • Attach the master issue link in the description of this PR.
  • Attach the google doc link if the BP is written in Google Doc.

Otherwise:

  • Make sure the PR title is formatted like:
    <Issue #>: Description of pull request
    e.g. Issue 123: Description ...
  • Make sure tests pass via mvn clean apache-rat:check install spotbugs:check.
  • Replace <Issue #> in the title with the actual Issue number.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good.
Can you please add a couple of test cases in order to ensure that it works as expected and that we are covering this case in our tests?

@jiazhai jiazhai added this to the 4.14.0 milestone Apr 26, 2021
@jiazhai jiazhai merged commit 3c9c710 into apache:master Apr 26, 2021
@dlg99 dlg99 deleted the fix-message-copy branch October 14, 2021 23:40
lhotari added a commit to lhotari/bookkeeper that referenced this pull request Jan 30, 2024
lhotari added a commit to lhotari/bookkeeper that referenced this pull request Jan 31, 2024
merlimat pushed a commit that referenced this pull request Feb 7, 2024
…ith readerIndex > 0 (#4196)

* Add a test that reproduces a bug in checksum calculation

* Revert "Fixed unnecessary copy to heap (#2701)" changes to ByteBufList

This partially reverts commit 3c9c710.

* Remove CompositeBuffer unwrapping in DigestManager

* Rename update -> internalUpdate so that unwrapping logic could be added to update

* Remove unnecessary unwrapping logic in Java9IntHash

* Add safe way to handle CompositeByteBuf

* Add license header

* Fix checkstyle

* Refactor ByteBuf visitor solution

* Fix checkstyle

* Reformat

* Refactor recursive visiting

* Revisit equals, hashCode and toString

* Refactor test case

* Add support for UnpooledHeapByteBuf.getBytes which passes an array

* Add support for visiting buffers backed by byte[] arrays

- getBytes calls setBytes with a byte[] argument for
  heap ByteBufs

* Move ByteBufVisitor to org.apache.bookkeeper.util package

* Update javadoc

* Refactor to use stateless visitor so that instance can be shared

* Improve test so that a single scenario can be used for debugging

* Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)

- Java9IntHash uses private methods from java.util.zip.CRC32C class,
  updateBytes and updateDirectByteBuffer.
  When inspecting the use and interface contract, it doesn't match
  how it is used in Java9IntHash. This PR addresses that by introducing
  a separate initial value for initializing the accumulated value
  so that the initial value could match the logic in
  java.util.zip.CRC32C.reset method. There's also a separate
  method for finalizing the accumulated value into a final
  checksum value. This is to match the java.util.zip.CRC32C.getValue
  method's logic (uses bitwise complement operator ~).

- With a quick glance, it might appear that the previous logic is similar.
  However it isn't since I have a failing test which gets fixed with this
  change. I haven't yet added the Java9IntHash level unit test case to prove how
  it differs. It must be related to integer value overflow. For the CRC32C function,
  I believe it means that it cannot be assumed in all cases that
  func(x) == ~func(~x). That's the assumption that the previous code was making.
  It probably applies for many inputs, but not all. It would break in overflow
  cases.

* Fix checkstyle

* Fix checkstyle

* Fix missing depth increment that prevents StackOverflowException

* Properly handle the depth increase and decrease

* Remove unnecessary condition

* Use more efficient way to read bytes to the target array

* Don't use ByteBufVisitor if it's not necessary

* Revert "Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)"

This reverts commit 272e962.

* Fix issue in resume byte[] version that was added

- input and output should be complemented. explanation has been added to the
  resume ByteBuf method

* Polish ByteBufVisitor

- reuse GetBytesCallbackByteBuf instance for handling the root ByteBuf instance

* Use extracted method

* Fix bug with array handling

* Polish ByteBufVisitor

* Optimize the buffer copying in the case where array or memory address cannot be accessed

- read-only buffers will need to be copied before reading
  - use ByteBuf.copy for direct buffers with pooled allocator when the algorithm can accept
    a memory address buffer
- use the 64kB threadlocal byte[] buffer for copying all other inputs

* Check if memory address is accepted

* Improve comments about complement (current = ~current) in resume

* Print thread dump when build is cancelled

* Filter empty buffers and arrays in ByteBufVisitor
lhotari added a commit to lhotari/bookkeeper that referenced this pull request Feb 8, 2024
…ith readerIndex > 0 (apache#4196)

* Add a test that reproduces a bug in checksum calculation

* Revert "Fixed unnecessary copy to heap (apache#2701)" changes to ByteBufList

This partially reverts commit 3c9c710.

* Remove CompositeBuffer unwrapping in DigestManager

* Rename update -> internalUpdate so that unwrapping logic could be added to update

* Remove unnecessary unwrapping logic in Java9IntHash

* Add safe way to handle CompositeByteBuf

* Add license header

* Fix checkstyle

* Refactor ByteBuf visitor solution

* Fix checkstyle

* Reformat

* Refactor recursive visiting

* Revisit equals, hashCode and toString

* Refactor test case

* Add support for UnpooledHeapByteBuf.getBytes which passes an array

* Add support for visiting buffers backed by byte[] arrays

- getBytes calls setBytes with a byte[] argument for
  heap ByteBufs

* Move ByteBufVisitor to org.apache.bookkeeper.util package

* Update javadoc

* Refactor to use stateless visitor so that instance can be shared

* Improve test so that a single scenario can be used for debugging

* Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)

- Java9IntHash uses private methods from java.util.zip.CRC32C class,
  updateBytes and updateDirectByteBuffer.
  When inspecting the use and interface contract, it doesn't match
  how it is used in Java9IntHash. This PR addresses that by introducing
  a separate initial value for initializing the accumulated value
  so that the initial value could match the logic in
  java.util.zip.CRC32C.reset method. There's also a separate
  method for finalizing the accumulated value into a final
  checksum value. This is to match the java.util.zip.CRC32C.getValue
  method's logic (uses bitwise complement operator ~).

- With a quick glance, it might appear that the previous logic is similar.
  However it isn't since I have a failing test which gets fixed with this
  change. I haven't yet added the Java9IntHash level unit test case to prove how
  it differs. It must be related to integer value overflow. For the CRC32C function,
  I believe it means that it cannot be assumed in all cases that
  func(x) == ~func(~x). That's the assumption that the previous code was making.
  It probably applies for many inputs, but not all. It would break in overflow
  cases.

* Fix checkstyle

* Fix checkstyle

* Fix missing depth increment that prevents StackOverflowException

* Properly handle the depth increase and decrease

* Remove unnecessary condition

* Use more efficient way to read bytes to the target array

* Don't use ByteBufVisitor if it's not necessary

* Revert "Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)"

This reverts commit 272e962.

* Fix issue in resume byte[] version that was added

- input and output should be complemented. explanation has been added to the
  resume ByteBuf method

* Polish ByteBufVisitor

- reuse GetBytesCallbackByteBuf instance for handling the root ByteBuf instance

* Use extracted method

* Fix bug with array handling

* Polish ByteBufVisitor

* Optimize the buffer copying in the case where array or memory address cannot be accessed

- read-only buffers will need to be copied before reading
  - use ByteBuf.copy for direct buffers with pooled allocator when the algorithm can accept
    a memory address buffer
- use the 64kB threadlocal byte[] buffer for copying all other inputs

* Check if memory address is accepted

* Improve comments about complement (current = ~current) in resume

* Print thread dump when build is cancelled

* Filter empty buffers and arrays in ByteBufVisitor

(cherry picked from commit 9c373f7)
merlimat pushed a commit that referenced this pull request Feb 9, 2024
…ith readerIndex > 0 (#4196) (#4205)

* Add a test that reproduces a bug in checksum calculation

* Revert "Fixed unnecessary copy to heap (#2701)" changes to ByteBufList

This partially reverts commit 3c9c710.

* Remove CompositeBuffer unwrapping in DigestManager

* Rename update -> internalUpdate so that unwrapping logic could be added to update

* Remove unnecessary unwrapping logic in Java9IntHash

* Add safe way to handle CompositeByteBuf

* Add license header

* Fix checkstyle

* Refactor ByteBuf visitor solution

* Fix checkstyle

* Reformat

* Refactor recursive visiting

* Revisit equals, hashCode and toString

* Refactor test case

* Add support for UnpooledHeapByteBuf.getBytes which passes an array

* Add support for visiting buffers backed by byte[] arrays

- getBytes calls setBytes with a byte[] argument for
  heap ByteBufs

* Move ByteBufVisitor to org.apache.bookkeeper.util package

* Update javadoc

* Refactor to use stateless visitor so that instance can be shared

* Improve test so that a single scenario can be used for debugging

* Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)

- Java9IntHash uses private methods from java.util.zip.CRC32C class,
  updateBytes and updateDirectByteBuffer.
  When inspecting the use and interface contract, it doesn't match
  how it is used in Java9IntHash. This PR addresses that by introducing
  a separate initial value for initializing the accumulated value
  so that the initial value could match the logic in
  java.util.zip.CRC32C.reset method. There's also a separate
  method for finalizing the accumulated value into a final
  checksum value. This is to match the java.util.zip.CRC32C.getValue
  method's logic (uses bitwise complement operator ~).

- With a quick glance, it might appear that the previous logic is similar.
  However it isn't since I have a failing test which gets fixed with this
  change. I haven't yet added the Java9IntHash level unit test case to prove how
  it differs. It must be related to integer value overflow. For the CRC32C function,
  I believe it means that it cannot be assumed in all cases that
  func(x) == ~func(~x). That's the assumption that the previous code was making.
  It probably applies for many inputs, but not all. It would break in overflow
  cases.

* Fix checkstyle

* Fix checkstyle

* Fix missing depth increment that prevents StackOverflowException

* Properly handle the depth increase and decrease

* Remove unnecessary condition

* Use more efficient way to read bytes to the target array

* Don't use ByteBufVisitor if it's not necessary

* Revert "Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)"

This reverts commit 272e962.

* Fix issue in resume byte[] version that was added

- input and output should be complemented. explanation has been added to the
  resume ByteBuf method

* Polish ByteBufVisitor

- reuse GetBytesCallbackByteBuf instance for handling the root ByteBuf instance

* Use extracted method

* Fix bug with array handling

* Polish ByteBufVisitor

* Optimize the buffer copying in the case where array or memory address cannot be accessed

- read-only buffers will need to be copied before reading
  - use ByteBuf.copy for direct buffers with pooled allocator when the algorithm can accept
    a memory address buffer
- use the 64kB threadlocal byte[] buffer for copying all other inputs

* Check if memory address is accepted

* Improve comments about complement (current = ~current) in resume

* Print thread dump when build is cancelled

* Filter empty buffers and arrays in ByteBufVisitor

(cherry picked from commit 9c373f7)
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
…ith readerIndex > 0 (apache#4196)

* Add a test that reproduces a bug in checksum calculation

* Revert "Fixed unnecessary copy to heap (apache#2701)" changes to ByteBufList

This partially reverts commit 3c9c710.

* Remove CompositeBuffer unwrapping in DigestManager

* Rename update -> internalUpdate so that unwrapping logic could be added to update

* Remove unnecessary unwrapping logic in Java9IntHash

* Add safe way to handle CompositeByteBuf

* Add license header

* Fix checkstyle

* Refactor ByteBuf visitor solution

* Fix checkstyle

* Reformat

* Refactor recursive visiting

* Revisit equals, hashCode and toString

* Refactor test case

* Add support for UnpooledHeapByteBuf.getBytes which passes an array

* Add support for visiting buffers backed by byte[] arrays

- getBytes calls setBytes with a byte[] argument for
  heap ByteBufs

* Move ByteBufVisitor to org.apache.bookkeeper.util package

* Update javadoc

* Refactor to use stateless visitor so that instance can be shared

* Improve test so that a single scenario can be used for debugging

* Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)

- Java9IntHash uses private methods from java.util.zip.CRC32C class,
  updateBytes and updateDirectByteBuffer.
  When inspecting the use and interface contract, it doesn't match
  how it is used in Java9IntHash. This PR addresses that by introducing
  a separate initial value for initializing the accumulated value
  so that the initial value could match the logic in
  java.util.zip.CRC32C.reset method. There's also a separate
  method for finalizing the accumulated value into a final
  checksum value. This is to match the java.util.zip.CRC32C.getValue
  method's logic (uses bitwise complement operator ~).

- With a quick glance, it might appear that the previous logic is similar.
  However it isn't since I have a failing test which gets fixed with this
  change. I haven't yet added the Java9IntHash level unit test case to prove how
  it differs. It must be related to integer value overflow. For the CRC32C function,
  I believe it means that it cannot be assumed in all cases that
  func(x) == ~func(~x). That's the assumption that the previous code was making.
  It probably applies for many inputs, but not all. It would break in overflow
  cases.

* Fix checkstyle

* Fix checkstyle

* Fix missing depth increment that prevents StackOverflowException

* Properly handle the depth increase and decrease

* Remove unnecessary condition

* Use more efficient way to read bytes to the target array

* Don't use ByteBufVisitor if it's not necessary

* Revert "Fix bug in Java9IntHash calculation that assumed crc32c_update(x) == ~crc32c_update(~x)"

This reverts commit 272e962.

* Fix issue in resume byte[] version that was added

- input and output should be complemented. explanation has been added to the
  resume ByteBuf method

* Polish ByteBufVisitor

- reuse GetBytesCallbackByteBuf instance for handling the root ByteBuf instance

* Use extracted method

* Fix bug with array handling

* Polish ByteBufVisitor

* Optimize the buffer copying in the case where array or memory address cannot be accessed

- read-only buffers will need to be copied before reading
  - use ByteBuf.copy for direct buffers with pooled allocator when the algorithm can accept
    a memory address buffer
- use the 64kB threadlocal byte[] buffer for copying all other inputs

* Check if memory address is accepted

* Improve comments about complement (current = ~current) in resume

* Print thread dump when build is cancelled

* Filter empty buffers and arrays in ByteBufVisitor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants