-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(java): standardizing fury java spec #1240
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
chaokunyang
force-pushed
the
refine_fury_java_spec
branch
from
December 21, 2023 12:14
3317c83
to
a8fce2f
Compare
PragmaTwice
reviewed
Dec 21, 2023
@wangweipeng2 @PragmaTwice @bytemain Could you help review this PR? |
theweipeng
approved these changes
Dec 21, 2023
theweipeng
reviewed
Dec 22, 2023
theweipeng
reviewed
Dec 22, 2023
theweipeng
approved these changes
Dec 22, 2023
bytemain
reviewed
Dec 22, 2023
bytemain
suggested changes
Dec 23, 2023
Hi @wangweipeng2 @PragmaTwice @bytemain, do you have other comments about this PR? I hope all concerns has been addressed. |
bytemain
approved these changes
Dec 23, 2023
PragmaTwice
reviewed
Dec 24, 2023
PragmaTwice
reviewed
Dec 24, 2023
PragmaTwice
reviewed
Dec 24, 2023
PragmaTwice
reviewed
Feb 27, 2024
PragmaTwice
reviewed
Feb 27, 2024
PragmaTwice
reviewed
Feb 27, 2024
PragmaTwice
reviewed
Feb 27, 2024
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
PragmaTwice
reviewed
Feb 28, 2024
PragmaTwice
reviewed
Feb 28, 2024
PragmaTwice
reviewed
Feb 28, 2024
PragmaTwice
reviewed
Feb 28, 2024
PragmaTwice
reviewed
Feb 28, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest looks good to me.
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
2 tasks
chaokunyang
added a commit
that referenced
this pull request
Apr 12, 2024
…th (#1486) <!-- **Thanks for contributing to Fury.** **If this is your first time opening a PR on fury, you can refer to [CONTRIBUTING.md](https://github.com/apache/incubator-fury/blob/main/CONTRIBUTING.md).** Contribution Checklist - The **Apache Fury (incubating)** community has restrictions on the naming of pr titles. You can also find instructions in [CONTRIBUTING.md](https://github.com/apache/incubator-fury/blob/main/CONTRIBUTING.md). - Fury has a strong focus on performance. If the PR you submit will have an impact on performance, please benchmark it first and provide the benchmark result here. --> ## What does this PR do? This PR optimize string serialization by concating coder and length into a long and serialize it together, it can save one byte for small string which the length is less than 32 <!-- Describe the purpose of this PR. --> ## Related issues #1240 <!-- Is there any related issue? Please attach here. - #xxxx0 - #xxxx1 - #xxxx2 --> ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/incubator-fury/issues/new/choose) describing the need to do so and update the document if necessary. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark Before ``` Benchmark (bufferType) (objectType) (references) Mode Cnt Score Error Units UserTypeDeserializeSuite.fury_deserialize array MEDIA_CONTENT false thrpt 50 2576580.395 ± 55698.851 ops/s ``` This PR: ```java Benchmark (bufferType) (objectType) (references) Mode Cnt Score Error Units UserTypeDeserializeSuite.fury_deserialize array MEDIA_CONTENT false thrpt 50 2591695.102 ± 48174.940 ops/s ``` <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. -->
2 tasks
chaokunyang
added a commit
that referenced
this pull request
Apr 15, 2024
<!-- **Thanks for contributing to Fury.** **If this is your first time opening a PR on fury, you can refer to [CONTRIBUTING.md](https://github.com/apache/incubator-fury/blob/main/CONTRIBUTING.md).** Contribution Checklist - The **Apache Fury (incubating)** community has restrictions on the naming of pr titles. You can also find instructions in [CONTRIBUTING.md](https://github.com/apache/incubator-fury/blob/main/CONTRIBUTING.md). - Fury has a strong focus on performance. If the PR you submit will have an impact on performance, please benchmark it first and provide the benchmark result here. --> ## What does this PR do? This PR implements meta string encoding described in [fury java serialization spec](https://fury.apache.org/docs/specification/fury_java_serialization_spec#meta-string) and [xlang serialization spec](https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#meta-string) We have `3/8` space saveing for most string: ```java // utf8 use 30 bytes, we use only 19 bytes assertEquals(encoder.encode("org.apache.fury.benchmark.data").getBytes().length, 19); // utf8 use 12 bytes, we use only 9 bytes. assertEquals(encoder.encode("MediaContent").getBytes().length, 9); ``` The integration with ClassResolver is left in another PR. ## Related issues #1240 #1413 ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/incubator-fury/issues/new/choose) describing the need to do so and update the document if necessary. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. -->
2 tasks
chaokunyang
added a commit
that referenced
this pull request
Apr 27, 2024
## What does this PR do? This PR implements meta string encoding described in [xlang serialization spec](https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#meta-string) ## Related issues - #1240 - #1413 - #1514 ## Does this PR introduce any user-facing change? - [No] Does this PR introduce any public API change? - [No] Does this PR introduce any binary protocol compatibility change? --------- Co-authored-by: Shawn Yang <chaokunyang@apache.org>
chaokunyang
added a commit
that referenced
this pull request
May 2, 2024
## What does this PR do? This PR implements type meta encoding for java proposed in #1240 . The type meta encoding in xlang spec proposed in #1413 will be finished in another PR based on this PR. The spec has been updated too: type meta header ``` | 8 bytes meta header | meta size | variable bytes | variable bytes | variable bytes | +-------------------------------+-----------|--------------------+-------------------+----------------+ | 7 bytes hash + 1 bytes header | 1~2 bytes | current class meta | parent class meta | ... | ``` And the encoding for packge/class/field name has been updated to: ``` - Package name encoding(omitted when class is registered): - encoding algorithm: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL` - Header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63` will be used to indicate size `0~62`, the value `63` the size need more byte to read, the encoding will encode `size - 62` as a varint next. - Class name encoding(omitted when class is registered): - encoding algorithm: `UTF8/LOWER_UPPER_DIGIT_SPECIAL/FIRST_TO_LOWER_SPECIAL/ALL_TO_LOWER_SPECIAL` - header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63` will be used to indicate size `1~64`, the value `63` the size need more byte to read, the encoding will encode `size - 63` as a varint next. - Field info: - header(8 bits): `3 bits size + 2 bits field name encoding + polymorphism flag + nullability flag + ref tracking flag`. Users can use annotation to provide those info. - 2 bits field name encoding: - encoding: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL/TAG_ID` - If tag id is used, i.e. field name is written by an unsigned varint tag id. 2 bits encoding will be `11`. - size of field name: - The `3 bits size: 0~7` will be used to indicate length `1~7`, the value `6` the size read more bytes, the encoding will encode `size - 7` as a varint next. - If encoding is `TAG_ID`, then num_bytes of field name will be used to store tag id. - Field name: If type id is set, type id will be used instead. Otherwise meta string encoding length and data will be written instead. ``` ## Meta size Before this PR: ```java class org.apache.fury.benchmark.data.MediaContent 78 class org.apache.fury.benchmark.data.Media 208 class org.apache.fury.benchmark.data.Image 114 ``` With this PR: ```java class org.apache.fury.benchmark.data.MediaContent 53 class org.apache.fury.benchmark.data.Media 114 class org.apache.fury.benchmark.data.Image 68 ``` The size of class meta reduced by half, which is a great gain. The size can be reduded more if we introduce field name hash, but it's not related to this PR. We can discuss it in another PR. ## Related issues #1240 #203 #202 ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/incubator-fury/issues/new/choose) describing the need to do so and update the document if necessary. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. -->
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What do these changes do?
This PR refine fury java serialization format spec. The cross-language object graph serialization spec is similar and will be added in a later PR, but it needs more discuss.
This PR added some new spec which hasn't been implemented in current java implementation:
Some parts has been omitted in this spec:
Currently fury doesn't provide binary compatibility, the spec may be revised in the future.
Related issue number
Closes #1239
#1238
Check code requirements