Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Auto meta shared mode for type forward/backward compatibility #202

Open
chaokunyang opened this issue May 11, 2023 · 0 comments
Open
Labels
enhancement New feature or request java

Comments

@chaokunyang
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
We supported type forward/backward compatibility in #197, but as the issue said, the solution proposed in that issue will write class meta every time the object is serialized. If multiple objects of same time are serialized as a whole, the meta will be serialized multiple times, which is unneccessary.

We can use meta sharing to write meta only once in a serialization for an object graph. And the meta can be encoded to binary, so the actual meta writting will be just a memory copy, which is far more faster.

The issue #80 proposed meta share across serialization, which need the rpc or users to maintain the MetaContext, which is inconvinient for users.

Describe the solution you'd like
We can support auto meta sharing to reduce meta cost in every serialization. This will ensure multiple objects of same type write meta only once for space saving, and got better pperformance by memory copy meta binary.

Currently meta encoding is not compressed, this will be discussed in another issue.

Additional context
#80 #197

@chaokunyang chaokunyang added enhancement New feature or request java labels May 11, 2023
chaokunyang added a commit that referenced this issue May 2, 2024
## What does this PR do?

This PR implements type meta encoding for java proposed in #1240 .

The type meta encoding in xlang spec proposed in #1413 will be finished
in another PR based on this PR.

The spec has been updated too:

type meta header
```
|      8 bytes meta header      | meta size |   variable bytes   |  variable bytes   | variable bytes |
+-------------------------------+-----------|--------------------+-------------------+----------------+
| 7 bytes hash + 1 bytes header | 1~2 bytes | current class meta | parent class meta |      ...       |
```

And the encoding for packge/class/field name has been updated to:
```
- Package name encoding(omitted when class is registered):
    - encoding algorithm: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL`
    - Header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63`  will be used to indicate size `0~62`,
      the value `63` the size need more byte to read, the encoding will encode `size - 62` as a varint next.
- Class name encoding(omitted when class is registered):
    - encoding algorithm: `UTF8/LOWER_UPPER_DIGIT_SPECIAL/FIRST_TO_LOWER_SPECIAL/ALL_TO_LOWER_SPECIAL`
    - header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63`  will be used to indicate size `1~64`,
      the value `63` the size need more byte to read, the encoding will encode `size - 63` as a varint next.
- Field info:
    - header(8
      bits): `3 bits size + 2 bits field name encoding + polymorphism flag + nullability flag + ref tracking flag`.
      Users can use annotation to provide those info.
        - 2 bits field name encoding:
            - encoding: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL/TAG_ID`
            - If tag id is used, i.e. field name is written by an unsigned varint tag id. 2 bits encoding will be `11`.
        - size of field name:
            - The `3 bits size: 0~7`  will be used to indicate length `1~7`, the value `6` the size read more bytes,
              the encoding will encode `size - 7` as a varint next.
            - If encoding is `TAG_ID`, then num_bytes of field name will be used to store tag id.
    - Field name: If type id is set, type id will be used instead. Otherwise meta string encoding length and data will
      be written instead.
```

## Meta size
Before this PR:
```java
class org.apache.fury.benchmark.data.MediaContent 78
class org.apache.fury.benchmark.data.Media 208
class org.apache.fury.benchmark.data.Image 114
```

With this PR:
```java
class org.apache.fury.benchmark.data.MediaContent 53
class org.apache.fury.benchmark.data.Media 114
class org.apache.fury.benchmark.data.Image 68
```

The size of class meta reduced by half, which is a great gain.

The size can be reduded more if we introduce field name hash, but it's
not related to this PR. We can discuss it in another PR.

## Related issues

#1240 
#203 
#202 


## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/incubator-fury/issues/new/choose)
describing the need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?


## Benchmark

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
chaokunyang added a commit that referenced this issue May 30, 2024
…lity (#1660)

## What does this PR do?

This PR implements scoped meta share mode for type forward/backward
compaibility

## Related issues

#202 


## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/incubator-fury/issues/new/choose)
describing the need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?


## Benchmark
Perf increased from `1900102.586` to `2430410.064`
```
Before:

Benchmark                                                       (bufferType)   (objectType)  (references)   Mode  Cnt        Score        Error  Units
fury_deserialize                              array  MEDIA_CONTENT         false  thrpt   10  2734151.212 ± 253921.628  ops/s
fury_deserialize_compatible                   array  MEDIA_CONTENT         false  thrpt   10  1900102.586 ±  62176.872  ops/s
furymetashared_deserialize_compatible         array  MEDIA_CONTENT         false  thrpt   10  3011439.327 ± 260518.752  ops/s

After:

Benchmark                                                       (bufferType)   (objectType)  (references)   Mode  Cnt        Score        Error  Units
fury_deserialize                              array  MEDIA_CONTENT         false  thrpt   10  2661186.814 ± 279377.198  ops/s
fury_deserialize_compatible                   array  MEDIA_CONTENT         false  thrpt   10  2430410.064 ± 164165.865  ops/s
furymetashared_deserialize_compatible         array  MEDIA_CONTENT         false  thrpt   10  3098083.064 ± 259391.053  ops/s
```

Size decreased from **732 to 577**:
```
Before
2024-05-30 01:00:49 INFO  FuryState:157 [fury_deserialize_compatible-jmh-worker-1] - ======> Fury | MEDIA_CONTENT | false | array | 732 |

After
2024-05-30 12:57:00 INFO  FuryState:157 [fury_deserialize_compatible-jmh-worker-1] - ======> Fury | MEDIA_CONTENT | false | array | 577 |
```

The
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request java
Projects
None yet
Development

No branches or pull requests

1 participant