Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove some serializations in MapView. #3036

Merged
merged 10 commits into from
Dec 17, 2024

Conversation

MathieuDutSik
Copy link
Contributor

@MathieuDutSik MathieuDutSik commented Dec 16, 2024

Motivation

The performance of the MapView has recently emerged as a subject of concern. While the use of async Iterators is under discussion, it is clear that there are other issues.

Proposal

The following was done:

  • The enum type ValueOrBytes was introduced to represent the fact that in some cases we have the values and in some other cases its serialization.
  • The function for_each_key_value_while no longer serializes the values; it just creates a ValueOrBytes::Value.
  • The use of iterator() is replaced by a into_iterator_owned() since this avoids a .to_vec() in the construction of the ValueOrBytes::Bytes. Note that we cannot have a Bytes(&a [u8])since indeed the scope off` is larger than the one in which the key values are created.
  • Some Clone traits have to be added. But this should be viewed as a gain since before we were serializing and then deserializing, i.e. a very expensive clone.
  • The recently introduced functions index_values were needlessly complex. They are replaced by simpler code.

Possible criticism:

  • The functions for_each_key_value and for_each_key_value_while are kept even if their use is complex. This is not an issue as those functions are not used outside of the linera-views code. They could be made private.
  • Lack of benchmark. But I consider the elimination of serialization/deserialization sufficient as a win so as to not need additional justifications.

Test Plan

The CI should do the job.

Release Plan

The PR could be deployed on TestNet / DevNet as desired.

Links

None.

@MathieuDutSik MathieuDutSik marked this pull request as ready for review December 17, 2024 04:33
Copy link
Contributor

@deuszx deuszx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem rather arbitrary - I don't see any explanation why we're doing that.

@MathieuDutSik
Copy link
Contributor Author

Changes seem rather arbitrary - I don't see any explanation why we're doing that.

Serializing data and deserializing is a very inefficient method. Replacing by Clone is much better in terms of speed.

@afck
Copy link
Contributor

afck commented Dec 17, 2024

I'm not sure about this: It considerably complicates the public API, and in almost all cases it's the values we want, not the bytes.
The only exception seems to be when we compute the hash. So maybe this optimization should be done only in a private function that can then be used in the implementation of for_each_key_value_while and hash?

@@ -1279,7 +1367,7 @@ where
{
let prefix = Vec::new();
self.map
.for_each_key_value(
.for_each_key_value_or_bytes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these can be just for_each_key_value, so we don't need the to_value call below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you.

@@ -776,7 +776,7 @@ where
/// assert_eq!(count, 1);
/// # })
/// ```
pub(crate) async fn for_each_key_value_or_bytes<'a, F>(
pub async fn for_each_key_value_or_bytes<'a, F>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want to make it public after all? I'd keep this and the ValueOrBytes type private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed.

mut f: F,
prefix: Vec<u8>,
) -> Result<(), ViewError>
where
F: FnMut(&[u8], &[u8]) -> Result<bool, ViewError> + Send,
F: FnMut(&[u8], ValueOrBytes<'a, V>) -> Result<bool, ViewError> + Send,
Copy link
Contributor

@ma2bd ma2bd Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is public API. Why would a user want to see the bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, wrong move. I wanted the test functions to be running, but that was a wrong idea.

@ma2bd
Copy link
Contributor

ma2bd commented Dec 17, 2024

I personally don't like the added code complexity with no performance gains to show for.
Are you trying to accelerate count? maybe there are better ways

Copy link
Contributor

@ma2bd ma2bd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

back to your queue for the questions

@afck afck self-requested a review December 17, 2024 12:19
@MathieuDutSik MathieuDutSik merged commit f201d21 into linera-io:main Dec 17, 2024
23 checks passed
|index, value| {
count += 1;
hasher.update_with_bytes(index)?;
hasher.update_with_bytes(value)?;
let bytes = value.to_bytes()?;
hasher.update_with_bytes(&bytes)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok now I understand the motivation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

T: Clone + DeserializeOwned,
{
/// Convert to a value.
pub fn to_value(self) -> Result<T, ViewError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use to_cow to avoid a clone

@@ -838,11 +920,12 @@ where
let mut hasher = sha3::Sha3_256::default();
let mut count = 0;
Copy link
Contributor

@ma2bd ma2bd Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should write 0u32 here. Otherwise the hashing scheme is not well defined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, and this impacts other containers, and it makes the change TestNet/DevNet breaking.

@@ -67,6 +67,40 @@ pub struct ByteMapView<C, V> {
updates: BTreeMap<Vec<u8>, Update<V>>,
}

/// Whether we have a value or its serialization.
pub enum ValueOrBytes<'a, T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not private

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants