Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DA compression #1609

Draft
wants to merge 106 commits into
base: master
Choose a base branch
from
Draft

DA compression #1609

wants to merge 106 commits into from

Conversation

Dentosal
Copy link
Member

@Dentosal Dentosal commented Jan 18, 2024

Related #1605. VM PR FuelLabs/fuel-vm#670.

This PR adds DA compression crate for Fuel blocks, performed upon block creation. The compressed blocks are stored into the offchain database and can be fetched using the GraphQL API.

Note for reviewers

To keep this reasonably compact, decompression support is not included in this PR, and will be done as a follow-up. As a result, the full data roundtrip testing is not part of this PR. There's no proof here that compression of full blocks is reversible.

TODO

Features

  • Temporal registry db support
  • Optimize temporal registry eviction implementation
  • Implement TxId ↔ TxPointer lookups
  • Integrate with the block committer (GraphQL interface, probably)

Tests

  • compressed blocks are available from non-block-producer nodes
  • e2e test for the full decompression cycle (moved to a follow-up)

Follow-up issues

@Dentosal Dentosal self-assigned this Jan 18, 2024
@Dentosal Dentosal changed the title DA compression as a crate DA compression Feb 1, 2024
@Dentosal
Copy link
Member Author

There is one more question that we need to resolve:

When do we want to enable the DA compression? Do we want to enable it starting a specific block, or do we want to start it from the genesis block?

If the first one, when we need to coordinate it with the ecosystem. If the second one is, then we need to come up with a migration solution.

Don't we already post the blocks to L1? Maybe we could call those "V0" comprssed blocks, and allow the decompressor to work on those as well? I'm not sure if they are easily recognizable, though.

Is anyone in the ecosystem depending on the posted blocks? What kind of coordination is needed?

@xgreenx
Copy link
Collaborator

xgreenx commented Sep 25, 2024

Don't we already post the blocks to L1? Maybe we could call those "V0" comprssed blocks, and allow the decompressor to work on those as well? I'm not sure if they are easily recognizable, though.

We do. Those blocks don't have versioning.

Is anyone in the ecosystem depending on the posted blocks? What kind of coordination is needed?

Our sentries depend on it=) And anyone in the network who wants to follow the compression. The network should agree starting which block height we are doing compression.

It can be the first blob with a new compression type, but then we have a problem with our cluster(how sentries know what block to start). Plus, we need add support for blobs into our Relayer to track this information=)

I think just setting block height in the ChainConfig will be the simplest option. But we need to discuss it cc @Voxelot

@@ -169,9 +169,9 @@ dependencies = [

[[package]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert changes in this file please?=)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on that one=)

postcard = { version = "1.0", features = ["use-std"] }
rand = { workspace = true, optional = true }
serde = { version = "1.0", features = ["derive"] }
thiserror = { workspace = true }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thiserror is not no_std compatible, lets not use it=)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rermoved in c5ac5db

ctx: &DecompressCtx<D>,
) -> Result<Self, DecompressError> {
Ok(Transaction::mint(
Default::default(), // TODO: what should this we do with this?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of decompression, you can modify the TxPointer with the block height and transaction index.

{
// Pick first key not in the set
// TODO: use a proper algo, maybe LRU?
let mut key = db.read_latest(keyspace)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nanosecond is much faster than one microsecond(in the case of the database).

Plus, if we cache the key, then we don't rely on the database implementation of the db.read_latest and db.write_latest =)

crates/compression/src/tables.rs Outdated Show resolved Hide resolved
};
}

// Arguments here should match the tables! macro from crates/compression/src/tables.rs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if the implementation of the storage was independent of the tables! macro. It seems the removing of RegistryKeyspace and RegistryKeyspaceValue from implementation will solve that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is independent. The comment here just tells the future person modifying the code where to look for values that are used here. But now it's removed in 946a2e8 anyway.

crates/fuel-core/src/graphql_api/storage/da_compression.rs Outdated Show resolved Hide resolved
crates/fuel-core/src/graphql_api/storage/da_compression.rs Outdated Show resolved Hide resolved
crates/fuel-core/src/graphql_api/storage/da_compression.rs Outdated Show resolved Hide resolved
crates/fuel-core/src/graphql_api/storage/da_compression.rs Outdated Show resolved Hide resolved
xgreenx and others added 4 commits September 26, 2024 15:44
With this approach, we have more macro rules, but the logic is split
between different modules, and we can see the hierarchy of dependency
between modules(instead of everyone depending on everyone). Plus, we
don't need the intermediate `CompressCtxKeyspaces` type anymore.
crates/compression/src/compress.rs Outdated Show resolved Hide resolved
@@ -169,9 +169,9 @@ dependencies = [

[[package]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on that one=)

Comment on lines 187 to 193
if let Some(found) = ctx.db.registry_index_lookup(self)? {
return Ok(found);
}

let key = ctx.$ident.cache_evictor.next_key();
let old = ctx.$ident.changes.insert(key, self.clone());
assert!(old.is_none(), "Key collision in registry substitution");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we handle the case when the same type is used several times during one compression? For example transactions has 2 inputs with the same owner.

It seems that we don't handle it and we will hit the assert

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle it here, and allocate two keys in that case. That's not correct and it's fixed in 40a1a3d.

We would not hit the assert anyway, since that requires that the cache evictor would return the same key twice, which will not happen.

registrations.$ident.push((key, value));
}
)*
Ok(registrations)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about calling registrations.write_to_registry(&mut db)?; here as well? In this case CompressCtx will update the latest key along with all registrations

crates/compression/src/eviction_policy.rs Outdated Show resolved Hide resolved
Comment on lines +74 to +75
/// Serialization for compressed transactions is already tested in fuel-vm,
/// but the rest of the block de/serialization is be tested here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is true, it doesn't change the fact that we need to test compression and decompression here. In the fuel-tx,, we test those traits for compression, and their deriving works correctly. Here, we need to test that the logic of the compress and decompress contexts is correct. We need to be sure that we collect the right addresses, IDs, and codes and store them in the database. That we assign correct register keys, and can handle cases when we use the same register key during one compression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The testing is clearly not sufficient. I've taken this PR back to draft stage until I manage to write those.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this PR if we do not enable block compression in the fuel-core to simplify the review process; what do you think?

crates/compression/src/ports.rs Outdated Show resolved Hide resolved
crates/compression/src/ports.rs Outdated Show resolved Hide resolved
@@ -44,6 +45,7 @@ hyper = { workspace = true }
indicatif = { workspace = true, default-features = true }
itertools = { workspace = true }
num_cpus = { version = "1.16.0", optional = true }
paste = "1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it in several places, I think it will be better to use { workspace = true } isntead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. 1e70c77

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two more places=)

image

Comment on lines 96 to 104
// Remove the overwritten value from index, if any
self.db_tx
.storage_as_mut::<[< DaCompressionTemporalRegistryIndex $type >]>()
.remove(&value_in_index)?;

// Add the new value to the index
self.db_tx
.storage_as_mut::<[< DaCompressionTemporalRegistryIndex $type >]>()
.insert(&value_in_index, key)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove it first and insert after? Insert will replace the value automaticly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was broken in some refactor. Fixed in ee3e844.

Dentosal and others added 3 commits September 27, 2024 06:19
)

- Minimized the number of tables for metadata and reverse index since we
don't need control over this information.
- Use explicitly the `VersionedCompressedBlock` type as the input type
for compression and decompression. It is up to the `fuel-core` decide
how to represent the final compressed block(postcard or something else).
- Added initialization of the `next_key`.
- Use a more performant codec for each table of the type instead of a
Postcard codec.

---------

Co-authored-by: Hannes Karppila <2204863+Dentosal@users.noreply.github.com>
@Dentosal Dentosal marked this pull request as draft September 27, 2024 04:01
@Dentosal
Copy link
Member Author

Dentosal commented Sep 27, 2024

Design notes from a call with @xgreenx:

  • only one dedicated node handling compression
    • if it crashes, it syncs from L1. must not be down over L1 retention time.
    • compressor is part of fuel-core, have a flag to enable
    • enable for only one sentry
  • only store data for past two weeks in the temporal registry
    • block header timestamp is used for timing
    • add a CLI flag for controlling retention
  • block committercan binary seach to figure which compressed blocks are available
  • remove smt root field from v0 compression
  • postpone some testing to a follow-up PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants