Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk-based cache for the downloader #306

Merged
merged 34 commits into from
Aug 1, 2024
Merged

Disk-based cache for the downloader #306

merged 34 commits into from
Aug 1, 2024

Conversation

syntrust
Copy link
Collaborator

@syntrust syntrust commented Jul 5, 2024

This PR is purposed to solve the issue #298.

To hold the downloaded blobs in es-node in a scenario like long-time finalization, the blob cache uses https://github.com/holiman/billy as the disk-backed solution. Billy is a very simplistic data store with features like:

  • It has fixed item-sized buckets, represented by flat files.
  • The payloads will be roughly equally heavy on write, delete, and read operations.
  • The data is expected to be transient rather than pertinent stored.
  • Upon startup, Billy compacts the files to remove any accumulated gaps.

Since blob transactions can contain between 1-6 blobs, the stored items in the blob cache here are expected to have sizes of 128KB ~ 128KB * 6 buckets.

A similar implementation reference is the blobpool of geth: ethereum/go-ethereum#26940

Other changes include:

  • replace hash with block number as key in blockblob cache to handle re-org
  • use a customized billy implementation to support reading samples directly from the disk cache for better performance.

Tests should verify that sampling from cached blobs can work as fast as from storage file, and the mining tx confirmed successfully.

@syntrust syntrust marked this pull request as draft July 5, 2024 10:21
@syntrust syntrust changed the base branch from main to miner_fix_l2 July 16, 2024 02:33
@syntrust syntrust changed the base branch from miner_fix_l2 to main July 17, 2024 08:07
@syntrust syntrust marked this pull request as ready for review July 17, 2024 09:45
@syntrust syntrust requested review from qzhodl and ping-ke July 17, 2024 09:45
ethstorage/downloader/downloader.go Outdated Show resolved Hide resolved
ethstorage/downloader/blob_disk_cache.go Outdated Show resolved Hide resolved
ethstorage/downloader/blob_disk_cache.go Outdated Show resolved Hide resolved
@syntrust
Copy link
Collaborator Author

syntrust commented Jul 24, 2024

Performance impact:

As the number of blobs increases, sampling time increases, when mining threads per shard is 24.

blobs in cache sampling time(s)
0 2.1
1 5.3
2 8.1
3 13.3 (90% nonces tried)
4 24 (50% nonces tried)

On the other hand, the time statistics in the ReadSample() method indicate that the time taken for sampling from the cache is approximately 30 to 50 times longer than that from the storage, with no significant changes observed as the number of blobs in the cache increases.

@syntrust syntrust requested a review from qzhodl July 24, 2024 08:16
@syntrust
Copy link
Collaborator Author

After using an extend GetSample API on billy, the performance issue disappears.

Changes to billy can be reviewed here:
holiman/billy@main...ethstorage:billy:main

go.mod Outdated Show resolved Hide resolved
ethstorage/downloader/blob_disk_cache.go Outdated Show resolved Hide resolved
ethstorage/downloader/blob_disk_cache.go Show resolved Hide resolved
ethstorage/downloader/blob_cache_test.go Show resolved Hide resolved
@syntrust syntrust requested a review from qzhodl July 31, 2024 06:15
@syntrust syntrust requested a review from qzhodl July 31, 2024 08:56
@syntrust syntrust merged commit 264bf03 into main Aug 1, 2024
2 checks passed
@syntrust
Copy link
Collaborator Author

syntrust commented Aug 5, 2024

This is for the unit test. I think we may need to set up a real test environment to see the results:

  1. Deploy a test storage contract with a shard size of 512 GB.
  2. Upload 3 to 6 blobs every 2 seconds.

Then, we should collect the results, including:

  1. Ensuring that the memory usage of es-node remains stable.
  2. The sampling time should be the same
  3. The disk-backed cache should be around 16 GB (131072 * 3 * 3600 * 12) just before finalization.
  4. Monitor how long it takes to delete all the cache items afterward.
  5. Record the time required to write the 16 GB of data to the disk after finalization, noting any exceptions or issues during this process.
  6. Monitor the time it takes to close the cache when shutting down es-node.

Updates on performance test based on the above comments

The test has been done with build with commit 264bf03

  1. The memory usage of the blob cache remains stable.
  2. The sampling time remains unchanged despite the cache is empty or less or many.
  3. For about every 30~40 minutes cleanup() deletes up to 1400 blobs. So the disk space usage peak would be 128K*1400=183M
  4. The cleanup method takes around 10ms to delete 1000+ blobs
  5. No exception found when finalization happens
  6. It takes no time to delete the cache file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants