feat(torii): limit number of blocks processed in one go #2505

lambda-0x · 2024-10-08T10:33:54Z

Summary by CodeRabbit

New Features
- Introduced a new command-line argument blocks_chunk_size for configuring the number of blocks to process before committing to the database, enhancing user control over data processing.
Improvements
- Updated processing logic to handle block chunks more efficiently, improving performance during data fetching and event processing.

codecov · 2024-10-08T12:29:39Z

Codecov Report

Attention: Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 67.73%. Comparing base (e591364) to head (67dc049).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/torii/core/src/engine.rs	16.66%	5 Missing ⚠️
bin/torii/src/main.rs	0.00%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2505   +/-   ##
=======================================
  Coverage   67.73%   67.73%           
=======================================
  Files         388      388           
  Lines       50421    50427    +6     
=======================================
+ Hits        34153    34158    +5     
- Misses      16268    16269    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai · 2024-10-09T14:19:53Z

Walkthrough

Ohayo, sensei! This pull request introduces a new command-line argument blocks_chunk_size to the Args struct in the torii binary. This allows users to specify how many blocks to process before committing to the database, with a default of 10240. The EngineConfig struct is updated to include this field, and the fetch_data and fetch_range methods in the Engine class are modified to handle chunked processing. A new process_tasks method is also added to manage concurrency during event processing.

Changes

File Path	Change Summary
bin/torii/src/main.rs	Added new command-line argument `blocks_chunk_size` to `Args` struct.
crates/torii/core/src/engine.rs	Added `pub blocks_chunk_size: u64` to `EngineConfig` struct; updated default value to `10240`.
crates/torii/core/src/engine.rs	Updated `fetch_data` and `fetch_range` methods to process blocks in chunks based on new config.
crates/torii/core/src/engine.rs	Introduced new `process_tasks` method for managing concurrency during event processing.

Possibly related PRs

feat(torii-core): parallelization #2423: This PR introduces a new command-line argument max_concurrent_tasks to the Args struct, similar to the addition of blocks_chunk_size in the main PR.
feat(torii-core): bitflags for indexing #2450: This PR modifies the Args struct by adding new command-line arguments related to indexing, aligning with the main PR's changes to enhance configurability in the engine's operation.

Suggested reviewers

Larkooo

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)

bin/torii/src/main.rs (2)
117-119: Ohayo, sensei! New argument looks good, but let's add some spice to the docs!

The new blocks_chunk_size argument is a great addition! It allows users to fine-tune the indexing process. However, to make it even more user-friendly, consider adding a brief explanation of its impact on performance and memory usage in the argument's help text.

Here's a suggestion to enhance the documentation:
 /// Number of blocks to process before commiting to DB
-#[arg(long, default_value = "10240")]
+#[arg(long, default_value = "10240", help = "Number of blocks to process before committing to DB. Higher values may improve performance but increase memory usage.")]
 blocks_chunk_size: u64,
246-246: Ohayo again, sensei! The engine config update looks solid!

The blocks_chunk_size is correctly added to the EngineConfig initialization. This ensures that the engine respects the user's preference (or the default value) for block processing.

To improve code readability, consider aligning the new field with the others:
 EngineConfig {
     max_concurrent_tasks: args.max_concurrent_tasks,
     start_block: 0,
-    blocks_chunk_size: args.blocks_chunk_size,
+    blocks_chunk_size:    args.blocks_chunk_size,
     events_chunk_size: args.events_chunk_size,
     index_pending: args.index_pending,
     polling_interval: Duration::from_millis(args.polling_interval),
     flags,
 },
crates/torii/core/src/engine.rs (1)

291-291: Ohayo, sensei! Let's tackle the TODO for parallelizing data fetching.

Now that blocks are processed in chunks, implementing parallel data fetching could significantly improve performance.

Would you like assistance in implementing parallel fetching? I can help with code suggestions or open a GitHub issue to track this enhancement.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e591364 and 67dc049.

📒 Files selected for processing (2)

bin/torii/src/main.rs (2 hunks)
crates/torii/core/src/engine.rs (3 hunks)

🧰 Additional context used

🔇 Additional comments (4)

bin/torii/src/main.rs (1)
246-246: Ohayo once more, sensei! Let's make sure our engine is tuned up!

The blocks_chunk_size configuration looks good, but we should verify that the Engine implementation correctly utilizes this new parameter. This ensures that the chunked processing works as intended, maintaining data consistency and optimizing performance.

Could you please run the following script to check the Engine implementation?

This will help us confirm that the Engine is properly implementing the chunked processing based on the new blocks_chunk_size parameter.

✅ Verification successful

Ohayo once more, sensei! It looks like engine.rs wasn't found at the expected location. Let's broaden our search to locate the correct implementation.

Ohayo once more, sensei! The blocks_chunk_size parameter is correctly utilized within the Engine implementation. Everything looks good!
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the Engine implementation correctly uses the blocks_chunk_size parameter

# Search for blocks_chunk_size usage in the Engine implementation
rg --type rust 'blocks_chunk_size' crates/torii-core/src/engine.rs

# Look for any methods that might be using this parameter for chunked processing
rg --type rust 'fn\s+process_blocks|fn\s+index_blocks' crates/torii-core/src/engine.rs
Length of output: 299

Script:
#!/bin/bash
# Description: Locate the Engine implementation and verify usage of blocks_chunk_size

# Search for blocks_chunk_size usage across the entire codebase
rg 'blocks_chunk_size'

# Search for engine-related files in case the path has changed
rg --files | rg 'engine\.rs'
Length of output: 469
crates/torii/core/src/engine.rs (3)

134-134: Ohayo, sensei! Great addition of blocks_chunk_size to EngineConfig.

Introducing blocks_chunk_size provides better control over block processing, enhancing the engine's flexibility and scalability.

146-146: Setting default blocks_chunk_size to 10240.

The default value appears reasonable for standard workloads, ensuring efficient processing without overwhelming system resources.

303-303: ⚠️ Potential issue

Potential off-by-one error when incrementing from.

Incrementing from when it's not zero might skip processing the block at from. Please verify if this behavior is intentional.

Run the following script to inspect the block range calculations:

coderabbitai · 2024-10-09T14:19:56Z

crates/torii/core/src/engine.rs

        let from = cursors.head.unwrap_or(0);
+        let total_remaining_blocks = latest_block_number - from;
+        let blocks_to_process = total_remaining_blocks.min(self.config.blocks_chunk_size);


⚠️ Potential issue

Possible underflow risk when calculating total_remaining_blocks.

If from is greater than latest_block_number, subtracting may cause an underflow panic. Consider adding a check to handle this scenario gracefully.

Apply this change to prevent potential underflow:

let total_remaining_blocks = latest_block_number - from; +if from > latest_block_number { + return Ok(FetchDataResult::None); +}

Committable suggestion was skipped due to low confidence.

lambda-0x force-pushed the torii-chunk-blocks branch from b17d0fd to 17fc03d Compare October 9, 2024 10:59

Base automatically changed from feat/torii/ercs to main October 9, 2024 14:02

feat(torii): limit number of blocks processed in one go

67dc049

lambda-0x force-pushed the torii-chunk-blocks branch from 17fc03d to 67dc049 Compare October 9, 2024 14:13

coderabbitai bot reviewed Oct 9, 2024

View reviewed changes

glihm approved these changes Oct 9, 2024

View reviewed changes

lambda-0x merged commit 975f3e4 into main Oct 9, 2024
13 of 15 checks passed

lambda-0x deleted the torii-chunk-blocks branch October 9, 2024 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(torii): limit number of blocks processed in one go #2505

feat(torii): limit number of blocks processed in one go #2505

lambda-0x commented Oct 8, 2024 •

edited by coderabbitai bot

Loading

codecov bot commented Oct 8, 2024 •

edited

Loading

coderabbitai bot commented Oct 9, 2024

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 9, 2024

feat(torii): limit number of blocks processed in one go #2505

feat(torii): limit number of blocks processed in one go #2505

Conversation

lambda-0x commented Oct 8, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

codecov bot commented Oct 8, 2024 • edited Loading

Codecov Report

coderabbitai bot commented Oct 9, 2024

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 9, 2024

Choose a reason for hiding this comment

lambda-0x commented Oct 8, 2024 •

edited by coderabbitai bot

Loading

codecov bot commented Oct 8, 2024 •

edited

Loading