Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Seq scanner scans data by time range #4809

Merged
merged 13 commits into from
Oct 17, 2024

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented Oct 10, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#4757

What's changed and what's your intention?

Seq scanner scans data according to the order of partition ranges.

Moves scan_file_ranges and scan_mem_ranges to scan_util. Refactors them so that SeqScan and UnorderedScan can reuse them.

To reuse codes, this PR adds a new metric struct PartitionMetrics, and shares it between streams in the same partition. This struct also prints the debug log in drop() so we can still get the log when the stream is dropped before exhausted.

It removes all unused codes.

There are still some remaining works:

  • Remove the scan parallelism. But we need to support file-level parallelism.
  • Support splitting multiple row groups in SeqScan.
  • Support field pruning in last_non_null mode if the time range only has one file.
  • More tests for RangeMeta and StreamContext

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Oct 10, 2024
@evenyag evenyag marked this pull request as ready for review October 10, 2024 06:02
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 95.79288% with 13 lines in your changes missing coverage. Please review.

Project coverage is 84.01%. Comparing base (a8ed3db) to head (67257ab).
Report is 27 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4809      +/-   ##
==========================================
- Coverage   84.42%   84.01%   -0.42%     
==========================================
  Files        1124     1128       +4     
  Lines      204759   208260    +3501     
==========================================
+ Hits       172873   174963    +2090     
- Misses      31886    33297    +1411     

src/mito2/src/read/seq_scan.rs Show resolved Hide resolved
@evenyag evenyag mentioned this pull request Oct 14, 2024
11 tasks
@evenyag
Copy link
Contributor Author

evenyag commented Oct 16, 2024

74b4088 changed the PartitionRange::end to exclusive. @waynexia @discord9

Copy link
Contributor

@discord9 discord9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others LGTM

src/mito2/src/read/range.rs Show resolved Hide resolved
@evenyag evenyag added this pull request to the merge queue Oct 17, 2024
Merged via the queue into GreptimeTeam:main with commit e0c4157 Oct 17, 2024
33 checks passed
@evenyag evenyag deleted the feat/seq-scan-by-part branch October 17, 2024 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants