Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: use fueled output handles in sources #27814

Merged
merged 9 commits into from
Jun 27, 2024

Conversation

petrosagg
Copy link
Contributor

@petrosagg petrosagg commented Jun 23, 2024

Motivation

This PR makes it so that sources produce collections backed by StackWrapper<T> timely containers as opposed to Vec<T> containers. The StackWrapper container stores its data into flat regions (currently backed by Columnation) which allow us to easily measure the heap size of the data produced so far. Currently these container survive until the reclocking boundary where they turn back to normal Vec<T> containers in the mz scope. We will probably want to use those region allocated containers throughout the storage dataflows but I didn't want to implement a bigger change than required for this feature.

Creating an output in an async operator with StackWrapper container and the newly introduced AccountedStackBuilder container builder unlocks a .give_fueled() API which will automatically yield back to timely once certain amount of MBs have been emitted into the dataflow. The limit is currently set to 128MB.

I have gone through the feature benchmark and confirmed that this change does not produce a performance regression for any of the ingestion workloads. You can find the results here https://buildkite.com/materialize/nightly/builds/8237

Closes #27211

Tips for reviewer

The PR is made of 5 simple commits that add Columnation implementations for relevant types and change the return type required by SourceRender::render to be a Collection that uses StackWrapper containers.

Then there are 4 separate commits, one for each source type, that have source specific changes required to produce these stack-container-based collections.

The files that each commit touches are disjoint so this can also be reviewed using the full diff.

Checklist

@petrosagg petrosagg force-pushed the columnated-ingestion branch 8 times, most recently from dd3f894 to 9cb5037 Compare June 26, 2024 12:28
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
@petrosagg petrosagg changed the title WIP: Columnated ingestion storage: use fueled output handles in sources Jun 26, 2024
@petrosagg petrosagg requested a review from rjobanp June 26, 2024 14:47
@petrosagg petrosagg marked this pull request as ready for review June 26, 2024 14:47
@petrosagg petrosagg requested a review from a team as a code owner June 26, 2024 14:47
Copy link

shepherdlybot bot commented Jun 26, 2024

Risk Score:80 / 100 Bug Hotspots:1 Resilience Coverage:16%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test
  • (Required) Observability
  • (Required) QA Review
  • (Required) Run Nightly Tests
  • Unit Test
Risk Summary:

The risk score for this pull request is high at 80, indicating a significant chance of introducing a bug, especially since one of the modified files is a known bug hotspot. The repository's predicted bug trend is on the rise, although the observed trend has remained steady. It's important to note that, historically, pull requests with similar characteristics to this one are 106% more likely to cause a bug compared to the repository's baseline. The predictors driving this risk score are the sum of bug reports of the files affected and the change in executable lines of code.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../src/sources.rs 97

Copy link
Contributor

@rjobanp rjobanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - too bad you had to add back in all the .await calls that you previously removed when updating the give method!

impl Region for SourceMessageRegion {
type Item = SourceMessage;

unsafe fn copy(&mut self, item: &Self::Item) -> Self::Item {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate is wildly unsafe

😨
(from the columnation docstring )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, yeah.. fortunately we will soon migrate to flatcontainer which is safer

});
row_temp.push(c);
}
Ok(std::mem::take(&mut row_temp))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works because Vec::default() doesn't allocate, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right!

@petrosagg
Copy link
Contributor Author

Will wait for a Nightly run before merging https://buildkite.com/materialize/nightly/builds/8246

@petrosagg petrosagg merged commit 5877d09 into MaterializeInc:main Jun 27, 2024
191 of 193 checks passed
@petrosagg petrosagg deleted the columnated-ingestion branch June 27, 2024 10:45
@@ -13,6 +13,7 @@ workspace = true
differential-dataflow = "0.12.0"
either = "1"
lgalloc = "0.3"
columnation = { git = "https://github.com/frankmcsherry/columnation" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Don't depend on it, but use timely::container::columnation.

@@ -102,6 +106,62 @@ pub struct SourceMessage {
pub metadata: Row,
}

mod columnation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: yield periodically in source operators
3 participants