Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: switch sources to columnated DataflowError #27833

Merged

Conversation

petrosagg
Copy link
Contributor

@petrosagg petrosagg commented Jun 24, 2024

Motivation

This change enables storage dataflows to make use of columnated/lgalloced containers when moving data between operators. We could have gone the other way and implemented a columnation region for the SourceReaderError type but since there aren't any actual downsides to switching to DataflowError we go for this simpler approach.

Tips for reviewer

Checklist

This change enables storage dataflows to make use of
columnated/lgalloced containers when exchanging data between operators.
We could have gone the other way and implemented a columnation region
for the `SourceReaderError` type but since there aren't any actual
downsides to switching to `DataflowError` we go for this simpler
approach.

Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
@petrosagg petrosagg requested a review from a team as a code owner June 24, 2024 11:46
@@ -38,7 +37,7 @@ message ProtoSourceErrorDetails {
}

message ProtoSourceError {
mz_repr.global_id.ProtoGlobalId source_id = 1;
reserved 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since persist consolidates based on the byte contents but at runtime we consolidate based on the rust Eq implementation this change has the potential to leave permanently uncompacted values in shards that contain errors before the upgrade that get retracted after the upgrade.

This is purely storage overhead, nothing changes as far as dataflow results are concerned. The expectation is that the number of errors is too small to be something to worry about. Eventually we want persist's compaction to work identically to the in-memory one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is that the number of errors is too small to be something to worry about.

History tells us this is a wrong assumption (a while ago a customer had 50+GiB of errors.)

A more important question: Did we ever write this down in persist? If yes, this might be a scary change to make because it has the potential to change how we surface errors. Or at least, we need to show how it changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

History tells us this is a wrong assumption (a while ago a customer had 50+GiB of errors.)

I should have been more specific. Those 50+GB are most likely decode errors (i.e not this error variant). This error variant is almost always produced when a source hits a fatal error. After that the source advances to the empty frontier and no more output is produced. Beyond those fatal cases some sources (mysql and pg) have the ability to retract this specific variant of error in some cases (e.g if we encounter malformed utf8 data). These cases are very rare if they happen at all.

So it is true that true that this specific error variant is both:

  1. Not produced in large quantities
  2. When it is produced it is not retracted anyway

A more important question: Did we ever write this down in persist? If yes, this might be a scary change to make because it has the potential to change how we surface errors. Or at least, we need to show how it changes.

We do. The only change is that the error message of this particular error variant will not show the global id of the source, which is in line with every other error that can happen.

@petrosagg petrosagg requested a review from rjobanp June 24, 2024 11:50
@petrosagg petrosagg merged commit 6b86f93 into MaterializeInc:main Jun 24, 2024
76 checks passed
@petrosagg petrosagg deleted the sources-with-dataflow-error branch June 24, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants