-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: switch sources to columnated DataflowError
#27833
storage: switch sources to columnated DataflowError
#27833
Conversation
This change enables storage dataflows to make use of columnated/lgalloced containers when exchanging data between operators. We could have gone the other way and implemented a columnation region for the `SourceReaderError` type but since there aren't any actual downsides to switching to `DataflowError` we go for this simpler approach. Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
@@ -38,7 +37,7 @@ message ProtoSourceErrorDetails { | |||
} | |||
|
|||
message ProtoSourceError { | |||
mz_repr.global_id.ProtoGlobalId source_id = 1; | |||
reserved 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since persist
consolidates based on the byte contents but at runtime we consolidate based on the rust Eq
implementation this change has the potential to leave permanently uncompacted values in shards that contain errors before the upgrade that get retracted after the upgrade.
This is purely storage overhead, nothing changes as far as dataflow results are concerned. The expectation is that the number of errors is too small to be something to worry about. Eventually we want persist's compaction to work identically to the in-memory one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expectation is that the number of errors is too small to be something to worry about.
History tells us this is a wrong assumption (a while ago a customer had 50+GiB of errors.)
A more important question: Did we ever write this down in persist? If yes, this might be a scary change to make because it has the potential to change how we surface errors. Or at least, we need to show how it changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
History tells us this is a wrong assumption (a while ago a customer had 50+GiB of errors.)
I should have been more specific. Those 50+GB are most likely decode errors (i.e not this error variant). This error variant is almost always produced when a source hits a fatal error. After that the source advances to the empty frontier and no more output is produced. Beyond those fatal cases some sources (mysql and pg) have the ability to retract this specific variant of error in some cases (e.g if we encounter malformed utf8 data). These cases are very rare if they happen at all.
So it is true that true that this specific error variant is both:
- Not produced in large quantities
- When it is produced it is not retracted anyway
A more important question: Did we ever write this down in persist? If yes, this might be a scary change to make because it has the potential to change how we surface errors. Or at least, we need to show how it changes.
We do. The only change is that the error message of this particular error variant will not show the global id of the source, which is in line with every other error that can happen.
Motivation
This change enables storage dataflows to make use of columnated/lgalloced containers when moving data between operators. We could have gone the other way and implemented a columnation region for the
SourceReaderError
type but since there aren't any actual downsides to switching toDataflowError
we go for this simpler approach.Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.