-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs,jobspb: rename ExecutionLog, populate #69370
jobs,jobspb: rename ExecutionLog, populate #69370
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 14 of 14 files at r2, 15 of 15 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rhu713 and @sajjadrizvi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 14 of 14 files at r2, 15 of 15 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @rhu713)
pkg/jobs/config.go, line 69 at r4 (raw file):
// defaultExecutionErrorsMaxEntrySize is the maximum allowed size of an // error. If this size is exceeded, the error will formatted as a string
nit: s/ will formatted/will be formatted
Release justification: non-production code changes Release note: None
7649f96
to
d52689c
Compare
Also moves the code to a new file. Release justification: category 2 Release note: None
d52689c
to
0c9833b
Compare
d762bcb
to
4825d71
Compare
I got cold feet on the `ExecutionLog` in the `Payload` being populated at the beginning of job execution. I become uncomfortable with the idea of writing to the `Payload` an extra time for every job execution in the happy case. These things can get big. Instead we only populate events when a retriable error occurs. Given that, the events have been renamed to `RetriableExecutionError`. This data is exposed in the crdb_internal.jobs table via the `execution_errors` string array. The decision to use an array of strings instead of json was predicated upon the idea that these errors are for human consumption. The structured data continues to exist in the protobuf. Work to truncate these logs will come in the next commit. Release justification: bug fixes and low-risk updates to new functionality. Release note (general change): When jobs encounter retriable errors during execution, they will now record these errors into their state. The errors as well as metadata about the execution can be inspected via the newly added `execution_errors` field of `crdb_internal.jobs` which is a `STRING[]` column.
4825d71
to
870ff19
Compare
This change limits the size of execution error entries based on cluster settings (which are not public). Release justification: low risk change to new functionality Release note: None
870ff19
to
b7301c6
Compare
bors r+ |
Build succeeded: |
I got cold feet on the
ExecutionLog
in thePayload
being populated at thebeginning of job execution. I become uncomfortable with the idea of writing to
the
Payload
an extra time for every job execution in the happy case. Thesethings can get big. Instead we only populate events when a retriable error
occurs. Given that, the events have been renamed to
RetriableExecutionError
.This data is exposed in the crdb_internal.jobs table via the
execution_errors
string array. The decision to use an array of strings instead of json was
predicated upon the idea that these errors are for human consumption. The
structured data continues to exist in the protobuf.
These events are retained in the
Payload
inRetriableExecutionErrorLog
.A finite number of events are retained, and that number is controlled by the new
non-public cluster setting
jobs.execution_errors.max_entries
. The size of theevents is also controlled by a cluster setting:
jobs.execution_errors.max_entry_size
.If an event would be larger than this size, the error it would contain is formatted
to a string and truncated to the max size. This loses structure but at least retains
some observability.
Fixes #68800.
Release justification: bug fixes and low-risk updates to new functionality.
Release note (general change): When jobs encounter retriable errors during
execution, they will now record these errors into their state. The errors as
well as metadata about the execution can be inspected via the newly added
execution_errors
field ofcrdb_internal.jobs
which is aSTRING[]
column.