-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki: Return an __error_details__ label for any line which incurs a __error__ while being processed #6543
Conversation
…about a logql error. Signed-off-by: Edward Welch <edward.welch@grafana.com>
Signed-off-by: Edward Welch <edward.welch@grafana.com>
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ErrPipeline = errors.New("failed execute pipeline") | ||
ErrLimit = errors.New("limit reached while evaluating the query") | ||
ErrorLabel = "__error__" | ||
ErrorDetailsLabel = "__error_details__" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably my biggest question... Is this the right name for this label??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ErrPipeline = errors.New("failed execute pipeline") | ||
ErrLimit = errors.New("limit reached while evaluating the query") | ||
ErrorLabel = "__error__" | ||
ErrorDetailsLabel = "__error_details__" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function func (b *LabelsBuilder) Reset()
also needs to reset the errDetails
.
Lines 132 to 137 in 942d659
// Reset clears all current state for the builder. | |
func (b *LabelsBuilder) Reset() { | |
b.del = b.del[:0] | |
b.add = b.add[:0] | |
b.err = "" | |
} |
Otherwise all subsequent lines after an error have this label set.
Signed-off-by: Edward Welch <edward.welch@grafana.com>
Great catch, fixed. |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
- distributor -0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
* Update production-ready Loki in docker-compose (#6691) * Improved docker-compose setup: - S3-like storage - Interactive debugging by optionally exposing dlv ports - Improve performance - Use SSD mode (read/write) - GroupCache - Latest versions of loki, promtail, grafana - Scaling * Adding diagram & updated README Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * Upgrade guide Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * Adding steps to build debug image Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * Add prometheus Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * Add datasources and disable Grafana auth Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * Add configurable groupcache capacity (#6678) * Add configurable groupcache capacity Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * Improving the verbiage for clarity - thanks Travis! Changed the default size to 100MB Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * Adding config docs Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * Removed test - it was pretty redundant, plus groupcache panics if you try init its http server more than once Signed-off-by: Danny Kopping <danny.kopping@grafana.com> * updated versions to the latest release v2.6.1 (#6703) * updated versions to the latest release v2.6.1 (#6705) * Add function to reset boltDBIndexClientWithShipper singleton In a downstream project I have a test case that creates multiple stores with different BoltDB shipper schema configurations. Since the client for the store is a singleton and is re-used, there needs to be a function to reset it as well. This function must only be used in tests. Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * promtail: Inject tenant ID when receiving X-Scope-OrgID in heroku target (#6695) * inject tenant ID header * fix import order * Add push route (#6616) * Loki: Return an __error_details__ label for any line which incurs a __error__ while being processed (#6543) * adds an __error_details__ label which includes any details available about a logql error. Signed-off-by: Edward Welch <edward.welch@grafana.com> * fix lint, changed a test a little Signed-off-by: Edward Welch <edward.welch@grafana.com> * reset error details in Reset() function Signed-off-by: Edward Welch <edward.welch@grafana.com> * Loki: Do not store exact log line duplicates (now including unordered inserts) (#6642) * do not insert log lines which are an exact duplicate in both timestamp and log content Signed-off-by: Edward Welch <edward.welch@grafana.com> * update test to now ignore dupes pushed out of order Signed-off-by: Edward Welch <edward.welch@grafana.com> * make the test unordered to be more thorough Signed-off-by: Edward Welch <edward.welch@grafana.com> Co-authored-by: Danny Kopping <danny.kopping@grafana.com> Co-authored-by: Vladyslav Diachenko <82767850+vlad-diachenko@users.noreply.github.com> Co-authored-by: Christian Haudum <christian.haudum@gmail.com> Co-authored-by: Pablo <2617411+thepalbi@users.noreply.github.com> Co-authored-by: Salva Corts <salva.corts@grafana.com> Co-authored-by: Ed Welch <edward.welch@grafana.com>
…_error__ while being processed (grafana#6543) * adds an __error_details__ label which includes any details available about a logql error. Signed-off-by: Edward Welch <edward.welch@grafana.com> * fix lint, changed a test a little Signed-off-by: Edward Welch <edward.welch@grafana.com> * reset error details in Reset() function Signed-off-by: Edward Welch <edward.welch@grafana.com>
What this PR does / why we need it:
Currently when Loki process a line, there are a numbe of reasons the processing won't go as expected, when this happens Loki sets an
__error__
label on each line that had issues, the value of this__error__
label is one of several values e.g. JSONParserErr or TemplateFormatErrKeeping the value of
__error__
to a simple set of values gives users granularity over removing only certain log lines with errors from a result, e.g.__error__ != "TemplateFormatErr"
However currently it's often difficult to understand why there was an error, rather than modify any existing behavior this PR introduces an additiona label
__error_details__
which contains more verbose error information with each line as to why the__error__
label was set.For example, perhaps you were trying to use a template function in your query
line_format "{{replace \"test\" .foo }}"
This would set the
"__error__": "TemplateFormatErr"
in the response but doesn't give you a lot of understanding as to what went wrong.Now in addition to that existing
__error__
label, Loki would also set this label:"__error_details__": "template: label:1:2: executing \"label\" at <replace>: wrong number of args for replace: want 3 got 2"
Checklist
CHANGELOG.md
.docs/sources/upgrading/_index.md