Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki: Return an __error_details__ label for any line which incurs a __error__ while being processed #6543

Merged
merged 3 commits into from
Jul 18, 2022

Conversation

slim-bean
Copy link
Collaborator

@slim-bean slim-bean commented Jun 29, 2022

What this PR does / why we need it:

Currently when Loki process a line, there are a numbe of reasons the processing won't go as expected, when this happens Loki sets an __error__ label on each line that had issues, the value of this __error__ label is one of several values e.g. JSONParserErr or TemplateFormatErr

Keeping the value of __error__ to a simple set of values gives users granularity over removing only certain log lines with errors from a result, e.g. __error__ != "TemplateFormatErr"

However currently it's often difficult to understand why there was an error, rather than modify any existing behavior this PR introduces an additiona label __error_details__ which contains more verbose error information with each line as to why the __error__ label was set.

For example, perhaps you were trying to use a template function in your query line_format "{{replace \"test\" .foo }}"

This would set the "__error__": "TemplateFormatErr" in the response but doesn't give you a lot of understanding as to what went wrong.

Now in addition to that existing __error__ label, Loki would also set this label:

"__error_details__": "template: label:1:2: executing \"label\" at <replace>: wrong number of args for replace: want 3 got 2"

Checklist

  • Documentation added
  • Tests updated
  • Is this an important fix or new feature? Add an entry in the CHANGELOG.md.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

…about a logql error.

Signed-off-by: Edward Welch <edward.welch@grafana.com>
@slim-bean slim-bean requested a review from a team as a code owner June 29, 2022 14:50
Signed-off-by: Edward Welch <edward.welch@grafana.com>
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@MichelHollands MichelHollands left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ErrPipeline = errors.New("failed execute pipeline")
ErrLimit = errors.New("limit reached while evaluating the query")
ErrorLabel = "__error__"
ErrorDetailsLabel = "__error_details__"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably my biggest question... Is this the right name for this label??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ErrPipeline = errors.New("failed execute pipeline")
ErrLimit = errors.New("limit reached while evaluating the query")
ErrorLabel = "__error__"
ErrorDetailsLabel = "__error_details__"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function func (b *LabelsBuilder) Reset() also needs to reset the errDetails.

// Reset clears all current state for the builder.
func (b *LabelsBuilder) Reset() {
b.del = b.del[:0]
b.add = b.add[:0]
b.err = ""
}

Otherwise all subsequent lines after an error have this label set.

Signed-off-by: Edward Welch <edward.welch@grafana.com>
@slim-bean
Copy link
Collaborator Author

I think the function func (b *LabelsBuilder) Reset() also needs to reset the errDetails.

// Reset clears all current state for the builder.
func (b *LabelsBuilder) Reset() {
b.del = b.del[:0]
b.add = b.add[:0]
b.err = ""
}

Otherwise all subsequent lines after an error have this label set.

Great catch, fixed.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
-        distributor	-0.3%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@slim-bean slim-bean merged commit 1a75bb8 into main Jul 18, 2022
@slim-bean slim-bean deleted the include-details-with-query-errors branch July 18, 2022 18:14
DylanGuedes added a commit that referenced this pull request Jul 18, 2022
* Update production-ready Loki in docker-compose (#6691)

* Improved docker-compose setup:

- S3-like storage
- Interactive debugging by optionally exposing dlv ports
- Improve performance
- Use SSD mode (read/write)
- GroupCache
- Latest versions of loki, promtail, grafana
- Scaling

* Adding diagram & updated README

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Upgrade guide

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding steps to build debug image

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Add prometheus

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Add datasources and disable Grafana auth

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Add configurable groupcache capacity (#6678)

* Add configurable groupcache capacity

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Improving the verbiage for clarity - thanks Travis!

Changed the default size to 100MB

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding config docs

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Removed test - it was pretty redundant, plus groupcache panics if you try init its http server more than once

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* updated versions to the latest release v2.6.1 (#6703)

* updated versions to the latest release v2.6.1 (#6705)

* Add function to reset boltDBIndexClientWithShipper singleton

In a downstream project I have a test case that creates multiple stores
with different BoltDB shipper schema configurations. Since the client
for the store is a singleton and is re-used, there needs to be a
function to reset it as well.

This function must only be used in tests.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

* promtail: Inject tenant ID when receiving X-Scope-OrgID in heroku target (#6695)

* inject tenant ID header

* fix import order

* Add push route (#6616)

* Loki: Return an __error_details__ label for any line which incurs a __error__ while being processed (#6543)

* adds an __error_details__ label which includes any details available about a logql error.

Signed-off-by: Edward Welch <edward.welch@grafana.com>

* fix lint, changed a test a little

Signed-off-by: Edward Welch <edward.welch@grafana.com>

* reset error details in Reset() function

Signed-off-by: Edward Welch <edward.welch@grafana.com>

* Loki: Do not store exact log line duplicates (now including unordered inserts) (#6642)

* do not insert log lines which are an exact duplicate in both timestamp and log content

Signed-off-by: Edward Welch <edward.welch@grafana.com>

* update test to now ignore dupes pushed out of order

Signed-off-by: Edward Welch <edward.welch@grafana.com>

* make the test unordered to be more thorough

Signed-off-by: Edward Welch <edward.welch@grafana.com>

Co-authored-by: Danny Kopping <danny.kopping@grafana.com>
Co-authored-by: Vladyslav Diachenko <82767850+vlad-diachenko@users.noreply.github.com>
Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
Co-authored-by: Pablo <2617411+thepalbi@users.noreply.github.com>
Co-authored-by: Salva Corts <salva.corts@grafana.com>
Co-authored-by: Ed Welch <edward.welch@grafana.com>
lxwzy pushed a commit to lxwzy/loki that referenced this pull request Nov 7, 2022
…_error__ while being processed (grafana#6543)

* adds an __error_details__ label which includes any details available about a logql error.

Signed-off-by: Edward Welch <edward.welch@grafana.com>

* fix lint, changed a test a little

Signed-off-by: Edward Welch <edward.welch@grafana.com>

* reset error details in Reset() function

Signed-off-by: Edward Welch <edward.welch@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants