Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync upstream-main to upstream-v2.7.0-pre #60

Merged

Conversation

periklis
Copy link

Sync upstream/main into openshift/upstream-v2.7.0-pre to unblock Red Hat Loki Operator work. Includes to cherry-picks as carry-patches:

cc @Red-GV @xperimental

Danny Kopping and others added 30 commits October 5, 2022 09:01
The `production/docker` docker-compose setup was broken because of a
missing configuration.
**What this PR does / why we need it**:
Updated the `loki-build-image` to Go 1.19.2 as a precursor for
grafana#7243.

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
This documents the existing command completion that kingpin provides
(docs
[here](https://github.com/alecthomas/kingpin#bashzsh-shell-completion)
on that).

**Which issue(s) this PR fixes**:
Fixes grafana#2949
…7337)

**What this PR does / why we need it**:
Currently the `loki-build-image` has a version of golangci-lint which is
based on Go 1.18.
[This issue](https://drone.grafana.net/grafana/loki/16278/2/9) has
arisen in our CI: golangci/golangci-lint#3107
It adds bearer token support for lambda-promtail, as it supports only
http basic auth.

Signed-off-by: Thomas Belian <thomas.belian@bt909.de>
Upgrade Loki and all related components to Go 1.19.2.
**What this PR does / why we need it**:
In PR grafana#7321, we re-introduced mmap for reading tsdb index, but it does
not cover all the paths where we open the index. This PR takes care of
it.

I have also removed the code for decompressing the index file since we
always get a decompressed file from the `indexshipper`.
I removed these checks in grafana#7337 to
not make grafana#7243 overly complex to
review.

Refactored code to fix valid lint issues, ignored the
irrelevant/dangerous ones.
I also removed `unused` and `unparam` as these produce low-quality
results that are largely useless.
…7342)

**What this PR does / why we need it**:
Adds the handling of negative/zeroed desired rate to our distributor
stream sharding. Without this, invalid desired rates results in a -Inf
number of shards. This also adds a log line to report that such a thing
happened.

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Co-authored-by: Danny Kopping <danny.kopping@grafana.com>
Single binary deployment is not working when going through default
service. The issue is with the labels that currently includes the canary
pods.

Fixes grafana#7260
Bumped version of the chart and added changelog. 
These changes were missed in grafana#7301
This PR introduces an API on the Ingester that reports all current
streams in memory and their current rate in bytes/second. The reponse
from the API contains a slice of `StreamRate` structs:

```go
type StreamRate struct {
	StreamHash        uint64 `protobuf:"varint,1,opt,name=streamHash,proto3" json:"streamHash,omitempty"`
	StreamHashNoShard uint64 `protobuf:"varint,2,opt,name=streamHashNoShard,proto3" json:"streamHashNoShard,omitempty"`
	Rate              int64  `protobuf:"varint,3,opt,name=rate,proto3" json:"rate,omitempty"`
}
```

In the interest of space on the caller and in transmission, only hashes
and rates are sent.
- `StreamHash` will be used to deduplicate replicas
- `StreamHashNoShard` will be used to aggregate a sharded stream's total
rate.

Co-authored-by: Dylan Guedes <djmgguedes@gmail.com>
…7346)

**What this PR does / why we need it**:
The config file URL mentioned in the docs contain config options that
doesn't match the latest version of Loki, which, as of now, is v2.6.1.

**Which issue(s) this PR fixes**:
Fixes grafana#6915 

**Special notes for your reviewer**:
Disclaimer: I am a Grafanista from k6.

**Checklist**
- [x] Reviewed the `CONTRIBUTING.md` guide
- [ ] Documentation added
- [ ] Tests updated
- [x] `CHANGELOG.md` updated
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
**What this PR does / why we need it**:
- Move stream sharding configuration to its own package to avoid cyclic
imports
- Change stream sharding to be a per-tenant configuration
- Change ingesters to reject whole streams due to rate-limit based on
per-tenant stream sharding
- Change stream sharding flags prefix from `distributor.shard-stream` to
`shard-stream`
…rafana#7355)

There's a bug in TSDB where empty label values alter the series
fingerprints used in chunk addresses, but are stripped out and create
different fingerprints in our TSDB based index. I plan to open a PR to
Prometheus, but I also prefer normalizing labels here by removing empty
label values.

According to
[Prometheus](https://prometheus.io/docs/concepts/data_model/),
> A label with an empty label value is considered equivalent to a label
that does not exist.

Our bug comes from the Prometheus model `LabelBuilder` stripping out
empty label values,
[here](https://github.com/grafana/loki/blob/main/vendor/github.com/prometheus/prometheus/model/labels/labels.go#L419-L423).
This means that `Labels.NewBuilder({job="foo",bar=""}) => {job="foo"}`.
The
[labels.Hash()](https://github.com/grafana/loki/blob/main/vendor/github.com/prometheus/prometheus/model/labels/labels.go#L136-L182)
function does not skip empty label values, meaning it's easy to end up
with different fingerprints for effectively the same series (an empty
label value is supposed to be equivalent to a missing label in
prometheus, but they generate different label hashes).
These were introduced in an earlier revision of the TSDB work, but are
unused now.
Allow the OS kernel to decide what's an open port instead of trying to
guess
Mostly related to the job selector; I deployed these against a local
grafana instance along the way and I think I got everything correct but
we might still need to update some of these again.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Signed-off-by: Callum Styan <callumstyan@gmail.com>
The `RateStore` concurrently queries all ingester `GetStreamRate` APIs
and stores the results. `RateStore` deduplicates replicas and combines
sharded streams so a caller of `RateFor` gets an unsharded view of a
stream's rate.
This PR does two things:
* Includes the series fingerprint in TSDB WALs. Doing this ensures we
recover fingerprint used in chunk addresses after restart even if loki
is upgraded and the label hashing algorithm changes.
* Removes much of the extra hashing in the TSDB write path + manager.
Occasionally, the fingerprint may be different than the hash of labels.
Historically this has happened in a few places:
* When the `__name__="logs"` label was injected for non-tsdb indices
(not applicable to TSDB)
* When building a multitenant tsdb index and adding a synthetic
`__loki_tenant__` label
* In a previous bug(grafana#7355) when
empty label values altered calculated fingerprints.

Explicitly storing/retrieving the desired fingerprint throughout helps
avoid calculating it in error.
**What this PR does / why we need it**:
We noticed ingesters and index-gateways panicking when using `tsdb` as
an index store with the latest build. This is caused by closing `mmap`
while the index queries are still being processed. This PR takes care of
it by locking the index until we are done querying the index.

**Checklist**
- [x] Tests updated
migrated step `deploy` to `updater` plugin from the deprecated
`deploy-image` plugin and added `logql-analyzer` to updater-config.
It's needed to automate adding a new version to logql-analyzer namespace
once the new version is released or once the new version is published on
docker hub.
Reverts grafana#7348

I expect we'll merge this again in the future, but we'll need some more
time for downstream projects to be ready for this change.
**What this PR does / why we need it**:
correct indentation of `relabel_configs` to make it valid and consistent
with
https://grafana.com/docs/loki/latest/clients/promtail/scraping/#syslog-receiver

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the `CONTRIBUTING.md` guide
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
Summary:
This imports our canary into the Loki mixin so it can be used by the
Helm chart. The dashboard is disabled by default. It can be enabled by
setting

```jsonnet
{
  _config+:: {
    canary+: {
      enabled: true,
    },
  },
}
```

Co-authored-by: Vladyslav Diachenko <82767850+vlad-diachenko@users.noreply.github.com>
SadFaceSmith and others added 17 commits October 12, 2022 22:20
**What this PR does / why we need it**:
This change allows client certificates signed by a self-signed
certificate authority to be used by the Loki canary.

**Which issue(s) this PR fixes**:
Fixes grafana#4366 

**Special notes for your reviewer**:
This has been tested on linux amd64 with self-signed certificates.

**Checklist**
- [x] Reviewed the `CONTRIBUTING.md` guide
- [x] Documentation added
- [x] Tests updated
- [x] `CHANGELOG.md` updated
- [x] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
**What this PR does / why we need it**:
This should make it easier for users to set the proper policies for
their S3 storage.

**Which issue(s) this PR fixes**:
Relates to grafana#7403

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the `CONTRIBUTING.md` guide
- [x] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
…ds (grafana#7420)

**What this PR does / why we need it**:
Fixing the documentation about exposed promtail metrics for
promtail_request_duration_seconds

**Which issue(s) this PR fixes**:
No issue related

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the `CONTRIBUTING.md` guide
- [x] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
**What this PR does / why we need it**:
After been running the GCP Logs Push target in actual workloads, we've
noticed that GCP sends a lot of traffic, and adjusts that send rate
according to the rate at which the receiver can process. To perform that
adjustment, GCP considers as a NACK [non 2xx HTTP
responses](https://cloud.google.com/pubsub/docs/push#receive_push).

This PR adds a target handler timeout (that includes as well the
`api.Entry` channel send) to allow the user to configure this maximum
processing time, therefore giving GCP notice that the sending rate is
too high.

This new timeout is optional, and disabled by default.

**Which issue(s) this PR fixes**:
Related to grafana/cloud-onboarding#2067

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the `CONTRIBUTING.md` guide
- [x] Documentation added
- [x] Tests updated
- [x] `CHANGELOG.md` updated
- [x] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
@periklis periklis self-assigned this Oct 14, 2022
@Red-GV
Copy link

Red-GV commented Oct 14, 2022

/lgtm

@periklis
Copy link
Author

/approve

@periklis
Copy link
Author

/lgtm cancel

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Oct 14, 2022
@openshift-ci
Copy link

openshift-ci bot commented Oct 14, 2022

@periklis: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@periklis
Copy link
Author

/approve

@openshift-ci
Copy link

openshift-ci bot commented Oct 14, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: periklis, Red-GV

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@periklis periklis merged commit 6ab81bf into openshift:upstream-v2.7.0-pre Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.