S3 storage configuration of LokiStack through the secret does not work properly #12608

kai-uwe-rommel · 2024-04-14T11:50:20Z

Describe the bug
I am trying to deploy a LokiStack using the Operator on Openshift and a local Minio instance in the same namespace. The deployment itself is working correctly (as far as I can see). But afterwards, the compactor pod crashloops complaining it does not find the endpoint in the secret. But it is there.

To Reproduce
Steps to reproduce the behavior:

Install the Red Hat Loki operator on Openshift
Create a Minio deployment in the target namespace where the LokiStack instance will be created
Provide the secret with the configuration data for the Minio S3 store according to this: https://loki-operator.dev/docs/object_storage.md/
Create the LokiStack instance with the operator
When the compactor-0 pod starts, it fails with this error message:
failed to init delete store: failed to get s3 object: MissingEndpoint: 'Endpoint' configuration is required for this service
Expected behavior
The configured secret DOES contain the endpoint. So the compactor should not fail.

Environment:

Openshift 4.15 cluster
Red Hat LokiStack 5.9 operator
simple Minio deployment

Screenshots, Promtail config, or terminal output
This is what the secret looks like in a test environment:

apiVersion: v1
kind: Secret
metadata:
  name: minio-s3
  namespace: openshift-logging
type: Opaque
data:
  # minio
  access_key_id: bWluaW8=
  # PassW0rd
  access_key_secret: UGFzc1cwcmQ=
  # logging
  bucketnames: bG9nZ2luZw==
  # http://minio:9000
  endpoint: aHR0cDovL21pbmlvOjkwMDA=

And this is the lokistack instance object:

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: lokistack-logging
  namespace: openshift-logging
spec:
  managementState: Managed
  size: 1x.extra-small
  hashRing:
    type: memberlist
  limits:
    global:
      queries:
        queryTimeout: 3m
      retention:
        days: 7
  rules:
    enabled: true
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v12
    secret:
      name: minio-s3
      type: s3
  storageClassName: thin
  tenants:
    mode: openshift-logging

The text was updated successfully, but these errors were encountered:

kai-uwe-rommel · 2024-04-14T11:53:14Z

The configuration of the S3 store for the LokiStack instance through this secret seems to have even more bugs. For example, in my first attempt, I accidentially had a data.region value (of "US") in it (from a template where I copied it from). This made lokistack assume the S3 store would live on AWS even though it was set to "type: s3" and not "type: aws", also leading to a strange error message.

kai-uwe-rommel · 2024-04-16T17:56:08Z

It turns out that the bug is that the compactor (and perhaps other components that use S3/MinIO) fails if the endpoint URL in the secret is a "short" one, e.g. "http://minio:9000". The problem is solved by using a FQDN, e.g. "http://minio.openshift-logging.svc.cluster.local:9000" instead. However, a short URL should normally work equally well. The bug is that with Loki it doesn't.

periklis · 2024-04-17T07:57:29Z

@kai-uwe-rommel Thanks for reporting this, I believe the Loki S3 configuration has always been a bit tricky. This stems from the fact supporting two ways to say which endpoint s3: your-url-here and endpoint: your-url-here. What you report is something we addressed in the two following PRs (not release in OpenShift Logging yet, ETA 5.9.1):

fix(operator): Configure Loki to use virtual-host-style URLs for S3 AWS endpoints #12469
fix(operator): Improve validation of provided S3 storage configuration #12181

TL;DR; We streamlined the config generation to always use endpoint: ... and added some validation cases to keep Loki off from crashlooping.

kai-uwe-rommel · 2024-04-17T08:06:42Z

So, one of these validation cases fails (although the secret's endpoint element was there and valid) and causes the pod to crashloop? :-)

periklis · 2024-04-18T07:14:17Z

Not exactly. First things first we don't separate the secret type into a category s3 and one for aws. For legacy reason the type s3 is used for both. However the Loki s3_storage_config allows two combinations to declare the target endpoint: s3+region (meant for aws) and endpoint (meant for everything else including the case private/wnterprise proxies that relay to AWS). Historically users and the operators maintainers included here made the mistake to mix this two and in turn trigger unwanted behavior in the AWS SDK (e.g. replacing your endpoint with .amazonaws.com). Therefore we concluded to use only endpoint and make the right choices for everybody using AWS, i.e Building the url from region and setting virtual host style.

JStickler added sig/operator type/bug Somehing is not working as expected labels Apr 15, 2024

periklis closed this as completed Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 storage configuration of LokiStack through the secret does not work properly #12608

S3 storage configuration of LokiStack through the secret does not work properly #12608

kai-uwe-rommel commented Apr 14, 2024 •

edited

Loading

kai-uwe-rommel commented Apr 14, 2024

kai-uwe-rommel commented Apr 16, 2024

periklis commented Apr 17, 2024

kai-uwe-rommel commented Apr 17, 2024

periklis commented Apr 18, 2024

S3 storage configuration of LokiStack through the secret does not work properly #12608

S3 storage configuration of LokiStack through the secret does not work properly #12608

Comments

kai-uwe-rommel commented Apr 14, 2024 • edited Loading

kai-uwe-rommel commented Apr 14, 2024

kai-uwe-rommel commented Apr 16, 2024

periklis commented Apr 17, 2024

kai-uwe-rommel commented Apr 17, 2024

periklis commented Apr 18, 2024

kai-uwe-rommel commented Apr 14, 2024 •

edited

Loading