Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 storage configuration of LokiStack through the secret does not work properly #12608

Closed
kai-uwe-rommel opened this issue Apr 14, 2024 · 5 comments
Labels
sig/operator type/bug Somehing is not working as expected

Comments

@kai-uwe-rommel
Copy link

kai-uwe-rommel commented Apr 14, 2024

Describe the bug
I am trying to deploy a LokiStack using the Operator on Openshift and a local Minio instance in the same namespace. The deployment itself is working correctly (as far as I can see). But afterwards, the compactor pod crashloops complaining it does not find the endpoint in the secret. But it is there.

To Reproduce
Steps to reproduce the behavior:

  1. Install the Red Hat Loki operator on Openshift
  2. Create a Minio deployment in the target namespace where the LokiStack instance will be created
  3. Provide the secret with the configuration data for the Minio S3 store according to this: https://loki-operator.dev/docs/object_storage.md/
  4. Create the LokiStack instance with the operator
  5. When the compactor-0 pod starts, it fails with this error message:
    failed to init delete store: failed to get s3 object: MissingEndpoint: 'Endpoint' configuration is required for this service
    Expected behavior
    The configured secret DOES contain the endpoint. So the compactor should not fail.

Environment:

  • Openshift 4.15 cluster
  • Red Hat LokiStack 5.9 operator
  • simple Minio deployment

Screenshots, Promtail config, or terminal output
This is what the secret looks like in a test environment:

apiVersion: v1
kind: Secret
metadata:
  name: minio-s3
  namespace: openshift-logging
type: Opaque
data:
  # minio
  access_key_id: bWluaW8=
  # PassW0rd
  access_key_secret: UGFzc1cwcmQ=
  # logging
  bucketnames: bG9nZ2luZw==
  # http://minio:9000
  endpoint: aHR0cDovL21pbmlvOjkwMDA=

And this is the lokistack instance object:

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: lokistack-logging
  namespace: openshift-logging
spec:
  managementState: Managed
  size: 1x.extra-small
  hashRing:
    type: memberlist
  limits:
    global:
      queries:
        queryTimeout: 3m
      retention:
        days: 7
  rules:
    enabled: true
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v12
    secret:
      name: minio-s3
      type: s3
  storageClassName: thin
  tenants:
    mode: openshift-logging
@kai-uwe-rommel
Copy link
Author

The configuration of the S3 store for the LokiStack instance through this secret seems to have even more bugs. For example, in my first attempt, I accidentially had a data.region value (of "US") in it (from a template where I copied it from). This made lokistack assume the S3 store would live on AWS even though it was set to "type: s3" and not "type: aws", also leading to a strange error message.

@JStickler JStickler added sig/operator type/bug Somehing is not working as expected labels Apr 15, 2024
@kai-uwe-rommel
Copy link
Author

It turns out that the bug is that the compactor (and perhaps other components that use S3/MinIO) fails if the endpoint URL in the secret is a "short" one, e.g. "http://minio:9000". The problem is solved by using a FQDN, e.g. "http://minio.openshift-logging.svc.cluster.local:9000" instead. However, a short URL should normally work equally well. The bug is that with Loki it doesn't.

@periklis
Copy link
Collaborator

@kai-uwe-rommel Thanks for reporting this, I believe the Loki S3 configuration has always been a bit tricky. This stems from the fact supporting two ways to say which endpoint s3: your-url-here and endpoint: your-url-here. What you report is something we addressed in the two following PRs (not release in OpenShift Logging yet, ETA 5.9.1):

TL;DR; We streamlined the config generation to always use endpoint: ... and added some validation cases to keep Loki off from crashlooping.

@kai-uwe-rommel
Copy link
Author

So, one of these validation cases fails (although the secret's endpoint element was there and valid) and causes the pod to crashloop? :-)

@periklis
Copy link
Collaborator

Not exactly. First things first we don't separate the secret type into a category s3 and one for aws. For legacy reason the type s3 is used for both. However the Loki s3_storage_config allows two combinations to declare the target endpoint: s3+region (meant for aws) and endpoint (meant for everything else including the case private/wnterprise proxies that relay to AWS). Historically users and the operators maintainers included here made the mistake to mix this two and in turn trigger unwanted behavior in the AWS SDK (e.g. replacing your endpoint with .amazonaws.com). Therefore we concluded to use only endpoint and make the right choices for everybody using AWS, i.e Building the url from region and setting virtual host style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/operator type/bug Somehing is not working as expected
Projects
None yet
Development

No branches or pull requests

3 participants