Skip to content

giantswarm/loki-app

Repository files navigation

Loki App

CircleCI

Giant Swarm offers Loki as a managed app. This chart provides a distributed loki setup based on this upstream chart. It tunes some options from upstream to make the chart easier to deploy.

This chart is meant to be used with S3 compatible storage only. Access to the S3 storage must be ensured for the chart to work.

  • Check below to see what configuration you need on the AWS side.
  • or below to see what configuration you need on the Azure side.

Table of Contents:

Requirements

  • You need to ensure that pods deployed can access S3 storage (as explained above).
  • On Giant Swarm clusters, you have to run a release that is based on helm 3. This means you need at least:
    • v12.1.2 for Azure
    • v12.5.1 for AWS
    • v12.3.1 for KVM.

Install

There are several ways to install this app onto a workload cluster.

Upgrading

Upgrading an existing Release to a new major version

A major chart version change (like v0.5.0 -> v1.0.0) indicates that there is an incompatible breaking change needing manual actions.

Versions before v1.0.0 are not stable, and can even have breaking changes between "minor" versions. (like v0.5.0 -> v0.6.0)

From 0.19.x to 0.20.x

⚠️ Upgrading to 0.20.x from any older version is a breaking change as described below

Be aware that this upgrade will cause a slight downtime of Loki as the ingress needs to be recreated (grafana/loki#12554)

Current list of open issues around loki 3 upgrade can be found here: grafana/loki#12506

From 0.6.x to 0.7.x

⚠️ Upgrading to 0.9.x from any older version can be a breaking change as described below

From 0.6.x to 0.7.x

⚠️ Upgrading to 0.6.x from any older version can be a breaking change as described below

  • nginx file definition has been changed for easier maintenance. But there is a drawback: if you had defined it in your values, you should add these values:
    loki:
      gateway:
        nginxConfig:
          customReadUrl: http://loki-multi-tenant-proxy.default.svc.cluster.local:3100
          customWriteUrl: http://loki-multi-tenant-proxy.default.svc.cluster.local:3101
          customBackendUrl: http://loki-multi-tenant-proxy.default.svc.cluster.local:3100
    

From 0.5.x to 0.6.x

⚠️ Upgrading to 0.6.x from any older version is a breaking change as described below

  • nginx file definition for loki-multi-tenant has moved to a helper template. If you had defined it in your values, you should:
    • remove .loki.gateway.nginxConfig.file from your values
    • set .loki.gateway.nginxConfig.genMultiTenant: true in your values
    • => now we manage maintenance for this template, so you can keep a cleaner values config.

From 0.4.x to 0.5.x

⚠️ Upgrading to 0.5.x from any older version is a breaking change as described below

The chart used as a base moved from a community chart to the officially maintained chart.

The structure of the values changed in 0.5.0 as we now rely on helm chart dependency mechanism to manage the application.

Basic upgrade procedure

  1. Retrieve current values.yaml
    • for manual/happa deployments you could do it with a command like k get cm -n [mycluster] loki-user-values -oyaml | yq '.data.values' on the management cluster
    • for gitops deployments, you should have it in git
  2. keep a backup: cp values.yaml values.yaml_0.4
  3. prepare your new values file (see "Most notable changes" section hereafter for details on what to change)
  4. open grafana, check that you can access your logs
  5. uninstall loki
  6. install newer loki version, with new values
  7. check in grafana that you can still access old and new logs

Note:

Uninstalling before re-installing is not mandatory. You can also change config and app version at the same time. Works well with Flux for instance.

Details

Your values.yaml file need some adjustments.

Most notable changes:

New Loki defaults to multi-tenant mode.

If you set an orgid when sending logs, you now have to make sure you set it also when reading logs. You can read multiple tenants with orgid built like this: tenant1|tenant2 Logs sent with no tenant are stored as tenant fake. You can see all your tenants by listing your object storage. Here, I have fake, tenant1 and tenant2 tenants:

fake/
tenant1/
tenant2/
index/
loki_cluster_seed.json

Rollback

You can rollback to your previous Loki version, and see your old logs. However, because of multi-tenancy, seeing logs that were stored with the new version may require some config tweaking.

Configuration

As this application is build upon the Grafana loki upstream chart as a dependency, most of the values to override can be found here.

Some samples can be found here

General recommendations

The number of replicas in the default values file are generally considered safe. If you reduce the number of replicas below the default recommended values, expect undefined behaviour and problems.

Prepare config file

  1. Create app config file Grab the included sample config file or azure sample config file, read the comments for options and adjust to your needs. To check all available options, please consult the full values.yaml file.

  2. update nodeSelectorTerms to match your nodes (if unsure, kubectl describe nodes [one worker node] | grep machine- should give you the right id for machine-deployment or machine-pool depending on your provider). Beware, there's 2 places to update! (obsolete with SSD)

  3. update gateway.ingress.hosts.host and gateway.ingress.tls.host

Multi-tenant setup

  1. The default GiantSwarm template is prepared for multi-tenancy. In multi tenant setups, you can enable multi-tenant-proxy to manage credentials for different tenants.

Enable the deployment of multi-tenant-proxy by setting multiTenantAuth.enabled to true.

Write down your credentials in multiTenantAuth.credentials. They should be formatted in your values file like this:

multiTenantAuth:
  enabled: true
  credentials: |-
    users:
      - username: Tenant1
        password: 1tnaneT
        orgid: tenant-1
      - username: Tenant2
        password: 2tnaneT
        orgid: tenant-2
  1. In single tenant setups with simple basic auth logins you want to use the gateway.basicAuth.existingSecret config option. To create the secret with necessary users and passwords use the following commands:
echo "passwd01" | htpasswd -i -c.htpasswd user01
echo "passwd02" | htpasswd -i .htpasswd user02
echo "passwd03" | htpasswd -i .htpasswd user03

kubectl -n loki create secret generic loki-basic-auth --from-file=.htpasswd

Then, set gateway.basicAuth.existingSecret to loki-basic-auth.

Caching

When ingesting logs from workload clusters, Loki may have a hard time processing a user's query because of the huge amount of data. This can lead to read pods being overwhelmed and result in a timeout output for the user.

To avoid this, Loki is able to use a memcached cluster which will operate - obviously - caching operations to ease the read pods' job. To enable caching, one will have to deploy the memcached-app and set up the loki.loki.memcached field in the Loki config.

This field is composed of 2 subfields :

  • chunk_cache, in which one may define the batch size for the chunks stored.
  • results_cache, in which one may define the validity period for a cached result as well as the timeout for the query requesting it.

Both subfields also need to have their host and service specified. If you deployed memcached-app with its default values :

  • host should be memcached-app.loki.svc. Otherwise, with custom values for memcached-app, the host value will be memcached's service DNS name.
  • service should be memcache. With custom values for memcached-app, the service value will be memcached's service port name.

Bloom filters

Giant Swarm experimented with bloom filters quite early one after the release of Loki 3.1.0 as can be seen here.

You can quite easily enable blooms in your loki instance by setting the following configuration:

loki:
  loki:
    structuredConfig:
      bloom_compactor:
        enabled: true
        retention:
          enabled: true
          max_lookback_days: 30
      bloom_gateway:
        enabled: true
        client:
          addresses: dns+loki-backend-headless.loki.svc.cluster.local:9095
    limits_config:
      bloom_gateway_enable_filtering: true
      bloom_compactor_enable_compaction: true

We decided against enabling it by default for now for multiple reasons mostly argued upstream grafana/loki#12751 (comment) and grafana/loki#12751 (comment):

  • bloom filters are under heavy development
  • architecture may still change quite often/fast
  • documentation is not guaranteed up-to-date
  • nobody knows about performance yet...

Deploying on AWS

The recommended deployment mode is using S3 storage mode. Assuming your cluster has kiam (https://github.com/uswitch/kiam), cert-manager and external-dns included, you should be good to use the instructions below to setup S3 bucket and the necessary permissions in your AWS account.

Make sure to create this config for the cluster where you are deploying Loki, and not at installation-level.

Prepare AWS S3 storage.

Create a new private S3 bucket based in the same region as your instances. Ex. gs-loki-storage.

  • encryption is not required, but strongly recommended: Loki won't encrypt your data
  • consider creating private VPC endpoint for S3 - traffic volume might be considerable and this might save you some money for the transfer fees,
  • it is recommended to use S3 bucket class for frequent access (S3 standard),
  • create a retention policy for the bucket; currently, loki won't delete files in S3 for you (check here and here).
  • CLI procedure:
# prepare environment
export CLUSTER_NAME=zj88t
export NODEPOOL_ID=oy9v0
export REGION=eu-central-1
export INSTALLATION=gorilla
export BUCKET_NAME=gs-loki-storage-"$CLUSTER_NAME" # must be globally unique
export AWS_PROFILE=gorilla-atlas # your AWS CLI profile
export LOKI_POLICY="$BUCKET_NAME"-policy
export LOKI_ROLE="$BUCKET_NAME"-role

# create bucket
aws --profile="$AWS_PROFILE" s3 mb s3://"$BUCKET_NAME" --region "$REGION"

Create bucket policy to enforce tls in-transit:

# Create policy
BUCKET_POLICY_DOC='{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EnforceSSLOnly",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::'"$BUCKET_NAME"'",
                "arn:aws:s3:::'"$BUCKET_NAME"'/*"
            ],
            "Condition": {
                "Bool": {
                    "aws:SecureTransport": "false"
                }
            }
        }
    ]
}'

aws --profile="$AWS_PROFILE" s3api put-bucket-policy --bucket $BUCKET_NAME --policy "$BUCKET_POLICY_DOC"

Prepare AWS IAM policy.

Create an IAM Policy in IAM. If you want to use AWS WebUI, copy/paste the contents of POLICY_DOC variable.

# Create policy
POLICY_DOC='{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject" ],
            "Resource": [
                "arn:aws:s3:::'"$BUCKET_NAME"'",
                "arn:aws:s3:::'"$BUCKET_NAME"'/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:GetAccessPoint",
                "s3:GetAccountPublicAccessBlock",
                "s3:ListAccessPoints"
            ],
            "Resource": "*"
        }
    ]
}'
aws --profile="$AWS_PROFILE" iam create-policy --policy-name "$LOKI_POLICY" --policy-document "$POLICY_DOC"

Prepare AWS IAM role

Up to giantswarm v18

Create a new IAM Role that allows the necessary instances (k8s masters in the case of using kiam) to access resources from the policy. Set trust to allow the Role used by kiam to claim the S3 access role. If you want to use AWS WebUI, copy/paste the contents of POLICY_DOC variable.

# Create role
PRINCIPAL_ARN="$(aws --profile="$AWS_PROFILE" iam get-role --role-name "$CLUSTER_NAME"-IAMManager-Role | sed -n 's/.*Arn.*"\(arn:.*\)".*/\1/p')"
ROLE_DOC='{
    "Version": "2012-10-17",
    "Statement": [
        {
        "Effect": "Allow",
        "Principal": {
            "AWS": "'"$PRINCIPAL_ARN"'"
        },
        "Action": "sts:AssumeRole"
        }
    ]
}'

From giantswarm v19

Giant Swarm clusters will use IRSA (Iam Roles for Service Accounts) to allow pods to access S3 buckets' resources. For more details concerning IRSA, you can refer to the official documentation as well as to the giant swarm one.

This means that the role's Trust Relationship will be different that the one used for KIAM (cf above) :

PRINCIPAL_ARN="$(aws --profile="$AWS_PROFILE" iam get-role --role-name "$CLUSTER_NAME"-IAMManager-Role | sed -n 's/.*Arn.*"\(arn:.*\)".*/\1/p')"
ROLE_DOC='{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::'$PRINCIPAL_ARN':oidc-provider/irsa.'$CLUSTER_NAME'.k8s.'$INSTALLATION'.'$REGION'.aws.gigantic.io"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "irsa.'$CLUSTER_NAME'.k8s.'$INSTALLATION'.'$REGION'.aws.gigantic.io:sub": "system:serviceaccount:loki:loki"
                }
            }
        }
    ]
}'

Create role

Everything is now set to create the role :

aws --profile="$AWS_PROFILE" iam create-role --role-name "$LOKI_ROLE" --assume-role-policy-document "$ROLE_DOC"
# Attach the policy to the role
LOKI_POLICY_ARN="${PRINCIPAL_ARN%:role/*}:policy/$LOKI_POLICY"
aws --profile="$AWS_PROFILE" iam attach-role-policy --policy-arn "$LOKI_POLICY_ARN" --role-name "$LOKI_ROLE"
  • Store the role's arn in a variable for the next step :
LOKI_ROLE_ARN="${PRINCIPAL_ARN%:role/*}:role/$LOKI_ROLE"

Link IAM role to Kubernetes

Up to giantswarm v18

Currently, you have to manually pre-create the namespace and annotate it with IAM Roles required for pods running in the namespace:

kubectl create ns loki
kubectl annotate ns loki iam.amazonaws.com/permitted="$LOKI_ROLE_ARN"

From giantswarm v19

Since IRSA is relying on the use of service accounts to grant access rights to the pods, you don't have to manually create the loki namespace as you won't have to annotate it. Instead, you'll have to edit the Chart's values under the loki section with the following :

serviceAccount:
  create: true
  name: loki
  annotations:
    eks.amazonaws.com/role-arn: "$LOKI_ROLE_ARN"

This way, all pods using the loki service account will be able to access to the S3 bucket created earlier.

Install the app

  • Fill in the values from previous step in your config (values.yaml) file:

    • role annotation for S3
    • cluster ID
    • node pool ID
    • and your custom setup
  • Install the app using your values. Don't forget to use the same namespace as you prepared above for the installation.

Deploying on Azure

Gather data

Find the 'Subscription name' (usually named after your installation) name and the 'Resource group' of your cluster (usually named after cluster id) inside your 'Azure subscription'

  • list subscriptions:
az account list -otable
export SUBSCRIPTION_NAME="your subscription"
  • list resource groups:
az group list --subscription "$SUBSCRIPTION_NAME" -otable
export RESOURCE_GROUP="your resource group"

object storage setup

  1. Create 'Storage Account' on Azure (How-to) 'Create storage account'
    • 'Account kind' should be 'BlobStorage'
    • Example with Azure CLI:
# Chose your storage account name
export STORAGE_ACCOUNT_NAME="loki$RESOURCE_GROUP"
# then create it
az storage account create \
     --subscription "$SUBSCRIPTION_NAME" \
     --name "$STORAGE_ACCOUNT_NAME" \
     --resource-group "$RESOURCE_GROUP" \
     --sku Standard_GRS \
     --encryption-services blob \
     --https-only true \
     --kind BlobStorage \
     --access-tier Hot

(It may be required to set the location using the --location flag.)

  1. Create a 'Blob service' 'Container' in your storage account
    • Example on how to do it with Powershell in Azure portal:
export CONTAINER_NAME="$STORAGE_ACCOUNT_NAME"container
az storage container create \
     --subscription "$SUBSCRIPTION_NAME" \
     -n "$CONTAINER_NAME" \
     --public-access off \
     --account-name "$STORAGE_ACCOUNT_NAME"
  1. Go to the 'Access keys' page of your 'Storage account'
    • Use the 'Storage account name' for azure_storage.account_name
    • Use the name of the 'Blob service' 'Container' for azure_storage.blob_container_name
    • Use one of the keys for azure.storage_key
    • With azure CLI
az storage account keys list \
     --subscription "$SUBSCRIPTION_NAME" \
     --account-name "$STORAGE_ACCOUNT_NAME" \
| jq -r '.[]|select(.keyName=="key1").value'

Install the app

  • Fill in the values from previous step in your config (values.yaml) file:

    • cluster ID
    • node pool ID
    • and your custom setup
  • Install the app using your values.

Deploying on a new cluster for testing purposes

You might find yourself in a situation where you want to deploy Loki on a new cluster for testing purposes only. Depending on the testing requirements, you might need to avoid creating an object storage with a cloud-provider and manage its access permissions for your Loki pods.

Then you should consider deploying Loki with MinIO as an object storage solution. To put it in a nutshell, MinIO is an object storage solution with a S3-like API which uses the nodes' volumes to store its data. Thus, when used for testing purposes, one can mock an S3 bucket behavior to have quick and simple object storage access for Loki without the need for complex access permissions.

The good news is that the Loki chart directly provides a minio field where one can configure a minio deployment to serve as object storage for the Loki pods. Such a configuration is displayed in the sample_configs/values-eks-testing.yaml file.

Creating access keys for MinIO access

Once Loki is deployed with MinIO, one will have to create a key pair in the MinIO console to grant Loki pods access to the buckets. To achieve this, one will first have to port-forward the adequate service :

kubectl port-forward -n loki service/loki-minio-console 8080:9001

Change the namespace according to the one in which your loki pods and services are deployed.

Then one will have to access to the minio console at 127.0.0.1:8080. Go to identity --> user and create a new user with whatever name and password one wants and attach the correct permissions needed (most likely the readwrite one). Then, one will have to click on the newly created, go to service accounts and click on create service account. This is where one needs to pay attention because both the Access Key and the secret Key are present in the values mentioned earlier as loki.loki.storage.s3.accessKeyId and loki.loki.storage.s3.secretAccessKey.

Set the Access Key and secret Key in the console so that they have the same value as the corresponding fields in the loki values file and voilà !

Everything is now set for testing.

Testing your deployment

Reading data with logcli

  1. Install latest logcli from https://github.com/grafana/loki/releases

  2. Here are a few test queries for Loki, that you should adapt with your URL and credentials:

  • test from WAN
# List all streams
logcli --username=Tenant1 --password=1tnaneT --addr="http://loki.nx4tn.k8s.gauss.eu-west-1.aws.gigantic.io" series '{}'
  • Test with a port-forward to the gateway:
k port-forward -n loki svc/loki-gateway 8080:80
logcli --username=Tenant1 --password=1tnaneT --addr="http://localhost:8080" series '{}'
  • You can also test direct access to loki-write
# port-forward loki-write to local port 3100
k port-forward -n loki svc/loki-write 3100:3100
# or loki-query-frontend-xxxx port 3100 accepts the same queries

# List all streams
# Note that we use "org-id" rather than "username/password" when we bypass the gateway
$ logcli --org-id="tenant-1" --addr="http://localhost:3100" series '{}'
http://localhost:3100/loki/api/v1/series?end=1654091687961363182&match=%7B%7D&start=1654088087961363182

Ingesting data with promtail

---
server:
  disable: true
positions:
  filename: /tmp/promtail_test_positions.yaml
clients:
  - url: http://localhost:8080/loki/api/v1/push
    # tenant_id: tenant-1
    basic_auth:
      username: Tenant1
      password: 1tnaneT
    tenant_id: tenant-1
scrape_configs:
  - job_name: logfile
    static_configs:
      - targets:
          - localhost
        labels:
          job: logfile
          host: local
          __path__: /tmp/lokitest.log
  • If you want to bypass the gateway, you can port-forward Loki distributor to localhost:3100
k port-forward -n loki svc/loki-distributor 3100:3100
# Don't forget to change your promtail URL, and use tenant_id rather than basic_auth!
  • Launch promtail
promtail --config.file=promtail-test.yml --inspect
  • Add data to your log file
(while true ; do echo "test log line $(date)"; sleep 1; done ) >> /tmp/lokitest.log
  • Query loki with logcli and see your data

Limitations

The application and its default values have been tailored to work inside Giant Swarm clusters. If you want to use it for any other scenario, know that you might need to adjust some values.

Links

Credit

This application is installing the upstream chart below with defaults to ensure it runs smoothly in Giant Swarm clusters.