Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail Journald Log Forwarding Not Functioning with systemd v246 (affects FedoraCoreOS / Fedora 33 systems) #2792

Closed
fifofonix opened this issue Oct 22, 2020 · 26 comments · Fixed by #2957
Assignees

Comments

@fifofonix
Copy link

fifofonix commented Oct 22, 2020

Promtail 1.6.1 and 2.0.0 docker images have ceased forwarding Journald when running on 'next' version of FedoraCoreOS which is based on upstream Fedora 33. Issue reported via FCOS issues and suggestion is that there may now be an incompatibility in journald compression level (FedoraCoreOS systemd now with +ZSTD compression).

Steps to reproduce the behavior:

  1. Start Loki 1.6.0 or 2.0.0 with debug mode
  2. Start Promtail 1.6.1 or 2.0.0 with debug mode to tail journald and monitor a local log file.

Expect to see verbose logging in the promtail logs associated with relay of journald (like we see on pre Fedora33 FedoraCoreOS servers). Instead we only see occasional debug messages regarding the local log file config (expected).

Environment:

  • Infrastructure: FedoraCoreOS VMs on VMWare
  • Deployment tool: Custom

Promtail Config:

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: debug

positions:
  filename: /var/promtail/positions.yaml

clients:
  # Via Traefik with https and basic auth:
  - url: https://loki:****@****/loki/api/v1/push

scrape_configs:
  - job_name: journal
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
      path: /var/log/journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'
  - job_name: system
    static_configs:
    - targets:
      - localhost
      labels:
        job: varlogs
        __path__: /tmp/*.log

Sample Promtail Log


Oct 22 12:13:29 d-node-1 podman[254870]: level=warn ts=2020-10-22T12:13:29.399516973Z caller=filetargetmanager.go:101 msg="WARNING!!! entry_parser config is deprecated, please change to pipeline_stages"
Oct 22 12:13:29 d-node-1 podman[254870]: level=debug ts=2020-10-22T12:13:29.3996603Z caller=manager.go:226 component=discovery msg="Starting provider" provider=string/0 subs=[system]
Oct 22 12:13:29 d-node-1 podman[254870]: level=debug ts=2020-10-22T12:13:29.399777351Z caller=manager.go:244 component=discovery msg="Discoverer channel closed" provider=string/0
Oct 22 12:13:29 d-node-1 conmon[254962]: level=debug ts=2020-10-22T12:13:29.3996603Z caller=manager.go:226 component=discovery msg="Starting provider" provider=string/0 subs=[system]
Oct 22 12:13:29 d-node-1 conmon[254962]: level=debug ts=2020-10-22T12:13:29.399777351Z caller=manager.go:244 component=discovery msg="Discoverer channel closed" provider=string/0
Oct 22 12:13:29 d-node-1 conmon[254962]: level=info ts=2020-10-22T12:13:29.411579529Z caller=server.go:194 http=[::]:9080 grpc=[::]:45373 msg="server listening on addresses"
Oct 22 12:13:29 d-node-1 podman[254870]: level=info ts=2020-10-22T12:13:29.411579529Z caller=server.go:194 http=[::]:9080 grpc=[::]:45373 msg="server listening on addresses"
Oct 22 12:13:29 d-node-1 conmon[254962]: level=info ts=2020-10-22T12:13:29.411805748Z caller=main.go:108 msg="Starting Promtail" version="(version=1.6.1, branch=HEAD, revision=623858df0)"
Oct 22 12:13:29 d-node-1 podman[254870]: level=info ts=2020-10-22T12:13:29.411805748Z caller=main.go:108 msg="Starting Promtail" version="(version=1.6.1, branch=HEAD, revision=623858df0)"
Oct 22 12:13:34 d-node-1 conmon[254962]: level=debug ts=2020-10-22T12:13:34.400228784Z caller=filetargetmanager.go:231 msg="new target" labels="{__address__=\"localhost\"}"
Oct 22 12:13:34 d-node-1 conmon[254962]: level=info ts=2020-10-22T12:13:34.400339203Z caller=filetargetmanager.go:285 msg="Adding target" key="{host=\"d-node-1\", job=\"varlogs\"}"
Oct 22 12:13:34 d-node-1 podman[254870]: level=debug ts=2020-10-22T12:13:34.400228784Z caller=filetargetmanager.go:231 msg="new target" labels="{__address__=\"localhost\"}"
Oct 22 12:13:34 d-node-1 podman[254870]: level=info ts=2020-10-22T12:13:34.400339203Z caller=filetargetmanager.go:285 msg="Adding target" key="{host=\"d-node-1\", job=\"varlogs\"}"
...
Oct 22 14:16:04 d-node-1 conmon[254962]: level=debug ts=2020-10-22T14:16:04.400626757Z caller=filetarget.go:213 msg="no files matched requested path, nothing will be tailed" path=/tmp/*.log
Oct 22 14:16:04 d-node-1 podman[254870]: level=debug ts=2020-10-22T14:16:04.400626757Z caller=filetarget.go:213 msg="no files matched requested path, nothing will be tailed" path=/tmp/*.log
Oct 22 14:16:14 d-node-1 conmon[254962]: level=debug ts=2020-10-22T14:16:14.400709537Z caller=filetarget.go:213
...
Oct 22 14:16:44 d-node-1 conmon[254962]: level=debug ts=2020-10-22T14:16:44.400711064Z caller=filetarget.go:213 msg="no files matched requested path, nothing will be tailed" path=/tmp/*.log
Oct 22 14:16:44 d-node-1 podman[254870]: level=debug ts=2020-10-22T14:16:44.400711064Z caller=filetarget.go:213 msg="no files matched requested path, nothing will be tailed" path=/tmp/*.log
Oct 22 14:16:54 d-node-1 conmon[254962]: level=debug ts=2020-10-22T14:16:54.401455252Z caller=filetarget.go:213 msg="no files matched requested path, nothing will be tailed" path=/tmp/*.log
@fifofonix
Copy link
Author

Non-working and working systemctl -version output (required apt-get install systemd in promtail containers)

Note +ZSTD in non-working host view.

Non-working Host View

systemd 246 (v246.6-3.fc33)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified

Non-Working Container View

systemd 232
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Working Host View:

systemd 245 (v245.8-2.fc32)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified

Working Container View:

systemd 232
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

@chancez
Copy link
Contributor

chancez commented Oct 23, 2020

Are you sure /var/log/journal is populated? By default logs are stored in /var/run/journal unless /var/log/journal exists.
See https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html#Description :

The journal service stores log data either persistently below /var/log/journal or in a volatile way below /run/log/journal/ (in the latter case it is lost at reboot). By default, log data is stored persistently if /var/log/journal/ exists during boot, with an implicit fallback to volatile storage otherwise. Use Storage= in journald.conf(5) to configure where log data is placed, independently of the existence of /var/log/journal/.

@fifofonix
Copy link
Author

Yes /var/log/journal is most definitely populated, and has a current timestamp on both old working nodes, and new non-working nodes. There is a /var/run/log/journal directory but it is stale according to its timestamp on both working and non-working configurations.

@fifofonix
Copy link
Author

I think any Fedora33 based system will no longer be able to use promtail for journald until this is addressed. @slim-bean any recommendations on how this issue should be handled, or recommendations for Fedora/promtail users? Since Fedora is upstream to Redhat and Openshift (via OKD & FedoraCoreOS) I am imagining the impact is going to broaden.

@fifofonix
Copy link
Author

I was hoping that rebuilding promtail from source for buster, backporting libsystemd-dev which then would be on the right systemd version (246) with the correct flags for FedoraCoreOS (including especially +ZSTD) would solve things. Indeed Loki debug trace reveals correct reading of Journald log lines...but it seems changes would also be required to the Go program as it is giving a stack trace when attempting the send to Loki.

@fifofonix fifofonix changed the title Promtail Journald Log Forwarding Not Functioning on FedoraCoreOS (Fedora 33 Issue?) Promtail Journald Log Forwarding Not Functioning with systemd v246 (affects FedoraCoreOS / Fedora 33 systems) Nov 1, 2020
@fifofonix
Copy link
Author

This issue continues to manifest itself on Promtail 2.0.0 with Loki 2.0.0. Original description updated to reflect this.

@fifofonix
Copy link
Author

Good news. Rebuilding promtail from source for buster, backporting libsystemd-dev does in fact solve things providing one sets the BUILD_IN_CONTAINER switch to force rebuild of the promtail code itself against backported C modules (missing this switch was my earlier 'muppet' mistake).

It is not clear to me exactly how this should be fixed in the project itself. It would seem that the right end goal should be an additional tag available in dockerhub like promtail:2.0.0-systemd-246?

For the record the diff of the changes made on the v2.0.0 tag to the promtail/dockerfile are:

-FROM golang:1.14.2 as build
+FROM golang:1.14.2-buster as build
 # TOUCH_PROTOS signifies if we should touch the compiled proto files and thus not regenerate them.
 # This is helpful when file system timestamps can't be trusted with make
 ARG TOUCH_PROTOS
 COPY . /src/loki
 WORKDIR /src/loki
-RUN apt-get update && apt-get install -qy libsystemd-dev
-RUN make clean && (if [ "${TOUCH_PROTOS}" ]; then make touch-protos; fi) && make BUILD_IN_CONTAINER=false promtail
+RUN echo "deb http://deb.debian.org/debian buster-backports main" >> /etc/apt/sources.list
+RUN apt-get update && apt-get install -t buster-backports -qy libsystemd-dev
+RUN make clean && (if [ "${TOUCH_PROTOS}" ]; then make touch-protos; fi) && make BUILD_IN_CONTAINER=true promtail

 # Promtail requires debian as the base image to support systemd journal reading
-FROM debian:stretch-slim
+FROM debian:buster-slim
 # tzdata required for the timestamp stage to work
+RUN echo "deb http://deb.debian.org/debian buster-backports main" >> /etc/apt/sources.list
 RUN apt-get update && \
   apt-get install -qy \
-  tzdata ca-certificates libsystemd-dev && \
+  tzdata ca-certificates
+RUN apt-get install -t buster-backports -qy libsystemd-dev && \
   rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
 COPY --from=build /src/loki/cmd/promtail/promtail /usr/bin/promtail
 COPY cmd/promtail/promtail-docker-config.yaml /etc/promtail/config.yml

@slim-bean
Copy link
Collaborator

slim-bean commented Nov 6, 2020

I'm a little confused following this, why is the backport necessary? Is it enough to just update the build image and container image to buster?

@fifofonix
Copy link
Author

My understanding is that buster ships with systemd version v241 which shows the feature flags listed below. These do not include a v246 feature flag +ZSTD which is present in Fedora33 and apparently references a compression algorithm (https://facebook.github.io/zstd/).

Backporting the newer v246 to buster therefore seems necessary to ensure that the loki go project, which uses the systemd go project, which uses native systemd C apis to access journald, does so correctly. I did experiment with switching journald compression off altogether but this did not fix things, and in any case is non-ideal. Disclaimer however that I have never played with journald configuration before so it is possible I made a mistake with this.

% systemctl --version
systemd 241 (241)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

Fedora of course is upstream to Redhat but also FedoraCoreOS which is upstream to the Open Kubernetes Distribution (OKD) which is upstream to OpenShift. So, I guess this is of significance to a fairly wide base. FedoraCoreOS are running their 'next' stream with Fedora33 already to shake out issues like these.

@cinemast
Copy link

cinemast commented Nov 8, 2020

Same problem for ArchLinux:

systemctl --version
systemd 246 (246.6-1.1-arch)
+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

@T4cC0re
Copy link

T4cC0re commented Nov 16, 2020

Systemd journals created with 246+ are deliberately incompatible with earlier versions. The backport is required to ship those logs as the C linkage needs to be up to date here.

@mmahut
Copy link

mmahut commented Nov 16, 2020

@cyriltovena do you mind looking at this, please?

@slim-bean
Copy link
Collaborator

Will take a look at this tomorrow and get this merged, thanks everyone, apologies for delay

@slim-bean slim-bean self-assigned this Nov 16, 2020
@fifofonix
Copy link
Author

This issue now manifesting itself on the 'testing' FedoraCoreOS stream with an 11/30 timetable to go to the 'stable' stream at which point the impact of this issue will become more widespread.

@slim-bean
Copy link
Collaborator

working on this now, should be in a release early next week

@slim-bean
Copy link
Collaborator

@fifofonix can you (or anyone else) test out this image to see if it works:

grafana/promtail:update-libsystem-dev-0bd2ca6

I verified it worked on our existing systems but I don't have a newer system handy to make sure if fixes the original issue.

@slim-bean
Copy link
Collaborator

oh hrm where you testing with the docker image or a binary?

@fifofonix
Copy link
Author

fifofonix commented Nov 19, 2020

Looks good right now. I am running the container you built on both old (FCOS Fedora 32 w/systemd 245) and new (FCOS Fedora 33 w/systemd 246 with +ZSTD compression) servers and journald logs are aggregating from both. I'm going to leave those running and continue to monitor.

@tomkukral
Copy link

I'm testing image grafana/promtail:update-libsystem-dev-0bd2ca6 on Fedora 33 Workstation and it looks good so far.

@mmahut
Copy link

mmahut commented Nov 20, 2020

grafana/promtail:update-libsystem-dev-0bd2ca6 works fine on NixOS 20.09.

@T4cC0re
Copy link

T4cC0re commented Nov 20, 2020

grafana/promtail:update-libsystem-dev-0bd2ca6 confirmed working for an up-to-date Arch Linux, too.

@slim-bean
Copy link
Collaborator

Thanks everyone for testing this!

@v-stickykeys
Copy link
Contributor

v-stickykeys commented Dec 8, 2020

Hey @slim-bean I'm seeing what may be a similar issue just running on docker--if the file is created after promtail starts, it says it can't be found. If the file is created before, it ingests the existing logs but doesn't tail.

@abferm
Copy link
Contributor

abferm commented Apr 14, 2022

I appear to have run into this issue using the grafana/promtail:2.4.2-arm docker image on balena os.

Balena OS recently updated their systemd

systemd 244 (244.5+)
-PAM -AUDIT -SELINUX +IMA -APPARMOR -SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP -GCRYPT -GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID -ELFUTILS +KMOD -IDN2 -IDN -PCRE2 default-hierarchy=hybrid

systemd 249 (249.3+)
-PAM -AUDIT -SELINUX -APPARMOR +IMA -SMACK +SECCOMP -GCRYPT -GNUTLS -OPENSSL +ACL +BLKID -CURL -ELFUTILS -FIDO2 -IDN2 -IDN -IPTC +KMOD -LIBCRYPTSETUP +LIBFDISK -PCRE2 -PWQUALITY -P11KIT -QRENCODE -BZIP2 -LZ4 -XZ -ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=hybrid

Is there an official ARM docker image that addresses this issue?

@abferm
Copy link
Contributor

abferm commented Apr 14, 2022

Upon further inspection, it looks like the related pull request #2957 only touched clients/cmd/promtail/Dockerfile (cmd/promtail/Dockerfile at time of merge), and the arm image appears to be built using either Dockerfile.arm32 or Dockerfile.cross

@slim-bean
Copy link
Collaborator

the promtail:2.4.2-arm is the 32bit image and would be built with Dockerfile.arm32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants