Pods restart because their log volume exceeds its limit #620

siegfriedweber · 2023-07-11T11:27:16Z

Pods restart because their log volume exceeds its limit:

Usage of EmptyDir volume "log" exceeds the limit "11Mi".

This was observed in an OpenShift cluster for ZooKeeper 3.8.0 and HBase.

The text was updated successfully, but these errors were encountered:

siegfriedweber · 2023-07-11T14:31:14Z

Observations with ZooKeeper 3.8.1 in a local kind cluster:

The log file rollover works as desired:

The logs are written to zookeeper.log4j.xml.
When the log file exceeds 5 MiB (5,244,013 Bytes in the test which are 5 MiB + 1,133 Bytes) then it is renamed to zookeeper.log4j.xml.1 and new logs are written again to zookeeper.log4j.xml.
When the log file zookeeper.log4j.xml exceeds again 5 MiB (5,243,584 Bytes in the test which are 5 MiB + 704 Bytes) then zookeeper.log4j.xml.1 is deleted first (see also https://github.com/qos-ch/logback/blob/v_1.2.10/logback-core/src/main/java/ch/qos/logback/core/rolling/FixedWindowRollingPolicy.java#L128-L132), and then zookeeper.log4j.xml is renamed to zookeeper.log4j.xml.1.

The logs of the prepare container used 515 Bytes.

A total of 10,488,112 Bytes were used which means, nearly 1 MiB (1,046,224 Bytes) were unused.

If the size limit of the log volume is set to 2 MiB then the Pod is evicted with the shown error message when the log file exceeds 2,300,856 Bytes (~ 2.2 MiB). So the size limit is checked in a kind cluster.

Possible explanations for the behavior in OpenShift:

The file system uses a large block size.
The deletion does not happen immediately or the renaming occupies twice the file size for a short amount of time.
The node ephemeral storage is not sufficient:

A size limit can be specified for the default medium, which limits the capacity of the emptyDir volume. The storage is allocated from node ephemeral storage. If that is filled up from another source (for example, log files or image overlays), the emptyDir may run out of capacity before this limit.

https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
Some log entries are huge (more than 0.5 MiB).

siegfriedweber · 2023-07-12T12:06:23Z

The issue can be reproduced on our OpenShift cluster.

The problem is that the disk usage per log file is not just required blocks * block size. For instance, a zookeeper.log4j.xml with 4,127,151 bytes occupied 4,032 blocks. Then log entries were written and the actual file size increased to 4,132,477 bytes which occupy suddenly 8,128 blocks. On smaller files, less additional blocks are reserved. The number of reserved blocks can also decrease. For instance, the zookeeper.log4j.xml.1 contains 5,243,985 bytes and occupies only 5,124 blocks.

siegfriedweber added type/bug openshift/infrastructure labels Jul 11, 2023

sbernauer added the customer-request label Jul 11, 2023

sbernauer assigned siegfriedweber Jul 11, 2023

siegfriedweber closed this as completed Jul 14, 2023

lfrancke added the release/2023-07 label Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods restart because their log volume exceeds its limit #620

Pods restart because their log volume exceeds its limit #620

siegfriedweber commented Jul 11, 2023

siegfriedweber commented Jul 11, 2023 •

edited

Loading

siegfriedweber commented Jul 12, 2023

Pods restart because their log volume exceeds its limit #620

Pods restart because their log volume exceeds its limit #620

Comments

siegfriedweber commented Jul 11, 2023

siegfriedweber commented Jul 11, 2023 • edited Loading

siegfriedweber commented Jul 12, 2023

siegfriedweber commented Jul 11, 2023 •

edited

Loading