Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaces Vector with fluent-bit #665

Merged
merged 30 commits into from
Mar 30, 2022
Merged

Replaces Vector with fluent-bit #665

merged 30 commits into from
Mar 30, 2022

Conversation

nkinkade
Copy link
Contributor

@nkinkade nkinkade commented Mar 29, 2022

Not long ago we discovered that due to a probable bug in Vector, many pods were not pushing logs to Stackdriver. We upgraded Vector to a version with the bug supposed fix, and added a new alert to let us know when container logs for a node are not showing up in Stackdriver.

Lately, the new alerts for missing logs has started to fire in production and staging. It seems there is some other unresolved bug in Vector. Logs are just to important for us to be missing, and for alerts to be firing all the time.

This PR replaces Vector with fluent-bit. We don't know that fluent-bit is necessarily better than Vector, but it is worth a shot. Vector seems very nice, and its configuration format is certainly far more elegant and powerful than that of fluent-bit, but as long as logs consistently make it to Stackdriver, then the extra complexity in configuration is worth it once.

NOTE: This PR also includes a random, unrelated fix to the script that creates virtual platform nodes. If the directory /etc/kubernetes/manifests does not exist, the kubelet spams its log with messages every few seconds complaining about this missing directory. The changes just creates the directory, if it doesn't already exist.


This change is Reviewable

nkinkade added 29 commits March 25, 2022 14:07
Also, uses a debug image of fluent-bit, which should have a shell for
debugging.
Also, changes name of secret data for fluentbit
fluent-bit's filtering is not rich enough to support combining && and
|| logic into a single INPUT, so this commit splits the single INPUT
into several others so that we can filter and "kernel" message of
particular priorities.
This is to allow the use of the "Use_Kubelet" feature where fluent-bit
will get k8s metadata for logs from the local kubelet instead of having
to request the data from the API server.
This option currently requires hostNetwork=true, which is a non-starter.
Earlier I was getting an error that the CRI parser was already defined,
but now I'm seeing an error that the CRI parser is not registered.
fluent-bit docs say that it is built into fluent-bit. We'll see what
happens again.

Also, modifies tags for various inputs and outputs.
Also sets Read_From_Tail for systemd input plugin, which causes the
plugin to read only new entries in journal on startup instead of trying
to ingest the entire journal.

Also, one small unrelated fix to creating virtual sites.
Also adds a comment about why we have so many systemd inputs.
@nkinkade nkinkade requested a review from robertodauria March 29, 2022 22:39
Copy link
Contributor

@robertodauria robertodauria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: - Thanks.

Reviewed 5 of 6 files at r1.
Reviewable status: 0 of 1 approvals obtained

@nkinkade nkinkade merged commit 6de33c1 into master Mar 30, 2022
@nkinkade nkinkade deleted the sandbox-kinkade branch March 30, 2022 16:53
nkinkade added a commit that referenced this pull request Dec 19, 2022
This is more or less the inverse of this close PR from about 9 months
ago: #665

fluent-bit has _not_ turned out to be more reliable than Vector, or at
least not any better. fluent-bit's reliability today is at least as bad
Vector's lack of reliability was a year ago, and Vector's configuration
and documentation is worlds better than fluent-bit. The bugs that seemed
like they might be responsible for the unreliability of Vector have all
apparently been resolved. Let's try Vector, again.
nkinkade added a commit that referenced this pull request Dec 20, 2022
This is more or less the inverse of this close PR from about 9 months
ago: #665

fluent-bit has _not_ turned out to be more reliable than Vector, or at
least not any better. fluent-bit's reliability today is at least as bad
Vector's lack of reliability was a year ago, and Vector's configuration
and documentation is worlds better than fluent-bit. The bugs that seemed
like they might be responsible for the unreliability of Vector have all
apparently been resolved. Let's try Vector, again.
nkinkade added a commit that referenced this pull request Dec 21, 2022
* Replaces fluent bit with Vector

This is more or less the inverse of this close PR from about 9 months
ago: #665

fluent-bit has _not_ turned out to be more reliable than Vector, or at
least not any better. fluent-bit's reliability today is at least as bad
Vector's lack of reliability was a year ago, and Vector's configuration
and documentation is worlds better than fluent-bit. The bugs that seemed
like they might be responsible for the unreliability of Vector have all
apparently been resolved. Let's try Vector, again.

* Fixes a syntax error and removes condition.type

The default type is "vrl", and a shorthand is to just make the value of
condition be a string that represents the filter.

* Adds a data_dir config for Vector

* Fixes the VectorMissing alert

Vector is not configured to export any metrics, so the usual
`up{deployment="vector"}` expression does not work. However,
kube-state-metrics still knows about it.

* Moves helm values override files to top level dir

Before the helm values override files were mixed in with various jsonnet
config files in the config/ directory. Since helm is its own thing, this
commit creates a new top-level directory named "helm/" and moves all the
value override files to seprate subdirectories of that directory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants