-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log Collection Proof of Concept: Validate Chained Approach #955
Comments
One option for log file tail benchmarking: https://github.com/awslabs/amazon-log-agent-benchmark-tool |
Fluent Forward can be used over TCP or a unix socket, we'd probably want the latter, since it is supposed to be more efficient. fluent/fluent-bit#2181 |
We may still want to use TCP since unix sockets are not available on Windows (or we can choose depending on the platform). |
Vector seems like a better and more flexible lightweight forwarder than fluentbit IMO. Is there a reason we want to go with fluentbit? |
@jkowall I’m not an OpenTelemetry maintainer; I’m not in charge here. I’m just trying to help. But I think that the reasoning is that Fluent Bit is part of an established, graduated CNCF project, is widely used in production (160 million image downloads on Docker Hub for the main distro, considerable usage in other builds too), and has a good set of logging inputs that can be used. As I understand it, the plan here is not certify Fluent Bit as the OT Logging implementation. The OT Collector will be the unified telemetry agent for OT. However, right now it is primarily focused on metrics and tracing, and that work will continue to be prioritized for some time. We still want it to be a unified collector for telemetry on the timeline of the OT GA at the end of this year. So the proposal is that we implement support for the log data model/data type in the OT collector, but use Fluent Bit for its input plugins (forward to the collector). That way, we don’t have to spend effort on writing a bunch of log receivers right away. I suspect (and propose) that the OT Collector Logging implementation will fall into phases:
All of that being said, Tigran and the other OT folks are very data driven. I have immense respect for everyone whom I have met thus far in this community. I am sure they will happily review and consider alternate proposals. This idea is just a proposal- it may or may not actually happen. |
I quite like the idea of using Fluent forward protocol. I think that's very useful as a receiver even if Fluent Bit is not bundled with the collector. It means that you could run Fluent Bit or Fluentd on a set of hosts, and forward to the Collector running as an "Aggregator" on another host. It also means you can integrate with the Fluentd Docker Log Driver. |
@jkowall FluentBit was selected as the most suitable currently available option as a companion to OpenTelemetry Collector for multiple reasons (Wesley listed several). This is not intended to be a logging agent comparison initiative. We do not have a capacity to do a PoC with multiple logging agents in parallel so we went with the one which looks most promising based on preliminary research. If the PoC shows FluentBit is not a good fit we will consider another agent. We also have no desire or intent to prevent any other logging agents to be integrated and used in conjunction with OpenTelemetry Collector. Collector explicitly is built with extensibility in mind. Anyone who has the desire can replicate the PoC and the steps listed but using a different logging agent and is welcome to share the results. |
Sounds reasonable to me @tigrannajaryan and @PettitWesley I just could see that having better data pipelines available in a heavy forwarder (fluentd) versus a lightweight forwarder (fluent bit) would seem like a beneficial tradeoff. Keeping things in CNCF is fine, but not a good technical reason to select one technology to standardize upon versus another. I think both solutions would work fine and likely would fit into this pipeline easily. |
Did anybody took a look at Grafana/loki (https://grafana.com/oss/loki/) with Grafana/Promtail? |
@tigrannajaryan I think the majority of this is done. Any progress update here? Maybe consider to close this and open smaller issues if only small things are left. |
I'm going to do a bit more performance testing on the fluentbit standalone vs fluentbit+collector combination to get some more exact numbers on performance. That is the main thing remaining from the list in the description. |
Performance test results are outlined in https://docs.google.com/document/d/1uMO0DRlesOMGTjla4Ucq1W0wlnepc8DLe3tPJBvUZo4/edit?usp=sharing. Comments welcome. |
Just to summarize a few important points from the test results document:
I believe the tests prove the validity of the approach. It certainly is usable for many use cases and we can recommend it as the default OpenTelemetry approach for now. Over time we may gradually add more log collection capabilities directly in the Collector, thus eliminating the need to use Fluent Bit in certain cases where it is important. |
Hi, |
@sl1316 I have not tried this myself but it might help: https://medium.com/opentelemetry/introducing-the-fluentbit-exporter-for-opentelemetry-574ec133b4b4 |
Bumps [peter-evans/create-issue-from-file](https://github.com/peter-evans/create-issue-from-file) from 2 to 3. - [Release notes](https://github.com/peter-evans/create-issue-from-file/releases) - [Commits](peter-evans/create-issue-from-file@v2...v3) --- updated-dependencies: - dependency-name: peter-evans/create-issue-from-file dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
We want to validate that chaining FluentBit and OpenTelemetry Collector
is a viable approach to support collection of logs in formats that FluentBit
supports and which we would like to avoid implementing directly in Collector.
Chained approach implies that FluentBit collects logs from file logs (and
possibly from other sources), sends them to the Collector, then Collector sends
the logs to the backend.
We will use Fluentd Forward Protocol v1 to send logs from FluentBit to Collector.
We will use OTLP to send logs from Collector to the backend.
The primary concerns that need to be clarified are:
What is the performance impact of chained approach? How much more CPU and RAM
is used by Collector when logs are passed through it compared to the scenario
when the same logs are collected by FluentBit and send to the backend?
How much latency (delay in log delivery) is added by Collector?
What is the impact of Collector crashing? How much logs are queued in memory
and will be lost? Is queuing necessary of can be minimized/avoided to minimize
the losses? Can we run with 0-sized queue in Collector and rely on queuing and
batching done by FluentBit?
If queuing is necessary what is the impact of adding persistent queues that
survive the crash and restart of Collector?
In order to clarify these concerns the following tasks need to be performed:
logsproto
logsproto
logs exporter in Collector (add tootlpexporter
).testbed to automate it.
chained.
Note:
logsproto
is the experimental version of OTLP Logs Protocol. Receiver and exporter for this protocol will be added to existingotlp
receiver and exporter, but the implementation is experimental and subject to change, so we will not document for end users how it is configured and enabled (developer documentation is still necessary).The text was updated successfully, but these errors were encountered: