Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Make Stanza adapter more more synchronous by removing channels and workers #35453

Open
andrzej-stencel opened this issue Sep 27, 2024 · 8 comments · May be fixed by #35669
Open
Assignees
Labels
enhancement New feature or request pkg/stanza

Comments

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Sep 27, 2024

Component(s)

pkg/stanza

Is your feature request related to a problem? Please describe.

This issue is created as a result of discussion in #31074. From #31074 (comment):

In my view, the current implementation of the Stanza adapter with multiple channels and asynchronous workers if overly complex without delivering a lot of value, instead creating the possibility of data loss.

and:

The conversion [from Stanza entries to OTLP log records] is currently pretty complex, involving multiple asynchronous workers and a bunch of channels.

Describe the solution you'd like

Remove the channels and workers by having LogEmitter convert Stanza entries to log records and call ConsumeLogs synchronously. Leave the 100-log buffering in place for this change, to minimize the impact. Compare benchmarks.

Describe alternatives you've considered

Leaving the 100-log buffering in place leaves the issue of losing logs still not completely resolved. Ideally we want to get rid of this buffering, but we need to step cautiously, as this may have serious performance impact. We should measure this impact and possibly explore possibilities for Stanza receivers like Filelog receiver / File consumer to emit entries in batches, not one by one. Implementing this would alleviate the possible performance impact that removing of batching in LogEmitter may introduce.

Additional context

See #31074

@andrzej-stencel andrzej-stencel added enhancement New feature or request needs triage New item requiring triage labels Sep 27, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@andrzej-stencel
Copy link
Member Author

Removing needs triage label as this was discussed with the code owner.

@andrzej-stencel andrzej-stencel changed the title [pkg/stanza] Make Stanza adapter synchronuous [pkg/stanza] Make Stanza adapter synchronous Oct 4, 2024
@bacherfl
Copy link
Contributor

bacherfl commented Oct 7, 2024

Hi! I would like to take this on if this is not taken yet

@andrzej-stencel
Copy link
Member Author

It's yours @bacherfl, thanks for picking it up!

@bacherfl
Copy link
Contributor

bacherfl commented Oct 8, 2024

Thanks @andrzej-stencel! One quick question to clarify: Currently the LogEmitter and its logChannel are also used by two other components:

Should we remove the async channels from the LogEmitter altogether and also adapt those two other components that use it, or do you think we should provide both options (async and sync) in the LogEmitter and leave those two other components unchanged for now?
Both of these other components would be fairly simple to adapt to also work synchronously though.

@andrzej-stencel
Copy link
Member Author

Good point @bacherfl. I wasn't aware that the container parser uses LogEmitter internally. Yes, ideally the logs channel should be removed from the LogEmitter altogether and both the container parser and the Logs Transform processor should use it the same way as the Stanza adapter, having the LogEmitter emit logs (almost) synchronously.

I put "almost" in the sentence above, because we want to keep using the 100-entry buffer in the LogEmitter, which makes it not completely synchronous after removing the channels and workers.

@andrzej-stencel andrzej-stencel changed the title [pkg/stanza] Make Stanza adapter synchronous [pkg/stanza] Make Stanza adapter more more synchronous by removing channels and workers Oct 9, 2024
@bacherfl
Copy link
Contributor

bacherfl commented Oct 9, 2024

Good point @bacherfl. I wasn't aware that the container parser uses LogEmitter internally. Yes, ideally the logs channel should be removed from the LogEmitter altogether and both the container parser and the Logs Transform processor should use it the same way as the Stanza adapter, having the LogEmitter emit logs (almost) synchronously.

I put "almost" in the sentence above, because we want to keep using the 100-entry buffer in the LogEmitter, which makes it not completely synchronous after removing the channels and workers.

Thanks for the clarification @andrzej-stencel - then I will proceed with also adapting the log transform processor and the container parser. I also saw that you were doing some performance evaluations in #35454, so I will also use that method to compare the changes I make and update the PR once I have the results

@bacherfl
Copy link
Contributor

alright, I think the PR is ready for a first round of reviews - I have added the performance test results to the PR description. CC @andrzej-stencel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/stanza
Projects
None yet
2 participants