feat(processors.batch): Add batch processor #15869

LarsStegman · 2024-09-11T10:21:09Z

Summary

This new processor can distribute metrics across batches by adding a tag indicating what batch number it is in. This makes it possible to distribute the load of a high number of metrics across multiple instances of the same output plugin.

Checklist

No AI generated code was used in this PR

Related issues

resolves #15621
resolves #11707

LarsStegman · 2024-09-11T10:22:38Z

@srebhan the implementation is a little different than what you suggested. I did not see the added benefit of also specifying the batch size, since any overflow would probably just overflow into the next batch, which then also overflows. Please let me know if it should still be added.

srebhan

Thanks @LarsStegman for the contribution! Just two small comments from my side. Furthermore, should we also add a force_rebatch option that will only overwrite the batch tag if it does not already exists? I'm asking because in the current default, Telegraf will run each processor twice, once before and once after aggregators if any.

plugins/processors/batch/batch.go

plugins/processors/batch/README.md

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>

LarsStegman · 2024-09-11T12:32:20Z

should we also add a force_rebatch option that will only overwrite the batch tag if it does not already exists? I'm asking because in the current default, Telegraf will run each processor twice, once before and once after aggregators if any.

Hmmm interesting. The results after the second pass will indeed be different, because the processor will already have run a pass and the count will have increased. I think it is better to add that feature indeed. It will be more predictable for users.

LarsStegman · 2024-09-11T12:54:05Z

I made the rebatching enabled by default, because it is less computational load. By default it will now not check the existing tags.

srebhan

Two more comments. Regarding the flag, I'm fine either way but slightly tend to your approach...

plugins/processors/batch/sample.conf

plugins/processors/batch/batch_test.go

srebhan

@LarsStegman awesome! Maybe just avoid abbreviations in config options? How about naming this just batches?

plugins/processors/batch/sample.conf

LarsStegman · 2024-09-12T09:56:14Z

@srebhan looks like the test runner timed out or something

srebhan

Thanks @LarsStegman!

telegraf-tiger · 2024-09-12T11:01:42Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

🥳 This pull request decreases the Telegraf binary size by -8.01 % for linux amd64 (new size: 239.8 MB, nightly size 260.6 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_arm64.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz	windows_i386.zip
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz

knollet · 2024-10-01T11:25:09Z

But, to parallelize processing, this doesn't help to create batch instances of, say, some regex processor, right? It only does the round-robing tagging.

Also: Does this help as efficiently as possible? Because with this, all the metrics still go down the same pipeline (and by this: go-channel) and don't split up into multiple batch processing pipelines. Every metric still has to be sorted out by a metric-/tagpass deciding "no this metric isn't batch-tagged for me" on every processor defined.

I have to say: I don't see this resolving any one of the claimed Feature-Requests.
This just casts a really bad duct-tape-solution made from starlark into an equally bad duct-tape-solution made from golang.

There shouldn't be batching-tags but

the slow processors should be made parallely spawnable.
The distribution into these parallely spawned processor instances should be realized by multiple go-channel-consumers.

LarsStegman · 2024-10-01T11:37:04Z

@knollet I agree, this processor does not fix #11707. @srebhan can that issue be reopened? I see you added it to this PR.

knollet · 2024-10-01T11:39:41Z

I mean, If I batch into, lets say, 2, I have to duplicate my complex code...

[[processors.batch]]
  batch_tag="batch"
  batches = 2
  
[[processors.regex]]
  tagpass = { "batch" = [ "0" ] }  # "0", not 0... srsly?
  ...some slow regex stuff...
  
# I have to replicate this here... exactly, no mistakes allowed, batches - 1 times, 
# else there's not even a resemblence of parallel processing. hard to catch bugs, 
# even if you generate this with a templating engine like jinja.
[[processors.regex]]
  tagpass = { "batch" = [ "1" ] }  # "1", not 1... srsly?
  ...some slow regex stuff...

LarsStegman · 2024-10-01T11:42:56Z

@knollet this processor was not meant for increasing the efficiency of the processor pipeline, but to increase output capacity of the end of the pipeline. See #15621 (comment) and down.

knollet · 2024-10-01T12:20:45Z

Yeah, ok. I don't wanna trash talk your contribution.
Still: Don't you have the problem of having to duplicate that which your batch processor feeds into? May it be a processor or an output plugin doesn't really matter, does it?

srebhan · 2024-10-01T20:14:49Z

@knollet I reopened #11707.

telegraf-tiger bot added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/processor labels Sep 11, 2024

LarsStegman mentioned this pull request Sep 11, 2024

Asynchronous InfluxDB V2 output #15621

Closed

feat(processors.batch): create batch processor

a8490ff

LarsStegman force-pushed the feat/processor-batch branch from 7bb64c2 to a8490ff Compare September 11, 2024 10:26

srebhan reviewed Sep 11, 2024

View reviewed changes

plugins/processors/batch/batch.go Outdated Show resolved Hide resolved

plugins/processors/batch/README.md Outdated Show resolved Hide resolved

srebhan self-assigned this Sep 11, 2024

srebhan added the new plugin label Sep 11, 2024

LarsStegman and others added 2 commits September 11, 2024 13:29

feat(processors.batch): use atomic uint64

4209ee5

feat(processors.batch): update README.md

f8c0574

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>

feat(processors.batch): add option to not rebatch

3d63787

srebhan reviewed Sep 11, 2024

View reviewed changes

plugins/processors/batch/sample.conf Outdated Show resolved Hide resolved

plugins/processors/batch/batch_test.go Outdated Show resolved Hide resolved

srebhan changed the title ~~feat(processors.batch): create batch processor~~ feat(processors.batch): Add batch processor Sep 11, 2024

This comment was marked as outdated.

Sign in to view

feat(processors.batch): fix bug and simplify tests

7e841d8

LarsStegman force-pushed the feat/processor-batch branch from d94682e to 7e841d8 Compare September 11, 2024 13:39

feat(processors.batch): make lint happy

2d80838

srebhan reviewed Sep 11, 2024

View reviewed changes

plugins/processors/batch/sample.conf Outdated Show resolved Hide resolved

feat(processors.batch): rename num_batches to batches

b3212e8

srebhan approved these changes Sep 12, 2024

View reviewed changes

srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Sep 13, 2024

srebhan assigned DStrand1 and unassigned srebhan Sep 13, 2024

DStrand1 approved these changes Sep 30, 2024

View reviewed changes

DStrand1 merged commit 338282b into influxdata:master Sep 30, 2024
27 checks passed

github-actions bot added this to the v1.33.0 milestone Sep 30, 2024

LarsStegman deleted the feat/processor-batch branch October 1, 2024 11:34

asaharn pushed a commit to asaharn/telegraf that referenced this pull request Oct 16, 2024

feat(processors.batch): Add batch processor (influxdata#15869)

5dab147

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(processors.batch): Add batch processor #15869

feat(processors.batch): Add batch processor #15869

LarsStegman commented Sep 11, 2024 •

edited by srebhan

Loading

LarsStegman commented Sep 11, 2024

srebhan left a comment

LarsStegman commented Sep 11, 2024

LarsStegman commented Sep 11, 2024

srebhan left a comment

This comment was marked as outdated.

srebhan left a comment

LarsStegman commented Sep 12, 2024

srebhan left a comment

telegraf-tiger bot commented Sep 12, 2024

Artifact URLs

knollet commented Oct 1, 2024 •

edited

Loading

LarsStegman commented Oct 1, 2024

knollet commented Oct 1, 2024 •

edited

Loading

LarsStegman commented Oct 1, 2024 •

edited

Loading

knollet commented Oct 1, 2024 •

edited

Loading

srebhan commented Oct 1, 2024

feat(processors.batch): Add batch processor #15869

feat(processors.batch): Add batch processor #15869

Conversation

LarsStegman commented Sep 11, 2024 • edited by srebhan Loading

Summary

Checklist

Related issues

LarsStegman commented Sep 11, 2024

srebhan left a comment

Choose a reason for hiding this comment

LarsStegman commented Sep 11, 2024

LarsStegman commented Sep 11, 2024

srebhan left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

srebhan left a comment

Choose a reason for hiding this comment

LarsStegman commented Sep 12, 2024

srebhan left a comment

Choose a reason for hiding this comment

telegraf-tiger bot commented Sep 12, 2024

Artifact URLs

knollet commented Oct 1, 2024 • edited Loading

LarsStegman commented Oct 1, 2024

knollet commented Oct 1, 2024 • edited Loading

LarsStegman commented Oct 1, 2024 • edited Loading

knollet commented Oct 1, 2024 • edited Loading

srebhan commented Oct 1, 2024

LarsStegman commented Sep 11, 2024 •

edited by srebhan

Loading

knollet commented Oct 1, 2024 •

edited

Loading

knollet commented Oct 1, 2024 •

edited

Loading

LarsStegman commented Oct 1, 2024 •

edited

Loading

knollet commented Oct 1, 2024 •

edited

Loading