Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix value of system_config.workers at run_configure #4066

Conversation

abetomo
Copy link
Contributor

@abetomo abetomo commented Feb 20, 2023

Which issue(s) this PR fixes:
Fixes #4051

What this PR does / why we need it:

Override system_config only if conf.target_worker_ids.size >= 1.

  • If <worker N> or <worker 0-N> is not set, conf.target_worker_ids is empty.
  • If <system> workers N </system> is not set, system_config.workers defaults to 1.

Docs Changes:

Release Note:

  • Fix value of system_config.workers at run_configure
    • (Fix bug that system_config.workers value can be wrongly 1 at config check at startup.)
  • Change argument type of Fluent::Plugin::Base::configure() to Fluent::Config::Element only.
    • (Note: This doesn't change the specification. It was a bug that a variable whose type is not Fluent::Config::Element could be passed to Fluent::Plugin::Base::configure().)

@daipom daipom self-requested a review February 20, 2023 23:45
@daipom
Copy link
Contributor

daipom commented Feb 21, 2023

Thanks for the fix!

The status of this PR is Draft. Please let me know which part is TODO.

@abetomo
Copy link
Contributor Author

abetomo commented Feb 21, 2023

Thanks for your comments!
I will add some tests.

@daipom
Copy link
Contributor

daipom commented Feb 21, 2023

I will add some tests.

Thanks! We need tests for this issue!

In addition, this problem is difficult. Hard to know what was wrong and how we should fix it.
I want to clarify it.

The specification history related to this is as follows.

What went wrong in which process?

@abetomo
Copy link
Contributor Author

abetomo commented Feb 21, 2023

#2651 is related.

If there is no <worker> directive, conf.for_this_worker? is false.
However, if Fluent::Engine.supervisor_mode is true, it will overwrite system_config.workers regardless of conf.for_this_worker?.

Since conf.target_worker_ids is empty when there is no <worker> directive, this fix checks for it and does not override system_config.workers.

@daipom
Copy link
Contributor

daipom commented Feb 21, 2023

Thanks so much!
It certainly seems that #2651 has something to do with this problem.

There was nothing wrong at the point of #1507 and #2292.
(The method of rewriting the value of the system config in the process seems a bit too complicated, but I guess it was inevitable since SystemConfig.workers was used to determine if the plugin works in the multi-worker or not.
It seems to me that this complicated specification could be the cause of bugs in the future..., anyway, it is not directly related to this issue.)

I will see #2651!

@daipom
Copy link
Contributor

daipom commented Feb 22, 2023

It is hard to understand the purpose of code changes in #2651...

I'm concerned about the following points.

I don't think all the points should be solved now.
I just noted points of concern as I work on this issue.
If you have noticed anything in the process of making this fix, I'd like to know.

The current fix may be enough to solve the issue #4051, but it may be better to leave some code comments for the future.

@abetomo
Copy link
Contributor Author

abetomo commented Feb 22, 2023

Thanks for your comment!
I'll look into it.

@abetomo
Copy link
Contributor Author

abetomo commented Feb 23, 2023

As for #2292 condition, I too think @daipom's code is enough.

@daipom
Copy link
Contributor

daipom commented Feb 23, 2023

Thanks!

When Fluent::Engine.supervisor_mode is true, what is the expected value of workers?
* I think we have to validate the config in one process in this case, but how many workers each plugin will work with will change depending on the setting.
For example, plugin-A works with worker-1 and worker-2, and plugin-B works with only worker-3.
With the current structure, would it be difficult to properly reflect and validate the number of workers?

I have checked this, and I understand this fix allows us to validate the config with the correct workers value!

I used this config.

<system>
  workers 8
</system>

<worker 0>
  <source>
    @id A
    @type sample
    tag test
  </source>
</worker>

<worker 1-2>
  <source>
    @id B
    @type sample
    tag test
  </source>
</worker>

<source>
  @id C
  @type sample
  tag test
</source>

<match test.**>
  @type stdout
</match>

I added a log for debugging to in_sample.

--- a/lib/fluent/plugin/in_sample.rb
+++ b/lib/fluent/plugin/in_sample.rb
@@ -63,6 +63,7 @@ module Fluent::Plugin
 
     def configure(conf)
       super
+      log.info id: @id, workers: system_config.workers
       @sample_index = 0
       config = conf.elements.select{|e| e.name == 'storage' }.first

The result of master branch:

2023-02-24 00:20:14 +0900 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-02-24 00:20:14 +0900 [info]: parsing config file is succeeded path="../work/fluent.conf"
...
2023-02-24 00:20:14 +0900 [info]: [C]  id="C" workers=1
2023-02-24 00:20:14 +0900 [info]: [A]  id="A" workers=1
2023-02-24 00:20:14 +0900 [info]: [B]  id="B" workers=2
...
2023-02-24 00:20:14 +0900 [info]: starting fluentd-1.15.3 pid=1286518 ruby="3.2.0"
...
2023-02-24 00:20:15 +0900 [info]: #7 [C]  id="C" workers=8
...
2023-02-24 00:20:15 +0900 [info]: #1 [B]  id="B" workers=2
...
2023-02-24 00:20:15 +0900 [info]: #0 [A]  id="A" workers=1
...

Before starting Fluentd, workers value of C is wrong.

The expected workers values before stating Fluentd are as follows.

[C]  id="C" workers=8
[A]  id="A" workers=1
[B]  id="B" workers=2

When applying this fix, these values are as expected!

@daipom
Copy link
Contributor

daipom commented Feb 23, 2023

Summarize what we've learned so far.

As for #2292 condition, I too think @daipom's code is enough.

def for_this_worker?
@target_worker_ids.include?(Fluent::Engine.worker_id)
end

if Fluent::Engine.supervisor_mode || (conf.respond_to?(:for_this_worker?) && conf.for_this_worker?)
workers = if conf.target_worker_ids && !conf.target_worker_ids.empty?
conf.target_worker_ids.size
else
1
end

Originally, there was no need to overwrite 1.

def worker_id
if @supervisor_mode
return -1
end

if Fluent::Engine.supervisor_mode || (conf.respond_to?(:for_this_worker?) && conf.for_this_worker?)

We need Fluent::Engine.supervisor_mode condition to validate the config before starting Fluentd.

However, with supervisor_mode == true, the current logic overwrites the workers of all plugins that are not inside the worker directive.
This is the bug we have to fix.

After starting Fluentd, the for_this_worker? condition prevented this bug, so it was hard to notice.

This PR completely fixes this problem.

lib/fluent/plugin/base.rb Outdated Show resolved Hide resolved
@abetomo abetomo force-pushed the fix-value-of-system_config_workers-at-run_configure branch from 355ca5f to 6438674 Compare February 25, 2023 02:22
@abetomo
Copy link
Contributor Author

abetomo commented Feb 25, 2023

conf of Fluent::TextParser.configure(conf) is Hash.
Thus, I added a conf.is_a?(Fluent::Config::Element) check.

!conf.target_worker_ids.empty? check is only necessary for Fluent::Engine.supervisor_mode == true.
So I changed to (Fluent::Engine.supervisor_mode && !conf.target_worker_ids.empty?).

@abetomo abetomo marked this pull request as ready for review February 25, 2023 02:26
@daipom
Copy link
Contributor

daipom commented Feb 26, 2023

Thanks for the fix and adding tests! I will see this soon.

lib/fluent/plugin/base.rb Outdated Show resolved Hide resolved
@abetomo
Copy link
Contributor Author

abetomo commented Feb 28, 2023

Thanks for the review!
I fixed the use of Hash in the test with another PR.
#4074

Override `system_config` only if `conf.target_worker_ids.size >= 1`.

* If `<worker N>` or `<worker 0-N>` is not set, `conf.target_worker_ids` is empty.
* If `<system> workers N </system>` is not set, `system_config.workers` defaults to 1.

Signed-off-by: abetomo <abe@enzou.tokyo>
@abetomo abetomo force-pushed the fix-value-of-system_config_workers-at-run_configure branch from 6438674 to b3a3aef Compare March 10, 2023 00:30
lib/fluent/plugin/base.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/base.rb Outdated Show resolved Hide resolved
@abetomo abetomo force-pushed the fix-value-of-system_config_workers-at-run_configure branch from b3a3aef to 74c0a86 Compare March 10, 2023 00:57
test/plugin/test_base.rb Outdated Show resolved Hide resolved
Signed-off-by: abetomo <abe@enzou.tokyo>
@abetomo abetomo force-pushed the fix-value-of-system_config_workers-at-run_configure branch from 7b3315d to d691d6f Compare March 10, 2023 10:25
lib/fluent/plugin/base.rb Outdated Show resolved Hide resolved
* Change of order
* Use `conf.for_every_workers?` instead of `conf.target_worker_ids.empty?`.

Signed-off-by: abetomo <abe@enzou.tokyo>
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thanks for fixing this difficult issue!

Copy link
Member

@ashie ashie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashie ashie merged commit bbb304b into fluent:master Mar 13, 2023
@ashie
Copy link
Member

ashie commented Mar 13, 2023

Thanks!

@abetomo abetomo deleted the fix-value-of-system_config_workers-at-run_configure branch March 13, 2023 02:04
@ashie ashie added this to the v1.16.0 milestone Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

system_config.workers value is wrong when validating config on launching
3 participants