Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable minion's IPC channel to aggregate results from spawned jobber processes #61468

Merged

Conversation

devkits
Copy link
Contributor

@devkits devkits commented Jan 17, 2022

What does this PR do?

Enable minion's IPC channel to aggregate results from spawned jobber processes. Use a long-running request channel in the minion parent process to communicate job results back to the master via broker-based or broker-less transport.

This is a necessary optimization at scale for transports that prefer a sustained long-running connection because connection create/dispose operations are expensive. The working assumption is that this change benefits all supported transports.

Testing Done:

  • this tests provide coverage for this use case:
    .../salt/tests/pytests/integration/minion.*

Previous Behavior

Minion would spawn a process that would run a job and create/dispose a new transport connection just to communicate one job result. This does not scale well for transports that prefer persistent connections because connection operations are expensive.

New Behavior

Minion would spawn a process that would run a job and communicate result via an IPC channel to the parent minion process. Parent minion process would use a long-running connection to communicate result back to master. This approach scales better because it reduces connection churn when thousands of minions are responding to the master.

Fixes: #61274

…processes. Use a long-running request channel in the minion parent process to communicate job results back to the master via broker-based or broker-less transport.

This is a necessary optimization for transports that prefer a sustained long-running connection because connection create/dispose operations are expensive. The working assumption is that this change benefits all supported transports.

Testing Done:
 * this tests provide coverage for this use case:
      .../salt/tests/pytests/integration/minion.*
@devkits devkits requested a review from a team as a code owner January 17, 2022 02:52
@devkits devkits requested review from garethgreenaway and removed request for a team January 17, 2022 02:52
@devkits
Copy link
Contributor Author

devkits commented Jan 18, 2022

re-run full all

@jfindlay
Copy link
Contributor

@devkits
Copy link
Contributor Author

devkits commented Jan 21, 2022

re-run full all

devkits and others added 2 commits January 26, 2022 17:28
…r events intended for the master's ReqServer. Tag these specific events with a specific tag "__master_req_channel_payload"

Update the architecture flow accordingly: https://github.com/saltstack/salt/blob/master/doc/topics/development/architecture.rst#minion-job-flow

Testing Done:
  - updated unit tests,
  - tested manually with zeromq
@Ch3LL Ch3LL added the Phosphorus v3005.0 Release code name and version label Jan 27, 2022
@dwoz
Copy link
Contributor

dwoz commented Feb 4, 2022

@devkits This test needs to be fixed. Otherwise, the PR looks good.

@devkits
Copy link
Contributor Author

devkits commented Feb 8, 2022

Thanks. I can reproduce the test failure locally. Looks like the fix may need be in the TCP transport itself. I will try to update the PR with the fix.

dwoz
dwoz previously approved these changes Feb 8, 2022
twangboy
twangboy previously approved these changes Feb 8, 2022
s0undt3ch
s0undt3ch previously approved these changes Feb 9, 2022
@garethgreenaway
Copy link
Contributor

@devkits Can you please add a changelog for this PR? Thanks!

@dwoz dwoz dismissed stale reviews from s0undt3ch, twangboy, and themself via 7215701 February 9, 2022 21:25
@garethgreenaway garethgreenaway merged commit df7617e into saltstack:master Feb 9, 2022
@welcome
Copy link

welcome bot commented Feb 9, 2022

Congratulations on your first PR being merged! 🎉

frebib added a commit to frebib/salt that referenced this pull request Oct 7, 2022
Since [1] minions now return all returns to all masters, instead of just
the master that spawned the job. The upstream change in behaviour
overloads our global masters making them unusable, so this aims to
revert to the previous behaviour whilst maintaining the single-channel
return improvements also introduced in [1].

[1]: saltstack#61468

Upstream-bug: saltstack#62834
Signed-off-by: Joe Groocock <jgroocock@cloudflare.com>
vzhestkov pushed a commit to openSUSE/salt that referenced this pull request Sep 1, 2023
…sc#1213257)

* Revert usage of long running REQ channel (bsc#1213960, bsc#1213630, bsc#1213257)

This reverts commits:
  saltstack/salt@a99ffb5
  saltstack/salt@80ae518
  saltstack/salt@3c7e1ec
  saltstack/salt@171926c

From this PR: saltstack/salt#61468

See: saltstack/salt#62959 (comment)

* Revert "Fix linter"

This reverts commit d09d2d3.

* Revert "add a regression test"

This reverts commit b2c32be.

* Fix failing tests after reverting commits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Phosphorus v3005.0 Release code name and version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST] Salt Minion should cache ReqChannels.
7 participants