Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for isolation groups to matching simulator #6243

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

natemort
Copy link
Member

This fundamentally restructures how pollers and task generators are configured, but allows for granular control across isolation groups.

What changed?

  • Adds support for isolation groups and restructures poller/task generation to allow for multiple independent configurations to be executing concurrently.

Why?

  • Enables improvements in isolation group throughput

How did you test it?

  • Running the matching simulator

Potential risks

Release notes

Documentation Changes

Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.95%. Comparing base (69d0ce7) to head (7aab5d6).
Report is 4 commits behind head on master.

Additional details and impacted files

see 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69d0ce7...7aab5d6. Read the comment docs.

Tasks []SimulationTaskConfiguration
}

SimulationPollerConfiguration struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonblocking:

since onebox is used for both docker/local development and simulation, what do you think about pulling this simulation-specific stuff into a simulation.go file so that it's a bit easier to see at a glance if there's any significant changes to the more general onebox config or it's just simulation config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, when I do further refactoring I'll move it out.

decisionTask := newDecisionTask(domainID, tasklist, scheduleID)
isolationGroup := ""
if len(taskConfig.getIsolationGroups()) > 0 {
isolationGroup = taskConfig.getIsolationGroups()[newTasksGenerated%len(taskConfig.getIsolationGroups())]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good happy path scenario, but you'll want to control this as a config parameter for more tricky 'happy-path' scenarios.

expected scenarios include things like:

  • there being no pollers in a zone, therefore we should see the isolation group logic re-route traffic
  • a poller was in a zone with a sticky tasklist and then went away
  • The isolation-group is being flooded due to skew

as well as pathological scenarios like:

  • traffic is dispatched to an isolation-group with too many partitions (eg partition > pollers per IG)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the scenarios use it yet, but we can configure multiple task generators to create scenarios with skew. For example, we could do something like this to send slight traffic between zones a and b while sending more significant traffic to c, which doesn't have any pollers.

  simulationconfig:
    tasklistwritepartitions: 2
    tasklistreadpartitions: 2
    forwardermaxoutstandingpolls: 1
    forwardermaxoutstandingtasks: 1
    forwardermaxratepersecond: 10
    forwardermaxchildrenpernode: 20
    localpollwaittime: 0ms
    localtaskwaittime: 0ms
    tasks:
      - numtaskgenerators: 2
        taskspersecond: 10
        maxtasktogenerate:  3000
        isolationgroups: ['a', 'b']
      - numtaskgenerators: 2
        taskspersecond: 80
        maxtasktogenerate:  3000
        isolationgroups: ['c']
    pollers:
      - isolationgroup: 'a'
        taskprocesstime: 1ms
        numpollers: 4
        polltimeout: 60s
      - isolationgroup: 'b'
        taskprocesstime: 1ms
        numpollers: 4
        polltimeout: 60s

Sticky tasklists aren't something we've looked at in detail yet but otherwise I think all those scenarios are possible to create with the new structure introduced in this PR.

This fundamentally restructures how pollers and task generators are configured, but allows for granular control across isolation groups.
@natemort natemort merged commit b3d1b66 into cadence-workflow:master Aug 22, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants