Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

platforms: rose host-select / using platform groups to manage a simple cluster of nodes #3800

Open
oliver-sanders opened this issue Sep 2, 2020 · 0 comments
Labels
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Sep 2, 2020

  • Cylc has load balancing logic (based on psutils) which we use during suite server selection.
  • Cylc will soon have the concept of platform groups which provide a neat configurable way to assemble a bunch of platforms behind an abstraction.

Put these two things together and it should be pretty simple to take a bunch of linux boxes and get Cylc to assemble them into a crude cluster with load-balancing.

This isn't the most important functionality in the world, but, since the required functionality is all implemented anyway it's mostly a configuration problem. Giving this issue the "small" flag, as it will take much less time to implement than it will to discuss.

Putting up an issue after conversations with @dpmatthews.

Example

Say we have a bunch of hosts named my_host_1, my_host_2 ... my_host_N and we want a system to pick a host based on load average, excluding any hosts with low RAM.

With rose host-select

This functionality is currently present in the soon-to-be-depreated rose host-select command:

[rose-host-select]
group{my_host_group}=my_host_1, my_host_2, ... my_host_N
# memory must be greater than 1GB to be considered for selection
thresholds{my_host_group}=thresholds{linux}=mem:1000
# pick the server based on server load average (persumably 15min average?)
method{my_host_group}=load

Use the rose host-select command for the host configuration to use this in a Cylc7 workflow.

[runtime]
     [[my_task]]
         host = $(rose host-select my_host_group)

With Cylc [platforms]

First we need to define one platform per-host like so:

[platforms]
    [[my_host_*]]

Then define the platform group like so:

[platforms]
    [[my_platform_group]]
        platforms = my_host_1, my_host2, .. my_host_N  # perhaps we should support globs here?

So far so good, we now have a system which will pick one of our hosts at random.

Next we configure load balancing using the same "ranking" system as we use for suite server selection:

[platforms]
    [[my_platform_group]]
        platforms = my_host_1, my_host2, .. my_host_N  # perhaps we should support globs here?
        ranking = """
            # memory must be greater than 1GB to be considered for selection
            virtual_memory().available > 1000
            # pick the server based on 15min server load average
            getloadavg()[2]
        """

Just one problem, this ranking has to be "evaluated" on a host, but platform groups are configured in terms of platforms not hosts. In this case it's kinda simple since we only have one host-per-platform, however, this is not guaranteed.

Problem/Solution

How to handle this misalignment between hosts and platforms, options:

  1. Don't handle it at all.
    • Expect admins to configure hosts sensibly.
    • From each platform in the group pick one host for selection using the configured method (random by default).
    • If a platform in a group has more than one host it will still work, but the results might not make sense.
  2. Require that platforms in a group have exactly one host.
    • Extra validation step each time we load in the global config.
    • If you "break" the global config all running suites may shut down due to being unable to reload the global config.
  3. Let's just not do this in the first place.
    • Users/sites have until we formally retire rose host-select to change their working practices or implement their own infrastructure.
  4. Other?

[Personal preference for (1)]

@oliver-sanders oliver-sanders added small question Flag this as a question for the next Cylc project meeting. labels Sep 2, 2020
@oliver-sanders oliver-sanders added this to the cylc-8.0.0 milestone Sep 2, 2020
@hjoliver hjoliver modified the milestones: cylc-8.0.0, cylc-8.x Aug 4, 2021
@oliver-sanders oliver-sanders removed the question Flag this as a question for the next Cylc project meeting. label Apr 22, 2022
@oliver-sanders oliver-sanders modified the milestones: cylc-8.x, some-day Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants