[MRG] add "sticky builds" functionality #949

betatim · 2019-09-12T05:51:25Z

The ranking is computed using the Rendez-vous hashing algorithm.

In the abstract this can be used for all sorts of stateless ranking, however this PR is part of #946 (comment) and provides a method for ranking possible nodes to schedule build pods on.

The goal is to eventually have a setup where rendezvous_rank() is called just before submitting the build pod spec to compute a preferred node for the pod:

ranking = rendezvous_rank(['node1', 'node2', 'node3'], 'https://github.com/org/repo')
# now attach ranking[0][1] as the preferred node affinity to the build pod

Let me know what you think of building this in small PRs. I'd propose to review and merge this without having to build out the rest.

The ranking is computed using the Rendez-vous hashing algorithm.

binderhub/tests/test_utils.py

minrk · 2019-09-12T09:03:46Z

I'm happy to do this in small PRs if that feels right. It does make more reviewable bite-size chunks, but at the same time it makes it harder to figure out "why is it like this?" when the answer is in an as-yet unwritten PR.

So my rough idea for breaking into smaller PRs:

do the PRs make sense on their own (a dedicated PR for an unused utility might not, for instance)
is it clear what early PRs should look like before the later ones are written (i.e. do we know that the signature for this utility should be before we write the code that uses it? That may be easy for something like picking an item from a list, but harder for something more complex)

So if I were breaking this into bits, I would probably do it the other way around - start with the PR where we add the ability to pick a node, with a no-op implementation to start, then the PR implementing a strategy using rendezvous hashing, etc.

betatim · 2019-09-12T11:32:04Z

That makes sense. I'll try again with a second PR from the other end.

consideRatio

This PR ended up really fun to review, I ended up thinking a lot about many things and learning more about BinderHub. I found some ideas on improvements I hope you find concrete enough to apply if you agree they make sense!

binderhub/app.py

binderhub/utils.py

binderhub/tests/test_utils.py

binderhub/build.py

consideRatio · 2019-09-22T13:44:11Z

Notes

A good learning resource about the algorithms.
We could support sticky build pods even without dind enabled, but our current way of finding the valid nodes to schedule on (which isn't obvious because of custom required affinities etc) is made by identifying the dind pods within the namespace.

consideRatio

About the algorithm tests

Currently we have two tests. I think it is fine to have multiple tests and make them harder in descending sequence so we can fail on simple to understand test cases initially but also ensure statistically that we have a robust algorithm by making a strong test in the end.

binderhub/tests/test_utils.py

betatim · 2019-09-24T20:22:00Z

binderhub/tests/test_utils.py

+
+def test_rendezvous_redistribution():
+    # check that approximately a third of keys move to the new bucket
+    # when one is added


An idea for how to test that keys move when a new bucket becomes available and that the pattern of movement is right. I think this is how it should be but not sure. WDYT?

Excellent!

Since we actually hash "nodes-key", this test doesn't need to be run multiple times, but if we would hash node separately from key and as consistent hashing was described to do as compared to our rendezvous hashing, then we could by fluke have two node hashes that are spaced luckily allow for the new node to to catch 1/3 of the stuff.

But hmmm, could you position nodes like pointers on a clock to initially have a fair share and then also after have a fair share?

Thats a clean test as well to have I think, to check that we have a 1/2 distribution initially and then get a 1/3 distribution after, combined with the previous test about perfectly stable we capture all kinds of logic. You are already doing this to some degree but it is captures mostly by the abs(from_b1 - from_b2) < 10 statement.

consideRatio · 2019-09-24T20:25:49Z

binderhub/tests/test_utils.py

+    assert 0.31 < n_moved / n_keys < 0.35
+    # keys should move from the two original buckets with approximately
+    # equal probability
+    assert abs(from_b1 - from_b2) < 10


Nice that you add a redistribution test, it makes me happy to grasp that this really works as we expect!

Assuming 1000 keys moved from either b1 or b2 and this is considered a coin flip, then the average difference will be sqrt(1000) which is about 31.6.

I rounded it to 30. My feeling is that we just need to get the right order of magnitude here that fails the test when something really weird is happening and minimises false alarms.

consideRatio

This currently LGTM!! Awesome work on this Tim!

I left some notes on the redistribution test but as they passed the narrow range by luck we can stick with it if you want!

consideRatio · 2019-09-25T08:53:31Z

binderhub/tests/test_utils.py

+    assert abs(from_bucket["b1"] - from_bucket["b2"]) < 30
+    # the initial distribution of keys should be roughly the same
+    # We pick 30 because it is "about right"
+    assert abs(start_in["b1"] - start_in["b2"]) < 30


The scale of the about right is different between the from_bucket and start_in difference, on average, the random walk distance if you flip a +1 or -1 over and over, is sqrt(N).

For the from_bucket case it is sqrt(~1000) == ~32 and for the start_in case it is sqrt(3000) == ~55.

consideRatio · 2019-09-25T09:01:45Z

Keeps LGTM if you want to self-merge, I don't want to do it as you may want to finish up some detail though.

betatim · 2019-09-25T12:23:27Z

Thanks for the ideas and patience!! I will try to find time today to deploy this on staging and watch what happens.

jupyterhub/binderhub#949

Add function to rank buckets for a key

4e2ac04

The ranking is computed using the Rendez-vous hashing algorithm.

betatim requested a review from minrk September 12, 2019 05:51

betatim commented Sep 12, 2019

View reviewed changes

binderhub/tests/test_utils.py Show resolved Hide resolved

minrk requested a review from consideRatio September 12, 2019 08:56

Assign node affinity based on rendez-vous hash

d95d8d4

consideRatio reviewed Sep 21, 2019

View reviewed changes

betatim added 2 commits September 22, 2019 14:52

Explicitly create affinity spec

33be855

Finish sticky-build test

b58a84d

betatim added 2 commits September 23, 2019 08:13

Rename hash function and add comment about its use

7a27628

Add test to check relative ranking

85ee797

betatim mentioned this pull request Sep 23, 2019

Extend "sticky build" support to hubs without DIND #960

Open

consideRatio requested changes Sep 23, 2019

View reviewed changes

binderhub/tests/test_utils.py Show resolved Hide resolved

binderhub/tests/test_utils.py Outdated Show resolved Hide resolved

betatim added 2 commits September 24, 2019 15:17

Remove repeated tests

b28f5e3

Add stability and redistribution tests

83310fc

betatim commented Sep 24, 2019

View reviewed changes

binderhub/tests/test_utils.py Show resolved Hide resolved

betatim commented Sep 24, 2019

View reviewed changes

consideRatio reviewed Sep 24, 2019

View reviewed changes

consideRatio approved these changes Sep 24, 2019

View reviewed changes

betatim changed the title ~~[MRG] Add function to rank buckets for a key~~ [MRG] add "sticky builds" functionality Sep 25, 2019

betatim added 2 commits September 25, 2019 08:22

Check that we start with roughly equal bucket assignments

e1567d1

Extend rendezvous hash testing

34c9eee

consideRatio reviewed Sep 25, 2019

View reviewed changes

betatim merged commit fcc4a4b into jupyterhub:master Sep 25, 2019

betatim deleted the rendez-vous-hash branch September 25, 2019 12:22

yuvipanda pushed a commit to jupyterhub/helm-chart that referenced this pull request Sep 25, 2019

[binderhub] Automatic update for commit fcc4a4b

0ce59c9

jupyterhub/binderhub#949

henchbot mentioned this pull request Sep 25, 2019

binderhub: 575fb2a...fcc4a4b jupyterhub/mybinder.org-deploy#1176

Merged

choldgraf added the enhancement label Oct 8, 2019

betatim mentioned this pull request Nov 15, 2019

Optimizing scheduling of build pods #946

Closed

betatim mentioned this pull request Jan 25, 2020

Support Monorepo in Binder #1030

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] add "sticky builds" functionality #949

[MRG] add "sticky builds" functionality #949

betatim commented Sep 12, 2019

minrk commented Sep 12, 2019

betatim commented Sep 12, 2019

consideRatio left a comment •

edited

Loading

consideRatio commented Sep 22, 2019

consideRatio left a comment

betatim Sep 24, 2019

consideRatio Sep 24, 2019

consideRatio Sep 24, 2019

betatim Sep 25, 2019

consideRatio left a comment

consideRatio Sep 25, 2019

consideRatio commented Sep 25, 2019

betatim commented Sep 25, 2019 •

edited

Loading

[MRG] add "sticky builds" functionality #949

[MRG] add "sticky builds" functionality #949

Conversation

betatim commented Sep 12, 2019

minrk commented Sep 12, 2019

betatim commented Sep 12, 2019

consideRatio left a comment • edited Loading

Choose a reason for hiding this comment

consideRatio commented Sep 22, 2019

Notes

consideRatio left a comment

Choose a reason for hiding this comment

About the algorithm tests

betatim Sep 24, 2019

Choose a reason for hiding this comment

consideRatio Sep 24, 2019

Choose a reason for hiding this comment

consideRatio Sep 24, 2019

Choose a reason for hiding this comment

betatim Sep 25, 2019

Choose a reason for hiding this comment

consideRatio left a comment

Choose a reason for hiding this comment

consideRatio Sep 25, 2019

Choose a reason for hiding this comment

consideRatio commented Sep 25, 2019

betatim commented Sep 25, 2019 • edited Loading

consideRatio left a comment •

edited

Loading

betatim commented Sep 25, 2019 •

edited

Loading