Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Do not merge - collecting performance patches for scale testing #9436

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Commits on Nov 13, 2020

  1. Configuration menu
    Copy the full SHA
    4192bd2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7342826 View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2020

  1. deploymentwatcher: limit before snapshotting state

    Prior to this change getAllocs work was split before and after the
    limiter. This moves all work after the limiter.
    
    **WIP**
    
    I'm unclear how to test this change or assert its benefits, so I'm
    leaving it as a draft for now unless others have ideas.
    schmichael committed Nov 24, 2020
    Configuration menu
    Copy the full SHA
    40cb90c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    59ca810 View commit details
    Browse the repository at this point in the history
  3. client: always wait 200ms before sending updates

    Always wait 200ms before calling the Node.UpdateAlloc RPC to send
    allocation updates to servers.
    
    Prior to this change we only reset the update ticker when an error was
    encountered. This meant the 200ms ticker was running while the RPC was
    being performed. If the RPC was slow due to network latency or server
    load and took >=200ms, the ticker would tick during the RPC.
    
    Then on the next loop only the select would randomly choose between the
    two viable cases: receive an update or fire the RPC again.
    
    If the RPC case won it would immediately loop again due to there being
    no updates to send.
    
    When the update chan receive is selected a single update is added to the
    slice. The odds are then 50/50 that the subsequent loop will send the
    single update instead of receiving any more updates.
    
    This could cause a couple of problems:
    
    1. Since only a small number of updates are sent, the chan buffer may
       fill, applying backpressure, and slowing down other client
       operations.
    2. The small number of updates sent may already be stale and not
       represent the current state of the allocation locally.
    
    A risk here is that it's hard to reason about how this will interact
    with the 50ms batches on servers when the servers under load.
    
    A further improvement would be to completely remove the alloc update
    chan and instead use a mutex to build a map of alloc updates. I wanted
    to test the lowest risk possible change on loaded servers first before
    making more drastic changes.
    schmichael committed Nov 24, 2020
    Configuration menu
    Copy the full SHA
    d7f9285 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8acaa32 View commit details
    Browse the repository at this point in the history