Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relay Node draining logic #5

Open
6 tasks
maxwnewcomer opened this issue Sep 15, 2024 · 0 comments · May be fixed by #8
Open
6 tasks

Relay Node draining logic #5

maxwnewcomer opened this issue Sep 15, 2024 · 0 comments · May be fixed by #8
Assignees
Labels
enhancement New feature or request

Comments

@maxwnewcomer
Copy link
Owner

maxwnewcomer commented Sep 15, 2024

Draining logic

I believe that the following draining logic for a worker will work.

  1. Room status in UP
  2. SIGINT to Relay node
  3. Redis update DRAINING status with some TTL for each room
  4. Spawn draining worker to keep status up to date
  5. Deny New connection
  6. Drop current connections
    • Update relay takeover logic to not takeover on valid TTL DRAINING status
  7. Start potential room persistence for all rooms
  8. Once room persistence done set room to DOWN
    • Takeover available here
  9. Start room atomic take over on new connections status SYNCING
  10. Sync with potential persistence
  11. Room ready on new node: status UP

Notes

  • The process should be independent for each room (we don't want non-persisted rooms to be held up on drains)
  • We accept that there will be a xms pause in functionality for the sake of consistency
    • with yjs having a localdb provider, this will probably make this unobservable (?? not 100% on this)

Node State Flow

stateDiagram-v2
    [*] --> DOWN

    DOWN --> SYNCING: Start Sync

    state SYNCING {
        [*] --> SYNCING_LOAD

        SYNCING_LOAD --> SYNCING_SUCCESS: Load Success
        SYNCING_LOAD --> SYNCING_RETRY_LOAD: Load Fail

        SYNCING_RETRY_LOAD --> SYNCING_LOAD: Retry Load
        SYNCING_RETRY_LOAD --> SYNCING_FAIL: Retry Limit Exceeded

        SYNCING_SUCCESS --> [*]
    }

    SYNCING --> UP: Sync Complete

    UP --> DRAINING: Start Draining

    state DRAINING {
        [*] --> DRAINING_STORE

        DRAINING_STORE --> DRAINING_SUCCESS: Store Success
        DRAINING_STORE --> DRAINING_RETRY_STORE: Store Fail

        DRAINING_RETRY_STORE --> DRAINING_STORE: Retry Store
        DRAINING_RETRY_STORE --> DRAINING_FAIL: Retry Limit Exceeded


        DRAINING_SUCCESS --> [*]
    }

    DRAINING --> DOWN: Drain Complete
Loading

Changes Needed

  • Room status UP, DOWN, DRAINING, SYNCING
  • Relay takeover logic modification
  • Persistence trait with noop default impl
  • SIGINT trigger of drain
  • Actual drain logic
  • Update to TUI to include node status in table
@maxwnewcomer maxwnewcomer added the enhancement New feature or request label Sep 15, 2024
@maxwnewcomer maxwnewcomer self-assigned this Sep 15, 2024
@maxwnewcomer maxwnewcomer linked a pull request Sep 17, 2024 that will close this issue
@maxwnewcomer maxwnewcomer linked a pull request Sep 17, 2024 that will close this issue
@maxwnewcomer maxwnewcomer moved this from Todo to In Progress in Road to contactor v0.1.0 Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

1 participant