-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(rfc): Independent compute release flow #8881
Conversation
4986 tests run: 4822 passed, 0 failed, 164 skipped (full report)Flaky tests (9)Postgres 17
Postgres 16
Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
1939b99 at 2024-09-24T00:02:32.818Z :recycle: |
04575bd
to
2d0491b
Compare
2d0491b
to
6ff8ff3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we adjust the compute pool logic for the proposed release changes? Probably not, but asking just in case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well, I am a bit hesitant to the alternative Helm approach though.
44fb3be
to
0ad928d
Compare
For now, we shouldn't. With this v1 release flow it's expected that we will start all new computes in the pool with a new version, while we still re-utilize old pre-created ones, if there is a shortage of new computes. Later, yes, we would need to teach pools to maintain pools with both versions if we want kinda canary deployments within the same region / control plane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall plus one, especially the compute_releases
table will be a huge step forward.
I'm concerned about the ignorance in this PR about merge strategies, see my comments on that. I think it needs a bit more forethought.
Maybe I missed it, but, I think there should be a "Unique Selling Point" section outlining why we want the compute_releases
table. I know from our 1:1 discussions, but, it's not written down here. IIRC that was
- ability to prewarm pools with new image to avoid misses
- especially important once we do slow rollout strategies where pools will need to maintain pre-warmed computes for both old and new image for the duration of the rollout
- extension stuff (I forgot the details, we talked about it 1:1 like 3-4 months ago)
I didn't want this to be part of v1 because it requires a lot of changes on the control plane side, but I briefly mentioned that in the Further work section https://github.com/neondatabase/neon/pull/8881/files#diff-39df61b534dc5f736661bd5f139140b573a0f35c32524866e0339e425b9f22e4R299
Yeah, I mentioned that too, but waiting for the Anastasia's input here, as I don't know this part well enough yet #8881 (comment). I only know that we need some metadata in cplane and compute to make remote extensions working, but don't know how exactly it's produced |
0ad928d
to
1939b99
Compare
Related to https://github.com/neondatabase/cloud/issues/11698