-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup for blue/green deployment mode #180
Conversation
@anandswaminathan @mwylde When you have a chance, I'd like to get high-level feedback on the approach. I thought I'd start here to make reviewing the upcoming changes easier. Let me know if you prefer a different way of me making these changes.. |
deploy/crd.yaml
Outdated
- name: Cluster Health | ||
type: string | ||
description: The health of the Flink cluster | ||
JSONPath: .status.clusterStatus.health | ||
JSONPath: .status.appStatus[*].clusterStatus.health |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this return when there are multiple app statuses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I have here doesn't quite work for multi app statuses. Printer columns only seem to support boolean,date,integer,number,string
and also don't support JSONPath operators/expressions :(
I think to surface multi app statuses, I'd have to explicitly specify the array index and the output would look something like:
NAME PHASE APPLICATION VERSION CLUSTER HEALTH JOB HEALTH JOB RESTARTS APPLICATION VERSION CLUSTER HEALTH JOB HEALTH JOB RESTARTS AGE
operator-test-app DualRunning green Green Green blue Green Green 6h4
This would mean a few empty columns when there's only 1 application running.
This PR is now ready for review again. Key changes since the last update:
@mwylde @anandswaminathan PTAL when you have a chance! TIA for the review; I know this is a laborious PR :( |
@mwylde and I chatted offline and came to the following conclusion:
Keeping the above in mind I've now modified the PR. The cumulative effect of the change (since we've iterated a bit, mentioning changelog since last review may be confusing), is as below:
@mwylde I've now modified the status to not initialize the VersionStatuses at all when the deployment mode is not BlueGreen. So, I believe clients that use the status sub-resource can remain unchanged (verified with our command line spcli client). I feel better about the overall change now, thanks for helping think through this :) Example of the status sub-resource: Status:
Cluster Status:
Available Task Slots: 2
Cluster Overview URL: http://localhost:8001/api/v1/namespaces/operator-test-app-staging/services/operator-test-app:8081/proxy/#/overview
Health: Green
Healthy Task Managers: 1
Number Of Task Managers: 1
Number Of Task Slots: 2
Deploy Hash: e3a2e2a0
Job Status:
Completed Checkpoint Count: 94
Entry Class: com.lyft.OperatorTestApp
Health: Green
Jar Name: operator-test-app-1.0.0-SNAPSHOT.jar
Job ID: 3baf7b47dc58b6d4450f20f9eb04b302
Job Overview URL: http://localhost:8001/api/v1/namespaces/operator-test-app-staging/services/operator-test-app:8081/proxy/#/jobs/d4ab43e120f3a30311b588bc81c64308
Last Checkpoint: file:/checkpoints/flink/externalized-checkpoints/d4ab43e120f3a30311b588bc81c64308/chk-94
Last Checkpoint Time: 2020-03-19T22:02:17Z
Parallelism: 2
Running Tasks: 4
Start Time: 2020-03-19T21:54:30Z
State: RUNNING
Total Tasks: 4
Last Updated At: 2020-03-19T22:02:50Z
Phase: SubmittingJob
Savepoint Path: file:/checkpoints/flink/savepoints/savepoint-d4ab43-0878eae508df
Savepoint Trigger Id: d87898ce219937f3c5e167272b73fcb6 |
// This should only ever be encountered once (per application) | ||
// when a new CRD version is deployed and an older version of the application exists | ||
// As a workaround, we try to update the entire resource instead of only the status | ||
// TODO Remove this block when we upgrade to k8s 1.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this around to help future CRD upgrades (though no longer required for this PR because there's no CRD upgrade anymore). Let me know if you don't want this here.
@mwylde Re-requesting review; I made the flink_state_machine.go diffs cleaner by removing unwarranted changes. Thanks for all the reviewing :) |
This PR adds the setup required for blue/green deploys but does not actually implement a blue/green deploy yet. I'm opening this mainly to get feedback on how the status sub-resource would be modified for applications that don't opt-in to the BlueGreen deployment mode.
Changelog
DeploymentModeBlueGreen
: A deployment mode to opt-in to blue green deploysFlinkApplicationDualRunning
: A phase for when we have two flink applications running (not yet implemented)FlinkApplicationTeardown
: A phase for when one of the flink applications is being torn down (not yet implemented)[]FlinkApplicationVersionStatus
: An array of flink application statuses. This in turn contains theFlinkClusterStatus
,FlinkJobStatus
andFlinkApplicationVersion
DesiredApplicationCount
to indicate how many applications are expected in any of the Updating phases.flink.go
to do the index math for the status array and also make updates/gets to/from the status object easier.Output
A new Flink application status for an application in the Dual deployment mode looks like below:
The printer columns look like below: