Replies: 1 comment
-
w.r.t. @beyhan 's comment, "how this relate to the existing cloud-check command? Should we deprecate it in favour of this one at some point?" I don't think we should deprecate cloud-check because for most of the cases cloud-check is the easier path, rather than running the two-step |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi BOSH friends!
I wanted to start a discussion about a feature we've been working on that promises to speed recovery in IaaS disaster scenarios. We've been calling this feature
bosh recover
.bosh cloud-check
is great, but it can be frustratingly slow for situations where there are a large amount of VMs down or unhealthy due to underlying IaaS failures. In most cases we've seen, operators have opted to use an awkward combination ofbosh ignore
,bosh recreate
,jq
andbosh deploy
to try and get their VMs back online faster. In those scenarios, the interactions between all of the commands and also the state in the director is very hard to reason about. This is especially the case if you also have a failed deployment.Much of the underlying functionality of
bosh cloud-check
is sound and will be re-used for this feature. In particular, theGET
andPUT
/deployments/:deployment/problems
endpoints are featured.The new feature consists of:
GET /deployments/:deployment/problems
endpoint to return theinstance_group
of each problemPUT /deployments/:deployment/problems
endpoint to allow overridingmax_in_flight
per instance groupbosh create-recovery-plan
command, which scans a deployment for problems, prompts the user to select resolutions per problem type per instance group, optionally overridemax_in_flight
for the instance group, and then writes out a YAML file with those resolutions.bosh recover
command, which takes the plan generated bybosh create-recovery-plan
and applies it to the given deployment.Note that the recovery plan has no deployment-specific information in it. This allows you to use the same plan for multiple deployments. An example of this we've seen is for on-demand service-broker deployments, where the deployment name is generated, but the
instance_group
s inside of the deployment are all the same.Beta Was this translation helpful? Give feedback.
All reactions