-
Notifications
You must be signed in to change notification settings - Fork 106
Add cluster health check to drain script before shutting down local node #209
Conversation
Hey iamejboy! Thanks for submitting this pull request! All pull request submitters and commit authors must have a Contributor License Agreement (CLA) on-file with us. Please sign the appropriate CLA (individual or corporate). When sending signed CLA please provide your github username in case of individual CLA or the list of github usernames that can make pull requests on behalf of your organization. If you are confident that you're covered under a Corporate CLA, please make sure you've publicized your membership in the appropriate Github Org, per these instructions. Once you've publicized your membership, one of the owners of this repository can close and reopen this pull request, and dreddbot will take another look. |
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/157656403 The labels on this github issue will be updated when the story is started. |
Hi @CAFxX,
|
Hi @ctaymor, We believe we are covered with Corporate CLA and already publicized our membership. Regards |
Closing and reopening to get cfdreddbot to pick up the CLA. |
Hey iamejboy! Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA. |
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/157781101 The labels on this github issue will be updated when the story is started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fantastic contribution, as this release was desperately missing such drain
script. Thank you for contributing it!
As you'll see, I'm not an expert of wsrep
internals, but rather an expert in authoring BOSH Releases and writing best-practice drain
scripts. For the MariaDB expertise, I trust your own expertise and tests.
Please review my comments, push updates, and don't hesitate to answer where appropriate.
jobs/mysql/templates/drain.sh
Outdated
@@ -1,6 +1,59 @@ | |||
#!/bin/bash -e | |||
#!/bin/bash -eu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see an exemple drain
script here: https://github.com/gstackio/exemplar-release/blob/latest/jobs/paragon/templates/drain
The structure drain
scripts includes quite a deal of common patterns, that are very specific to drain
scripts. Copying the file descriptor 1 to file desc 3 is a elegant way for providing activity logs of the drain script while still conforming the BOSH requirement to only output an integer to the original stdout
.
Other advices and examples can be found here: https://github.com/gstackio/exemplar-release/tree/latest#drain-docs
Here as a start, I would advise you to use the #!/usr/bin/env bash
sheebang for portability, and use set -eu -o pipefail
after so that it cannot be overridden.
jobs/mysql/templates/drain.sh
Outdated
return_code=$? | ||
echo 0 | ||
exit ${return_code} | ||
NODE_IP=<%= spec.ip %> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please read the section about encoding: https://github.com/gstackio/exemplar-release#encoding
Every BOSH property injected with ERB in this script should be properly escaped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! had that previously, but removed after changing codes doing test. smh
jobs/mysql/templates/drain.sh
Outdated
NODE_IP=<%= spec.ip %> | ||
MYSQL_PORT=<%= p("cf_mysql.mysql.port") %> | ||
|
||
LOG_DIR="/var/vcap/sys/log/mysql/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line and replace this with the logging pattern demonstrated in https://github.com/gstackio/exemplar-release/blob/latest/jobs/paragon/templates/drain
Then your echo "..." &>> "$LOG_DIR/drain.log"
statements will become plain echo "..."
because of the default redirection, and exit protocol (status and echoing integer) will be handled automatically at script exit.
jobs/mysql/templates/drain.sh
Outdated
fi | ||
} | ||
|
||
CLUSTER_NODES=(`wsrep_var wsrep_incoming_addresses $NODE_IP $MYSQL_PORT | sed -e 's/,/ /g'`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Bash-isms consistenly, and use the $()
syntax here instead of backticks.
Please quote "$NODE_IP" "$MYSQL_PORT"
to be sure the exact values are passed.
jobs/mysql/templates/drain.sh
Outdated
# rolling restart can actually cause data loss (e.g. if a node that is out | ||
# of sync is used to bootstrap the cluster): in this case we fail immediately. | ||
for NODE in "${CLUSTER_NODES[@]}"; do | ||
NODE_IP=`echo $NODE | cut -d ":" -f 1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, please:
- use
$()
- quote properly
$NODE
- don't over-quote
":"
as it is a litteral
All in all, I suggest you write NODE_IP=$(echo "$NODE" | cut -d: -f1)
And if you agree with this change, please apply it 3 times more below.
jobs/mysql/templates/drain.sh
Outdated
NODE_IP=`echo $NODE | cut -d ":" -f 1` | ||
NODE_PORT=`echo $NODE | cut -d ":" -f 2` | ||
cluster_status=`wsrep_var wsrep_cluster_status $NODE_IP $NODE_PORT` | ||
if [ "$cluster_status" != "Primary" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here quoting "Primary"
is unnecessary as it is a litteral string anyway.
node_ip = spec.ip | ||
hosts = (node_ip.split(".")[0..-2] + ["%"]).join('.') | ||
%> | ||
GRANT USAGE ON mysql.* TO 'drain'@'<%= hosts %>' IDENTIFIED BY '<%= p('cf_mysql.mysql.drain.db_password') %>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best to find some Ruby syntax to SQL-escape those values that are injected here.
@@ -0,0 +1,5 @@ | |||
<% | |||
node_ip = spec.ip | |||
hosts = (node_ip.split(".")[0..-2] + ["%"]).join('.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the way we should grant the drain
user.
This script will be run anytime the MariaDB nodes IPs are changing anyway, and you cannot assume that the nodes IPs will be following a specific addressing scheme. So, you could to loop over the list of Galera nodes here (using BOSH link) and output one GRANT
per IP address.
But there is even simpler: this script only need to allow the current node to connect, i.e. <%= spec.ip %>
, because similar drain
script will exist on other nodes and take care for the proper GRANT, just in time it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you could to loop over the list of Galera nodes here (using BOSH link) and output one GRANT per IP address
- Unfortunately, PostStartSQLFiles, cannot accept multiple command in one file, https://github.com/iamejboy/cf-mysql-release/blob/a51ea90cfe51318ba6f11e5d3497527c9e0dd5a3/jobs/mysql/templates/mariadb_ctl_config.yml.erb#L42 ,
so I resorted to like that as a work around to allow all nodes. This might need a change? https://github.com/cloudfoundry/mariadb_ctrl/blob/master/mariadb_helper/mariadb_helper.go#L257
this script only need to allow the current node to connect, i.e. <%= spec.ip %>, because similar drain script will exist on other nodes and take care for the proper GRANT, just in time it is necessary.
- the script need to allow also other node ip, as when during drain on a local node, it will jump in to other node db to check its status.
jobs/mysql/templates/drain.sh
Outdated
timeout 5 \ | ||
/usr/local/bin/mysql --defaults-file=/var/vcap/jobs/mysql/config/drain.cnf -h "$host" -P "$port" \ | ||
--execute="SHOW STATUS LIKE '$var_name'" -N |\ | ||
awk '{print $2}' | tr -d '\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: this would look better if you put the pipe |
at begining of this line instead of the end of previous line.
E.g.:
--execute="SHOW STATUS LIKE '$var_name'" -N \
| awk '{print $2}' \
| tr -d '\n'
jobs/mysql/templates/drain.sh
Outdated
if [ "$state" != "Synced" ]; then | ||
echo "wsrep_local_state_comment of node '$NODE_IP' is '$state' (expected 'Synced'): retry drain in 5 seconds" &>> "$LOG_DIR/drain.log" | ||
# TODO: rewrite to avoid using dynamic drain (soon to be deprecated) | ||
echo -5 # retry in 5 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you write -5
here, then you must exit 0
right away, as per drain script contract.
Thanks for the feedback @bgandon. @iamejboy, if you want to make any of these changes, feel free. Otherwise we can take care of any cleanup when we merge the PR. @iamejboy We were going over the functionality when it checks the Are we missing a edge case here where you've seen this happen? We suggest It also seems that it is impossible for a node to be Synced, and non-Primary. If this is the case, we probably don't need to check the Primary component. Does this match your understanding of these two fields? Thoughts? |
@bgandon thanks for your feedback. @ndhanushkodi Yes, you may take care the cleanup.
|
Yes, there's a couple of cases AFAICT:
|
We were trying to do some verification and testing around the changes in the drain script. We partitioned off one node so it was Unfortunately, when the node is partitioned off, the other nodes drop it from their The script as it is does catch some very timing dependent cases, but not larger cluster problems. One thought we had was to check all ips of the cluster, using the bosh link to get them all. That causes problems during a scale down though since BOSH deletes the unneeded nodes first, then the drain script would be unable to connect to them. Any thoughts on how it might be made better? |
I feel like there should be an easier way, but this should work (using some drain magic):
or, similarly:
@ndhanushkodi do you think this would work? |
@ndhanushkodi any feedback? |
hi,
|
Hi @CAFxX, We thought of one potential solution which would catch some of these cases, and maybe be less expensive. The drain script could look to the bosh link to get the ips of the cluster, and call This would catch some cases of an unhealthy cluster, but the drain script would not fail in a full network partition.
|
@ctaymor for now it could work, I would suggest to doublecheck for unreachability of a node also via ICMP over a period of time (i.e. if the host does not respond to pings for e.g. 60 seconds -- this would obviously slow down the shrinking process, but I would argue that such an operation is going to be rarely executed anyway). The rationale for this is that at least some IaaS (e.g. vsphere) have provisions to immediately restart a missing VM, so attempting to ping the VM for a period of time should hopefully catch the case in which the VM disappears (e.g. due to host crash) right in the middle of a deploy, as well as the case in which the VM is running but mysql is not working. Better long term solutions would be either
|
@CAFxX I think you hit the nail on the head, that we're trying to come up with a mediocre workaround when the solution to enabling good drain scripts for clustered services should really be pushed down to the bosh level. I'd recommend opening an issue in BOSH. Thanks for spending all this time working on trying to make the drain scripts better.
|
@ctaymor ok, we'll improve the current good-enough-for-now drain impl as discussed here and in the background will open an issue to bosh. Based on the slack discussion we should allow the operator to opt-out of the ICMP protection (e.g. for the case in which ICMP is not allowed in the deployment networks). |
I also reached out to dmitriy and it seems there's some activity already happening in SAP for extending bosh for similar use cases: https://github.com/cloudfoundry/bosh-notes/blob/master/proposals/drain-actions.md |
Hi @ctaymor & @ndhanushkodi Updated PR with cleanup plus what was discussed above. Feel free to re-test and amend for any additional or found errors. Regards, |
See the following references for context: - http://galeracluster.com/documentation-webpages/monitoringthecluster.html - http://galeracluster.com/documentation-webpages/nodestates.html#node-state-changes - http://galeracluster.com/documentation-webpages/galerastatusvariables.html - https://fromdual.com/murphys-law-is-also-valid-for-galera-cluster-for-mysql
That's what `monit stop` is supposed to do
@ctaymor @ndhanushkodi @bgandon have you had a chance to look at this? |
We've added a story to our backlog to pull in some of these changes to fix the drain script issues, and we may change the implementation a little. Unfortunately, our team is stretched a little thin and this will have to be prioritized low for now, but we should be able to eventually get to it. Thanks, |
@ndhanushkodi our production rollout of c2c is blocked on this (and the other c2c bug @iamejboy reported a few days ago, but it seems that a candidate fix is available now), so we were hoping it would happen sooner than "eventually". Just to organize work on our side, any rough estimate about when you realistically expect to be able to work on this? |
Hi @CAFxX, we do want to go ahead and try to unblock you.
|
We updated the drain script to use the Galera Healthcheck endpoint rather than determining node health directly (since this is a job Galera Healthcheck already performs). Should be merged into the develop branch now. Let us know if you have any questions or concerns. |
Hey @CAFxX. Looks like our pipelines ran into a problem. When deleting a deployment, BOSH does not respect the You can get around this with a We talked with the BOSH team to see if there is any way for us to differentiate scaling down a node vs deleting a deployment from the perspective of the drain script; but there is no way to differentiate. This is not the first time they've heard of this issue and are going to prioritize some work to resolve the parallel draining, but there is no guarantee it will get done soon. We don't think there is a simple enough solution in cf-mysql-release to implement this behavior currently. However, we did port the behavior over to our new pxc-release. This is intended to be a replacement for cf-mysql-release moving forward. Because we've structured the jobs differently in pxc-release, and separated the galera-agent/galera_health_check job out with its own drain scripts, we think the drain script will work there without any problem. |
@jpalermo in my understanding the drain script is designed to aid graceful/zero-downtime lifecycles for bosh jobs and, at least from my perspective, it's hard to describe deleting a whole deployment as a "graceful" operation. If we agree on this, then using
We would like to avoid using in production pre-release/non-GA components or, as in the case of the current cf-mysql-release, components with known issues that can affect reliability. |
@CAFxX if it were just our pipelines, we would probably consider requiring the Open source software is a bit strange, since it can be hard to determine what is pre-release and what is ready for production. I can tell you that our biggest deployments are now using pxc-release in production and Pivotal's latest version of Pivotal Cloud Foundry just shipped with pxc-release as the default database. We have not hit the 1.0 release banner for pxc yet: https://www.pivotaltracker.com/story/show/157528771 At which point we plant to transition |
@jpalermo would you be open to leave it behind an opt-in manifest property? We are also considering pxc-release, but we were really close to rolling this out and moving to pxc is going to require to redo a significant amount of validation work that would push back our c2c rollout for weeks. |
I'll see what the mysql product managers think of that idea today. |
Hi @jpalermo @ctaymor @ndhanushkodi, Thanks for adding this as option. By the way, can we removed
Since we already put on spec desc, "---skip-drain flag is required in order to delete a deployment" then add "scaling down" In this case, we will fail drain if we have unreachable node, crashed node, or simply monit stopped node in the cluster. Drain script will be more useful to make sure that cluster is healthy before shutting down the local node. Unlike when cc: @CAFxX |
Thanks for opening a PR. Please make sure you've read and followed the Contributing guide, including signing the Contributor License Agreement.
Feature or Bug Description
Amended mysql drain script to check cluster nodes health before allowing local node to shutdown.
Motivation
We think this adds safeness to update cluster when it is being check on drain before shutting down local node.
Related Issue
#208