Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repair all keyspaces #15

Merged
merged 1 commit into from
Sep 9, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 1 addition & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Ansible role to install an Apache Cassandra cluster supervised by systemd. Inclu
* Some OS tuning options such as installing jemalloc, setting max_map_count and tcp_keepalive, disabling swap.
* Bootstraps nodes using the IPs of the servers in the `cassandra_seed` (configurable) inventory group.
* Weekly scheduled repairs via cron jobs that are non-overlapping (see `cassandra_repair_slots`).
* requires setting `cassandra_keyspaces` (default `[]` will have no effect)
* Note that **all** keyspaces will be scheduled for repairs
* Incremental and full backup scripts as well as a restore script. (disabled by default, optional) (NOTE: needs better testing)
* backup/restore requires access to S3.
* prometheus-style metrics using jmx-exporter
Expand Down Expand Up @@ -42,12 +42,6 @@ Give your cluster a better name:
cassandra_cluster_name: default
```

You should override the keyspaces to match keyspaces on which you wish to run weekly repairs:

```yaml
cassandra_keyspaces: []
```

You may wish to override the following defaults to enable backups:

```yaml
Expand Down Expand Up @@ -108,8 +102,6 @@ Then the following should work and start your cluster:
vars:
# set cluster_name before running the playbook for the first time; never change it afterwards
cassandra_cluster_name: my_cluster
cassandra_keyspaces:
- my_keyspace1
# set installed java package version manually. required when using Ubuntu 18.04. see: [A note on Java 8 and Ubuntu 18.04](#a-note-on-Java-8-and-Ubuntu-18.04)
java_packages: openjdk-8-jdk
roles:
Expand Down
1 change: 0 additions & 1 deletion defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ cassandra_env: dev
cassandra_cluster_name: default
cassandra_role: 'cassandra_{{ cassandra_cluster_name }}'
cassandra_seed_role: '{{ cassandra_role }}_seed'
cassandra_keyspaces: []
cassandra_metrics_prefix: 'cassandra_{{ cassandra_cluster_name }}'

cassandra_snitch: '{% if is_aws_environment %}Ec2Snitch{% else %}SimpleSnitch{% endif %}'
Expand Down
18 changes: 7 additions & 11 deletions templates/cassandra_repair.j2
Original file line number Diff line number Diff line change
Expand Up @@ -11,45 +11,41 @@ metrics_file=${metrics_dir}/cassandra_repair.prom
metrics_tmpl='
# TYPE cassandra_repair counter
# HELP cassandra_repair Counts the number of repairs and the status
cassandra_repair{status="%s",errno="%d",keyspace="%s"} 1 %d
cassandra_repair{status="%s",errno="%d"} 1 %d

# TYPE cassandra_repair_duration_seconds gauge
# HELP cassandra_repair_duration_seconds Track the duration of the repair task, in seconds
cassandra_repair_duration_seconds{status="%s",keyspace="%s"} %d %d
cassandra_repair_duration_seconds{status="%s"} %d %d
'

started_at=$(date +%s%3N)

function report_metrics {
status=$1
errno=$2
keyspace=$3

now=$(date +%s%3N)

# nb. the date must be milliseconds since the epoch
printf "$metrics_tmpl" \
$status $errno $keyspace $now $status $keyspace $(((now-started_at)/1000)) $now > ${metrics_file}.$$
$status $errno $now $status $(((now-started_at)/1000)) $now > ${metrics_file}.$$
mv -v ${metrics_file}.$$ ${metrics_file} || true
}

{% for keyspace in cassandra_keyspaces %}
# Check if a repair is still running
if ps -u cassandra -f | grep "repair {{ keyspace }}" > /dev/null
if ps -u cassandra -f | grep "repair" > /dev/null
then
report_metrics "aborted" 255 "{{ keyspace }}"
report_metrics "aborted" 255
exit 255;
fi

# Run repair
su - {{ cassandra_user }} \
-c '/opt/cassandra/bin/nodetool repair {{ keyspace }} -full -pr'
-c '/opt/cassandra/bin/nodetool repair -full -pr'

if [ $? -ne 0 ]; then
st="failed"
else
st="succeeded"
fi
report_metrics $st $? "{{ keyspace }}"

{% endfor %}
report_metrics $st $?