This is a DigitalOcean-specific Kubernetes (k8s) setup for running a cluster of Helium validators. Some modifications are necessary to run on other Kubernetes hosts. Currently powering over 20 validators.
Development is still early and pull requests are welcome!
All the core essentials you will need to get your environment setup:
- Install kubectl:
brew install kubectl
(or Linux/Windows) - Install doctl:
brew install doctl
(or Linux/Windows) - Install helm:
brew install helm
(or Linux/Windows) - Install jq:
brew install jq
orsudo apt install jq
- Install base64:
brew install base64
Some very helpful tools to make your Kubernetes life easier:
- ๐ถk9s - A must have! K9s provides a terminal UI to interact with your Kubernetes clusters.
- BotKube - a Slack/Discord bot for monitoring and debugging Kubernetes clusters.
Create a new Kubernetes cluster on DigitalOcean. (i.e. 'helium-cluster')
Once setup, use doctl
to download that cluster's config file locally, for use with kubectl
. Also, create a new API token for yourself on DigitalOcean:
doctl auth init --context helium
Enter your access token: <DIGITALOCEAN API TOKEN HERE>
# now switch to the 'helium-cluster' context
doctl auth switch --context helium
# Download cluster's config file with doctl
doctl kubernetes cluster kubeconfig save helium-cluster
Before you setup the validators, create a helium
namespace and set it as your default:
kubectl create ns helium
kubectl config set-context --current --namespace helium
Create your .env
file from the sample one provided.
cp .env.sample .env
The main env vars you'll need to setup are:
# Default namespace context as defined earlier
NAMESPACE=helium
# Default to latest or set a fixed version
VALIDATOR_MAINNET_VERSION=latest
# Number of validators you'd like to run
TOTAL_MAINNET_VALIDATORS=2
# To get the name of your cluster run `kubectl config current-context`
MAINNET_CLUSTER=do-nyc1-helium-cluster
# If you'd like to run a staging/testnet cluster, set the TESTNET env vars
VALIDATOR_TESTNET_VERSION=latest
TOTAL_TESTNET_VALIDATORS=1
TESTNET_CLUSTER=do-nyc1-helium-cluster-dev
The following script will automatically deploy everything you need to run and monitor your validators.
scripts/deploy
# This automatically deploys
# - k8s/exporter-service.yml
# - k8s/validator.yml
# - dynamic-hostports
# - kube-prometheus-stack (Prometheus & Grafana)
If you make changes to your validators in anyway you'll need to restart all of them by running:
scripts/deploy restart
# or
scripts/deploy
scripts/restart pod
You're all set! Try running kubectl get pods
to see if everything is working. You should see something like:
NAME READY STATUS RESTARTS AGE
validator-0 2/2 Running 0 5m
validator-1 2/2 Running 0 5m
Validators will automatically update themselves whenever a new version is released. If a validator is currently in consensus, it will not update until it is out of consensus.
To disable auto updates, set the VALIDATOR_MAINNET_VERSION
env var in your .env
file to the version you'd like (e.g. 1.0.11
) and then run ./scripts/deploy restart
to update all validators.
By default, every validator will have 20GB of space each. If the validators start to need more space, you will have to modify each of your PVCs:
./scripts/validator pvc $replica_id $disk_size
# Example
./scripts/validator pvc 3 100Gi
If you look inside the /scripts
you'll see there are a bunch of helper scripts written to make validator management easier. Below are some of the most common uses:
Run this to see details on all your validators:
scripts/validator info
# Alternatively, you can specify the replica index to show a specific validator
scripts/validator info $replica_id
And then you should see something like this:
Pod: validator-1
Name: cool-hotspot-name
Address: 1YJSgoGPDpqC339KfysdfsdfVc4sG7JBJEUci1i1dKG
Version: 0.1.82
Validator API: https://api.helium.io/v1/validators/1YJSgoGPDpqC339KfysdfsdfVc4sG7JBJEUci1i1dKG
Not currently in consensus group
+---------+-------+
| name |result |
+---------+-------+
|connected| no |
|dialable | yes |
|nat_type |unknown|
| height | 15145 |
+---------+-------+
- Edit
TOTAL_MAINNET_VALIDATORS
in your.env
- Run
scripts/deploy
and the new validator(s) will automatically deploy. - Run
kubectl get pods -w
to monitor the new pod and verify it launched.
Please refer to Helium's guide on staking a validator. To get a validator's address, use the scripts/validator info
command as described above.
A validator will generate a swarm_key
for itself when it is first created. If you'd like to download those keys, run:
scripts/swarm-keys sync
# keys will be saved to disk in the /keys/$hotspot-name folder
If you have 1Password CLI installed, this script can automatically save all the swarm_keys to your vault! Get your vault's UUID
and set the OP_VAULT_UUID
in your .env
file. Here's a quick way to fetch a UUID for your Personal vault:
op list vaults | jq -r '.[] | select(.name == "Private") | .uuid'
To copy a local swarm_key file to a particular validator replica, run:
scripts/swarm-keys replace $replica_id $path_to_swarm_key
# For example
scripts/swarm-keys replace 1 ~/path/to/swarm_key
And if you have the 1Password CLI setup (as described earlier), then you can use the name of your validator instead:
scripts/swarm-keys replace $replica_id $animal_hotspot_name
This will update the swarm_key and restart the specified pod replica.
Grafana and prometheus should already be running thanks to the deploy script. Now you can setup a proxy to your Grafana dashboard using:
scripts/dashboard/monitor
If successful, you should see the following:
Visit =>
Grafana: http://localhost:3000
Prometheus query tool: http://localhost:9090
Alertmanager: http://localhost:9093
Visit http://localhost:3000 to see your Grafana dashboard.
There is a custom validator dashboard that should have most everything you need. When in grafana, search for "Helium Validator Dashboard". There are a bunch of alerts setup already (i.e. get notified when a validator enters consensus).
I plan to continue to make improvements to this dashboard often, so if you want those updates too, update your repo with the latest and then run:
scripts/dashboard/setup update-dashboard
Note: This will automatically update the "Helium Validator Dashboard". If you made any modifications to that dashboard, they will be erased. Make a backup copy of your modified dashboard before updating.
Also, if you have improvements you'd like to make to the validator dashboard, please make a PR! After making edits, you can "sync" those changes to this repo by using
scripts/dashboard/download
# This will automatically download the Helium Validator Dashboard to k8s/grafana/grafana-validator-dashboard.yml
If you're looking to give others access to a Grafana dashboard (or just want to be able to access from anywhere) you can use ExternalDNS to map a domain/subdomain to the Grafana pod.
First, you'll need to find your DNS provider in this list. You can follow their instructions provided, but the process should look something like this:
First, edit these two YAML files: k8s/grafana/external-dns-values.yml and k8s/grafana/external-dns-service.yml and replace all the values with your own. Then run:
helm repo add bitnami https://charts.bitnami.com/bitnami
# Run the following to install the DNS resolver
helm upgrade grafana-dns -f k8s/grafana/external-dns-values.yml bitnami/external-dns --install -n kube-prometheus-stack
# Now you'll create a LoadBalancer service that will map an external IP to a domain (or subomdain)
kubectl apply -f k8s/grafana/external-dns-service.yml -n kube-prometheus-stack
# After a few moments, external-dns should see the new LoadBalancer service and automatically update your DNS TXT records
DigitalOcean has the Kubernetes Dashboard setup for you already, but if you're running locally or on another host that doesn't have it, you can run:
scripts/setup-k8s-dashboard
Note: I don't really use this dashboard much and primarily just use k9s and Grafana
Which DigitalOcean Node Pool should I use?
I have been using CPU optimized node pool with 16GB of RAM and 8vCPUs. It's too early to tell if this is too powerful or not, but seems to perform well so far.
How many validators can I run per node?
I believe it is safe to run 2-3 validators per node but this could change over time. I have yet to see more than one validator enter a CG per node and the CPU has barely climbed over 25% on average with occasional spikes up to 50%.
Why does my nat_type
say unknown
?
This might seem concerning, but it's as expected. It has no effect on validator penalties. Since we are dealing with k8s and its complex networking setup, we have to give each validator a unique NAT_EXTERNAL_PORT
(using dyanmic-hostports) to bypass the miner's auto-NAT detection. This avoids it from being relayed but also defaults to unknown
. It also means your validator will show up with a cool ๐ดโโ ๏ธ on the validator explorer ๐.
k8s adds a lot overhead. Do the validators receive a lot of penalties?
Not enough data but it looks positive. Have had a few elections and majority have had zero penalties, and one lasted 9 rounds and accrued ~0.9 performance penalty. If you are using this setup, please share your data!
How do I move a validator to a different node?
Since the validators are a StatefulSet with no node affinity, simply delete the pod and k8s will evenly distribute the validator across your node pool.
How do I make sure a validator always has the same IP?
A pod will always have the same IP unless it switches to a new node. So as long as you arent moving pods around, then it should stay the same. Improvements are welcome! i.e. You could make sure a pod has a specific node affinity, or use a LoadBalancer to assign a static IP per validator (which would cost $).
Note: All the scripts for managing validators (e.g. upgrading to a new validator version) will make sure to never restart a validator that is currently in consensus.
Can I copy a snapshot from one validator to another?
Yes!
scripts/snapshot copy $replica_from $replica_to
# Example: This will take a snapshot from validator-0 and import into validator-2
scripts/snapshot copy 0 2
Can the pods use the animal hotspot name instead of "validator-3"?
It's technically possible, but would be a lot of refactoring. I highly recommend just using Grafana to figure out the names of your validators (or use scripts/validator info $replica_id
)
PRs for bug fixes, improvements, and new features are all welcome. And please feel free to file GitHub issues for bugs / enhancements or to check out the loose roadmap.
Huge thanks to the DeWi team for helping fund this project. And special shoutout to jamiew's validator-exporter-k8s and charlietran.