The following procedure deploys Linux and Kubernetes software to the management NCNs. Deployment of the nodes starts with booting the storage nodes followed by the master nodes and worker nodes together. After the operating system boots on each node, there are some configuration actions which take place. Watching the console or the console log for certain nodes can help to understand what happens and when. When the process completes for all nodes, the Ceph storage is initialized and the Kubernetes cluster is created and ready for a workload. The PIT node will join Kubernetes after it is rebooted later in Redeploy PIT Node.
The timing of each set of boots varies based on hardware. Nodes from some manufacturers will POST faster than others or vary based on BIOS setting. After powering on a set of nodes, an administrator can expect a healthy boot session to take about 60 minutes depending on the number of storage and worker nodes.
- Prepare for Management Node Deployment
- Update Management Node Firmware
- Deploy Management Nodes
- Configure after Management Node Deployment
- Validate Management Node Deployment
- Next Topic
Preparation of the environment must be done before attempting to deploy the management nodes.
-
Define shell environment variables that will simplify later commands to deploy management nodes.
Notice that one of them is the
IPMI_PASSWORD
. Replacechangeme
with the real root password for BMCs.pit# export mtoken='ncn-m(?!001)\w+-mgmt' pit# export stoken='ncn-s\w+-mgmt' pit# export wtoken='ncn-w\w+-mgmt' pit# export USERNAME=root pit# export IPMI_PASSWORD=changeme
Throughout the guide, simple one-liners can be used to query status of expected nodes. If the shell or environment is terminated, these environment variables should be re-exported.
Examples:
Check power status of all NCNs.
pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U $USERNAME -E -H {} power status
Power off all NCNs.
pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U $USERNAME -E -H {} power off
There will be post-boot workarounds as well.
Follow the workaround instructions for the before-ncn-boot
breakpoint.
NOTE: If you wish to use a timezone other than UTC, instead of step 1 below, follow this procedure for setting a local timezone, then proceed to step 2.
-
Ensure that the PIT node has the current and correct time.
The time can be inaccurate if the system has been powered off for a long time, or, for example, the CMOS was cleared on a Gigabyte node. See Clear Gigabyte CMOS.
This step should not be skipped
Check the time on the PIT node to see whether it matches the current time:
pit# date "+%Y-%m-%d %H:%M:%S.%6N%z"
If the time is inaccurate, set the time manually.
pit# timedatectl set-time "2019-11-15 00:00:00"
Run the NTP script:
pit# /root/bin/configure-ntp.sh
This ensures that the PIT is configured with an accurate date/time, which will be properly propagated to the NCNs during boot.
-
Ensure the current time is set in BIOS for all management NCNs.
If each NCN is booted to the BIOS menu, you can check and set the current UTC time.
pit# export USERNAME=root pit# export IPMI_PASSWORD=changeme
Repeat the following process for each NCN.
-
Start an IPMI console session to the NCN.
pit# bmc=ncn-w001-mgmt # Change this to be each node in turn. pit# conman -j $bmc
-
Using another terminal to watch the console, boot the node to BIOS.
pit# bmc=ncn-w001-mgmt # Change this to be each node in turn. pit# ipmitool -I lanplus -U $USERNAME -E -H $bmc chassis bootdev bios pit# ipmitool -I lanplus -U $USERNAME -E -H $bmc chassis power off pit# sleep 10 pit# ipmitool -I lanplus -U $USERNAME -E -H $bmc chassis power on
For HPE NCNs the above process will boot the nodes to their BIOS, but the menu is unavailable through conman as the node is booted into a graphical BIOS menu.
To access the serial version of the BIOS setup. Perform the ipmitool steps above to boot the node. Then in conman press
ESC+9
key combination when you see the following messages in the console. This will open a menu you use to enter the BIOS via conman.For access via BIOS Serial Console: Press 'ESC+9' for System Utilities Press 'ESC+0' for Intelligent Provisioning Press 'ESC+!' for One-Time Boot Menu Press 'ESC+@' for Network Boot
For HPE NCNs the date configuration menu can be found at the following path:
System Configuration -> BIOS/Platform Configuration (RBSU) -> Date and Time
Alternatively for HPE NCNs you can log in to the BMC's web interface and access the HTML5 console for the node to interact with the graphical BIOS. From the administrators own machine create a SSH tunnel (-L creates the tunnel, and -N prevents a shell and stubs the connection):
linux# bmc=ncn-w001-mgmt # Change this to be each node in turn. linux# ssh -L 9443:$bmc:443 -N root@eniac-ncn-m001
Opening a web browser to
https://localhost:9443
will give access to the BMC's web interface. -
When the node boots, you will be able to use the conman session to see the BIOS menu to check and set the time to current UTC time. The process varies depending on the vendor of the NCN.
-
After you have verified the correct time, power off the NCN.
Repeat the above process for each NCN.
-
The management nodes are expected to have certain minimum firmware installed for BMC, node BIOS, and PCIe card firmware. Where possible, the firmware should be updated prior to install. Some firmware can be updated during or after the installation, but it is better to meet the minimum NCN firmware requirement before starting.
-
(optional) Check these BIOS settings on management nodes NCN BIOS.
This is optional, the BIOS settings (or lack thereof) do not prevent deployment. The NCN Installation will work with the CMOS' default BIOS. There may be settings that facilitate the speed of deployment, but they may be tuned at a later time.
NOTE
The BIOS tuning will be automated, further reducing this step. -
The firmware on the management nodes should be checked for compliance with the minimum required version and updated, if necessary, at this point.
See Update NCN Firmware.
WARNING:
Gigabyte NCNs running BIOS version C20 can become unusable when Shasta 1.5 is installed. This is a result of a bug in the Gigabyte firmware. This bug has not been observed in BIOS version C17.A key symptom of this bug is that the NCN will not PXE boot and will instead fall through to the boot menu, despite being configure to PXE boot. This behavior will persist until the failing node's CMOS is cleared.
- See Clear Gigabyte CMOS.
Deployment of the nodes starts with booting the storage nodes first. Then, the master nodes and worker nodes should be booted together. After the operating system boots on each node, there are some configuration actions which take place. Watching the console or the console log for certain nodes can help to understand what happens and when. When the process is complete for all nodes, the Ceph storage will have been initialized and the Kubernetes cluster will be created ready for a workload.
The configuration workflow described here is intended to help understand the expected path for booting and configuring. See the actual steps below for the commands to deploy these management NCNs.
- Start watching the consoles for
ncn-s001
and at least one other storage node - Boot all storage nodes at the same time
- The first storage node
ncn-s001
will boot and then starts a loop as ceph-ansible configuration waits for all other storage nodes to boot - The other storage nodes boot and become passive. They will be fully configured when ceph-ansible runs to completion on
ncn-s001
- The first storage node
- Once
ncn-s001
notices that all other storage nodes have booted, ceph-ansible will begin Ceph configuration. This takes several minutes. - Once ceph-ansible has finished on
ncn-s001
, thenncn-s001
waits forncn-m002
to create /etc/kubernetes/admin.conf. - Start watching the consoles for
ncn-m002
,ncn-m003
, and at least one worker node - Boot master nodes (
ncn-m002
andncn-m003
) and all worker nodes at the same time- The worker nodes will boot and wait for
ncn-m002
to create the/etc/cray/kubernetes/join-command-control-plane
so they can join Kubernetes - The third master node
ncn-m003
boots and waits forncn-m002
to create the/etc/cray/kubernetes/join-command-control-plane
so it can join Kubernetes - The second master node
ncn-m002
boots, runs the kubernetes-cloudinit.sh which will create /etc/kubernetes/admin.conf and /etc/cray/kubernetes/join-command-control-plane, then waits for the storage node to create etcd-backup-s3-credentials
- The worker nodes will boot and wait for
- Once
ncn-s001
notices thatncn-m002
has created /etc/kubernetes/admin.conf, thenncn-s001
waits for any worker node to become available. - Once each worker node notices that
ncn-m002
has created /etc/cray/kubernetes/join-command-control-plane, then it will join the Kubernetes cluster.- Now
ncn-s001
should notice this from any one of the worker nodes and move forward with creation of ConfigMaps and running the post-Ceph playbooks (s3, OSD pools, quotas, etc.)
- Now
- Once
ncn-s001
creates etcd-backup-s3-credentials during the benji-backups role, which is one of the last roles after Ceph has been set up, thenncn-m001
notices this and moves forward.
-
Change the default root password and SSH keys
If you want to avoid using the default install root password and SSH keys for the NCNs, follow the NCN image customization steps in Change NCN Image Root Password and SSH Keys
This step is strongly encouraged for all systems.
-
Create boot directories for any NCN in DNS:
This will create folders for each host in
/var/www
, allowing each host to have their own unique set of artifacts; kernel, initrd, SquashFS, andscript.ipxe
bootscript.pit# /root/bin/set-sqfs-links.sh
-
Customize boot scripts for any out-of-baseline NCNs
- kubernetes-worker nodes with more than 2 small disks need to make adjustments to prevent bare-metal etcd creation
- A brief overview of what is expected is here, in disk plan of record / baseline
-
Run the BIOS Baseline script to apply a configs to BMCs. The script will apply helper configs to facilitate more deterministic network booting on any NCN port. The script depends on
pit# /root/bin/bios-baseline.sh
-
Set each node to always UEFI Network Boot, and ensure they are powered off
pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U $USERNAME -E -H {} chassis bootdev pxe options=efiboot,persistent pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U $USERNAME -E -H {} power off
NOTE
:The NCN boot order is further explained in NCN Boot Workflow. -
Validate that the LiveCD is ready for installing NCNs.
Observe the output of the checks and note any failures, then remediate them.
pit# csi pit validate --livecd-preflight
Note: You can ignore any errors about not being able resolve arti.dev.cray.com.
-
Print the consoles available to you:
pit# conman -q
Expected output looks similar to the following:
ncn-m001-mgmt ncn-m002-mgmt ncn-m003-mgmt ncn-s001-mgmt ncn-s002-mgmt ncn-s003-mgmt ncn-w001-mgmt ncn-w002-mgmt ncn-w003-mgmt
IMPORTANT
This is the administrators last chance to run NCN pre-boot workarounds (thebefore-ncn-boot
breakpoint). -
Boot the Storage Nodes
-
Boot all storage nodes except
ncn-s001
:pit# grep -oP $stoken /etc/dnsmasq.d/statics.conf | grep -v "ncn-s001-" | sort -u | xargs -t -i ipmitool -I lanplus -U $USERNAME -E -H {} power on
-
Wait approximately 1 minute.
-
Boot
ncn-s001
:pit# ipmitool -I lanplus -U $USERNAME -E -H ncn-s001-mgmt power on
-
-
Wait. Observe the installation through
ncn-s001-mgmt
's console:Print the console name:
pit# conman -q | grep s001
Expected output looks similar to the following:
ncn-s001-mgmt
Then join the console:
pit# conman -j ncn-s001-mgmt
From there an administrator can witness console-output for the cloud-init scripts.
NOTE
: Watch the storage node consoles carefully for error messages. If any are seen, consult Ceph-CSI TroubleshootingNOTE
: If the nodes have PXE boot issues (e.g. getting PXE errors, not pulling the ipxe.efi binary), see PXE boot troubleshootingNOTE
: If other issues arise, such as cloud-init (e.g. NCNs come up to Linux with no hostname), see the CSM workarounds for fixes around mutual symptoms. If there is a workaround here, the output will look similar to the following.pit# ls /opt/cray/csm/workarounds/after-ncn-boot
CASMINST-1093
-
Wait for storage nodes before booting Kubernetes master nodes and worker nodes.
NOTE
: Once all storage nodes are up and the message...sleeping 5 seconds until /etc/kubernetes/admin.conf
appears onncn-s001
's console, it is safe to proceed with booting the Kubernetes master nodes and worker nodespit# grep -oP "($mtoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U $USERNAME -E -H {} power on
-
Stop watching the console from
ncn-s001
.Type the ampersand character and then the period character to exit from the conman session on
ncn-s001
.&. pit#
-
Wait. Observe the installation through
ncn-m002-mgmt
's console:Print the console name:
pit# conman -q | grep m002
Expected output looks similar to the following:
ncn-m002-mgmt
Then join the console:
pit# conman -j ncn-m002-mgmt
NOTE
: If the nodes have PXE boot issues (e.g. getting PXE errors, not pulling the ipxe.efi binary) see PXE boot troubleshootingNOTE
: If one of the master nodes seems hung waiting for the storage nodes to create a secret, check the storage node consoles for error messages. If any are found, consult CEPH CSI TroubleshootingNOTE
: If other issues arise, such as cloud-init (e.g. NCNs come up to Linux with no hostname) see the CSM workarounds for fixes around mutual symptoms. If there is a workaround here, the output will look similar to the following.pit# ls /opt/cray/csm/workarounds/after-ncn-boot CASMINST-1093
-
Refer to timing of deployments. It should take no more than 60 minutes for the
kubectl get nodes
command to return output indicating that all the master nodes and worker nodes aside from the PIT node booted from the LiveCD areReady
:pit# ssh ncn-m002 ncn-m002# kubectl get nodes -o wide
Expected output looks similar to the following:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ncn-m002 Ready master 14m v1.18.6 10.252.1.5 <none> SUSE Linux Enterprise High Performance Computing 15 SP2 5.3.18-24.43-default containerd://1.3.4 ncn-m003 Ready master 13m v1.18.6 10.252.1.6 <none> SUSE Linux Enterprise High Performance Computing 15 SP2 5.3.18-24.43-default containerd://1.3.4 ncn-w001 Ready <none> 6m30s v1.18.6 10.252.1.7 <none> SUSE Linux Enterprise High Performance Computing 15 SP2 5.3.18-24.43-default containerd://1.3.4 ncn-w002 Ready <none> 6m16s v1.18.6 10.252.1.8 <none> SUSE Linux Enterprise High Performance Computing 15 SP2 5.3.18-24.43-default containerd://1.3.4 ncn-w003 Ready <none> 5m58s v1.18.6 10.252.1.12 <none> SUSE Linux Enterprise High Performance Computing 15 SP2 5.3.18-24.43-default containerd://1.3.4
-
Stop watching the console from
ncn-m002
.Type the ampersand character and then the period character to exit from the conman session on
ncn-m002
.&. pit#
IMPORTANT:
Do the following if NCNs are Gigabyte hardware.IMPORTANT:
the cephadm may output this warning "WARNING: The same type, major and minor should not be used for multiple devices.". You can ignore this warning.
If you have OSDs on each node (ceph osd tree
can show this), then you have all your nodes in Ceph. That means you can utilize the orchestrator to look for the devices.
-
Get the number of OSDs in the cluster.
ncn-s# ceph -f json-pretty osd stat |jq .num_osds 24
-
Compare your number of OSDs to the output below.
NOTE: If your Ceph cluster is large and has a lot of nodes, you can specify a node after the following command to limit the results.
ncn-s# ceph orch device ls Hostname Path Type Serial Size Health Ident Fault Available ncn-s001 /dev/sda ssd PHYF015500M71P9DGN 1920G Unknown N/A N/A No ncn-s001 /dev/sdb ssd PHYF016500TZ1P9DGN 1920G Unknown N/A N/A No ncn-s001 /dev/sdc ssd PHYF016402EB1P9DGN 1920G Unknown N/A N/A No ncn-s001 /dev/sdd ssd PHYF016504831P9DGN 1920G Unknown N/A N/A No ncn-s001 /dev/sde ssd PHYF016500TV1P9DGN 1920G Unknown N/A N/A No ncn-s001 /dev/sdf ssd PHYF016501131P9DGN 1920G Unknown N/A N/A No ncn-s001 /dev/sdi ssd PHYF016500YB1P9DGN 1920G Unknown N/A N/A No ncn-s001 /dev/sdj ssd PHYF016500WN1P9DGN 1920G Unknown N/A N/A No ncn-s002 /dev/sda ssd PHYF0155006W1P9DGN 1920G Unknown N/A N/A No ncn-s002 /dev/sdb ssd PHYF0155006Z1P9DGN 1920G Unknown N/A N/A No ncn-s002 /dev/sdc ssd PHYF015500L61P9DGN 1920G Unknown N/A N/A No ncn-s002 /dev/sdd ssd PHYF015502631P9DGN 1920G Unknown N/A N/A No ncn-s002 /dev/sde ssd PHYF0153000G1P9DGN 1920G Unknown N/A N/A No ncn-s002 /dev/sdf ssd PHYF016401T41P9DGN 1920G Unknown N/A N/A No ncn-s002 /dev/sdi ssd PHYF016504C21P9DGN 1920G Unknown N/A N/A No ncn-s002 /dev/sdj ssd PHYF015500GQ1P9DGN 1920G Unknown N/A N/A No ncn-s003 /dev/sda ssd PHYF016402FP1P9DGN 1920G Unknown N/A N/A No ncn-s003 /dev/sdb ssd PHYF016401TE1P9DGN 1920G Unknown N/A N/A No ncn-s003 /dev/sdc ssd PHYF015500N51P9DGN 1920G Unknown N/A N/A No ncn-s003 /dev/sdd ssd PHYF0165010Z1P9DGN 1920G Unknown N/A N/A No ncn-s003 /dev/sde ssd PHYF016500YR1P9DGN 1920G Unknown N/A N/A No ncn-s003 /dev/sdf ssd PHYF016500X01P9DGN 1920G Unknown N/A N/A No ncn-s003 /dev/sdi ssd PHYF0165011H1P9DGN 1920G Unknown N/A N/A No ncn-s003 /dev/sdj ssd PHYF016500TQ1P9DGN 1920G Unknown N/A N/A No
If you have devices that are "Available = Yes" and they are not being automatically added, you may have to zap that device.
IMPORTANT: Prior to zapping any device please ensure it is not being used.
-
Check to see if the number of devices is less than the number of listed drives or your output from step 1.
ncn-s# ceph orch device ls|grep dev|wc -l 24
If the numbers are equal, then you may need to fail your
ceph-mgr
daemon to get a fresh inventory.ncn-s# ceph mgr fail $(ceph mgr dump | jq -r .active_name)
Give it 5 minutes then re-check
ceph orch device ls
to see if the drives are still showing as available. If so, then proceed to the next step. -
ssh
to the host and look atlsblk
output and check against the device from the aboveceph orch device ls
ncn-s# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 4.2G 1 loop / run/ rootfsbase loop1 7:1 0 30G 0 loop └─live-overlay-pool 254:8 0 300G 0 dm loop2 7:2 0 300G 0 loop └─live-overlay-pool 254:8 0 300G 0 dm sda 8:0 0 1.8T 0 disk └─ceph--0a476f53--8b38--450d--8779--4e587402f8a8-osd--data--b620b7ef--184a--46d7--9a99--771239e7a323 254:7 0 1.8T 0 lvm
- If it has an LVM volume like above, then it may be in use and you should do the option 2 check below to make sure we can wipe the drive.
-
Log into each ncn-s node and check for unused drives.
ncn-s# cephadm shell -- ceph-volume inventory
IMPORTANT:
Thecephadm
command may output this warningWARNING: The same type, major and minor should not be used for multiple devices.
. You can ignore this warning.The field
available
would beTrue
if Ceph sees the drive as empty and can be used, e.g.:Device Path Size rotates available Model name /dev/sda 447.13 GB False False SAMSUNG MZ7LH480 /dev/sdb 447.13 GB False False SAMSUNG MZ7LH480 /dev/sdc 3.49 TB False False SAMSUNG MZ7LH3T8 /dev/sdd 3.49 TB False False SAMSUNG MZ7LH3T8 /dev/sde 3.49 TB False False SAMSUNG MZ7LH3T8 /dev/sdf 3.49 TB False False SAMSUNG MZ7LH3T8 /dev/sdg 3.49 TB False False SAMSUNG MZ7LH3T8 /dev/sdh 3.49 TB False False SAMSUNG MZ7LH3T8
Alternatively, just dump the paths of available drives:
ncn-s# cephadm shell -- ceph-volume inventory --format json-pretty | jq -r '.[]|select(.available==true)|.path'
-
Wipe the drive ONLY after you have confirmed the drive is not being used by the current Ceph cluster via options 1, 2, or both.
The following example wipes drive
/dev/sdc
onncn-s002
. You should replace these values with the appropriate ones for your situation.ncn-s# ceph orch device zap ncn-s002 /dev/sdc --force
-
Add unused drives.
ncn-s# cephadm shell -- ceph-volume lvm create --data /dev/sd<drive to add> --bluestore
More information can be found at the cephadm
reference page.
Follow the workaround instructions for the after-ncn-boot
breakpoint.
After the management nodes have been deployed, configuration can be applied to the booted nodes.
The LiveCD needs to authenticate with the cluster to facilitate the rest of the CSM installation.
-
Copy the Kubernetes config to the LiveCD to be able to use
kubectl
as cluster administrator.This will always be whatever node is the
first-master-hostname
in your/var/www/ephemeral/configs/data.json | jq
file. If you are provisioning your HPE Cray EX system fromncn-m001
, then you can expect to fetch these fromncn-m002
.pit# mkdir -v ~/.kube pit# scp ncn-m002.nmn:/etc/kubernetes/admin.conf ~/.kube/config
After the NCNs are booted, the BGP peers will need to be checked and updated if the neighbor IP addresses are incorrect on the switches. Follow the steps below and see Check and Update BGP Neighbors for more details on the BGP configuration.
-
Make sure the SYSTEM_NAME variable is set to name of your system.
pit# export SYSTEM_NAME=eniac
-
Determine the IP address of the worker NCNs.
pit# grep -B1 "name: ncn-w" /var/www/ephemeral/prep/${SYSTEM_NAME}/networks/NMN.yaml
-
Determine the IP addresses for the switches that are peering.
pit# grep peer-address /var/www/ephemeral/prep/${SYSTEM_NAME}/metallb.yaml
-
Do the following steps for each of the switch IP addresses that you found in the previous step:
-
Log in to the switch as the
admin
user:pit# ssh admin@<switch_ip_address>
-
Clear the BGP peering sessions by running the following commands. You should see either "arubanetworks" or "Mellanox" in the first output you see when you log in to the switch.
- Aruba:
clear bgp *
- Mellanox: First run
enable
, then runclear ip bgp all
- Aruba:
At this point the peering sessions with the worker IP addresses should be in IDLE, CONNECT, or ACTIVE state but not ESTABLISHED state. This is because the MetalLB speaker pods have not been deployed yet.
You should see that the MsgRcvd and MsgSent columns for the worker IP addresses are 0.
-
-
Check the status of the BGP peering sessions by running the following commands on each switch:
- Aruba:
show bgp ipv4 unicast summary
- Mellanox:
show ip bgp summary
You should see a neighbor for each of the workers NCN IP addresses found in an earlier step. If it is an Aruba switch, you will also see a neighbor for the other switch of the pair that are peering.
At this point the peering sessions with the worker IP addresses should be in
IDLE
,CONNECT
, orACTIVE
state (notESTABLISHED
). This is due to the MetalLB speaker pods not being deployed yet.You should see that the
MsgRcvd
andMsgSent
columns for the worker IP addresses are 0. - Aruba:
-
If the neighbor IP addresses do not match the worker NCN IP addresses, use the helper script for Mellanox and CANU (Cray Automated Network Utility) for Aruba.
-
This command will list the available helper scripts.
pit# ls -1 /usr/bin/*mellanox_set_bgp_peer*py
Expected output looks similar to the following:
/usr/bin/mellanox_set_bgp_peers.py
-
Run the BGP helper script if you have mellanox switches.
The BGP helper script requires three parameters: the IP address of switch 1, the IP addresss of switch 2, and the path to the to CSI generated network files.
- The IP addresses used should be Node Management Network IP addresses (NMN). These IP addresses will be used for the BGP Router-ID.
- The path to the CSI generated network files must include
CAN.yaml
,HMN.yaml
,HMNLB.yaml
,NMNLB.yaml
, andNMN.yaml
. The path must include the SYSTEM_NAME.
For Mellanox:
The IP addresses in this example should be replaced by the IP addresses of the switches.
pit# /usr/bin/mellanox_set_bgp_peers.py 10.252.0.2 10.252.0.3 /var/www/ephemeral/prep/${SYSTEM_NAME}/networks/```
-
Run CANU if you have Aruba switches.
CANU requires three parameters: the IP address of switch 1, the IP addresss of switch 2, and the path to the to directory containing the file
sls_input_file.json
The IP addresses in this example should be replaced by the IP addresses of the switches.
pit# canu -s 1.5 config bgp --ips 10.252.0.2,10.252.0.3 --csi-folder /var/www/ephemeral/prep/${SYSTEM_NAME}/```
-
Check the status of the BGP peering sessions on each switch.
- Aruba:
show bgp ipv4 unicast summary
- Mellanox:
show ip bgp summary
You should see a neighbor for each of the workers NCN IP addresses found above. If it is an Aruba switch, you will also see a neighbor for the other switch of the pair that are peering.
At this point the peering sessions with the worker IP addresses should be in
IDLE
,CONNECT
, orACTIVE
state (notESTABLISHED
). This is due to the MetalLB speaker pods not being deployed yet.You should see that the
MsgRcvd
andMsgSent
columns for the worker IP addresses are 0. - Aruba:
-
Check the BGP config on each switch to verify that the NCN neighbors are configured as passive.
-
Aruba:
show run bgp
The passive neighbor configuration is required.neighbor 10.252.1.7 passive
EXAMPLE ONLYsw-spine-001# show run bgp router bgp 65533 bgp router-id 10.252.0.2 maximum-paths 8 distance bgp 20 70 neighbor 10.252.0.3 remote-as 65533 neighbor 10.252.1.7 remote-as 65533 neighbor 10.252.1.7 passive neighbor 10.252.1.8 remote-as 65533 neighbor 10.252.1.8 passive neighbor 10.252.1.9 remote-as 65533 neighbor 10.252.1.9 passive
-
Mellanox:
show run protocol bgp
The passive neighbor configuration is required.router bgp 65533 vrf default neighbor 10.252.1.7 transport connection-mode passive
EXAMPLE ONLYprotocol bgp router bgp 65533 vrf default router bgp 65533 vrf default router-id 10.252.0.2 force router bgp 65533 vrf default maximum-paths ibgp 32 router bgp 65533 vrf default neighbor 10.252.1.7 remote-as 65533 router bgp 65533 vrf default neighbor 10.252.1.7 route-map ncn-w003 router bgp 65533 vrf default neighbor 10.252.1.8 remote-as 65533 router bgp 65533 vrf default neighbor 10.252.1.8 route-map ncn-w002 router bgp 65533 vrf default neighbor 10.252.1.9 remote-as 65533 router bgp 65533 vrf default neighbor 10.252.1.9 route-map ncn-w001 router bgp 65533 vrf default neighbor 10.252.1.7 transport connection-mode passive router bgp 65533 vrf default neighbor 10.252.1.8 transport connection-mode passive router bgp 65533 vrf default neighbor 10.252.1.9 transport connection-mode passive
-
-
IMPORTANT
The Boot-Order is set by cloud-init; however, the current setting is still iterating. This manual step is required until further notice.
-
Do the following two steps outlined in Set Boot Order for all NCNs and the PIT node.
```bash
pit# export CSM_RELEASE=csm-x.y.z
pit# pushd /var/www/ephemeral
pit# ${CSM_RELEASE}/lib/install-goss-tests.sh
pit# popd
```
Do all of the validation steps. The optional validation steps are manual steps which could be skipped.
The following csi pit validate
commands will run a series of remote tests on the other nodes to validate they are healthy and configured correctly.
Observe the output of the checks and note any failures, then remediate them.
-
Check the storage nodes.
Note
: Throughout the output of thecsi pit validate
command there will be a test total for each node where the tests run. Be sure to check all of them and not just the final one.pit# csi pit validate --ceph | tee csi-pit-validate-ceph.log
Once that command has finished, the following will extract the test totals reported for each node:
pit# grep "Total" csi-pit-validate-ceph.log
Example output for a system with 3 storage nodes:
Total Tests: 7, Total Passed: 7, Total Failed: 0, Total Execution Time: 1.4226 seconds Total Tests: 7, Total Passed: 7, Total Failed: 0, Total Execution Time: 1.4077 seconds Total Tests: 7, Total Passed: 7, Total Failed: 0, Total Execution Time: 1.4246 seconds
If these total lines report any failed tests, look through the full output of the test to see which node had the failed test and what the details are for that test.
Note
: Please see Utility Storage to help resolve any failed tests. -
Check the master and worker nodes.
Note
: Throughout the output of thecsi pit validate
command there will be a test total for each node where the tests run. Be sure to check all of them and not just the final one.pit# csi pit validate --k8s | tee csi-pit-validate-k8s.log
Once that command has finished, the following will extract the test totals reported for each node:
pit# grep "Total" csi-pit-validate-k8s.log
Example output for a system with 5 master and worker nodes (other than the PIT node):
Total Tests: 16, Total Passed: 16, Total Failed: 0, Total Execution Time: 0.3072 seconds Total Tests: 16, Total Passed: 16, Total Failed: 0, Total Execution Time: 0.2727 seconds Total Tests: 12, Total Passed: 12, Total Failed: 0, Total Execution Time: 0.2841 seconds Total Tests: 12, Total Passed: 12, Total Failed: 0, Total Execution Time: 0.3622 seconds Total Tests: 12, Total Passed: 12, Total Failed: 0, Total Execution Time: 0.2353 seconds
If these total lines report any failed tests, look through the full output of the test to see which node had the failed test and what the details are for that test.
WARNING
If there are failures for tests with names like "Worker Node CONLIB FS Label", then these manual tests should be run on the node which reported the failure. The master nodes have a test looking for ETCDLVM label. The worker nodes have tests looking for the CONLIB, CONRUN, and K8SLET labels.Master nodes:
ncn-m# blkid -L ETCDLVM /dev/sdc
Worker nodes:
ncn-w# blkid -L CONLIB /dev/sdc ncn-w# blkid -L CONRUN /dev/sdc ncn-w# blkid -L K8SLET /dev/sdc
WARNING If these manual tests do not report a disk device such as "/dev/sdc" (this letter will vary and is unimportant) as having the respective label on that node, then the problem must be resolved before continuing to the next step.
- If a master node has the problem then it is best to wipe and redeploy all of the management nodes before continuing the installation.
- Wipe the each of the worker and master nodes (except
ncn-m001
because it is the PIT node) using the 'Basic Wipe' section of Wipe NCN Disks for Reinstallation and then wipe each of the storage nodes using the 'Full Wipe' section of Wipe NCN Disks for Reinstallation. - Return to the Boot the Storage Nodes step of Deploy Management Nodes section above.
- Wipe the each of the worker and master nodes (except
- If a worker node has the problem then it is best to wipe and redeploy that worker node before continuing the installation.
- Wipe this worker node using the 'Basic Wipe' section of Wipe NCN Disks for Reinstallation.
- Return to the Boot the Master and Worker Nodes** step of Deploy Management Nodes section above.
- If a master node has the problem then it is best to wipe and redeploy all of the management nodes before continuing the installation.
-
If your shell terminal is not echoing your input after running the above
csi pit validate
tests, then reset the terminal.pit# reset
-
Ensure that weave has not become split-brained.
Run the following command on each member of the Kubernetes cluster (master nodes and worker nodes) to ensure that weave is operating as a single cluster:
ncn# weave --local status connections | grep failed
If you see messages like IP allocation was seeded by different peers, then weave looks to have become split-brained. At this point, it is necessary to wipe the NCNs and start the PXE boot again:
- Wipe the NCNs using the 'Basic Wipe' section of Wipe NCN Disks for Reinstallation.
- Return to the 'Boot the Storage Nodes' step of Deploy Management Nodes section above.
All validation should be taken care of by the CSI validate commands. The following checks can be done for sanity-checking:
Important common issues should be checked by tests, new pains in these areas should entail requests for new tests.
- Verify all nodes have joined the cluster
Check that the status of kubernetes nodes is Ready
.
ncn# kubectl get nodes
If one or more nodes are not in the Ready
state, the following command can be run to get additional information:
ncn# kubectl describe node <node-name> #for example, ncn-m001
- Verify etcd is running outside Kubernetes on master nodes
On each kubernetes master node, check the status of the etcd service and ensure it is Active/Running:
ncn-m# systemctl status etcd.service
- Verify that all the pods in the kube-system namespace are running
Check that pods listed are in the Running
or Completed
state.
ncn# kubectl get pods -o wide -n kube-system
- Verify that the ceph-csi requirements are in place Ceph CSI Troubleshooting
Before you move on, this is the last point where you will be able to rebuild nodes without having to rebuild the PIT node. So take time to double check both the cluster and the validation test results
After completing the deployment of the management nodes, the next step is to install the CSM services.