The Pre-Install Toolkit (PIT) node needs to be bootstrapped from the LiveCD. There are two media available to bootstrap the PIT node--the RemoteISO or a bootable USB device. This procedure describes using the RemoteISO. If not using the RemoteISO, see Bootstrap PIT Node from LiveCD USB
The installation process is similar to the USB based installation with adjustments to account for the lack of removable storage.
Important: Before starting this procedure be sure to complete the procedure to Prepare Configuration Payload for the relevant installation scenario.
- Known Compatibility Issues
- Attaching and Booting the LiveCD with the BMC
- First Login
- Configure the Running LiveCD
- Next Topic
The LiveCD Remote ISO has known compatibility issues for nodes from certain vendors.
- Intel nodes should not attempt to bootstrap using the LiveCD Remote ISO method. Instead use Bootstrap PIT Node from LiveCD USB
- Gigabyte nodes should not attempt to bootstrap using the LiveCD Remote ISO method. Instead use Bootstrap PIT Node from LiveCD USB
Warning: If this is a re-installation on a system that still has a USB device from a prior installation, then that USB device must be wiped before continuing. Failing to wipe the USB, if present, may result in confusion. If the USB is still booted, then it can wipe itself using the basic wipe from Wipe NCN Disks for Reinstallation. If it is not booted, please do so and wipe it or disable the USB ports in the BIOS (not available for all vendors).
Obtain and attach the LiveCD cray-pre-install-toolkit ISO file to the BMC. Depending on the vendor of the node, the instructions for attaching to the BMC will differ.
-
The CSM software release should be downloaded and expanded for use.
Important: To ensure that the CSM release plus any patches, workarounds, or hotfixes are included follow the instructions in Update CSM Product Stream
The cray-pre-install-toolkit ISO and other files are now available in the directory from the extracted CSM tar. The ISO will have a name similar to
cray-pre-install-toolkit-sle15sp2.x86_64-1.4.10-20210514183447-gc054094.iso
-
Prepare a server on the network to host the cray-pre-install-toolkit ISO.
This release of CSM software, the cray-pre-install-toolkit ISO should be placed on a server which the PIT node will be able to contact via http or https.
- HPE nodes can use http or https.
Note: A shorter path name is better than a long path name on the webserver.
- The Cray Pre-Install Toolkit ISO is included in the CSM release tarball. It will have a long filename similar to
cray-pre-install-toolkit-sle15sp2.x86_64-1.4.10-20210514183447-gc054094.iso
, so pick a shorter name on the webserver.
-
See the respective procedure below to attach an ISO.
- HPE iLO BMCs
- [Gigabyte BMCs] Should not use the RemoteISO method. See Bootstrap PIT Node from LiveCD USB
- [Intel BMCs] Should not use the RemoteISO method. See Bootstrap PIT Node from LiveCD USB
-
The chosen procedure should have rebooted the server. Observe the server boot into the LiveCD.
On first login (over SSH or at local console) the LiveCD will prompt the administrator to change the password.
-
The initial password is empty; set the username of
root
and pressreturn
twice.pit login: root
Expected output looks similar to the following:
Password: <-------just press Enter here for a blank password You are required to change your password immediately (administrator enforced) Changing password for root. Current password: <------- press Enter here, again, for a blank password New password: <------- type new password Retype new password:<------- retype new password Welcome to the CRAY Pre-Install Toolkit (LiveOS)
-
Set up the Typescript directory as well as the initial typescript. This directory will be returned to for every typescript in the entire CSM installation.
pit# mkdir -pv /var/www/ephemeral/prep/admin pit# pushd !$ pit# script -af csm-install-remoteiso.$(date +%Y-%m-%d).txt pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
-
Set up the site-link, enabling SSH to work. You can reconnect with SSH after this step.
NOTICE REGARDING DHCP
If your site's network authority or network administrator has already provisioned an IPv4 address for your master node(s) external NIC(s), then skip this step.-
Setup Variables.
# The IPv4 Address for the nodes external interface(s); this will be provided, if not already by the site's network administrator or network authority. pit# site_ip=172.30.XXX.YYY/20 pit# site_gw=172.30.48.1 pit# site_dns=172.30.84.40 # The actual NIC names for the external site interface; the first onboard or the first 1GBe PCIe (RJ-45). pit# site_nics='p2p1 p2p2 p2p3' # another example: pit# site_nics=em1
-
Run the link setup script.
NOTE : USAGE
All of the/root/bin/csi-*
scripts are harmless to run without parameters, doing so will dump usage statements.pit# /root/bin/csi-setup-lan0.sh $site_ip $site_gw $site_dns $site_nics
-
(recommended) print
lan0
, and if it has an IP address, then exit console and log in again using SSH. The SSH connection will provide larger window sizes and better bufferhandling (screen wrapping).pit# ip a show lan0 pit# exit external# ssh root@${SYSTEM_NAME}-ncn-m001
-
(recommended) After reconnecting, resume the typescript (the
-a
appends to an existing script).pit# pushd /var/www/ephemeral/prep/admin pit# script -af $(ls -tr csm-install-remoteiso* | head -n 1) pit# export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
-
Check hostname.
pit# hostnamectl
NOTE
If the hostname returned by thehostnamectl
command is stillpit
, then re-run the above script with the same parameters. Otherwise feel free to set the hostname by hand withhostnamectl
, please continue to use the-pit
suffix to prevent masquerading a PIT node as a real NCN to administrators and automation.
-
-
Find a local disk for storing product installers.
pit# disk="$(lsblk -l -o SIZE,NAME,TYPE,TRAN | grep -E '(sata|nvme|sas)' | sort -h | awk '{print $2}' | head -n 1 | tr -d '\n')" pit# parted --wipesignatures -m --align=opt --ignore-busy -s /dev/$disk -- mklabel gpt mkpart primary ext4 2048s 100% pit# mkfs.ext4 -L PITDATA "/dev/${disk}1"
-
Mount local disk, check the output of each command as it goes.
pit# mount -v -L PITDATA pit# pushd /var/www/ephemeral pit/var/www/ephemeral# mkdir -v prep configs data
-
Download the CSM software release to the PIT node.
Important: In an earlier step, the CSM release plus any patches, workarounds, or hotfixes were downloaded to a system using the instructions in Update CSM Product Stream Either copy from that system to the PIT node or set the ENDPOINT variable to URL and use
wget
.-
Set helper variables
pit:/var/www/ephemeral# export ENDPOINT=https://arti.dev.cray.com/artifactory/shasta-distribution-stable-local/csm pit:/var/www/ephemeral# export CSM_RELEASE=csm-x.y.z pit:/var/www/ephemeral# export SYSTEM_NAME=eniac
-
Save the
CSM_RELEASE
for usage later; all subsequent shell sessions will have this var set.# Prepend a new line to assure we add on a unique line and not at the end of another. pit:/var/www/ephemeral# echo -e "\nCSM_RELEASE=$CSM_RELEASE" >>/etc/environment
-
Fetch the release tarball.
pit:/var/www/ephemeral# wget ${ENDPOINT}/${CSM_RELEASE}.tar.gz -O /var/www/ephemeral/${CSM_RELEASE}.tar.gz
-
Expand the tarball on the PIT node.
pit:/var/www/ephemeral# tar -zxvf ${CSM_RELEASE}.tar.gz pit:/var/www/ephemeral# ls -l ${CSM_RELEASE}
-
Copy the artifacts into place.
pit/var/www/ephemeral# mkdir -pv data/{k8s,ceph} pit/var/www/ephemeral# rsync -a -P --delete ./${CSM_RELEASE}/images/kubernetes/ ./data/k8s/ pit/var/www/ephemeral# rsync -a -P --delete ./${CSM_RELEASE}/images/storage-ceph/ ./data/ceph/
The PIT ISO, Helm charts/images, and bootstrap RPMs are now available in the extracted CSM tar.
-
-
Install/upgrade the CSI and testing RPMs.
pit:/var/www/ephemeral# rpm -Uvh --force $(find ./${CSM_RELEASE}/rpm/cray/csm/ -name "cray-site-init-*.x86_64.rpm" | sort -V | tail -1) pit:/var/www/ephemeral# rpm -Uvh --force $(find ./${CSM_RELEASE}/rpm/cray/csm/ -name "goss-servers*.rpm" | sort -V | tail -1) pit:/var/www/ephemeral# rpm -Uvh --force $(find ./${CSM_RELEASE}/rpm/cray/csm/ -name "csm-testing*.rpm" | sort -V | tail -1)
-
Show the version of CSI installed.
pit# csi version
Expected output looks similar to the following:
CRAY-Site-Init build signature... Build Commit : b3ed3046a460d804eb545d21a362b3a5c7d517a3-release-shasta-1.4 Build Time : 2021-02-04T21:05:32Z Go Version : go1.14.9 Git Version : b3ed3046a460d804eb545d21a362b3a5c7d517a3 Platform : linux/amd64 App. Version : 1.5.18
-
Download and install/upgrade the workaround and documentation RPMs.
If this machine does not have direct Internet access, these RPMs will need to be externally downloaded and then copied to the system.
Important: In an earlier step, the CSM release plus any patches, workarounds, or hotfixes were downloaded to a system using the instructions in Check for Latest Workarounds and Documentation Updates. Use that set of RPMs rather than downloading again.
linux# wget https://storage.googleapis.com/csm-release-public/shasta-1.5/docs-csm/docs-csm-latest.noarch.rpm linux# wget https://storage.googleapis.com/csm-release-public/shasta-1.5/csm-install-workarounds/csm-install-workarounds-latest.noarch.rpm linux# scp -p docs-csm-*rpm csm-install-workarounds-*rpm ncn-m001:/root linux# ssh ncn-m001 pit# rpm -Uvh --force docs-csm-latest.noarch.rpm pit# rpm -Uvh --force csm-install-workarounds-latest.noarch.rpm
-
Generate configuration files.
Some files are needed for generating the configuration payload. See these topics in Prepare Configuration Payload if you have not already prepared the information for this system. At this time see Create HMN Connections JSON for instructions about creating the
hmn_connections.json
.Pull these files into the current working directory.
application_node_config.yaml
(optional - see below)cabinets.yaml
(optional - see below)hmn_connections.json
ncn_metadata.csv
switch_metadata.csv
system_config.yaml
(see below)
The optional
application_node_config.yaml
file may be provided to further assign application nodes to roles and subroles in the HSM. See Create Application Node YAMLThe optional
cabinets.yaml
file allows cabinet naming and numbering as well as some VLAN overrides. See Create Cabinets YAML.The
system_config.yaml
is required for a reinstall because it was created during a previous install. For a first time install, the information in it can be provided as command line arguments tocsi config init
.-
Change into the preparation directory.
linux# mkdir -pv /var/www/ephemeral/prep linux# cd /var/www/ephemeral/prep
After gathering the files into this working directory, generate your configurations.
-
If doing a reinstall and have the
system_config.yaml
parameter file available, then generate the system configuration reusing this parameter file (see avoiding parameters).If not doing a reinstall of Shasta software, then the
system_config.yaml
file will not be available, so skip the rest of this step.-
Check for the configuration files. The needed files should be in the current directory.
linux# ls -1
Expected output looks similar to the following:
application_node_config.yaml cabinets.yaml hmn_connections.json ncn_metadata.csv switch_metadata.csv system_config.yaml
-
Set an environment variable so this system name can be used in later commands.
linux# export SYSTEM_NAME=eniac
-
Generate the system configuration.
linux# csi config init
A new directory matching your
--system-name
argument will now exist in your working directory.These warnings from
csi config init
for issues inhmn_connections.json
can be ignored.- The node with the external connection (
ncn-m001
) will have a warning similar to this because its BMC is connected to the site and not the HMN like the other management NCNs. It can be ignored.
"Couldn't find switch port for NCN: x3000c0s1b0"
- An unexpected component may have this message. If this component is an application node with an unusual prefix, it should be added to the
application_node_config.yaml
file. Then reruncsi config init
. See the procedure to Create Application Node Config YAML
{"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row": {"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}
-
If a cooling door is found in
hmn_connections.json
, there may be a message like the following. It can be safely ignored.{"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row": {"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
- The node with the external connection (
-
Skip the next step and continue with the CSI Workarounds.
-
-
If doing a first time install or the
system_config.yaml
parameter file for a reinstall is not available, generate the system configuration.If doing a first time install, this step is required. If you did the previous step as part of a reinstall, skip this.
-
Check for the configuration files. The needed files should be in the current directory.
linux# ls -1
Expected output looks similar to the following:
application_node_config.yaml cabinets.yaml hmn_connections.json ncn_metadata.csv switch_metadata.csv
-
Set an environment variable so this system name can be used in later commands.
linux# export SYSTEM_NAME=eniac
-
Generate the system configuration. See below for an explanation of the command line parameters and some common settings.
linux# csi config init \ --bootstrap-ncn-bmc-user root \ --bootstrap-ncn-bmc-pass changeme \ --system-name ${SYSTEM_NAME} \ --can-cidr 10.103.11.0/24 \ --can-external-dns 10.103.11.113 \ --can-gateway 10.103.11.1 \ --can-static-pool 10.103.11.112/28 \ --can-dynamic-pool 10.103.11.128/25 \ --nmn-cidr 10.252.0.0/17 \ --hmn-cidr 10.254.0.0/17 \ --ntp-pool time.nist.gov \ --site-domain dev.cray.com \ --site-ip 172.30.53.79/20 \ --site-gw 172.30.48.1 \ --site-nic p1p2 \ --site-dns 172.30.84.40 \ --install-ncn-bond-members p1p1,p10p1 \ --application-node-config-yaml application_node_config.yaml \ --cabinets-yaml cabinets.yaml \ --hmn-mtn-cidr 10.104.0.0/17 \ --nmn-mtn-cidr 10.100.0.0/17 \ --bgp-peers aggregation
A new directory matching your
--system-name
argument will now exist in your working directory.After generating a configuration, a visual audit of the generated files for network data should be performed.
Run the command
csi config init --help
to get more information about the parameters mentioned in the example command above and others which are available.Notes about parameters to
csi config init
:- The
application_node_config.yaml
file is optional, but if you have one describing the mapping between prefixes inhmn_connections.csv
that should be mapped to HSM subroles, you need to include a command line option to have it used. See Create Application Node YAML. - The
bootstrap-ncn-bmc-user
andbootstrap-ncn-bmc-pass
must match what is used for the BMC account and its password for the management NCNs. - Set site parameters (
site-domain
,site-ip
,site-gw
,site-nic
,site-dns
) for the information which connectsncn-m001
(the PIT node) to the site. Thesite-nic
is the interface on this node connected to the site. - There are other interfaces possible, but the
install-ncn-bond-members
are typically:p1p1,p10p1
for HPE nodesp1p1,p1p2
for Gigabyte nodesp801p1,p801p2
for Intel nodes
- If you are not using a
cabinets-yaml
file, set the three cabinet parameters (mountain-cabinets
,hill-cabinets
, andriver-cabinets
) to the number of each cabinet which are part of this system. - The starting cabinet number for each type of cabinet (for example,
starting-mountain-cabinet
) has a default that can be overridden. See thecsi config init --help
- For systems that use non-sequential cabinet ID numbers, use
cabinets-yaml
to include thecabinets.yaml
file. This file can include information about the starting ID for each cabinet type and number of cabinets which have separate command line options, but is a way to specify explicitly the id of every cabinet in the system. If you are using acabinets-yaml
file, flags specified on thecsi
command-line related to cabinets will be ignored. See Create Cabinets YAML. - An override to default cabinet IPv4 subnets can be made with the
hmn-mtn-cidr
andnmn-mtn-cidr
parameters. - By default, spine switches are used as MetalLB peers. Use
--bgp-peers aggregation
to use aggregation switches instead. - Several parameters (
can-gateway
,can-cidr
,can-static-pool
,can-dynamic-pool
) describe the CAN (Customer Access network). Thecan-gateway
is the common gateway IP address used for both spine switches and commonly referred to as the Virtual IP address for the CAN. Thecan-cidr
is the IP subnet for the CAN assigned to this system. Thecan-static-pool
andcan-dynamic-pool
are the MetalLB address static and dynamic pools for the CAN. Thecan-external-dns
is the static IP address assigned to the DNS instance running in the cluster to which requests the cluster subdomain will be forwarded. Thecan-external-dns
IP address must be within thecan-static-pool
range. - Set
ntp-pool
to a reachable NTP server
These warnings from
csi config init
for issues inhmn_connections.json
can be ignored.-
The node with the external connection (
ncn-m001
) will have a warning similar to this because its BMC is connected to the site and not the HMN like the other management NCNs. It can be ignored."Couldn't find switch port for NCN: x3000c0s1b0"
-
An unexpected component may have this message. If this component is an application node with an unusual prefix, it should be added to the
application_node_config.yaml
file. Then reruncsi config init
. See the procedure to Create Application Node Config YAML{"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row": {"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}
-
If a cooling door is found in
hmn_connections.json
, there may be a message like the following. It can be safely ignored.{"level":"warn","ts":1612552159.2962296,"msg":"Cooling door found, but xname does not yet exist for cooling doors!","row": {"Source":"x3000door-Motiv","SourceRack":"x3000","SourceLocation":" ","DestinationRack":"x3000","DestinationLocation":"u36","DestinationPort":"j27"}}
- The
-
Continue with the next step to apply the csi-config workarounds.
-
-
CSI Workarounds
Follow the workaround instructions for the
csi-config
breakpoint. -
Copy the interface config files generated earlier by
csi config init
into/etc/sysconfig/network/
.pit# cp -pv /var/www/ephemeral/prep/${SYSTEM_NAME}/pit-files/* /etc/sysconfig/network/ pit# wicked ifreload all pit# systemctl restart wickedd-nanny && sleep 5
-
Check that IP addresses are set for each interface and investigate any failures.
-
Check IP addresses, do not run tests if these are missing and instead start triage.
pit# wicked show bond0 bond0.nmn0 bond0.hmn0 bond0.can0 bond0 up link: #7, state up, mtu 1500 type: bond, mode ieee802-3ad, hwaddr b8:59:9f:fe:49:d4 config: compat:suse:/etc/sysconfig/network/ifcfg-bond0 leases: ipv4 static granted addr: ipv4 10.1.1.2/16 [static] bond0.nmn0 up link: #8, state up, mtu 1500 type: vlan bond0[2], hwaddr b8:59:9f:fe:49:d4 config: compat:suse:/etc/sysconfig/network/ifcfg-bond0.nmn0 leases: ipv4 static granted addr: ipv4 10.252.1.4/17 [static] route: ipv4 10.92.100.0/24 via 10.252.0.1 proto boot bond0.can0 up link: #9, state up, mtu 1500 type: vlan bond0[7], hwaddr b8:59:9f:fe:49:d4 config: compat:suse:/etc/sysconfig/network/ifcfg-bond0.can0 leases: ipv4 static granted addr: ipv4 10.102.9.5/24 [static] bond0.hmn0 up link: #10, state up, mtu 1500 type: vlan bond0[4], hwaddr b8:59:9f:fe:49:d4 config: compat:suse:/etc/sysconfig/network/ifcfg-bond0.hmn0 leases: ipv4 static granted addr: ipv4 10.254.1.4/17 [static]
-
Run tests, inspect failures.
pit# csi pit validate --network
-
-
Copy the service config files generated earlier by
csi config init
for DNSMasq, Metal Basecamp (cloud-init), and Conman.-
Copy files (files only,
-r
is expressly not used).pit# cp -pv /var/www/ephemeral/prep/${SYSTEM_NAME}/dnsmasq.d/* /etc/dnsmasq.d/ pit# cp -pv /var/www/ephemeral/prep/${SYSTEM_NAME}/conman.conf /etc/conman.conf pit# cp -pv /var/www/ephemeral/prep/${SYSTEM_NAME}/basecamp/* /var/www/ephemeral/configs/
-
Enable, and fully restart all PIT services.
pit# systemctl enable basecamp nexus dnsmasq conman pit# systemctl stop basecamp nexus dnsmasq conman pit# systemctl start basecamp nexus dnsmasq conman
-
-
Start and configure NTP on the LiveCD for a fallback/recovery server.
pit# /root/bin/configure-ntp.sh
-
Check that our services are ready and investigate any test failures.
pit# csi pit validate --services
-
Mount a shim to match the Shasta-CFG steps' directory structure.
pit# mkdir -vp /mnt/pitdata pit# mount -v -L PITDATA /mnt/pitdata
-
The following procedure will set up customized CA certificates for deployment using Shasta-CFG.
- Prepare Site-Init to create and prepare the
site-init
directory for your system.
- Prepare Site-Init to create and prepare the
After completing this procedure the next step is to configure the management network switches.