Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZTP High Level Design #281

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
135 changes: 135 additions & 0 deletions doc/ztp/ztp_hld.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
gh Level Design
# Requirements
- Build option to have ZTP enabled
- SNMP should be available through the ZTP service
simone-dell marked this conversation as resolved.
Show resolved Hide resolved
- ZTP should verify, download and install SONiC image
- Updategraph handles all configuration during ZTP service
- ZTP should receive post config validation script, that collects switch info and sends it back to a remote location for processing
- ZTP status may be logged through syslog
simone-dell marked this conversation as resolved.
Show resolved Hide resolved
- Interruption of ZTP service should disable the service gracefully
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clarify this requirement

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in the document. If ZTP service is interrupted at any stage, status of ZTP should be "aborted" with explanation of what point ZTP was stopped at.


# Flow
For ZTP using DHCP, provisioning initially takes place over the management network and is initiated through a DHCP hook. A DHCP option is used to specify a configuration script. This script is then requested from the Web server and executed locally on the switch.
1. Simplify installation of switch, the steps involved will be
- Rack and Stack
- Connect
- Power-on
2. Onie Boots into SONiC (with ZTP enabled) from flash
- Power-on ZTP service will take over to load the initial configuration.
- ZTP service flag $enabled is true
- ZTP service flag $post_install is false
- ZTP service flag $post_config is false
3. The switch now enters ZTP mode and does the following,
- Obtains an IP address from the DHCP server
- Obtains the URL for the SONIC image, provisioning script, and post config validation script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define "provisioning script", "post config validation script", unclear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the doc with description of these items.

4. At this step, the switch will have information to reach the HTTP / TFTP server information by DHCP option 66 or Option 240.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is option 66 or option 240. What information are you referring to. please make it clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are private options for DHCP

5. ZTP service will relay to updategraph service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define relay, what do you mean by relay.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the wording in the new doc

- Within updategraph, ztp_enabled is true, and post_install is false
- This triggers updategraph to configure the device with an default config

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain where the default config comes from? Where does updategraph get this from, and how is this put into the build?

- Upon exiting updategraph, SNMP will start

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, I'm not really seeing why SNMP is required here.

6. ZTP service will now download the software image file.
7. ZTP service will validate image to be a SONIC image compatible with the current platform.
simone-dell marked this conversation as resolved.
Show resolved Hide resolved
8. ZTP service will check remaining hard disk for space before doing image install.
simone-dell marked this conversation as resolved.
Show resolved Hide resolved
9. ZTP service will install image using sonic-installer
- ZTP service flag $post_install is true
10. ZTP service will enable updategraph
11. ZTP service will call for reboot to apply new SONIC image, with option ZTP-enabled –post_install

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this re-boot? It doesn't seem necessary, especially in cases where the image install happens in ONIE.

12. Upon reboot, if ZTP service flag $post_install is true, ZTP hands off to updategraph
13. The configuration script is received and applied by updategraph, returns to ZTP service

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updategraph seems very minigraph like in name, what is the config file being downloaded? a shell script? a minigraph.xml file? a config.json file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically updategraph was used to deploy minigraph.xml files but moving forward SONiC configuration will be done using a config.json file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to consider loading a shell script and executing it rather than a config.json? Reason is the following: with FRR the config no longer lives in config.json but in a separate frr.conf file, same with snmp config. Having a bash script allows the operator to have more control over the ZTP process and be creative rather than being forced into a static model. The shell script could replace j2 templates for eg. perform certain daemon reloads etc.
I do understand that with the config checking and reporting that happens later in the process a minimal config can be loaded then an external agent could come in and finish the full device commissioning .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your input Michel. The tentative agenda for the next community meeting is on ZTP, we can bring up this point then and see what the community says

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on what Michel said.

ZTP service flag $post_config is true
14. Validation:
- SONIC device will download post config validation script through ZTP enabled DHCP option
- The script is customized based on user preference
- The script collects local info and sends it back to server using information in the script itself
15. Delete old SONIC image upon successful reimage/ ZTP status check:
- Check if $post_install is true
- Check if $post_config is true
- Set $enable to false
- Verify no errors were thrown during the service
- Output to syslog server and var/log/syslog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clearly define the dhcp option ztp is going to send to server, and dhcp option ztp is going to receive from server and use.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated within document changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guohan, I've added a table in the document that specifies the private DHCP options given/received

### Updategraph
Needs to be updated to provide default config to the switch when ztp_enabled is true
Needs to continue as currently designed when ztp_enabled is true and post_install is true
Can be used to download config file and apply config

### Interruption of ZTP
Through command line utility “ZTP Disable”
Sets ZTP $enabled flag to FALSE
If ZTP is interrupted mid-service, output is shown
If ZTP service receives “NA_ZTP” flag, continue updategraph with default config
ZTP service needs to have separate option for interruption than updategraph
If ZTP service is interrupted, updategraph should still be enabled and allowed to continue

### Syslog
ZTP status should be logged to syslog server and /var/log/syslog

### Build Option
Need to allow build time option to compile with ZTP enabled

# Error Cases

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add some cases for network events, such as: -

  • DHCP server does not respond
  • URL targets not reachable, or lost connection during load
  • etc

- Case: Switch receives invalid URLs from DHCP server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which url? please clarify.

- Switch should output error logs to syslog and apply default config. Kill the ZTP service, and allow other services to come up
- Case: Switch receives invalid OS image
- Switch should output error logs to syslog and apply default config. Kill the ZTP service, and allow other services to come up
- Case: Switch does not have enough memory to store the image
- Switch should output error logs to syslog and apply default config. Kill the ZTP service, and allow other services to come up
- Case: Switch receives invalid configuration from DHCP server
- Switch should output error logs to syslog, and gracefully exit the ZTP service, logging that image install completed successfully, and configuration script is invalid.
- Case: Post-validation script fails
- The success on the script is processed by a remote server. ZTP service is exited gracefully, logging with image install completed successfully.


# ZTP CLI
The ZTP CLI will allow the user to manually start, stop, and obtain status of the ZTP service.

Will show the user whether ZTD is enabled, the date of the last execution of the ZTP script, and the completion status of the ZTP service

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - ZTD


show ztp-status
The ZTP status will show the user the current SONiC image, if the configuration was successful, and if the post-configuration script was run, as well as the time of the last successful completion of the ZTP service

ztp reload
simone-dell marked this conversation as resolved.
Show resolved Hide resolved
The user can re-enable ZTP after a failed state using the CLI. This puts the switch into “factory setting” and reloads with ZTP enabled, allowing for provisioning restart.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit (so feel free to ignore!), but is this usage really limited to failure cases? Are there not cases where the user wants to re-run ZTP for other reasons (e.g. re-do a configuration)?


ztp cancel
The user can interrupt the ZTP service:
- If cancelled before image download, ZTP service will exit
- If cancelled after image download but before reboot, ZTP service will output sonic_installer list and exit
- If cancelled after reboot, ZTP service will not apply any configuration, and exit
- If cancelled after configuration, ZTP service will not continue with the validation script, and exit


# Action Items

## ZTP SERVICE
- The service needs to start immediately after boot
- If ZTP is not enabled, exit the service
- If ZTP is enabled and post_install is false, run updategraph to apply default configuration
- Acquire image and configuration script from HTTP/TFTP server
- Install the image using sonic-installer
- Reload to reimage the device
- ZTP is enabled and post_install is true, apply configurations using updategraph
- Run post_validation script
- Exit ZTP
- Test to see if Ansible server is reachable for further configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? Seems like it's outside the scope of ZTP (already exited).

## Updategraph
- If ZTP is enabled but post_install is false, apply default configs to the switch and exit
- If ZTP is enabled and post_install is true, continue to apply configs based on graph_url

# Phases
## Phase 1
- Implement image download and install
- Implement image and config validation
## Phase 2
- Post install script: downloaded after updategraph finishes
- Dependent on user
- Implement validation of management port
- Implement validation of neighbors
- Implement validation of config based on serial number
## Phase 3
- Command line utility
- ZTP interrupt process
- ZTP test planif [ -n "$new_acl_url" ]; then
echo $new_acl_url > /tmp/dhcp_acl_url
fi