Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZTP High Level Design #281

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

ZTP High Level Design #281

wants to merge 13 commits into from

Conversation

simone-dell
Copy link

Introducing ZTP HLD for SONiC

10. ZTP service will enable updategraph
11. ZTP service will call for reboot to apply new SONIC image, with option ZTP-enabled –post_install
12. Upon reboot, if ZTP service flag $post_install is true, ZTP hands off to updategraph
13. The configuration script is received and applied by updategraph, returns to ZTP service

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updategraph seems very minigraph like in name, what is the config file being downloaded? a shell script? a minigraph.xml file? a config.json file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically updategraph was used to deploy minigraph.xml files but moving forward SONiC configuration will be done using a config.json file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to consider loading a shell script and executing it rather than a config.json? Reason is the following: with FRR the config no longer lives in config.json but in a separate frr.conf file, same with snmp config. Having a bash script allows the operator to have more control over the ZTP process and be creative rather than being forced into a static model. The shell script could replace j2 templates for eg. perform certain daemon reloads etc.
I do understand that with the config checking and reporting that happens later in the process a minimal config can be loaded then an external agent could come in and finish the full device commissioning .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your input Michel. The tentative agenda for the next community meeting is on ZTP, we can bring up this point then and see what the community says

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on what Michel said.

- ZTP service flag $post_config is false
3. The switch now enters ZTP mode and does the following,
- Obtains an IP address from the DHCP server
- Obtains the URL for the SONIC image, provisioning script, and post config validation script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define "provisioning script", "post config validation script", unclear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the doc with description of these items.

3. The switch now enters ZTP mode and does the following,
- Obtains an IP address from the DHCP server
- Obtains the URL for the SONIC image, provisioning script, and post config validation script
4. At this step, the switch will have information to reach the HTTP / TFTP server information by DHCP option 66 or Option 240.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is option 66 or option 240. What information are you referring to. please make it clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are private options for DHCP

- Obtains an IP address from the DHCP server
- Obtains the URL for the SONIC image, provisioning script, and post config validation script
4. At this step, the switch will have information to reach the HTTP / TFTP server information by DHCP option 66 or Option 240.
5. ZTP service will relay to updategraph service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define relay, what do you mean by relay.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the wording in the new doc

doc/ztp/ztp_hld.md Outdated Show resolved Hide resolved
doc/ztp/ztp_hld.md Outdated Show resolved Hide resolved
- Updategraph handles all configuration during ZTP service
- ZTP should receive post config validation script, that collects switch info and sends it back to a remote location for processing
- ZTP status may be logged through syslog
- Interruption of ZTP service should disable the service gracefully
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clarify this requirement

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified in the document. If ZTP service is interrupted at any stage, status of ZTP should be "aborted" with explanation of what point ZTP was stopped at.

doc/ztp/ztp_hld.md Outdated Show resolved Hide resolved
- Set $enable to false
- Verify no errors were thrown during the service
- Output to syslog server and var/log/syslog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clearly define the dhcp option ztp is going to send to server, and dhcp option ztp is going to receive from server and use.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated within document changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guohan, I've added a table in the document that specifies the private DHCP options given/received

doc/ztp/ztp_hld.md Outdated Show resolved Hide resolved
Need to allow build time option to compile with ZTP enabled

# Error Cases
- Case: Switch receives invalid URLs from DHCP server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which url? please clarify.

doc/ztp/ztp_hld.md Outdated Show resolved Hide resolved
@madhupaluru
Copy link

Simone, how does ZTP process can handle situations like devices are continuously reloading after ZTP workflow?


# Requirements
- Build option to have ZTP enabled
- SNMP should be enabled during the ZTP process (achieved through leveraging updategraph)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why SNMP is important to this process - it's not clear to me where it's used here.

@@ -0,0 +1,166 @@
# Zero Touch Provisioning (ZTP)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think many user ZTP flows will start from ONIE, not from a pre-installed SONiC image. I think this should be discussed a little. For instance, in an ONIE flow, the run-time image is usually installed by the ONIE installer, and not by the SONiC ZTP. So, in this flow, I think a ztp_image_url should be treated as optional, and lack of a valid one should not be treated as an error.

|:-----------:|:-------------------|:-------------------------------------------------|
| 224 | snmp_community | usage implemented through updategraph |
| 225 | minigraph_url | usage implemented through updategraph |
| 226 | acl_url | usage implemented through updategraph |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the snmp_community and acl_url broken out as separate options? Could they not be done as part of the minigraph_url delivered config?

| 225 | minigraph_url | usage implemented through updategraph |
| 226 | acl_url | usage implemented through updategraph |
| 227 | ztp_image_url | URL for the SONiC image for the switch to dowload |
| 228 | validation_url | URL for the validation script -- a script that runs post config and collects device info and sends it for processing to a remote location designated within the validation script |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the name of this to "post_install_url" - there is no need to limit this to validation - it could do anything that the user chooses.

4. At this step, the switch will have information to reach the HTTP / TFTP server through URLs given by DHCP server.
5. ZTP service will enable and start updategraph service
- Within updategraph, ztp_enabled is true, and post_install is false
- This triggers updategraph to configure the device with an default config

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain where the default config comes from? Where does updategraph get this from, and how is this put into the build?

9. ZTP service will install image using sonic-installer
- ZTP service flag $post_install is true
10. ZTP service will enable updategraph
11. ZTP service will call for reboot to apply new SONIC image, with option ZTP-enabled –post_install

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this re-boot? It doesn't seem necessary, especially in cases where the image install happens in ONIE.

# ZTP CLI
The ZTP CLI will allow the user to manually start, stop, and obtain status of the ZTP service.

Will show the user whether ZTD is enabled, the date of the last execution of the ZTP script, and the completion status of the ZTP service

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - ZTD

The ZTP status will show the user the current SONiC image, if the configuration was successful, and if the post-configuration script was run, as well as the time of the last successful completion of the ZTP service

config ztp enable
The user can re-enable ZTP after a failed state using the CLI. This puts the switch into “factory setting” and reloads with ZTP enabled, allowing for provisioning restart.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit (so feel free to ignore!), but is this usage really limited to failure cases? Are there not cases where the user wants to re-run ZTP for other reasons (e.g. re-do a configuration)?

- ZTP is enabled and post_install is true, apply configurations using updategraph
- Run post_validation script
- Exit ZTP
- Test to see if Ansible server is reachable for further configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? Seems like it's outside the scope of ZTP (already exited).

### Build Option
Need to allow build time option to compile with ZTP enabled

# Error Cases

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add some cases for network events, such as: -

  • DHCP server does not respond
  • URL targets not reachable, or lost connection during load
  • etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants