-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Independent DPU Upgrade HLD #1906
base: master
Are you sure you want to change the base?
Conversation
/azp run |
No pipelines are associated with this pull request. |
Hi @hdwhdw - are you looking for a Reviewer for this PR? |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
@KrisNey-MSFT yes please. Thank you :) |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
### 2. Scope | ||
|
||
This document describes the high-level design of the sequence to independently upgrade a SmartSwitch DPU with minimal impact to other DPUs and the NPU, through GNOI API. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly define dependencies and any ordering constraints to prevent unexpected failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Let me know if there are any other dependencies.
* 'System.SetPackage' | ||
* 'OS.Activate' | ||
* 'Containerz.Deploy' | ||
* Rollback: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a sentence for that. If Activate fail, we should SetPackage (install) the previous image and Activate again.
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
3. DPU and NPU image compatibility: The upgrade process assumes that the DPU and NPU images are compatible with each other. It is up to the client to ensure the compatibility of the images. | ||
4. Eliminating human intervention: The upgrade process may require human intervention to resolve issues that cannot be handled automatically, in particular, when both the upgrade process fails and the rollback process fails, the system may be left in an inconsistent state that requires manual intervention. | ||
|
||
### 6. Architecture Degn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Architecture Design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch.
* 'Containerz.Deploy' | ||
* Rollback: | ||
* Rollback the new SONiC image on the DPU. Client issues 'OS.Activate' with the old SONiC image. | ||
* Rollback the new offloaded container images on the NPU. Client issues 'Containerz.RemoveImage' with the old container images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why rollback new offloaded img will issue RemoveImage with old img?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to undo the step. RemoveImage is optional and mainly to save spaces.
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
#### Why I did it Supports system.SetPackage, see sonic-net/SONiC#1906 This PR also improve ease of use for client by not using (in this `System.SetPackage` specifically) the generic `-jsonin` flag where we pass a json string, in favor of a more readable and verifiable flags like `-package_filename` and `-package_url` . #### How I did it Client side code for system.SetPackage. #### How to verify it On physical switches: ``` admin@switch:~$ docker exec gnmi gnoi_client -target 127.0.0.1:50052 -logtostderr -insecure -module System -rpc SetPackage System SetPackage Error validating flags: missing -package_filename admin@switch:~$ docker exec gnmi gnoi_client -target 127.0.0.1:50052 -logtostderr -insecure -module System -rpc SetPackage -package_filename=filename System SetPackage Error validating flags: missing -package_version admin@switch:~$ docker exec gnmi gnoi_client -target 127.0.0.1:50052 -logtostderr -insecure -module System -rpc SetPackage -package_filename=filename -package_version=abc -package_activate=true -package_url=abc.com System SetPackage Error receiving response: rpc error: code = Unimplemented desc = ``` Server already have placeholder code so the unimplemented error is expected.
This PR contains a draft HLD for DPU Independent Upgrade.
What we did:
Supports Independent smartswitch DPU upgrade.
Why we did it:
Supports managing DPU SONiC version indepedently in smartswitch.
PRs and States: