Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SmartSwitch] Extend reboot script for rebooting SmartSwitch #3566

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

vvolam
Copy link

@vvolam vvolam commented Oct 3, 2024

What I did

Extended reboot script for SmartSwitch cases to reboot entire SmartSwitch or a specific DPU

How I did it

Implemented changes according to https://github.com/sonic-net/SONiC/blob/605c3a56ac2717dbbb638433e7bb13054fc05a31/doc/smart-switch/reboot/reboot-hld.md

How to verify it

  • Verified the script on non-smart switch and didn't find any regressions. Also, script throws errors if any new smart switch related parameters are given by user.
  • Partially verified on smart switch. Waiting for image with gnmi container running to complete testing.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

@vvolam vvolam force-pushed the ss-reboot branch 2 times, most recently from 460146c to c72fbc0 Compare October 3, 2024 19:36
@vvolam vvolam force-pushed the ss-reboot branch 3 times, most recently from 8746356 to d6fc624 Compare October 23, 2024 20:33
scripts/reboot Outdated Show resolved Hide resolved
scripts/reboot Outdated Show resolved Hide resolved
scripts/reboot Outdated Show resolved Hide resolved
scripts/reboot Outdated Show resolved Hide resolved
scripts/reboot Outdated Show resolved Hide resolved
scripts/reboot Outdated Show resolved Hide resolved
scripts/reboot Outdated Show resolved Hide resolved
scripts/reboot Outdated Show resolved Hide resolved
scripts/reboot Outdated Show resolved Hide resolved
@vvolam vvolam changed the title Extend reboot script for rebooting SmartSwitch [SmartSwitch] Extend reboot script for rebooting SmartSwitch Nov 4, 2024
@vvolam vvolam marked this pull request as ready for review November 4, 2024 19:39
scripts/reboot_helper.py Outdated Show resolved Hide resolved
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@KrisNey-MSFT
Copy link

hi @gpunathilell - may I ask the next steps required for this one please?

@gpunathilell
Copy link
Contributor

hi @gpunathilell - may I ask the next steps required for this one please?

HI @KrisNey-MSFT Testing is not done yet, as we have to integrate both PMON changes (some PRs are still pending merge and undergoing changes) and the Reboot HLD related changes (some PRs have merge conflicts with the PRs related to PMON changes)so takes time to create a test image and do some testing

local dpu_ip=$1
local port=$2
local reboot_status
reboot_status=$(docker exec -i gnmi gnoi_client -target "${dpu_ip}:${port}" -logtostderr -insecure -notls -rpc RebootStatus 2>/dev/null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the changes regarding -notls, the communication is fixed, but the command shows a different error
This command yields the following output:

panic: rpc error: code = Unknown desc = systemd: Invalid reboot method: 3

goroutine 1 [running]:
main.systemReboot({0xb2a918, 0xc0001e1da0}, {0xb273d8, 0xc000113ac0})```

@KrisNey-MSFT
Copy link

KrisNey-MSFT commented Jan 27, 2025 via email

utilities_common/module.py Show resolved Hide resolved
utilities_common/module.py Show resolved Hide resolved
utilities_common/module.py Show resolved Hide resolved
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gpunathilell
Copy link
Contributor

2025-01-28 21:11:44 - Error: Failed to send reboot status command to DPU dpu1
/usr/local/bin/reboot_smartswitch_helper: line 126: [: null: integer expression expected

Error seen with latest changes

@gpunathilell
Copy link
Contributor

Also there are two issues which need to be addressed, the provisioning for the DPU and the switch GNMI container configuration (as these configurations are required to make sure that the gnoi_client command can be executed from the switch)
GNMI command execution fails with EOF error (due to GNMI container being shut down during pre shutdown) This issue has to be handled as per HLD (make sure GNMI container is not shutdown on DPU)

@KrisNey-MSFT
Copy link

Dependent upon PMON via Cisco, and a few Issues filed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants