Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config] Support for running config reload in background #1290

Open
wants to merge 129 commits into
base: master
Choose a base branch
from

Conversation

anand-kumar-subramanian
Copy link

Config reload enhancements

  • What I did
    Config reload enhancements

Support for running the config reload command in background mode. This is needed when we issue config reload in an SSH session and disconnect the SSH connection, it should continue the config reload without terminating it.
Exceptions are captured and logged.

  • How I did it

Added support for running the config reload command in background mode, so that even if the session timesout on an SSH session, the command will still run to completion.
Exceptions capture and logging on all the operations inside config reload

  • How to verify it
    config reload

  • Previous command output (if the output of a command-line utility has changed)

  • New command output (if the output of a command-line utility has changed)

Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments.

Also, just curious if this change is truly necessary. Have you tried simply running sudo config reload -y & to run in the background?

config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
@anand-kumar-subramanian
Copy link
Author

This is needed as we cannot ask the user the do that every time. Config reload command should automatically take care of it instead of asking the user to run the command in the background.

config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
config/main.py Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
@jleveque jleveque changed the title Config reload enhancements. Support for running the command in backgr… [config] Support for running config reload in background Dec 18, 2020
@lguohan
Copy link
Contributor

lguohan commented Dec 23, 2020

this is a behavioral change, can we put it as optional?

@ben-gale
Copy link
Collaborator

@lguohan, @jleveque - I discussed making this optional with the team. It could be done, but I question the merit of doing this. This change (PR) was done in response to a real issue in a real customer deployment. In that case they were leaving multiple switches in an unstable/indeterminate state on losing network connectivity after issuing a config reload command from a remote SSH session. Basically "config reload" was getting stopped after stopping a bunch of containers but before restarting them, and they were never restarting. A re-boot was required to get out of this situation. So, this is a real problem that needs a real solution. Users who are not seeing this problem wouldn't observe a change in behavior anyway. So, I would argue that making this optional is both undesirable (leaves a problem in place, adds complexity) and unnecessary. I think we should add this unconditionally,.

Thoughts?

@ben-gale
Copy link
Collaborator

@lguohan, @jleveque - your comments pls? Thx

@ben-gale
Copy link
Collaborator

ben-gale commented Jan 4, 2021

@lguohan, @jleveque - still looking for your comments pls.

anand-kumar-subramanian and others added 10 commits March 2, 2021 13:38
…on failures (sonic-net#1292)

To fix the issue sonic-net/sonic-buildimage#5972
warm-reboot with force flag ignores ASIC config checksum mismatch along with orchagent RESTARTCHECK failure.
This commit accounts for a use case when checksum-verification should be ignored but orchagent pause check should not be ignored.
The change is to add a new option in fast-reboot script to ignore ASIC checksum verification failures.
…#1295)

* Fix fast-reboot when NDP present
The IPv6 address delimiter ':' caused an exception

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>

* Add a comment
…ic-net#1275)

* [config/console][consutil] Support enable/disable console switch
* Changed the key to aligned with feature table style

Signed-off-by: Jing Kan jika@microsoft.com
Fixed a traceback seen when show sflow interface is executed.
Updated the script to follow Python3 conventions.

Signed-off-by: Garrick He <garrick_he@dell.com>
… ASIC platform. (sonic-net#1248)

* Update Db object to include multi ASIC db clients.
* Updated pfcwd CLI commands to use decorator to pass Db object.
Depends on sonic-net/sonic-swss#1453

- What I did
Added support to query and clear headroom pool watermark counters
Added unit test for the headroom pool watermark counters

- How I did it
Modified watermarkstat script to query/clear headroom pool watermark counters
Added show and clear commands

- How to verify it
Send traffic such that it treks into the headroom pool and check for the headroom pool usage using the show command below
Set polling interval to 30s and issue clear commands mentioned below and verify that counters are cleared

- New command output (if the output of a command-line utility has changed)
Show commands
admin@sonic:~$ show headroom-pool watermark 
Headroom pool maximum occupancy:
                 Pool    Bytes
---------------------  -------
ingress_lossless_pool    12480

admin@sonic:~$ show headroom-pool persistent-watermark 
Headroom pool maximum occupancy:
                 Pool    Bytes
---------------------  -------
ingress_lossless_pool    12480

Clear commands
admin@sonic:~$ sudo sonic-clear headroom-pool watermark 
admin@sonic:~$ sudo sonic-clear headroom-pool persistent-watermark 

Signed-off-by: Neetha John <nejo@microsoft.com>
)

Signed-off-by: Danny Allen <daall@microsoft.com>
Handling of "show ip/v6 route xxxx" for non-multi-asic platforms to be handled by FRR completely instead of using the common code added to support multi-asic platform.
…net#1299)

The current ‘show interfaces counters’ command does not show any information regarding the BCAST/MCAST counters per interface. Also, no data regarding the different packet sizes Rx/Tx counts.

Depends on sonic-net/sonic-swss#1536

Added a detailed option for 'show interface counters' to display all these information.
This is a per interface command like show below

root@sonic:/home/admin# show interfaces counters detailed -h
Usage: show interfaces counters detailed [OPTIONS] <interface_name>

Show interface counters detailed

Options:
-p, --period TEXT Display statistics over a specified period (in seconds)
--verbose Enable verbose output
-?, -h, --help Show this message and exit.
root@sonic:/home/admin#

Sample Output:
root@sonic:/home/admin# show interfaces counters detailed Ethernet11
Packets Received 64 Octets..................... 77
Packets Received 65-127 Octets................. 6
Packets Received 128-255 Octets................ 0
Packets Received 256-511 Octets................ 3
Packets Received 512-1023 Octets............... 0
Packets Received 1024-1518 Octets.............. 0
Packets Received 1519-2047 Octets.............. 0
Packets Received 2048-4095 Octets.............. 0
Packets Received 4096-9216 Octets.............. 0
Packets Received 9217-16383 Octets............. 0

Total Packets Received Without Errors.......... 86
Unicast Packets Received....................... 79
Multicast Packets Received..................... 6
Broadcast Packets Received..................... 1

Jabbers Received............................... 0
Fragments Received............................. 0
Undersize Received............................. 0
Overruns Received.............................. 0

Packets Transmitted 64 Octets.................. 77
Packets Transmitted 65-127 Octets.............. 0
Packets Transmitted 128-255 Octets............. 0
Packets Transmitted 256-511 Octets............. 3,677
Packets Transmitted 512-1023 Octets............ 0
Packets Transmitted 1024-1518 Octets........... 0
Packets Transmitted 1519-2047 Octets........... 0
Packets Transmitted 2048-4095 Octets........... 0
Packets Transmitted 4096-9216 Octets........... 0
Packets Transmitted 9217-16383 Octets.......... 0

Total Packets Transmitted Successfully......... 3,754
Unicast Packets Transmitted.................... 80
Multicast Packets Transmitted.................. 3,674
Broadcast Packets Transmitted.................. 0
Time Since Counters Last Cleared............... None
root@sonic:/home/admin#


Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
jleveque and others added 26 commits March 2, 2021 14:13
…et#1446)

Now that sonic-net#1427 has merged, we should be able to successfully run all unit tests in a non-privileged slave container.
…mmands (sonic-net#1426)

- What I did
  Added CLI reference for buffer-pool watermark|persistent-watermark commands

- How I did it
Updated the Command-Reference.md file for following:
1. Added buffer_pool in "show" help
2. show description for buffer_pool watermark
3. show descritpion buffer_pool persistent-watermark
The DB backup during warmboot has started failing recently after the changes made to deprecate the usage of SonicDBConfig methods originally implemented by python in https://github.com/Azure/sonic-py-swsssdk/. The new implementation is based on hiredis C++ library.

How I did it: The centralize_database script still uses the Python APIs instead of C++, update the method names which are now defined in sonic-swss-common.
With the new changes, the warm-boot goes ahead without DB save errors.
As per latest update in DPB DOC, fixed this bug

previously we had string value in "breakout_modes" key so it was not matching the whole string, But after the update via, now "breakout_modes" contain a dictionary where key is the breakout_mode and value is the alias. So we can easily check whether the key is present or not.

Signed-off-by: Sangita Maity <samaity@linkedin.com>
Co-authored-by: Guohan Lu <lguohan@gmail.com>
…net#1444)

- Refactor the way swsscommon is used in decode-syseeprom to align with more modern approach
- Add unit tests for DB-related functionality of decode-syseeprom utility
- Align whitespace in tests/mock_tables/state_db.json
…o this VLAN. (sonic-net#1420)

- What I did
In vlan.py , function del_vlan added validation if there is no members assigned to this VLAN. If there are members - the execution of this command is failed with error on the screen.

- How I did it
Added validation of the VLAN_MEMBER_CFG_TABLE with the key VLAN_ID. If there are entries with this VLAN_ID - print error:
VLAN ID {} can not be removed. First remove all members assigned to this VLAN.
User should remove all members assigned to this VLAN, and after that he can delete vlan.

- How to verify it
sudo config vlan add 200
sudo config vlan member add 200 Ethernet0
sudo config vlan del 200
In this case - you will see error as there are members assigned to this VLAN

sudo config vlan member del 200 Ethernet0
sudo config vlan del 200
Will pass.

Signed-off-by: allas <allas@nvidia.com>
The key used is as read from APPL-DB, hence use it as such to get the value,
to help get the interfce name, so it can be used for filtering out.
- What I did
Support new Mellanox system SN4600

- How I did it
Add the system name to the buffer migration script even if the system is first added
…1335)

- What I did
Added support for tech support extension scripts.

- How I did it
It looks at /usr/bin/debug-dump for scripts, if it finds one it will execute them and save the output to dump/ folder.

- How to verify it
Write a simple scripts that outputs something and place it under /usr/bin/debug-dump/

Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
Summary:
This PR provides the support for adding CLI commands for retrieving cable vendor name and part number information of the muxcable.
In particular these Cli commands are supported:
show muxcable cableinfo <port>

Approach
added the changes in sonic-utilities/show and sonic-utilities/config by changing the muxcable.py

What is the motivation for this PR?
To add the support for Cli for muxcable to be utilized for retrieving cable vendor name and part number information of the muxcable.

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
sonic-net#1416)

- Enhance `psushow -s` output, adding columns to display PSU Model, Serial, Voltage, Current and Power
- Add `-j/--json` option to display output in JSON format
- Update mock State DB PSU_INFO table to provide values which we can ensure floats are padded to two decimal places in tabular display mode
- Add `--json` option to `show platform psustatus` which in turn adds the `-j` flag to the underlying `psushow` call
- Add unit tests for `psushow` in psushow_test.py
- Add more unit tests for `show platform psustatus` and move to show_platform_test.py
#### What I did
The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100.

#### How I did it
In reboot script, if platform-specific reboot cause update script exists, run it
- What I did
To remove the list of hardcoded order-dependent lists of services to stop/restart/reset-failed.

- How I did it
Used sonic.target to stop/restart/reset-failed.

- How to verify it
Execute config reload and observe the services do restart.

Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
What I did
Add a new reboot named as soft-reboot which can be performed by "kexec -e"

How I did it
Replace the platform reboot with "kexec -e" for the cold reboot case.

How to verify it
Verified the reboot on DUT and check the reboot-cause
@lgtm-com
Copy link

lgtm-com bot commented Mar 2, 2021

This pull request introduces 1 alert when merging bf91bc3 into e32b5ac - view on LGTM.com

new alerts:

  • 1 for Use of the return value of a procedure

@FuzailBrcm
Copy link
Contributor

retest this please

@ben-gale
Copy link
Collaborator

ben-gale commented Mar 5, 2021

@lguohan, @jleveque - PR has been updated to add the "-- background" option as discussed - pls review.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.