Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

202012 #8075

Open
wants to merge 625 commits into
base: master
Choose a base branch
from
Open

202012 #8075

wants to merge 625 commits into from

Conversation

liujian2014
Copy link

@liujian2014 liujian2014 commented Jul 6, 2021

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

renukamanavalan and others added 30 commits April 21, 2021 14:00
1) Dropped non-required IP update in admin.conf, as all masters use VIP only (#7288)
2) Don't clear VERSION during stop, as it would overwrite new version pending to go.
3) subprocess, get return value from proc and do not imply with presence of data in stderr.
…de (#7309)

#### Why I did it

- xcvrd crash was seen in latest 201811 images.
- For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode.

#### How I did it

- Modified get_transceiver_change_event in 1.0 to poll mode.
dell_ich driver was removed as part of #7309 and it is needed for watchdog tickle in S6100 platform.
Set hierarchical ecmp level to 2 instead of 3. Based on CS00011833367, ecmp level must be set to 2.
This is already handled for TH2 platforms. Change is required only for TD3

Co-authored-by: Ubuntu <prsunny@prince-vm.vzw1i4tqyeburcdz5lrgulxi2c.yx.internal.cloudapp.net>
New features and fixes in the new SDK/FW:

SN4600C | AN/LT support
SN2700 | AN/LT bugs fixes
WJH | FID_MISS support

Signed-off-by: Kebo Liu <kebol@nvidia.com>
Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
This is the SAI 4.3.3.5 code drop from BRCM to address 2 CSP case and initial MMU changes
Note the MMU changes is the same as that of SAI 4.3.3.4-1 (#7341) but with official patch.

- Case CS00012178716 [4.3] Polling SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES fails often on TH3
- Case CS00012159273 [4.3.3.3] [TD3][IPFWD] subnet broadcast flooding end with extra VLAN Tag if member port of VLAN interface deleted then added back to VLAN

Once we have this PR merged, will validate each one of the above to ensure they are indeed fixed...

Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on TD3 DUT and all passed:

     ipfwd/test_dir_bcast.py
     fib/test_fib.py
     vxlan/test_vxlan_decap.py 
     decap/test_decap.py
     fdb/test_fdb.py
changed from python3 to python in supervisord.conf.
To fix innovium platform compilation and fix missing -lpython3.5 module
adf5ab58 [vstest/subintf] Add vs test case to validate processing sequence of APPL DB keys (#1663)
8a732726 [intfsorch] Create subport with the entry contains necessary attributes (#1650)
7ba813b2 [vstest/subintf] Update vs tests to validate physical port host interface vlan tag attribute (#1634)
ed32e333 [portsorch] Configure hostif tagging for subports (#1573)
b5209c43 Handle IPv6 and ECMP routes to be programmed to ASIC (#1711)
515cc1a7 [Dynamic buffer calc][Mellanox] Fix bug: buffer over subscription in buffer pool size calculation (#1706)
0ad524b2 [202012] Allowing the first time FEC and AN configuration to be pushed to SAI (#1710)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Shilong Liu <shilongliu@microsoft.com>
…7399)

Why I did it
Fix the boolean value case sensitive issue in Azure Pipelines

When passing parameters to a template, the "true" or "false" will have case sensitive issue, it should be a type casting issue.
To fix it, we change the true/false to yes/no, to escape the trap.

Support to override the job groups in the template, so PR build has chance to use different build parameters, only build simple targets. For example, for broadcom, we only build target/sonic-broadcom.bin, the other images, such as swi, debug bin, etc, will not be built.
Why I did it
Improve the version of the Pull Request build by changing the local branch name.

How I did it
Change the default branch name merge to [target_branch_name]-[pullrequestid].

How to verify it
For official build, the version is not changed.
For pull request build, the version as below:
When submitting a new official build for broadcom, vs, it prompts a error message, which says the job is not defined.
It was caused by the default option "[]", which is not empty, it is used as the jobGroups parameter.
Improve the SONiC version, fix the "azure pipeline build id" part

<target branch name>-<pullrequest id>.<azure pipelines build id>-<merge commit id>
Example: master-7381.11668-43df5c87
- Why I did it
Changes in the new release:
1. Fix 10G and 50G speeds in SAI XML to support all interface types
2. Enable SMAC=DMAC and SMAC MC in tunnel debug counter
3. Add tunnel statistics
4. Add isolation group API implementation
5. Fix ACL ANY debug counter to correctly track ACL drops
6. Add VXLAN source port hard coded range, controlled by K/V
7. FW dump me now feature
8. Add mlxtrace to saidump
9. Speed lane setting and AN control
10. Implement query stats API
11. VNI miss part of tunnel decal drop reason

- How I did it
Update the version number in SAI make file, update the mlnx-sai submodule pointer.

- How to verify it
Run full regression tests on Mellanox platforms

Signed-off-by: Dror Prital <drorp@nvidia.com>
… using pip3. (#7441)

[sonic-slave-stretch]: upgrade pyang version to 2.4.0.

Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
#7453)

Why I did it
Add the platform filter to ignore some platforms not ready to use
The platform centos-arm64 and the platform marvell-armhf are not ready to use now. We will add it when it is available.
PR# 7249 introduced a new bit of logic _after_ the point where the qemu based
build environment for ARM is removed. Hence the new logic fails when building
for ARM. Builds for AMD64 were not affected.

This commit moves the new logic introduced by PR# 7249 to just _before_ the
point where the qemu based build environment for ARM is removed. A comment is
added to reduce the likelihood of this sort of ARM build break from happening
again.
…s folder change (#7464)

Fix the generating version file failure issue caused by artifacts folder change.
When changing to use the same template for PR build, official build and packages version upgrade, the artifacts folder adding a "target" folder, the version upgrade task should be changed accordingly.
platform files for the new SKU D28C49S1
Why I did it
Support readonly version of the command vtysh

How I did it
Check if the command starting with "show", and verify only contains single command in script.
Fix the labeler workflow permission issue when merging from fork repo.
It impacts the labeler workflow to support auto-merge for package versions upgrade on 202012 branch. The current workaround is to add the label "automerge" on the PR sent by mssonicbld, then the automerge workflow will merge the PR.
- Why I did it
Upgrade hw-mgmt to 7.0100.2303

Bug fixes

1. Fan direction feature fix for fixed FAN system (using shell instead of binutils/strings)
2. Remove cpld 4th link on systems with only 3 CPLD's
3. hw-mgmt: thermal: Add hardcoded critical trip point. Follow-up after patch "Removing critical thermal zones to prevent unexpected software system shutdown".
4. Fix sensor attribute mapping to be label based instead of index based to allow common handling of voltage regulator names independently of hardware changes.
5. Update 'lm-sensors' custom configuration file. Relevant only for users utilizing sensors.conf files coming along with hw-management package.
6. For full feature list please follow https://github.com/Mellanox/hw-mgmt/blob/V.7.0010.2300_BR/debian/Release.txt

- How I did it
Update hw-mgmt pointer
Remove unused patches
Fix existing patch to make sure it apply successfully

- How to verify it
Full platform regression on all mellanox platforms
…7446)

The following error message is observed during chassis object being destroyed

"Exception ignored in: <function Chassis.__del__ at 0x7fd22165cd08>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform/chassis.py", line 83, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
The chassis tries to import deinitialize_sdk_handle during being destroyed for the purpose of releasing the sdk_handle.
However, importing another module during shutting down can cause the error because some of the fundamental infrastructures are no longer available."

This error occurs when a chassis object is created and then destroyed in the Python shell.

- How I did it
To fix it, record the deinitialize_sdk_handle in the chassis object when sdk_handle is being initialized and call the deinitialize handler when the chassis object is being destroyed

- How to verify it
Manually test.
Enable VXLAN src port range configuration via SAI profile
stephenxs and others added 15 commits June 29, 2021 22:49
Advance submodule head for sonic-swss on 202012

bb383be2 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781)
f949dfe9 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776)
def0a914 Fix config prompt question issue (sonic-net/sonic-swss#1799)
21f97506 [ci]: Merge azure pipelines from master to 202012 branch (sonic-net/sonic-swss#1764)
a83a2a42 [vstest]: add dvs_route fixture
849bdf9c [Mux] Add support for mux metrics to State DB (sonic-net/sonic-swss#1757)
386de717 [qosorch] Dot1p map list initialization fix (sonic-net/sonic-swss#1746)
f99abdca [sub intf] Port object reference count update (sonic-net/sonic-swss#1712)
4a00042d [vstest/nhg]: use dvs_route fixture to make test_nhg more robust

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>
Why I did it
Platform pcie configuration file doesn't exist for x86_64-arista_7170_64c

How I did it
Generate pcie.yml

How to verify it
Started pcie daemon (pcied RUNNING pid 63, uptime 0:00:19)
Why I did it
Multiple build failed in 202012 branch
It is caused by the disorder of the package urls retrieved from the command "apt-get download --print-urls "
Why I did it
Disable the nephos docker syncd rpc image temporarily
Update submodule for sonic-utilities to include the following PR:
[202012] [pfcwd] Fix the return code in invalid case (#1698)

Signed-off-by: Dror Prital <drorp@nvidia.com>
removed the file hwsku.json from the Mellanox-4600C-C64
Co-authored-by: Madhan Babu <madhan@l-csi-0241l.mtl.labs.mlnx>
[flex-counters] [202012] Delay flex counters stats init for faster boot time (sonic-net/sonic-swss#1804)
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
- Changes and new features:

1. Added support in SN4600C systems for new module Finisar ET7402-CWDM4 (100G CWDM4 QSFP28 1310nm SM 2KM).
2. Added support for new module MMS1W50-HM (2km transceiver FR4) for 200GbE
3. Improved performance of "per-port-buffer" counters
4. Added support for Kernel 5.10

- Bug fix:
On rare occasions (0.5%), in SN4600C systems, when using 100GbE NRZ mode and Fastboot flow, the link up time may take up to 10 seconds

Signed-off-by: Dror Prital <drorp@nvidia.com>
@ghost
Copy link

ghost commented Jul 6, 2021

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

❌ liujian2014 sign now
You have signed the CLA already but the status is still pending? Let us recheck it.

@qiluo-msft
Copy link
Collaborator

  1. This PR diff is so huge, could you double check?
  2. Please use a descriptive PR title
  3. Please provide description in the PR comment
  4. There are lots of conflicts

VenkatCisco and others added 9 commits July 7, 2021 09:40
#### Why I did it
The libpci library provides portable access to configuration registers of devices connected to the PCI bus.

#### How I did it
update dockers/docker-platform-monitor/Dockerfile.j2
#### Why I did it
ethtool can be used to query and change settings such as speed, auto- negotiation and checksum offload on many network devices, especially Ethernet devices. 

#### How I did it
add package extension to docker-platform-monitor/Dockerfile.j2
dash doesn't support += operation to append to a variable's value. Use KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} " instead

The below error message is seen when a reboot is issued.

[ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found
Update FW version to 2008.3218, fixing the following issues:
- 50G/100G links that are operationally down before warm-reboot are not coming up after warm-reboot
- 50G/100G links with admin shut / no shut commands are not coming up after warm-reboot

Signed-off-by: Dror Prital <drorp@nvidia.com>
* Added new SKU for SN4600C Platform: Mellanox-SN4600C-D48C40
Co-authored-by: Vivek Reddy Karri <vkarri@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.