Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chassis] Update lossy profile to restrict buffer usage in congestion state #20132

Merged
merged 5 commits into from
Sep 11, 2024

Conversation

vmittal-msft
Copy link
Contributor

@vmittal-msft vmittal-msft commented Sep 4, 2024

Why I did it

This change is to restrict lossy queue buffer usage in case of congestion state.

Work item tracking
  • Microsoft ADO (29315559):

How I did it

Updated alpha from 0 to -4 (400g) & -5 (100g) port speed. This configuration is applied on system port and will be using HWSKU port speed settings.

How to verify it

It is verified using sonic-mgmt tests and running ok.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@vmittal-msft vmittal-msft added the Chassis 🤖 Modular chassis support label Sep 4, 2024
@vmittal-msft
Copy link
Contributor Author

@saksarav-nokia @kenneth-arista please help review.

@gechiang
Copy link
Collaborator

gechiang commented Sep 5, 2024

@vmittal-msft , Is this also applicable for 202205 chassis? if so, please add the label "chassis for 202205 branch" Also update your MSFT ADO with proper branch request tags.

@vmittal-msft vmittal-msft changed the title [Chassis] Update lossy profile to restrict buffer usage in extreme congestion state [Chassis] Update lossy profile to restrict buffer usage in congestion state Sep 6, 2024
@vmittal-msft
Copy link
Contributor Author

@BYGX-wcr

@BYGX-wcr
Copy link
Contributor

BYGX-wcr commented Sep 7, 2024

Hi @judyjoseph , Can you please review this? I have no experience in RDMA buffer tuning.

@vmittal-msft vmittal-msft self-assigned this Sep 7, 2024
@vmittal-msft vmittal-msft added Chassis for 202205 branch PRs needed for 202205 branch in msft repo and removed Request for 202205 Branch labels Sep 7, 2024
Copy link
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the alpha value is different for 400g/100g Lcs. If a 400g port is configured at 100g speed, will it still use the 400g port alpha?

@vmittal-msft
Copy link
Contributor Author

vmittal-msft commented Sep 10, 2024

It is different to give different q limits for different port speeds. 100g port will get lesser limit then 400g. if 400g us configured as 100g, it will be set based on 100g.

arlakshm
arlakshm previously approved these changes Sep 11, 2024
@judyjoseph
Copy link
Contributor

@vmittal-msft, do add reference to the qos tests run with these new configs, or any PR sonic-mgmt test change needed to support this.

@vmittal-msft
Copy link
Contributor Author

Yes. I have requested Nokia team to refer this while opening sonic-mgmt PR.

@vmittal-msft
Copy link
Contributor Author

@yxieca @rlhui @wangxin Can you please help merge ?

@yxieca yxieca merged commit dbcfc28 into sonic-net:master Sep 11, 2024
23 checks passed
@BYGX-wcr BYGX-wcr added the Included in Chassis for 202205 Branch Indicate PR is already in MSFT repo 202205 branch label Sep 12, 2024
vvolam pushed a commit to vvolam/sonic-buildimage that referenced this pull request Sep 12, 2024
… state (sonic-net#20132)

Why I did it
This change is to restrict lossy queue buffer usage in case of congestion state.

Work item tracking
Microsoft ADO (29315559):

How I did it
Updated alpha from 0 to -4 (400g) & -5 (100g) port speed. This configuration is applied on system port and will be using HWSKU port speed settings.

How to verify it
It is verified using sonic-mgmt tests and running ok.
arlakshm pushed a commit to sonic-net/sonic-mgmt that referenced this pull request Sep 18, 2024
…ss_pkt count for lossy profile (#14585)

Since the dynamic_th-alpha changed from 0 to -4 (400g) & 100g port speed for egress lossy profile.
PR #sonic-net/sonic-buildimage#20132
Corresponding changes made in J2C+ qos yaml for t2 -broadcom-dnx
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Sep 20, 2024
… state (sonic-net#20132)

Why I did it
This change is to restrict lossy queue buffer usage in case of congestion state.

Work item tracking
Microsoft ADO (29315559):

How I did it
Updated alpha from 0 to -4 (400g) & -5 (100g) port speed. This configuration is applied on system port and will be using HWSKU port speed settings.

How to verify it
It is verified using sonic-mgmt tests and running ok.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #20318

hdwhdw pushed a commit to hdwhdw/sonic-mgmt that referenced this pull request Sep 20, 2024
…ss_pkt count for lossy profile (sonic-net#14585)

Since the dynamic_th-alpha changed from 0 to -4 (400g) & 100g port speed for egress lossy profile.
PR #sonic-net/sonic-buildimage#20132
Corresponding changes made in J2C+ qos yaml for t2 -broadcom-dnx
mssonicbld pushed a commit that referenced this pull request Sep 21, 2024
… state (#20132)

Why I did it
This change is to restrict lossy queue buffer usage in case of congestion state.

Work item tracking
Microsoft ADO (29315559):

How I did it
Updated alpha from 0 to -4 (400g) & -5 (100g) port speed. This configuration is applied on system port and will be using HWSKU port speed settings.

How to verify it
It is verified using sonic-mgmt tests and running ok.
arista-hpandya pushed a commit to arista-hpandya/sonic-mgmt that referenced this pull request Oct 2, 2024
…ss_pkt count for lossy profile (sonic-net#14585)

Since the dynamic_th-alpha changed from 0 to -4 (400g) & 100g port speed for egress lossy profile.
PR #sonic-net/sonic-buildimage#20132
Corresponding changes made in J2C+ qos yaml for t2 -broadcom-dnx
vikshaw-Nokia pushed a commit to vikshaw-Nokia/sonic-mgmt that referenced this pull request Oct 23, 2024
…ss_pkt count for lossy profile (sonic-net#14585)

Since the dynamic_th-alpha changed from 0 to -4 (400g) & 100g port speed for egress lossy profile.
PR #sonic-net/sonic-buildimage#20132
Corresponding changes made in J2C+ qos yaml for t2 -broadcom-dnx
ansrajpu-git added a commit to ansrajpu-git/sonic-mgmt that referenced this pull request Nov 7, 2024
…ss_pkt count for lossy profile (sonic-net#14585)

Since the dynamic_th-alpha changed from 0 to -4 (400g) & 100g port speed for egress lossy profile.
PR #sonic-net/sonic-buildimage#20132
Corresponding changes made in J2C+ qos yaml for t2 -broadcom-dnx
bingwang-ms pushed a commit to sonic-net/sonic-mgmt that referenced this pull request Nov 8, 2024
…hresholds (#15448)

* [QoS]qos_yaml j2C+ changes for new _vsq thresholds (#13069)

What is the motivation for this PR?
The new   MMU settings to enhance performance for RDMA traffic in production.
Hence the qos_params needs to be tweaked according to the set buffer profiles.

However, the existing sonic-mgmt LossyQueueTest  doesn't fairly verify the buffer threshold for headroom for Lossy traffic. As per the new vsq profile setting the XOFF FADT threshold/PG is way lesser than the Nominal headroom, which limits it to not utilize the headroom buffer completely and send pause frames before reaching the MAX headroom limit.
Either the test case needs to be improvised by adding more source ports or a new test case should be added to verify the Lossy queue traffic at PG level

* [Chassis][Voq] Updating J2C+ qos yaml  for 400G and 100G profile _egress_pkt count for lossy profile (#14585)

Since the dynamic_th-alpha changed from 0 to -4 (400g) & 100g port speed for egress lossy profile.
PR #sonic-net/sonic-buildimage#20132
Corresponding changes made in J2C+ qos yaml for t2 -broadcom-dnx

* [Qos]qos_yaml updated for 400G
aidan-gallagher pushed a commit to aidan-gallagher/sonic-buildimage that referenced this pull request Nov 16, 2024
… state (sonic-net#20132)

Why I did it
This change is to restrict lossy queue buffer usage in case of congestion state.

Work item tracking
Microsoft ADO (29315559):

How I did it
Updated alpha from 0 to -4 (400g) & -5 (100g) port speed. This configuration is applied on system port and will be using HWSKU port speed settings.

How to verify it
It is verified using sonic-mgmt tests and running ok.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approved for 202405 Branch Chassis for 202205 branch PRs needed for 202205 branch in msft repo Chassis 🤖 Modular chassis support Included in Chassis for 202205 Branch Indicate PR is already in MSFT repo 202205 branch Included in 202405 Branch Request for 202405 Branch
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

9 participants