Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged EVPN VxLAN MH HLD from Cisco and BCM #1702

Merged
merged 14 commits into from
Nov 12, 2024

Conversation

pbrisset
Copy link
Contributor

@pbrisset pbrisset commented May 21, 2024

This is the result of merging Cisco and BCM HLDs.

Cisco HLD
BCM HLD

@prvattem
Copy link
Collaborator

@adyeung please add me to the reviewers list?

@adyeung adyeung requested a review from prvattem May 31, 2024 23:31
#### 1.2.1.2 L2 Unicast forwarding in EVPN MH
The control-plane and forwarding-plane handling of MAC and local adjacency (ARP/ND) learnt on the Multihomed Ethernet-Segments significantly differs from the single homed Ethernet-Segments in the EVPN VxLAN network.

Each VTEP advertises EVPN Type-1 (per ES/EAD) route per local Ethernet-Segment. These VTEPs also advertises EVPN Type-2 (MAC/IP) route per locally learn adjacency (ARP/ND). Remote VTEPs perform EVPN route resolution based on matching ESI value carried as part of these routes. For each Ethernet-Segment, an unique L2 Next-hop group (NHG) is formed that contains the participating VTEPs. All processed EVPN Type-2 routes are pointing to that L2 NHG on matching ESI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a section to capture new BGP EVPN routes(Type-1 and Type-4) being supported?

Copy link
Contributor Author

@pbrisset pbrisset Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Route-type 1 and 4 works as described in RFC7432. No need to add them here explicitely.

Copy link

@AntonButenkoGL AntonButenkoGL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @pbrisset

I represent a team working on the EVPN Multihoming implementation and we found some comments to this PR.

Can you, please, check the comments? We think they are crucial to have the implementation.

Thank you

Comment on lines 537 to 560
**EVPN_ETHERNET_SEGMENT**

```
;New table
;Specifies EVPN Ethernet Segment interface
;
; Status: stable
key = EVPN_ETHERNET_SEGMENT|"ifname"if_id
; field = value
esi = "AUTO" or es_id
; es_id is 10 byte colon (:) separated string when type = "TYPE_0_OPERATOR_CONFIGURED".
; Otherwise, esi value should be "AUTO" falling back on ESI type 1 or 3.
type = esi_type
; esi_type should be string with one of the below values:
; "TYPE_0_OPERATOR_CONFIGURED" for Type-0 ESI
; "TYPE_1_LACP_BASED" for Type-1 ESI
; "TYPE_3_MAC_BASED" for Type-3 ESI
ifname = "ifname"if_id
; if_id is the interface identifier. Same value as in the key.
; e.g., PortChannel1, where the ifname is PortChannel and the if_id is 1.
df_pref = 1*5DIGIT
; Designated-Forwarder election preference for this router in (1..65535) range.
; Default=32767.
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EVPN_ETHERNET_SEGMENT table does not map on existing FRR commands and is missing required functionality: mac and id for type-3 ESI. Those 2 items are both mandatory in FRR for type-3 ES to work. Also to our knowledge there is no type-1 ESI support in FRR right now or in any PR. So even if everything else is ready to resolve ESI through LACP, FRR is not ready at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont care about ESI value in SONIC. FRR using the Ethernet Segment info programs needed DB in SONIC such as SHL, DF/NDF, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AntonButenkoGL , you are right that FRR does not support Type-1 ESI at present. Type-1 ESI requires enhancement of FRR by passing the lacp partner mac through fpm. Type-1 ESI support can be brought into SONiC once FRR has it.

I have removed the mention of type-1 ESI from this hld in the next commit. However, this comment in the db description will remain as is. This convention of specifying AUTO for esi is borrowed from ocyang:
https://github.com/openconfig/public/blob/master/doc/evpn_use_cases.md#ethernet-segment

curl -X PATCH "https://SWICH_IP:9090/restconf/data/openconfig-network-instance:network-instances/network-instance=default/evpn/evpn-mh/config" -H "accept: */*" -H "Content-Type: application/yang-data+json" -d "{\"openconfig-network-instance:config\":{\"startup-delay\":150,\"mac-holdtime\":200,\"neigh-holdtime\":250}}"
```
<a id="4-Flow-Diagrams"></a>
# 4 Flow Diagrams

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Flow Diagrams section contains multiple distinctly different ways of to naming/implementing feature comparing to the rest of the document. We think these diagrams are outdated artifacts from #1638 requirements and contradict the new requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hummm. Can you be explicit?

Comment on lines 789 to 798
### 3.3.8 L2nhgorch
- New Orchagent class being introduced.
- Subscription to different tables
- APP DB - L2_NEXTHOP_GROUP, L2_NEXTHOP_GROUP_MEMBER
- CONFIG DB - EVPN_ES_INTERFACE.
- Subscription to L2_NEXTHOP_GROUP
- Creates/deletes the L2_ECMP_GROUP ASIC DB table.
- Interacts with PortsOrch to create a bridge-port object of type L2_ECMP_GROUP.
- Subscription to L2_NEXTHOP_GROUP_MEMBER
- Creates/deletes the L2_ECMP_GROUP_MEMBER ASIC DB tables entries on the L2_NEXTHOP_GROUP APP DB table updates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "DB Changes" and "Impact to existing modules" section conflicts with the L2_NEXTHOP_GROUP schema in APP_DB (the latter mentions a separate table for members). We propose that the one in "DB Changes" section is to be chosen as the correct approach and "L2nhgorch" subsection is to be rewritten accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text was updated. You may want to re-review this section.

doc/vxlan/EVPN/EVPN_VxLAN_Multihoming.md Show resolved Hide resolved
doc/vxlan/EVPN/EVPN_VxLAN_Multihoming.md Outdated Show resolved Hide resolved
doc/vxlan/EVPN/EVPN_VxLAN_Multihoming.md Outdated Show resolved Hide resolved
### 2.2.4 Linux Kernel Support
EVPN Multihoming feature requires L2 next-hop group (NHG) support in the Linux kernel that is available in v5.10.
EVPN Multihoming feature will not be supported in SONiC releases running 4.x kernel versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to RFC 8365, Section 8.3.3

VXLAN-GPE encapsulation needs to be used between these PEs and the ingress PE needs to set the BUM Traffic Bit (B bit) to indicate that this is an ingress-replicated BUM traffic.

It might need an additional patch to support this in the v5.10??

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AntonButenkoGL @pbrisset Does the kernel v5.10 need patch for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SONiC is currently at Linux kernel v6.1.38.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the BUM Traffic Bit already supported by Linux Kernel v6.1.38?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gord1306 , this hld is for vxlan (RFC7432). Vxlan-gpe is still a draft (draft-ietf-nvo3-vxlan-gpe.) This hld can be extended in future to add vxlan-gpe support. Bum bit is not present in the non-gpe vxlan header. I hope this clarifies.

@gord1306
Copy link

In the current implementation, can all the functionalities already work on the VS (virtual machine) platform?

@AntonButenkoGL
Copy link

AntonButenkoGL commented Jul 12, 2024

In the current implementation, can all the functionalities already work on the VS (virtual machine) platform?

Yes, it was tested with VS platform.
Additionally, SAG and WarmReboot use cases testing is in progress.

And there is a work in progress to align the ESI types logic with the current requirements.

hasan-brcm and others added 3 commits July 29, 2024 15:35
Updated kernel, SAI, config, and design sections
Update EVPN_VxLAN_Multihoming.md
;Specifies EVPN Ethernet Segment interface
;
; Status: stable
key = EVPN_ETHERNET_SEGMENT|"ifname"if_id
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ifname"if_id looks weird, can we just have an ifname as key and in the description, it's already mentioned about if_id. Same suggestion for other cases like "Vlan"vlan_id as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@venkatmahalingam , this is the convention that has been followed. "ifname"if_id means, "Ethernet"+0,1,2.. or "PortChannel"+1,2,3.. Same goes with Vlan.

The Isolation group SAI object used for achieving Local Bias and Split Horizon Filtering in the MCLAG will be reused for EVPN MH. In EVPN MH an Isolation group for every Peer Multihoming VTEP is created as opposed to a single group for MCLAG. The Isolation group members are populated based on the EVPN_SPLIT_HORIZON_TABLE APP-DB entry. Traffic received from the Tunnel towards the peer multihoming VTEP will not be forwarded to the members of the Isolation group.

<a id="224-Linux-Kernel-Support"></a>
### 2.2.4 Linux Kernel Support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can SONiC-VM work for MH feature any without limitations/restriction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@venkatmahalingam , yes, once the mentioned functionality, i.e. split-horizon and df-election is integrated with the Linux kernel, evpn MH will work on sonic-vm as well. The current Linux version in SONiC already has support for L2 nexthop groups.

Configuration handling of MCLAG should now be aware of EVPN Ethernet-Segment configurations and throw appropriate errors in order to achieve mutual-exclusion between EVPN Multihoming and MCLAG features.
OpenConfig evpn yang is imported into SONiC and ethernet-segment URIs are implemented.
OpenConfig LACP yang already has system-id-mac configuration and it will be implemented.
OpenConfig evpn yang will be extended for global non-standard parameter configurations (e.g. MAC/ARP hold-timer, etc.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont have openconfig support for EVPN feature as such in the SONiC community, are we planning to bring that support via mgmt-framework as part of EVPN MH effort?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@venkatmahalingam , yes, that's the plan.

@zhangyanzhao
Copy link
Collaborator

sonic community review recording on 7/30 https://zoom.us/rec/share/V2Xv_D2UOXdjLKJHazb1AV11JQxEsXS6275dtFiGA_QyGPgoVvWmL0lh9A5DJBfE.l-gvQEzam4TFRpXT

##### 1.2.1.1.1 Local-bias and split-horizon filtering
In multi-homed scenario, if the BUM traffic originates from a device attached to an EVPN Ethernet-Segment, local-bias procedure is applied and BUM traffic is flooded on all locally attached multi-homed and single-homed devices. The BUM traffic is also replicated to the remote VTEPs, and split-horizon filtering is applied such that traffic is not replicated to any of the devices that are attached to the shared EVPN Ethernet-Segment.

Local-bias procedure is described in EVPN VxLAN Overlay [RFC 8365](https://datatracker.ietf.org/doc/html/rfc8365). The behavior of local-bias is explained with the help of below diagram.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the VXLAN encapsulation type used for BUM traffic indication ?
  2. Is it going to be VXLAN-GPE tunnel type ?
  3. If yes, is there any extension planned for sai_tunnel_type_t ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helloanandhi , No Vxlan-gpe right now. This hld can be extended in future for gpe. There is no bum bit in original vxlan header.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hasan-brcm If VXLAN-GPE is not supported, does this mean that ingress replication cannot be used in the current implementation due to BUM traffic duplication that described in the https://datatracker.ietf.org/doc/html/rfc8365#section-8.3.3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gord1306 , VXLAN-GPE is not a mandate as per RFC8365. It is just a suggestion to avoid (low probability) transient loop.

If it is desired to
avoid occurrence of such transient packet duplication (however low
probability that may be), then VXLAN-GPE encapsulation needs to be
used between these PEs and the ingress PE needs to set the BUM
Traffic Bit (B bit) to indicate that this is an ingress-
replicated BUM traffic.

@adyeung
Copy link
Collaborator

adyeung commented Oct 28, 2024

SAI spec opencomputeproject/SAI#2084

| -------- | ----------------------------------------- |
| (N)DF | (Non-)Designated Forwarder |
| IRB | Integrated Routing and Bridging |
| LVTEP | Logical VTEP |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove abbreviation of LVTEP as LVTEP terminology is not being used throghout this HLD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skumar041 , will be removed in next commit.

Upon access interface going operationally down, FdbOrch gets notified to flush the MAC addresses associated with the interface. Fdborch is extended to check if the interface is associated with an Ethernet-Segment. If the interface is an orphan port then flush associated MAC addresses. If there is an Ethernet Segment associated with the interface, FdbOrch fetches the corresponding L2 NHID and re-program all (local and remote) MAC addresses pointing to L2 NHID associated with the local ESI. The MAC addresses are programmed with aging disabled.

##### 1.2.1.3.6 Local Multihomed ES Link up
Upon access interface going operationally up, new support is added in the Orchagent to move the MAC from L2 NHID to the corresponding interface which is associated with the Ethernet Segment. MAC addresses are programmed with aging disabled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAC addresses should get programmed with aging enabled when Local Multihomed ES Link comes up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skumar041 , if the mac is being programmed locally (after the link comes up,) the only reason is that the mac is active on another VTEP - otherwise, the mac would have been removed from the evpn as none of the VTEPs have it.
In other words, the mac is in control of the remote VTEP, and therefore it should be programmed with ageing disabled on the local VTEP.
If the mac is programmed as ageing enabled, it means there is a local mac learn event - which may not happen after the link comes up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With EVPN control plane, learning / ageing never happen over remote VTEP.

@hasan-brcm
Copy link
Contributor

Hello @pbrisset

I represent a team working on the EVPN Multihoming implementation and we found some comments to this PR.

Can you, please, check the comments? We think they are crucial to have the implementation.

Thank you

Hello @AntonButenkoGL, thank you for reviewing the hld and posting comments! I see the responses to your comments now. Please confirm if the questions/concerns are addressed, and we are good to go.

@adyeung
Copy link
Collaborator

adyeung commented Nov 7, 2024

@prvattem @gord1306 @AntonButenkoGL @helloanandhi @mikemallin @skumar041 @venkatmahalingam @srj102 @eddyk-nvidia The HLD has been presented and reviewed at Routing WG and community weekly calls, if there is no further comments on the design, please signoff and mark approve to conclude the review

@gord1306
Copy link

@adyeung Sorry, I don't have the permission to approve. However, the reply messages above are ok with me.

@adyeung
Copy link
Collaborator

adyeung commented Nov 12, 2024

@adyeung Sorry, I don't have the permission to approve. However, the reply messages above are ok with me.

@gord1306 you can click files changed tab -> click review -> click approve

Addressed review comments.

(cherry picked from commit f9fe305)
Addressed review comments.

(cherry picked from commit 9368707)
Copy link
Contributor

@hasan-brcm hasan-brcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adyeung adyeung merged commit 5039fee into sonic-net:master Nov 12, 2024
1 check passed
kishorekunal01 added a commit to kishorekunal01/sonic-buildimage that referenced this pull request Nov 14, 2024
Kernel 6.1.94 vesrion.

Why I did it
Adding support bridge fdb nhid and sync libnl3 header file to Kernel 6.1.94
version
Check HLD: sonic-net/SONiC#1702

Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
kishorekunal01 added a commit to kishorekunal01/sonic-swss that referenced this pull request Nov 15, 2024
…om the Kernel

Why I did it
Managing NHID in Bridge FDB Updates and Handling NHG Updates from the Kernel in fdbsyncd

Check HLD: sonic-net/SONiC#1702

Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
kishorekunal01 added a commit to kishorekunal01/sonic-swss that referenced this pull request Nov 15, 2024
…om the Kernel

Why I did it
Managing NHID in Bridge FDB Updates and Handling NHG Updates from the Kernel in fdbsyncd

Check HLD: sonic-net/SONiC#1702

Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
@zhangyanzhao
Copy link
Collaborator

no code PR, move to backlog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: MovedToBacklog
Development

Successfully merging this pull request may close these issues.