Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VxLAN] Removing the VxLAN tunnel through CLI doesn't delete the tunnel from hardware #12063

Closed
dgsudharsan opened this issue Sep 13, 2022 · 6 comments
Assignees

Comments

@dgsudharsan
Copy link
Collaborator

Description

After configuring a static VxLAN and VNET, trying to remove vxlan using CLI will result in successfully removing it in the application but not in the hardware. The reason is the delete task doesn't call any SAI API. (https://github.com/sonic-net/sonic-swss/blob/fd0c585d34de63dc9cca57bb7d508cadfe1125d4/orchagent/vxlanorch.cpp#L1536)

Previously the delete task had the flow of removing the tunnels sonic-net/sonic-swss#931. However this flow was modified in the commit sonic-net/sonic-swss@095f481 due to which delete vxlan will not make any SAI calls

Steps to reproduce the issue:

  1. Create VxLAN using CLI
  2. Remove it using config vxlan del command

Describe the results you received:

Delete vxlan returns successfully without any errors but leaving the hardware tunnel intact

Describe the results you expected:

Delete should return success or show throw error if there are any references

Output of show version:

show version

SONiC Software Version: SONiC.detached.1836-83704d995
Distribution: Debian 11.4
Kernel: 5.10.0-12-2-amd64
Build commit: 83704d995
Build date: Sat Aug 27 05:10:38 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic02-005

Platform: x86_64-mlnx_msn3800-r0
HwSKU: ACS-MSN3800
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1925X00004
Model Number: MSN3800-CS2F
Hardware Revision: A2
Uptime: 21:14:04 up 14:38,  1 user,  load average: 0.51, 0.64, 0.76
Date: Tue 13 Sep 2022 21:14:04

Docker images:
REPOSITORY                    TAG                       IMAGE ID       SIZE
docker-orchagent              detached.1836-83704d995   fd11a401e814   478MB
docker-orchagent              latest                    fd11a401e814   478MB
docker-fpm-frr                detached.1836-83704d995   670c27cd3327   489MB
docker-fpm-frr                latest                    670c27cd3327   489MB
docker-teamd                  detached.1836-83704d995   a14ce103012f   459MB
docker-teamd                  latest                    a14ce103012f   459MB
docker-macsec                 latest                    b0fb0be14a81   461MB
docker-syncd-mlnx             detached.1836-83704d995   134dec23270f   983MB
docker-syncd-mlnx             latest                    134dec23270f   983MB
docker-platform-monitor       detached.1836-83704d995   35d03efa7752   990MB
docker-platform-monitor       latest                    35d03efa7752   990MB
docker-dhcp-relay             latest                    ad34a2fd418d   453MB
docker-sonic-telemetry        detached.1836-83704d995   77d9fc9c8899   524MB
docker-sonic-telemetry        latest                    77d9fc9c8899   524MB
docker-snmp                   detached.1836-83704d995   8425d6595df4   490MB
docker-snmp                   latest                    8425d6595df4   490MB
docker-lldp                   detached.1836-83704d995   574a29a5f79e   486MB
docker-lldp                   latest                    574a29a5f79e   486MB
docker-database               detached.1836-83704d995   9b1e1894c88c   443MB
docker-database               latest                    9b1e1894c88c   443MB
docker-mux                    detached.1836-83704d995   e62dacc96223   492MB
docker-mux                    latest                    e62dacc96223   492MB
docker-router-advertiser      detached.1836-83704d995   420abd7429e7   443MB
docker-router-advertiser      latest                    420abd7429e7   443MB
docker-nat                    detached.1836-83704d995   bb72b5146a26   434MB
docker-nat                    latest                    bb72b5146a26   434MB
docker-sflow                  detached.1836-83704d995   6959519de196   432MB
docker-sflow                  latest                    6959519de196   432MB
docker-sonic-mgmt-framework   detached.1836-83704d995   6e57b0d815ce   561MB
docker-sonic-mgmt-framework   latest                    6e57b0d815ce   561MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@zhangyanzhao zhangyanzhao added Triaged this issue has been triaged BRCM labels Sep 14, 2022
@zhangyanzhao
Copy link
Collaborator

Besides the bug fix, we should also add one test case for vxlan tunnel deletion

@srj102
Copy link
Contributor

srj102 commented Sep 15, 2022

@dgsudharsan
Please describe the exact sequence which leads to the Tunnel object creation in the SAI ?
As far as I see it happens only as part of the vlan-vni or vrf-vni map creation.
Similarly deletion should happen after the last mapping entry is removed.

@dgsudharsan
Copy link
Collaborator Author

@srj102 I see. In case of VNET deleting the VNET before VxLAN tunnel will clear the tunnel from the hardware. However the problem here I see is the CLI command to remove the VxLAN tunnel succeeds without throwing any dependency errors whether a tunnel map or VNET is associated with it. This needs to be fixed.

@srj102
Copy link
Contributor

srj102 commented Sep 26, 2022

@dgsudharsan the foll is the snippet for vxlan deletion.
Here the vlan vni mappings are handled.

vxlan_keys = db.cfgdb.get_keys("VXLAN_TUNNEL_MAP|*")
if not vxlan_keys:
  vxlan_count = 0
else:
  vxlan_count = len(vxlan_keys)

if(vxlan_count > 0):
    ctx.fail("Please delete all VLAN VNI mappings.")

w.r.t VNET,
If the user is going to add/remove vnet and vnet to vni mappings without using the CLI, wouldn't the user prefer the
same for the VXLAN_TUNNEL table as well ?
In which case how useful would it be to add this check in the "config vxlan del" CLI handler ?

Today vxlanmgr has checks for vlan vni mappings to take care of usecases without CLI, So I think the check can be put here.

doVxlanTunnelDeleteTask() ..

// If there are mappings still against this tunnel then hold on.
if (m_vxlanTunnelCache[vxlanTunnelName].vlan_vni_refcnt)
{
    SWSS_LOG_WARN("Tunnel %s deletion failed. Need to delete mapping entries", vxlanTunnelName.c_str());
    return false;
}

An additional check for m_vnetCache.size() can be added in vxlanmgr as part of doVxlanTunnelDeleteTask() and return false
if it is a non-zero size ?

@dgsudharsan
Copy link
Collaborator Author

@srj102 I believe handling in CLI should be good enough as CLI is the only flow for delete. I have raised a fix sonic-net/sonic-utilities#2404

@adyeung
Copy link
Collaborator

adyeung commented Oct 12, 2022

Closing this issue, #2404 already merged

@adyeung adyeung closed this as completed Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants