Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[subinterface] Invalid RIF admin state is set during vnet port subinterface configuration #15282

Closed
nazariig opened this issue May 31, 2023 · 1 comment · Fixed by sonic-net/sonic-swss#2806
Assignees
Labels
Bug 🐛 MSFT Triaged this issue has been triaged

Comments

@nazariig
Copy link
Collaborator

nazariig commented May 31, 2023

Description

The issue is observed on a Port Profile Init PR sonic-net/sonic-swss#2564 when optimizations are disabled - compilation takes place during CI docker vs build:

2023-05-30T10:09:01.9373597Z g++ -DHAVE_CONFIG_H -I. -I.. -I .. -I ../warmrestart -I  -g  -std=c++14 -Wall -fPIC -Wno-write-strings -I/usr/include/swss -I/usr/include/libnl3 -Werror -Wno-reorder -Wcast-align -Wcast-qual -Wconversion -Wdisabled-optimization -Wextra -Wfloat-equal -Wformat=2 -Wformat-nonliteral -Wformat-security -Wformat-y2k -Wimport -Winit-self -Winvalid-pch -Wlong-long -Wmissing-field-initializers -Wmissing-format-attribute -Wno-aggregate-return -Wno-padded -Wno-switch-enum -Wno-unused-parameter -Wpacked -Wpointer-arith -Wredundant-decls -Wstack-protector -Wstrict-aliasing=3 -Wswitch -Wswitch-default -Wunreachable-code -Wunused -Wvariadic-macros -Wno-switch-default -Wno-long-long -Wno-redundant-decls -fprofile-arcs -ftest-coverage  -Wdate-time -D_FORTIFY_SOURCE=2  -g -O0 -c -o fpmsyncd-fpmsyncd.o `test -f 'fpmsyncd.cpp' || echo './'`fpmsyncd.cpp

The issue is not observed when optimizations are enabled - compilation takes place during local docker vs build

g++ -DHAVE_CONFIG_H -I. -I.. -I .. -I ../warmrestart -I  -g  -std=c++14 -Wall -fPIC -Wno-write-strings -I/usr/include/swss -I/usr/include/libnl3 -Werror -Wno-reorder -Wcast-align -Wcast-qual -Wconversion -Wdisabled-optimization -Wextra -Wfloat-equal -Wformat=2 -Wformat-nonliteral -Wformat-security -Wformat-y2k -Wimport -Winit-self -Winvalid-pch -Wlong-long -Wmissing-field-initializers -Wmissing-format-attribute -Wno-aggregate-return -Wno-padded -Wno-switch-enum -Wno-unused-parameter -Wpacked -Wpointer-arith -Wredundant-decls -Wstack-protector -Wstrict-aliasing=3 -Wswitch -Wswitch-default -Wunreachable-code -Wunused -Wvariadic-macros -Wno-switch-default -Wno-long-long -Wno-redundant-decls  -Wdate-time -D_FORTIFY_SOURCE=2  -g -O2 -ffile-prefix-map=/sonic/src/sonic-swss=. -fstack-protector-strong -Wformat -Werror=format-security -c -o fpmsyncd-fpmsyncd.o `test -f 'fpmsyncd.cpp' || echo './'`fpmsyncd.cpp

Optimization are disabled when gcov is enabled:
https://github.com/sonic-net/sonic-swss/blob/master/debian/rules#L35

ifeq ($(ENABLE_GCOV), y)
	configure_opts += --enable-gcov CFLAGS="-g -O0" CXXFLAGS="-g -O0"
endif

CI log:

2023-05-30T14:21:14.7951819Z =================================== FAILURES ===================================
2023-05-30T14:21:14.7956746Z _________________ TestSubPortIntf.test_sub_port_intf_creation __________________
2023-05-30T14:21:14.7957189Z 
2023-05-30T14:21:14.7957682Z self = <test_sub_port_intf.TestSubPortIntf object at 0x7fd30e628250>
2023-05-30T14:21:14.7958206Z dvs = <conftest.DockerVirtualSwitch object at 0x7fd30e8f6f70>
2023-05-30T14:21:14.7958482Z 
2023-05-30T14:21:14.7959133Z     def test_sub_port_intf_creation(self, dvs):
2023-05-30T14:21:14.7963149Z >       self.connect_dbs(dvs)
2023-05-30T14:21:14.7963566Z 
2023-05-30T14:21:14.7964137Z test_sub_port_intf.py:586: 
2023-05-30T14:21:14.7964509Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2023-05-30T14:21:14.7964826Z test_sub_port_intf.py:86: in connect_dbs
2023-05-30T14:21:14.7965138Z     self.default_vrf_oid = self.get_default_vrf_oid()
2023-05-30T14:21:14.7965464Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2023-05-30T14:21:14.7965592Z 
2023-05-30T14:21:14.7965855Z self = <test_sub_port_intf.TestSubPortIntf object at 0x7fd30e628250>
2023-05-30T14:21:14.7966024Z 
2023-05-30T14:21:14.7966252Z     def get_default_vrf_oid(self):
2023-05-30T14:21:14.7966525Z         oids = self.get_oids(ASIC_VIRTUAL_ROUTER_TABLE)
2023-05-30T14:21:14.7966860Z >       assert len(oids) == 1, "Wrong # of default vrfs: %d, expected #: 1." % (len(oids))
2023-05-30T14:21:14.7967216Z E       AssertionError: Wrong # of default vrfs: 2, expected #: 1.
2023-05-30T14:21:14.7967477Z E       assert 2 == 1
2023-05-30T14:21:14.7967667Z E         -2
2023-05-30T14:21:14.7967870Z E         +1
2023-05-30T14:21:14.7967953Z 
2023-05-30T14:21:14.7968200Z test_sub_port_intf.py:393: AssertionError
2023-05-30T14:21:14.7970848Z ---------------------------- Captured stdout setup -----------------------------
2023-05-30T14:21:14.7971234Z remove extra link dummy
2023-05-30T14:21:14.7971739Z ---------------------------- Captured stderr setup -----------------------------
2023-05-30T14:21:14.7972110Z Exception ignored in: <function ApplDbValidator.__del__ at 0x7fd30ed389d0>
2023-05-30T14:21:14.7972436Z Traceback (most recent call last):
2023-05-30T14:21:14.7972741Z   File "/agent/_work/1/s/tests/conftest.py", line 168, in __del__
2023-05-30T14:21:14.7973063Z     neighbors = self.get_keys(self.NEIGH_TABLE)
2023-05-30T14:21:14.7973728Z   File "/agent/_work/1/s/tests/dvslib/dvs_database.py", line 122, in get_keys
2023-05-30T14:21:14.7974098Z     table = swsscommon.Table(self.db_connection, table_name)
2023-05-30T14:21:14.7974668Z   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2706, in __init__
2023-05-30T14:21:14.7975040Z     _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
2023-05-30T14:21:14.7975581Z RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address
2023-05-30T14:21:14.7976131Z ----------------------------- Captured stdout call -----------------------------
2023-05-30T14:21:14.7976679Z -----rc=1 for cmd ['bash', '-c', 'ip link show Ethernet64 | grep Vrf0']-----
2023-05-30T14:21:14.7976850Z 
2023-05-30T14:21:14.7977104Z -----
2023-05-30T14:21:14.7977558Z -----rc=1 for cmd ['bash', '-c', 'ip link show Ethernet64 | grep master']-----
2023-05-30T14:21:14.7977728Z 
2023-05-30T14:21:14.7977989Z -----
2023-05-30T14:21:14.7978669Z -----rc=1 for cmd ['bash', '-c', 'ip link show PortChannel1 | grep Vrf0']-----
2023-05-30T14:21:14.7978821Z 
2023-05-30T14:21:14.7979082Z -----
2023-05-30T14:21:14.7979530Z -----rc=1 for cmd ['bash', '-c', 'ip link show PortChannel1 | grep master']-----
2023-05-30T14:21:14.7979705Z 
2023-05-30T14:21:14.7979955Z -----
2023-05-30T14:21:14.7980404Z -----rc=1 for cmd ['bash', '-c', 'ip link show Ethernet64 | grep Vnet1000']-----
2023-05-30T14:21:14.7980593Z 
2023-05-30T14:21:14.7980834Z -----
2023-05-30T14:21:14.7981488Z -----rc=1 for cmd ['bash', '-c', 'ip link show Ethernet64 | grep master']-----
2023-05-30T14:21:14.7981664Z 
2023-05-30T14:21:14.7981916Z -----

2023-05-30T14:21:14.8097090Z 	<class 'AssertionError'>
2023-05-30T14:21:14.8100687Z 	Expected field/value pairs not found: expected={'SAI_ROUTER_INTERFACE_ATTR_TYPE': 'SAI_ROUTER_INTERFACE_TYPE_SUB_PORT', 'SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID': '10', 'SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE': 'true', 'SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE': 'true', 'SAI_ROUTER_INTERFACE_ATTR_MTU': '9100', 'SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID': 'oid:0x300000000061e', 'SAI_ROUTER_INTERFACE_ATTR_PORT_ID': 'oid:0x1000000000012'}, received={'SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID': 'oid:0x300000000061e', 'SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS': '02:42:AC:11:00:02', 'SAI_ROUTER_INTERFACE_ATTR_TYPE': 'SAI_ROUTER_INTERFACE_TYPE_SUB_PORT', 'SAI_ROUTER_INTERFACE_ATTR_PORT_ID': 'oid:0x1000000000012', 'SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID': '10', 'SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE': 'false', 'SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE': 'false', 'SAI_ROUTER_INTERFACE_ATTR_MTU': '9100', 'SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID': '0'}, key="oid:0x6000000000627", table="ASIC_STATE:SAI_OBJECT_TYPE_ROUTER_INTERFACE"
2023-05-30T14:21:14.8103306Z 	[<TracebackEntry /agent/_work/1/s/tests/test_sub_port_intf.py:594>, <TracebackEntry /agent/_work/1/s/tests/test_sub_port_intf.py:554>, <TracebackEntry /agent/_work/1/s/tests/test_sub_port_intf.py:427>, <TracebackEntry /agent/_work/1/s/tests/dvslib/dvs_database.py:239>]
2023-05-30T14:21:14.8104006Z test_sub_port_intf_creation failed; it passed 0 out of the required 1 times.

GDB log:

 637│
 638│             return true;
 639│         }
 640│         else
 641│         {
 642│             return false;
 643│         }
 644│     }
 645│
 646│     return true;
 647│ }
 648│
 649│ void IntfsOrch::doTask(Consumer &consumer)
 650│ {
 651├>    SWSS_LOG_ENTER();
 652│
 653│     if (!gPortsOrch->allPortsReady())
 654│     {
 655│         return;
 656│     }
 657│
 658│     string table_name = consumer.getTableName();
 659│
 660│     auto it = consumer.m_toSync.begin();
 661│     while (it != consumer.m_toSync.end())
 662│     {
 663│         KeyOpFieldsValuesTuple t = it->second;
 664│         vector<string> keys = tokenize(kfvKey(t), ':');
 665│         string alias(keys[0]);
/tmp/sonic-swss/orchagent/intfsorch.cpp
Thread 1 "orchagent" hit Breakpoint 1, IntfsOrch::doTask (this=0x563d33824000, consumer=...) at intfsorch.cpp:651
[?2004h[?2004l[?2004h(gdb) p consumer
$3 = (Consumer &) @0x563d3388dbf0: {<ConsumerBase> = {<Executor> = {<swss::Selectable> = {_vptr.Selectable = 0x563d30cad680 <vtable for Consumer+16>, m_priority = 0, m_last_used_time = {__d = {__r = 71835290
42520727}}}, m_selectable = 0x563d33886038, m_orch = 0x563d33824000, m_name = "INTF_TABLE"}, m_toSync = std::multimap with 1 element = {["Ethernet64.10"] = std::tuple containing = {
        [1] = "Ethernet64.10", [2] = "SET",
        [3] = std::vector of length 1, capacity 1 = {{first = "mtu",
            second = "9100"}}}}}, <No data fields>}

 505│             bool attrChanged = false;
 506│             if (mtu && port.m_mtu != mtu)
 507│             {
 508│                 port.m_mtu = mtu;
 509│                 attrChanged = true;
 510│
 511│                 setRouterIntfsMtu(port);
 512│             }
 513│
 514├>            if (port.m_admin_state_up != adminUp)
 515│             {
 516│                 port.m_admin_state_up = adminUp;
 517│                 attrChanged = true;
 518│
 519│                 setRouterIntfsAdminStatus(port);
 520│             }
 521│
 522│             if (attrChanged)
 523│             {
 524│                 gPortsOrch->setPort(alias, port);
 525│             }
 526│         }
 527│     }
 528│
 529│     if (!ip_prefix || m_syncdIntfses[alias].ip_addresses.count(*ip_prefix))
 530│     {
 531│         /* Request to create router interface, no prefix present or Duplicate entry */
 532│         return true;
 533│     }
/tmp/sonic-swss/orchagent/intfsorch.cpp
Thread 1 "orchagent" hit Breakpoint 1, IntfsOrch::setIntf (this=0x557e00828000, alias="Ethernet64.10", vrf_id=844424930133522, ip_prefix=0x0, adminUp=false, mtu=9100, loopbackAction="") at intfsorch.cpp:505
warning: Source file is more recent than executable.
[?2004h[?2004l[?2004h(gdb) p port.m_mtu
$1 = 9100
[?2004h(gdb) p mtu
$2 = 9100
[?2004h(gdb) p port.m_admin_state_up
$4 = true
[?2004h(gdb) p adminUp
$3 = false

The bug is caused by a logic in intfmgr which aims to reduce the redundant admin state configuration:

 464│
 465│ void IntfMgr::updateSubIntfAdminStatus(const string &alias, const string &admin)
 466│ {
 467│     string intf;
 468│     for (auto entry : m_subIntfList)
 469│     {
 470│         intf = entry.first;
 471│         subIntf subIf(intf);
 472│         if (subIf.parentIntf() == alias)
 473│         {
 474│             /*  Avoid duplicate interface admin UP event. */
 475│             string curr_admin = m_subIntfList[intf].currAdminStatus;
 476│             if (curr_admin == "up" && curr_admin == admin)
 477│             {
 478├>                continue;
 479│             }
 480│             std::vector<FieldValueTuple> fvVector;
 481│             string subintf_admin = setHostSubIntfAdminStatus(intf, m_subIntfList[intf].adminStatus, admin);
 482│             m_subIntfList[intf].currAdminStatus = subintf_admin;
 483│             FieldValueTuple fvTuple("admin_status", subintf_admin);
 484│             fvVector.push_back(fvTuple);
 485│             m_appIntfTableProducer.set(intf, fvVector);
 486│         }
 487│     }
 488│ }
 489│
 490│ std::string IntfMgr::setHostSubIntfAdminStatus(const string &alias, const string &admin_status, const string &parent_admin_status)
 491│ {
 492│     stringstream cmd;
/tmp/sonic-swss/cfgmgr/intfmgr.cpp

Because of that, intfsorch falls back to defaults and treats admin state configuration as disabled:

 687│             {
 688│                 //Synced local interface. Skip
 689│                 it = consumer.m_toSync.erase(it);
 690│                 continue;
 691│             }
 692│         }
 693│
 694│         const vector<FieldValueTuple>& data = kfvFieldsValues(t);
 695│         string vrf_name = "", vnet_name = "", nat_zone = "";
 696│         MacAddress mac;
 697│
 698│         uint32_t mtu = 0;
 699│         bool adminUp;
 700├>        bool adminStateChanged = false;
 701│         uint32_t nat_zone_id = 0;
 702│         string proxy_arp = "";
 703│         string inband_type = "";
 704│         bool mpls = false;
 705│         string vlan = "";
 706│         string loopbackAction = "";
 707│
 708│         for (auto idx : data)
 709│         {
 710│             const auto &field = fvField(idx);
 711│             const auto &value = fvValue(idx);
 712│             if (field == "vrf_name")
 713│             {
 714│                 vrf_name = value;
 715│             }
/tmp/sonic-swss/orchagent/intfsorch.cpp

Steps to reproduce the issue:

  1. Get Port Profile Init changes [ppi]: Implement port bulk comparison logic sonic-swss#2564
  2. Build docker vs including swss with disabled optimizations
  3. Run sonic-swss/tests/test_sub_port_intf.py

Describe the results you received:

root@adcf4e1b64f0:/# redis-cli -n 1 HGETALL "ASIC_STATE:SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x600000000061b"
 1) "SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID"
 2) "oid:0x3000000000612"
 3) "SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS"
 4) "02:42:AC:11:00:02"
 5) "SAI_ROUTER_INTERFACE_ATTR_TYPE"
 6) "SAI_ROUTER_INTERFACE_TYPE_SUB_PORT"
 7) "SAI_ROUTER_INTERFACE_ATTR_PORT_ID"
 8) "oid:0x1000000000012"
 9) "SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID"
10) "10"
11) "SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE"
12) "false"
13) "SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE"
14) "false"
15) "SAI_ROUTER_INTERFACE_ATTR_MTU"
16) "9100"
17) "SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID"
18) "0"

Describe the results you expected:

root@adcf4e1b64f0:/# redis-cli -n 1 HGETALL "ASIC_STATE:SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x600000000061b"
 1) "SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID"
 2) "oid:0x3000000000612"
 3) "SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS"
 4) "02:42:AC:11:00:02"
 5) "SAI_ROUTER_INTERFACE_ATTR_TYPE"
 6) "SAI_ROUTER_INTERFACE_TYPE_SUB_PORT"
 7) "SAI_ROUTER_INTERFACE_ATTR_PORT_ID"
 8) "oid:0x1000000000012"
 9) "SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID"
10) "10"
11) "SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE"
12) "true"
13) "SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE"
14) "true"
15) "SAI_ROUTER_INTERFACE_ATTR_MTU"
16) "9100"
17) "SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID"
18) "0"

Output of show version:

  • N/A

Output of show techsupport:

  • N/A

Additional information you deem important (e.g. issue happens only occasionally):

@judyjoseph
Copy link
Contributor

This is good to be understood on why this is seen with optimizations disabled for GCOV. @neethajohn @wendani to take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 MSFT Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants