Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FDB] Fix fbdorch to properly handle syncd FDB FLUSH Notif #2254

Merged
merged 26 commits into from
May 18, 2022

Conversation

vivekrnv
Copy link
Contributor

@vivekrnv vivekrnv commented May 4, 2022

What I did

Fixes sonic-net/sonic-buildimage#10605

Why I did it

  1. Vlan delete couldn't be handled by OA when there is fdb learnt on the member and when the member is deleted
  2. This inability of handling APPL_DB notif is affecting warm-restart
  3. FDB Entry from State DB is not removed.
  4. OA doesn't have the logic to handle consolidate flush notif coming from syncd
  5. FdbOrch doesn't have logic to clear internal cache and decrement corresponding fdb counters during a flush notification

How I verified it

Checked this config and verified the remove vlan passed though

1. config vlan add 40
2. config vlan member add 40 Ethernet0
On the other side. 
3. sudo ip link add link enp4s0f0 name enp4s0f0.40 type vlan id 40
4. sudo ip link set enp4s0f0.40 up
5. Make sure a fdb entry is learnt. 

root@sonic:/home/admin# fdbshow
  No.    Vlan  MacAddress         Port       Type
-----  ------  -----------------  ---------  -------
    1      40  7C:FE:90:12:22:EC  Ethernet0  Dynamic

dump state fdb all --db STATE_DB
{
    "Vlan40:7c:fe:90:12:22:ec": {
        "STATE_DB": {
            "keys": [
                {
                    "FDB_TABLE|Vlan40:7c:fe:90:12:22:ec": {
                        "port": "Ethernet0",
                        "type": "dynamic"
                    }
                }
            ],
            "tables_not_found": []
        }
    }
}

7. config vlan member del 40 Ethernet0
8. config vlan del 40
9. dump state fdb all --db STATE_DB
{}

Unit Tests:

vkarri@4897dcba242c:/sonic/src/sonic-swss/tests/mock_tests$ ./tests --gtest_filter=FdbOrchTest*
Running main() from /usr/src/gtest/src/gtest_main.cc
Note: Google Test filter = FdbOrchTest*
[==========] Running 6 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 6 tests from FdbOrchTest
[ RUN      ] FdbOrchTest.ConsolidatedFlushVlanandPort
[       OK ] FdbOrchTest.ConsolidatedFlushVlanandPort (14 ms)
[ RUN      ] FdbOrchTest.ConsolidatedFlushAll
[       OK ] FdbOrchTest.ConsolidatedFlushAll (15 ms)
[ RUN      ] FdbOrchTest.ConsolidatedFlushVlan
[       OK ] FdbOrchTest.ConsolidatedFlushVlan (14 ms)
[ RUN      ] FdbOrchTest.ConsolidatedFlushPort
[       OK ] FdbOrchTest.ConsolidatedFlushPort (15 ms)
[ RUN      ] FdbOrchTest.ConsolidatedFlushVlanandPortBridgeportDeleted
[       OK ] FdbOrchTest.ConsolidatedFlushVlanandPortBridgeportDeleted (14 ms)
[ RUN      ] FdbOrchTest.NonConsolidatedFlushVlanandPort
[       OK ] FdbOrchTest.NonConsolidatedFlushVlanandPort (15 ms)
[----------] 6 tests from FdbOrchTest (87 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 1 test case ran. (87 ms total)
[  PASSED  ] 6 tests.

Details if related

vivekrnv and others added 15 commits April 19, 2022 21:30
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv vivekrnv requested a review from prsunny as a code owner May 4, 2022 08:39
@liat-grozovik
Copy link
Collaborator

@vivekreddynv on which branch this fix should go? master only?

dgsudharsan
dgsudharsan previously approved these changes May 4, 2022
@vivekrnv
Copy link
Contributor Author

vivekrnv commented May 7, 2022

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

dgsudharsan
dgsudharsan previously approved these changes May 7, 2022
@vivekrnv
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).


/*
Handles the SAI_FDB_EVENT_FLUSHED notification recieved from syncd
Ref: https://github.com/opencomputeproject/SAI/blob/master/inc/saifdb.h#L223
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pleaser remove reference to code. This will not be useful when the header file gets changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled

if (entry->bv_id &&
!m_portsOrch->getPort(entry->bv_id, vlan))
{
SWSS_LOG_ERROR("FdbOrch notification type %d: Failed to locate vlan port from bv_id 0x%" PRIx64, type, entry->bv_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a NOTICE since the flow is in notification. The sonic-mgmt tests is now strict on what scenarios the ERR logs are being generated. There are currently some cases in fdb notif flow which is generating ERR which we are planning to revisit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled

@@ -493,86 +594,21 @@ void FdbOrch::update(sai_fdb_event_t type,
}
case SAI_FDB_EVENT_FLUSHED:

SWSS_LOG_INFO("FDB Flush event received: [ %s , 0x%" PRIx64 " ], \
bridge port ID: 0x%" PRIx64 ".",
SWSS_LOG_NOTICE("FDB Flush event received: [ %s , 0x%" PRIx64 " ], \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change it to INFO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled

const string& fdb_type,
const MacAddress& mac)
{
MacAddress flush_mac(CONSOLIDATED_FLUSH_MAC);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default assignment should be 0. I don't think we should define a new variable. You can give a comment that flush event shall have zero mac

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled

@@ -973,17 +1009,22 @@ void FdbOrch::doTask(NotificationConsumer& consumer)
for (uint32_t i = 0; i < count; ++i)
{
sai_object_id_t oid = SAI_NULL_OBJECT_ID;
sai_int32_t fdb_entry_type = -1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest assign to 0.

for (auto itr = m_entries.begin(); itr != m_entries.end();)
{
auto curr = itr++;
if (curr->second.type.find(fdb_type) != std::string::npos)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is redundant. afaik, the fdb_type is always populated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These entries might be static as well, so i think it's required to check the fdb_type before clearing them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you are not checking for static vs dynamic here. my point is, one or other is always true and never be npos, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"find" returns the index of first occurence, so this logic holds true, npos indicates a mismatch. I've used specifically thus because the type stored in internal cache can be of static, dynamic or dynamic_local. whereas the type received from the flush will either be static or dynamic.

Thus i've used this instead of a simple equality check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be i'm missing something. what do we miss with the following?

            if (curr->second.type.find(fdb_type) != std::string::npos)
            {
                if (curr->first.mac == mac || mac == flush_mac)
                {
                    clearFdbEntry(curr->first.mac, curr->first.bv_id, fdb_type, curr->first.port_name);
                }
            }

to


                if (curr->first.mac == mac || mac == flush_mac)
                {
                    clearFdbEntry(curr->first.mac, curr->first.bv_id, fdb_type, curr->first.port_name);
                }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. In the flush, can user specify fdb_type? If not, is it correct that flush notification would be for all types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link User can specify the flush type. Looks like there can also be a SAI_FDB_FLUSH_ENTRY_TYPE_ALL along with SAI_FDB_FLUSH_ENTRY_TYPE_STATIC/SAI_FDB_FLUSH_ENTRY_TYPE_DYNAMIC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update to handle the SAI_FDB_FLUSH_ENTRY_TYPE_ALL case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry, the flush notif will have either STATIC or DYNAMIC, so "all" case need not be handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Updated, remove reading the SAI_FDB_ENTRY_ATTR_TYPE attribute and cleared only non-static entries

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
vivekrnv and others added 2 commits May 16, 2022 20:07
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

dgsudharsan
dgsudharsan previously approved these changes May 17, 2022
vivekrnv added 2 commits May 18, 2022 01:01
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@prsunny prsunny merged commit bbbd5f4 into sonic-net:master May 18, 2022
@judyjoseph
Copy link
Contributor

@vivekreddynv Please raise a separate PR for 202111 as there are conflicts

vivekrnv added a commit to vivekrnv/sonic-swss that referenced this pull request Jul 29, 2022
…#2254)

Vlan delete couldn't be handled by OA when there is fdb learnt on the member and when the member is deleted
This inability of handling APPL_DB notif is affecting warm-restart. FDB Entry from State DB is not removed.
OA doesn't have the logic to handle consolidate flush notif coming from syncd
FdbOrch doesn't have logic to clear internal cache and decrement corresponding fdb counters during a flush notification

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
preetham-singh pushed a commit to preetham-singh/sonic-swss that referenced this pull request Aug 6, 2022
…#2254)

Vlan delete couldn't be handled by OA when there is fdb learnt on the member and when the member is deleted
This inability of handling APPL_DB notif is affecting warm-restart. FDB Entry from State DB is not removed.
OA doesn't have the logic to handle consolidate flush notif coming from syncd
FdbOrch doesn't have logic to clear internal cache and decrement corresponding fdb counters during a flush notification

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
prsunny pushed a commit that referenced this pull request Aug 16, 2022
…2401)

* [FDB] Fix fbdorch to properly handle syncd FDB FLUSH Notif (#2254)

Vlan delete couldn't be handled by OA when there is fdb learnt on the member and when the member is deleted
This inability of handling APPL_DB notif is affecting warm-restart. FDB Entry from State DB is not removed.
OA doesn't have the logic to handle consolidate flush notif coming from syncd
FdbOrch doesn't have logic to clear internal cache and decrement corresponding fdb counters during a flush notification

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
vivekrnv added a commit to vivekrnv/sonic-swss that referenced this pull request Aug 30, 2022
…onic-net#2401)

* [FDB] Fix fbdorch to properly handle syncd FDB FLUSH Notif (sonic-net#2254)

Vlan delete couldn't be handled by OA when there is fdb learnt on the member and when the member is deleted
This inability of handling APPL_DB notif is affecting warm-restart. FDB Entry from State DB is not removed.
OA doesn't have the logic to handle consolidate flush notif coming from syncd
FdbOrch doesn't have logic to clear internal cache and decrement corresponding fdb counters during a flush notification

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FDB] [OA] [Warm-reboot] Cannot remove vlan since FDB Flush per port per vlan doesn't clear FDB entries
6 participants