Skip to content

Commit

Permalink
swss: Fixing race condition for rif counters (sonic-net#2488)
Browse files Browse the repository at this point in the history
The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue sonic-net#2193 the cleanup for stale rif counters was added.
But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them.
So as a fix all such cleanup has been moved to syncd.

- What I did
Fix for sonic-net#11621

As a past fix which aimed at removing stale rif counters (sonic-net#2199) , there is a chance of race condition and it leads to lua script reporting error.
To handle this , the rif counters cleanup code(handled in cleanUpRifFromCounterDb) is now called from syncd ( removeCounter ) to avoid such race condition.

- Why I did it

The operations in Orchagent and syncd is not synchronous, so while Orchagent deletes the rif counters from Counters Db, the syncd could still access it. In race conditions the lua script trying to fetch rif counters will have errors syslog for such access as it was already deleted by orchagent. The cleanup code is removed from orchagent is added in syncd - it will make sure no such race condition would get hit.

- How I verified it

Followed the steps in (sonic-net#11621) :

Create RIF in SONiC, wait till RIF rates are populated in COUNTERS DB
Remove RIF
Repeat the steps multiple times and check if any error syslog is seen (No error syslog is seen)
Also checked cleanup for rif counters.

After RIF creation derived info of oid for RIF from "COUNTERS_RIF_NAME_MAP"
127) "Vlan100"
128) "oid:0x6000000000aa5"

Checked all the tabled in COUNTER_DB which has same OID in keys
127.0.0.1:6379[2]> keys 6000000000aa5
1) "RATES:oid:0x6000000000aa5:RIF"
2) "COUNTERS:oid:0x6000000000aa5"
3) "RATES:oid:0x6000000000aa5"
127.0.0.1:6379[2]>

Deleted the RIF by removing the ip on the intf.

Checked COUNTER_DB again with same OID if there are stale entries or not. No stale entries exist now.
127.0.0.1:6379[2]> keys 6000000000aa5
(empty array)
127.0.0.1:6379[2]>

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
  • Loading branch information
sumanbrcm authored Nov 8, 2022
1 parent 724f914 commit ab0e474
Show file tree
Hide file tree
Showing 2 changed files with 0 additions and 39 deletions.
35 changes: 0 additions & 35 deletions orchagent/intfsorch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1277,7 +1277,6 @@ bool IntfsOrch::removeRouterIntfs(Port &port)

const auto id = sai_serialize_object_id(port.m_rif_id);
removeRifFromFlexCounter(id, port.m_alias);
cleanUpRifFromCounterDb(id, port.m_alias);

sai_status_t status = sai_router_intfs_api->remove_router_interface(port.m_rif_id);
if (status != SAI_STATUS_SUCCESS)
Expand Down Expand Up @@ -1500,45 +1499,11 @@ void IntfsOrch::removeRifFromFlexCounter(const string &id, const string &name)
SWSS_LOG_DEBUG("Unregistered interface %s from Flex counter", name.c_str());
}

/*
TODO A race condition can exist when swss removes the counter from COUNTERS DB
and at the same time syncd is inserting a new entry in COUNTERS DB. Therefore
all the rif counters cleanup code should move to syncd
*/
void IntfsOrch::cleanUpRifFromCounterDb(const string &id, const string &name)
{
SWSS_LOG_ENTER();
string counter_key = getRifCounterTableKey(id);
string rate_key = getRifRateTableKey(id);
string rate_init_key = getRifRateInitTableKey(id);
m_counter_db->del(counter_key);
m_counter_db->del(rate_key);
m_counter_db->del(rate_init_key);
SWSS_LOG_NOTICE("CleanUp interface %s oid %s from counter db", name.c_str(),id.c_str());
}

string IntfsOrch::getRifFlexCounterTableKey(string key)
{
return string(RIF_STAT_COUNTER_FLEX_COUNTER_GROUP) + ":" + key;
}

string IntfsOrch::getRifCounterTableKey(string key)
{
return "COUNTERS:" + key;
}

string IntfsOrch::getRifRateTableKey(string key)
{
return "RATES:" + key;
}

string IntfsOrch::getRifRateInitTableKey(string key)
{
return "RATES:" + key + ":RIF";
}



void IntfsOrch::generateInterfaceMap()
{
m_updateMapsTimer->start();
Expand Down
4 changes: 0 additions & 4 deletions orchagent/intfsorch.h
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,6 @@ class IntfsOrch : public Orch
unique_ptr<ProducerTable> m_flexCounterGroupTable;

std::string getRifFlexCounterTableKey(std::string s);
std::string getRifCounterTableKey(std::string s);
std::string getRifRateTableKey(std::string s);
std::string getRifRateInitTableKey(std::string s);
void cleanUpRifFromCounterDb(const string &id, const string &name);

bool addRouterIntfs(sai_object_id_t vrf_id, Port &port, string loopbackAction);
bool removeRouterIntfs(Port &port);
Expand Down

0 comments on commit ab0e474

Please sign in to comment.