-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fpmsyncd] Bug Fix #12625 for Upgrade from 201911 to 202205 is failing #2544
base: master
Are you sure you want to change the base?
Conversation
fpmsyncd crash (#12625) - What I did On warm-reboot ignore the assert if a new dB filed is detected in the new build - How I did it ignore the assert in case new field is detected during warm-reboot - How to verify it Verified by warm-reboot with new software version having new field added for ROUTE_TABLE Signed-off-by: nikhil.kelapure@broadcom.com
|
re test please |
retest please |
/azpw run Azure.sonic-swss |
1 similar comment
/azpw run Azure.sonic-swss |
/AzurePipelines run Azure.sonic-swss |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run Azure.sonic-swss |
/AzurePipelines run Azure.sonic-swss |
Azure Pipelines successfully started running 1 pipeline(s). |
@vaibhavhd @prsunny pls help review the fix |
warmrestart/warmRestartHelper.cpp
Outdated
@@ -280,6 +280,12 @@ bool WarmStartHelper::compareAllFV(const std::vector<FieldValueTuple> &v1, | |||
for (auto &v2fv : v2) | |||
{ | |||
auto v1Iter = v1Map.find(v2fv.first); | |||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not have dead code checked-in
Removed dead code
/* | ||
* New field added for the refresh entry | ||
* hence return 'no match' | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring all new attributes sounds dangerous to me. Do we have a way to only ignore extra attributes for the objects that we have analyzed and assessed? @kcudnik any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nkelapur please take a look at this. To be unblocked, can we add a specific skip on assert condition for the extra attribute (weight
) that caused the issue?
I agree that ignoring all the extra attributes can bring us more unwarranted issues. Specific ignore helps us control what exactly is allowed/denied functionaly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vaibhavhd I don't completely agree with this change.
1> this warm-restart helper class is mainly used in fpmsyncd. Not used anywhere else.
2> We are in full control of what is going into app-db, in this case for route entries.
I think this whole point of assert was the design assumption earlier was the the fields before and after warm-restart will same.
However that design does not apply anymore, since the number of fields is different before and after warm-upgrade. So in this case the general design should be that this api returns "true" such that after warm-reboot reconciliation the new entry will get written to the app-db. This change is implementing it.
If we check the field for a particular field name ex "weight" in this case, then that will make this code non-modular and difficult to maintain.
At a later point when more new fields are added ex next-hop-group, then another check has to be added to check for that field name.
Also if we plan to use this for something other than route entries, then there will be fields specific to that table we might need to skip.
Please note. We are not ignoring all the new attributes. This api will return true if it finds a non matching attribute, meaning that the entry is different and needs to be set to the app-db at reconcillation. Thus app-db will be updated with the new entry with the extra new attribute and that will trigger OA to take further action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nkelapur I believe this issue should not be fixed in SONiC. When an new attribute is needed, SAI/SDK should add it in during warm recovery, with SONiC adding it with db_migrator, we have a complete loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yxieca Completely agree with this. If db_migrator take care of adding the new attribute, then we don't need to remove this assert.
No, we decided to pursue the db_migrator appraoch. we should close this PR. However, we are unable to repro the issue lately, we suspect the issue might have been caused by the service unmasking issue that was introduced and fixed recently. The timing matches. |
Fixes sonic-net/sonic-buildimage#12625
fpmsyncd crash (#12625)
What I did
Why I did it
How I verified it
Details if related