Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle FDB meta data together with FDB data in fdborch to avoid race … #441

Closed
wants to merge 1 commit into from

Conversation

jipanyang
Copy link
Contributor

…condition

Signed-off-by: Jipan Yang jipan.yang@alibaba-inc.com

Move FDB meta data data processing to fdborch. This a first step try to solve issue #440

…condition

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
}
}

void handle_meta_fdb_event(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where this is called ?

@kcudnik
Copy link
Collaborator

kcudnik commented Apr 23, 2019

can you give scenario where you get race condition ?

@lguohan
Copy link
Contributor

lguohan commented Apr 24, 2019

issue #440

@kcudnik
Copy link
Collaborator

kcudnik commented Apr 24, 2019

you moved some functionality to handle_meta_fdb_event, but this function is not called anywhere, basically you remove meta_ call which is required, and issues in OA are caused because OA is listening on redis notifications for fdb, and not handling them in SAI FDB callback ! https://github.com/Azure/sonic-swss/blob/master/orchagent/notifications.cpp#L8

which is totally wrong !, "concurency" in the DB can be solved other way! OA should use SAI callback to handle fdb notifications, this is only valid way to handle them

@lguohan
Copy link
Contributor

lguohan commented Jun 4, 2019

@jipanyang , can you check the comments here?

@jipanyang
Copy link
Contributor Author

handle_meta_fdb_event() is used in sonic-net/sonic-swss#824

@jipanyang
Copy link
Contributor Author

you moved some functionality to handle_meta_fdb_event, but this function is not called anywhere, basically you remove meta_ call which is required, and issues in OA are caused because OA is listening on redis notifications for fdb, and not handling them in SAI FDB callback ! https://github.com/Azure/sonic-swss/blob/master/orchagent/notifications.cpp#L8

which is totally wrong !, "concurency" in the DB can be solved other way! OA should use SAI callback to handle fdb notifications, this is only valid way to handle them

@kcudnik Not sure if I got your point. The current ntf_thread() is also a thread in OA, both fdborch in the main orchagent thread and ntf_thread operate on fdb meta data, there is race condition between them.

@kcudnik
Copy link
Collaborator

kcudnik commented Jun 4, 2019

there always will be race condition on 2 indepentent processes. interprocess synchronization could be used here.
fdb notification should be handled by using SAI notification callback, not by .listening to db directly.

you can use lock on ntf_callback from SAI when it will be used, but this does not guarantee anything, since more notifications can be in the redis queue waiting to be consumed by OA
because notifications are asynchronous the same issue also exist's when you would use vendor SAI directly instead of sairedis SAI. in vendor case, they jsut don't care about any reference count, bhats why we need, and this lead to finding a log of vendor bugs in the past, and prevent users writing OA to make simple mistakes like removing objects which is still in use

and again, you moved some functionality and not using using which opens possible race condition

@jipanyang
Copy link
Contributor Author

@kcudnik I think you are talking about the notification callback in syncd process. We are not touching that part.

@kcudnik
Copy link
Collaborator

kcudnik commented Jun 5, 2019

no, im talking about notification callback in OA, take a look here:
https://github.com/Azure/sonic-swss/blob/master/orchagent/notifications.cpp#L8

i already pointed this in my previous comment

@wendani
Copy link
Contributor

wendani commented Jun 5, 2019

retest this please

Copy link
Collaborator

@kcudnik kcudnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address commments

@jipanyang
Copy link
Contributor Author

no, im talking about notification callback in OA, take a look here:
https://github.com/Azure/sonic-swss/blob/master/orchagent/notifications.cpp#L8

i already pointed this in my previous comment

The comment is correct. While the change in meta_fdb_event_snoop_oid() simply conflicts with what is stated in the comment, in orchangent, now you have two threads handling the FDB related meta data reference. Please check #440 .

fdb is asynchronous. Having them processed in one thread in orchagent doesn't prevent possible pending notification in syncd or the db channel, but that should not be the excuse for introducing further race condition at OA module.

@kcudnik
Copy link
Collaborator

kcudnik commented Jun 6, 2019

i already mentioned the same issue, it does not matter whether those pending notifications are in syncd or in internal vendors sai, always is issue of having them come ant any time.
please fix OA using correct notifications procedure, and please don't break existing stuff since it will create more issues

@kcudnik
Copy link
Collaborator

kcudnik commented Jul 6, 2021

closing this as obsolete, code drifted to cpp objects instead of global functions, if there is need to still address this, please reopen

@kcudnik kcudnik closed this Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants