-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce replaydb feature #398
Conversation
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
pkg/frr/frr.go
Outdated
@@ -218,6 +322,12 @@ func handlevrf(objectData *eventbus.ObjectData) { | |||
comp.CompStatus = common.ComponentStatusError | |||
} | |||
log.Printf("%+v\n", comp) | |||
|
|||
// Checking the timer to decide if we need to replay or not | |||
if comp.Timer > replayThreshold { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we perform this check for each device type, can we have only one check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly you mean by saying "only one check" ? Like creating a common function and calling it from both device types related functions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take a look on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do many impovements here and we can do them in the experiment branch that I have created. The reason for this is:
- This code snippet that you have posted above is common in all the modules and not just FRR. So changes in that code will touch all the components and exceeds the scope of this PR so let's do it better on the experiment branch to do it properly
- The replayThreshold variable is decided by each module itself and that is why we cannot pass it to the comp structure that easily. If we want to pass it in that Comp structure then we need to find a clever way to do so. That means that it needs a bit more thinking and changes that will affect other part of the code that are outside of this scope. So let's do it to the experiment branch
The rest of your comments I have fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code snippet that you have posted above is common in all the modules and not just FRR. So changes in that code will touch all the components and exceeds the scope of this PR so let's do it better on the experiment branch to do it properly
You just added it in this PR, why do you need to rework the rest?
The replayThreshold variable is decided by each module itself and that is why we cannot pass it to the comp structure that easily.
It is the corresponding handle*() call which creates com, and can pass the threshold at construction?
I just do not want we continue adding debt. If it does not require to rework other components, so I'd prefer it is fixed in place
This change looks pretty straighforward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just added it in this PR, why do you need to rework the rest?
I didnt added this code snippet to this PR that was allready there.
comp.CompStatus = common.ComponentStatusSuccess
comp.Timer = 0
} else {
if comp.Timer == 0 {
comp.Timer = 2 * time.Second
} else {
comp.Timer *= 2
}
comp.CompStatus = common.ComponentStatusError
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about this comp here ?
opi-evpn-bridge/pkg/frr/frr.go
Line 163 in c9baa3e
var comp common.Component |
This is not the same comp object which we use in order to call the CheckReplayThreshold function
The object that we use to call this function is created here:
opi-evpn-bridge/pkg/frr/frr.go
Line 199 in c9baa3e
comp = svi.Status.Components[i] |
an is a different object so even if we add the replay threshold to the object in the line 163 will not be there in the line 199 because we call a function with a different object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I am saying that we need to find a better smarter way to handle this in the refactoring. Also the modules in general have duplicate code that can be written better so let's handle this also in the refactoring. WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments addresed
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
7a53660
to
71c21eb
Compare
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
4b50963
to
68d99a3
Compare
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
@artek-koltun I have addressed your comments. Can you please review again ? |
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
73dae37
to
5c16199
Compare
Signed-off-by: Dimitrios Markou <dimitrios.markou@ericsson.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The debt is growing
In order to move the evpn-gw-bridge into a more resilient path we have implemented the replaydb feature.
When a module (e.g. FRR module) confronts a permanent error then stops retrying and the replay procedure kicks in. The replay procedure will clean up all the system configuration (e.g. FRR daemon) that is related to the failed module and will replay the relevant objects from the database in order to bring the system configuration back on track again.
The replay feature currently is only supported for the FRR module and the FRR daemon. Support for all the modules will be added in the future