Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Routing Warm Reboot with Dynamic Reconciliation Signal" document #313

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

heidinet2007
Copy link
Contributor

The document is a brief description on how to add a dynamic EOIU signal to indicate the start of fpmsyncd to AppDB reconciliation

@msftclas
Copy link

msftclas commented Dec 18, 2018

CLA assistant check
All CLA requirements met.

Copy link
Contributor

@jipanyang jipanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add test plan as well?

@jipanyang
Copy link
Contributor

While walking through FRR code with the proposed change in this design document, I noticed that:

  1. eoiu mark from BGP will only be generated when bgp update-delay is configured. This is the prerequisite, but missing in the design.
  2. all the proposed changes are embedded into the core processing logic of bgp/zebra route processing, quite some special handlings are needed here and there. The risk is relatively high, also how to keep compatible with FRR upstream will be a big issue.

@jipanyang
Copy link
Contributor

I would recommend taking another approach to optimize the timing of route reconciliation in fpmsyncd upon warm-reboot/warm-restart.

  1. Similar to restore_neigbors.py for neigborsyncd, start a eoiu_reconciliation.py for bgp docker.

  2. The script check bgp neighbor state via cli interface periodically (every 1 second)
    It looks for explicit EOR and implicit EOR (keep alive after established) in the json output of show ip bgp neighbors A.B.C.D json

  3. Once the script has collected all needed EORs, it set a EOIU flag in stateDB.

  4. fpmsyncd could hold a few seconds (2~5 seconds) after getting the flag before starting routing reconciliation.
    2-5 seconds should be enough for all the route to be synced to fpmsyncd from bgp. If not, the system probably is already in wrong state.

  5. For any reason the script failed to set EOIU flag in stateDB, the current warm_restart bgp_timer will kick in later.

This approach may have a few more seconds delay compared with the FRR embedded EOIU solution, but simple and less risk. It also shortens the reconciliation delay by one order of magnitude in most cases compared with the fixed warm_restart bgp_timer.

@yxieca yxieca force-pushed the master branch 2 times, most recently from 8498931 to 8837dc2 Compare April 15, 2022 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants