Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Issues with Routinator Containers #42

Open
mark-hgb opened this issue May 13, 2024 · 6 comments
Open

Performance Issues with Routinator Containers #42

mark-hgb opened this issue May 13, 2024 · 6 comments

Comments

@mark-hgb
Copy link

I am using a 40 AS Mini-Internet here, four regions with 10 ASes each, 2 tier-1, 2 stub (fully configured) and 6 tier-2 ASes (managed by the students). So a "classic" setup I would suppose. Find all config files attached:

40_as_config.zip

Currently the whole intra-domain stuff is done, eBGP session are configured and running, business relationships and IXPs are setup too. The connection matrix shows full connectivity, some paths are still invalid due to route leaks based on mishandled business relationships. RPKI stuff is not done at the moment.

Now I ran into some serious troubles. I observe heavy and rising load on the virtual machine (VM) running the Mini-Internet. The VM has 16 CPU cores, all are up between 95 and 100 % load, load average is between 55 and 70. Memory consumption is at around 44 GB of 64 GB in total. Here's the current output of htop on this VM:

minet_load

Some deeper analysis shows that a big part of this heavy load seems to originate from the routinator processes in the 40 ASes (look at the TIME column in the ps output):

routinator_load

The ones with the most CPU time are the ones in the fully configured tier-1 and stub ASes. Looking at one of the affected containers (group 12, tier-1, routinator running on the host at router GRZ) shows the following for ps:

g12_routinator

Using strace on one of the routinator process on the VM shows that, if I am right, the routinator process is spawning a lot of new processes "doing things". I attach a file with strace output here:

g18_grz_host_routinator_trace.txt

Any ideas what is going wrong here?

Thanks for your help in advance!
Markus

@mark-hgb
Copy link
Author

Since the performance problems got prestering I fixed it right now by restarting and reconfiguring the routinator containers. At the moment load seems to be as expected.

I did some further testing by recording pcaps on the routinator and the krill container. The rsync connections between the routinator and the krill containers generate aroung 7-8 MB (!) traffic in about one minute.

@NotSpecial
Copy link
Contributor

Hey, thanks for investigating this.

Do you know if this traffic between Routinator and Krill always happens, or was that a one-off thing?

@mark-hgb
Copy link
Author

Hi Alex,
thx for getting back! It happened midth of May and I solved it by restarting all the routinator containers (as mentioned above). It then again happened around start of June but then disappeared again without doing anything. And it its happening again at the moment. How can I assist in diagnosing the root cause?
Best regards
Markus

@NotSpecial
Copy link
Contributor

Unfortunately I am not very familiar with routinator. @KTrel, do you have any idea what could be causing these overheads?

@mark-hgb, do you have any idea whether some specific update commands or anything might be prompting the overheads, or whether its just regular routinator operations?

@mark-hgb
Copy link
Author

I am sorry, but I am not able to shed some more light on this. It all started even before sync sessions with the routinator containers were configured by the students on their BGP routers. So the state of the routinator containers was that they were configured with an IP address and a gateway and therefore were able to communicate and sync with the krill host. With that in mind I would say it all happened during regular routinator operations.

@NotSpecial
Copy link
Contributor

Still, thank you for the report. We'll see if we can find anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants