-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issues with Routinator Containers #42
Comments
Since the performance problems got prestering I fixed it right now by restarting and reconfiguring the routinator containers. At the moment load seems to be as expected. I did some further testing by recording pcaps on the routinator and the krill container. The rsync connections between the routinator and the krill containers generate aroung 7-8 MB (!) traffic in about one minute. |
Hey, thanks for investigating this. Do you know if this traffic between Routinator and Krill always happens, or was that a one-off thing? |
Hi Alex, |
I am sorry, but I am not able to shed some more light on this. It all started even before sync sessions with the routinator containers were configured by the students on their BGP routers. So the state of the routinator containers was that they were configured with an IP address and a gateway and therefore were able to communicate and sync with the krill host. With that in mind I would say it all happened during regular routinator operations. |
Still, thank you for the report. We'll see if we can find anything. |
I am using a 40 AS Mini-Internet here, four regions with 10 ASes each, 2 tier-1, 2 stub (fully configured) and 6 tier-2 ASes (managed by the students). So a "classic" setup I would suppose. Find all config files attached:
40_as_config.zip
Currently the whole intra-domain stuff is done, eBGP session are configured and running, business relationships and IXPs are setup too. The connection matrix shows full connectivity, some paths are still invalid due to route leaks based on mishandled business relationships. RPKI stuff is not done at the moment.
Now I ran into some serious troubles. I observe heavy and rising load on the virtual machine (VM) running the Mini-Internet. The VM has 16 CPU cores, all are up between 95 and 100 % load, load average is between 55 and 70. Memory consumption is at around 44 GB of 64 GB in total. Here's the current output of htop on this VM:
Some deeper analysis shows that a big part of this heavy load seems to originate from the routinator processes in the 40 ASes (look at the TIME column in the ps output):
The ones with the most CPU time are the ones in the fully configured tier-1 and stub ASes. Looking at one of the affected containers (group 12, tier-1, routinator running on the host at router GRZ) shows the following for ps:
Using strace on one of the routinator process on the VM shows that, if I am right, the routinator process is spawning a lot of new processes "doing things". I attach a file with strace output here:
g18_grz_host_routinator_trace.txt
Any ideas what is going wrong here?
Thanks for your help in advance!
Markus
The text was updated successfully, but these errors were encountered: