Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant slow down when executed in quick succession #34

Closed
tncks0121 opened this issue Aug 20, 2017 · 12 comments
Closed

Significant slow down when executed in quick succession #34

tncks0121 opened this issue Aug 20, 2017 · 12 comments

Comments

@tncks0121
Copy link

tncks0121 commented Aug 20, 2017

I am just reopening issue #29 since I am really concerned with this issue. In my server, the delay is about 0.4-0.5s, which is significant in contests which has usual time limit 1 sec.

Before I saw that issue, I just decided to use another sandbox only for C/C++ (but using isolate for other languages, because they are slow itself), but it would be good for me to use isolate for all languages.

I'm not sure whether this is a kernel issue. We checked the time of successive isolate executions in two machines on Linode, using this bash script.

sumtime=0
for ((i=1; i<=10; i++)); do
   isolate --init --box-id=$(echo $i % 100 | bc) >/dev/null
   /usr/bin/time -f %e -o /tmp/memotime isolate --run -b $(echo $i % 100 | bc) -- /bin/echo 1>/dev/null 2>/dev/null
   isolate --cleanup -b $(echo $i % 100 | bc) >/dev/null
   echo iteration $i : $(cat /tmp/memotime)
   sumtime=$(echo $sumtime + $(cat /tmp/memotime) | bc)
done
echo total time = $sumtime

Machine 1: working bad

root@ubuntu:~# uname -a
Linux ubuntu 4.9.36-x86_64-linode85 #1 SMP Thu Jul 6 15:31:23 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu:~# bash test.sh
iteration 1 : 0.06
iteration 2 : 0.49
iteration 3 : 0.50
iteration 4 : 0.52
iteration 5 : 0.47
iteration 6 : 0.45
iteration 7 : 0.48
iteration 8 : 0.48
iteration 9 : 0.43
iteration 10 : 0.46
total time = 4.34

Machine 2: working nicely

root@ubuntu:~# uname -a
Linux ubuntu 4.9.36-x86_64-linode85 #1 SMP Thu Jul 6 15:31:23 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu:~# bash test.sh
iteration 1 : 0.01
iteration 2 : 0.04
iteration 3 : 0.06
iteration 4 : 0.04
iteration 5 : 0.06
iteration 6 : 0.06
iteration 7 : 0.04
iteration 8 : 0.07
iteration 9 : 0.06
iteration 10 : 0.06
total time = .50

Also I executed the script on a Docker container (on my Mac): working bad

root@8b518e8841a9:~# uname -a
Linux 8b518e8841a9 4.9.36-moby #1 SMP Wed Jul 12 15:29:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@8b518e8841a9:~# bash test/m.sh
iteration 1 : 0.08
iteration 2 : 0.97
iteration 3 : 0.90
iteration 4 : 0.91
iteration 5 : 0.89
iteration 6 : 0.91
iteration 7 : 0.87
iteration 8 : 0.94
iteration 9 : 0.96
iteration 10 : 0.99
total time = 8.42

Maybe the difference is just Linode's fault, but I think this difference is notable.

@hermanzdosilovic
Copy link
Contributor

Hey @tncks0121, why do you care about run time of isolate itself? Isn't just run time of user's program important? 😕

@tncks0121
Copy link
Author

@hermanzdosilovic, yes, that is correct, but suppose you are conducting a contest and each submission needs at least 0.4*100 seconds just because of the sandbox when the solution is super easy. In this situation the queue will be very long quickly if one has not that many workers.
Also I was planning to use isolate not only for cms but in general judging (users can see how much judging has done), so speed was important.

@hermanzdosilovic
Copy link
Contributor

hermanzdosilovic commented Aug 21, 2017 via email

@bblackham
Copy link
Contributor

Hi! I can't reproduce this on any of my machines, VMs or physical.

Can you try the following:

  • strace the isolate --run command by prefixing it with sudo strace -u $USER -ttTfo /tmp/strace-$(echo $i % 100 | bc)" to generate one strace per invocation, and see if there are any particular syscalls in each invocation that are taking the majority of the time?
  • Assuming what is suggested by this comment is true, that wait() or exit() is the culprit, the strace will hopefully show that. The next step to debug would be use oprofile on maybe 100 runs (with kernel symbols enabled) and see if there are any obvious hotspots in the kernel.

Or if you are able to provide a shell to a machine that can reproduce this, I'd be happy to investigate. (It probably requires root to diagnose though).

@tncks0121
Copy link
Author

@bblackham, I did what you said, and it seems this is the bottleneck. Unfortunately I don't know how to read these files, so I'm not sure what the problem is..

16020 06:55:34.505601 clone( <unfinished ...>
16037 06:55:34.864662 getpid( <unfinished ...>
16020 06:55:34.864715 <... clone resumed> child_stack=0x7ffe16ee62b0, flags=CLONE_NEWNS|CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 16037 <0.359109>
16037 06:55:34.864733 <... getpid resumed> ) = 1 <0.000027>

strace-1.txt
strace-2.txt

@bblackham
Copy link
Contributor

@tncks0121, that clone() definitely looks to be the culprit. No user code is executing between the start of the clone() and its completion. This points at the Linux kernel and something about cloning a task into a new namespace. Are you able to run operf on the affected machines? I'm not certain that operf will work in certain types of VM (I think it requires direct hardware access to the MSRs).

That sort of latency would have to be either some kind of network traffic (maybe there is a small amount of buffering which is why the first one is okay?), a lot of memory zeroing (like, gigabytes, which would be strange), or evicting something to swap, or dropping some caches. Perhaps it is triggering some call out to a really slow userspace helper. If operf doesn't provide any information, I don't know how to diagnose further without being able to reproduce it locally. Can you help me reproduce it locally, or provide a shell to somewhere that it is reproducible?

@tncks0121
Copy link
Author

@bblackham, I tried but it seems it doesn't give any useful information. Maybe I've done in a wrong way as I don't know about operf.

root@ubuntu:~# operf  ./isolate --run -- /bin/echo 1>/dev/null 2>/dev/null
root@ubuntu:~# opreport
Using /root/oprofile_data/samples/ for samples directory.


WARNING! Some of the events were throttled. Throttling occurs when
the initial sample rate is too high, causing an excessive number of
interrupts.  Decrease the sampling frequency. Check the directory
/root/oprofile_data/samples/current/stats/throttled
for the throttled event names.


WARNING: Lost samples detected! See /root/oprofile_data/samples/operf.log for details.
CPU: Intel Broadwell microarchitecture, speed 2299.99 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
        7 53.8462 echo
	CPU_CLK_UNHALT...|
	  samples|      %|
	------------------
	        7 100.000 kallsyms
        6 46.1538 isolate
	CPU_CLK_UNHALT...|
	  samples|      %|
	------------------
	        6 100.000 kallsyms

Anyway, I'll try to find a way to reproduce in a new machine. (both tested machines are already using ones)

@SemaiCZE
Copy link

I think that the execution speed depends on the machine, only the first one or first two iterations are significantly faster (6 times or so). This can be seen on both "good" and "bad" machines - the good one has first iteration 0.01 and then about 0.06 and the bad one 0.06 and then about 0.48, which is almost the same slowdown for both of them.

My measurement on CentOS VPS:

iteration 1 : 0.00
iteration 2 : 0.00
iteration 3 : 0.06
iteration 4 : 0.06
iteration 5 : 0.06
iteration 6 : 0.05
iteration 7 : 0.07
iteration 8 : 0.06
iteration 9 : 0.06
iteration 10 : 0.06
total time = .48

@bblackham
Copy link
Contributor

Right, I can reproduce it under docker here. The killer is creating a separate networking namespace. If you pass --share-net to isolate --run, you should see that it runs significantly faster (but you won't have any network isolation). I suspect it is the same kernel issue reported here and here. The netns cleanup operations are batched and very expensive. The patch in the latter link made it into Linux 4.12, so you might expect some speed up with a newer kernel version.

@stefano-maggiolo
Copy link

Confirm that --share-net makes the timing difference completely disappear on my laptop.

@bblackham
Copy link
Contributor

bblackham commented Aug 21, 2017

Thanks for confirming @stefano-maggiolo. Some extra data points:

I never saw it on my VM tests earlier because I was running a single-CPU VM (where the RCU slowdown issues never occur). On a dual-CPU VM, without any iptables loaded, there is no issue. But then as soon as I run iptables -t nat -L, the various nat connection tracking modules are loaded and these cause a major slowdown. If I rmmod nf_conntrack (including all modules it depends on), the slowdown disappears again.

For @tncks0121 and anyone else affected, try blacklisting nf_conntrack (add blacklist nf_conntrack to some file in /etc/modprobe.d/ and reboot). That should be all there is to it, assuming your iptables firewall doesn't require the connection tracking modules. If it's still a problem, please paste the output of cat /proc/modules and uname -r.

@bblackham
Copy link
Contributor

Closing this issue as I believe it is definitely a Linux kernel bug and there's nothing isolate (or any sandbox that uses Linux network namespaces for network isolation) can do about it. A potential workaround is given in my previous comment (blacklisting nf_conntrack).

If this workaround solves the issue for you, please confirm here for posterity. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants