Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

make -j fails on arm64 with clock skew error #1293

Closed
umarcor opened this issue Feb 28, 2019 · 8 comments
Closed

make -j fails on arm64 with clock skew error #1293

umarcor opened this issue Feb 28, 2019 · 8 comments

Comments

@umarcor
Copy link

umarcor commented Feb 28, 2019

Coming from #1245:

[@umarcor]
BTW, do you think that the clock skew issue mentioned there is worth a separate issue here?

[@Pennyzct]
Hi~@umarcor Could you please offer specific bug info about clock skew ? ;). Early days, we once encountered related issue, grab the discussion here,

[@umarcor]
The issue is related to building either ghdl/ghdl or DynamoRIO/dynamorio by running either make -j or make -j$(nproc). Lots of messages as the following are shown and the build fails:

  • Clock skew detected. Your build may be incomplete.
  • Warning: File <file> has modification time 0.008 s in the future

If I execute make instead, or if I limit nproc through --cpuset-cpus so that a single core is used, builds are successful. So, this seems not to be a problem with time between the container and the host, but between multiple taks being executed in different cores (all of them in the container/VM).

[@Pennyzct]
The flag --cpuset-cpus isn't applied for kata containers on arm64, since the underlying technique is based on cpu hotplug via acpi, which isn't applicable on arm.
And based on default qemu flag, -smp 1,cores=1,threads=1,sockets=1,maxcpus=123, we only provide one vcpu for container/VM on arm. so if you are running make -j$(nproc) in kata container/VM, that's quite the same thing as make -j1 .
when you switch to runc, does above errors still occur?



Hi~@umarcor so sorry for the delay.

I'm so sorry now. I was sidetracked for a few days.

The flag --cpuset-cpus isn't applied for kata containers on arm64, since the underlying technique is based on cpu hotplug via acpi, which isn't applicable on arm.

Fortunately (for me), this seems not to be the case here. See:

~$ docker run --rm -it  --runtime=runc alpine nproc
8
~$ docker run --rm -it  --runtime=runc --cpuset-cpus=0 alpine nproc
1
~$ docker run --rm -it  --runtime=runc --cpuset-cpus=0-3 alpine nproc
4
~$ docker run --rm -it  --runtime=kata-runtime alpine nproc
8
~$ docker run --rm -it  --runtime=kata-runtime --cpuset-cpus=0 alpine nproc
1
~$ docker run --rm -it  --runtime=kata-runtime --cpuset-cpus=0-3 alpine nproc
4

And based on default qemu flag, -smp 1,cores=1,threads=1,sockets=1,maxcpus=123, we only provide one vcpu for container/VM on arm. so if you are running make -j$(nproc) in kata container/VM, that's quite the same thing as make -j1 .

As shown above, it does make a difference. With make, make -j1 or make + --cpuset-cpus=x (where x is a single integer between 0 and 7), the build is successful. With make -jx (where x>1) or make + --cpuset-cpus=x (where x is a comma separated list of integers between 0 and 7, or a range in the same range; e.g. 4-6), the build fails.

when you switch to runc, does above errors still occur?

No, they do not. But trying to reproduce it for you is driving me mad. Initially, I stumbled upon this issue while trying to docker build in a machine where I didn't know that kata-runtime was the default. This case is failing consistently. See log: kata.log (search for 'skew' and 'future'). Then, I change default-runtime in /etc/docker/daemon.json to runc, restarted docker (systemctl restart docker) and run docker build again. This is consistently successful.

However, I tried to get the same result with docker run, and I could not. The build is successful with either docker run --runtime=runc or docker run --runtime=kata-runtime, independently of which is the default in /etc/docker/daemon.json. Maybe you have already fixed something for docker run but it is not being used in docker build?

@hejianet
Copy link

Coming from #1245:

[...]

~$ docker run --rm -it  --runtime=runc alpine nproc
8
~$ docker run --rm -it  --runtime=runc --cpuset-cpus=0 alpine nproc
1
~$ docker run --rm -it  --runtime=runc --cpuset-cpus=0-3 alpine nproc
4
~$ docker run --rm -it  --runtime=kata-runtime alpine nproc
8

Could you please run docker run --rm -it --runtime=kata-runtime alpine sh ?
And then please open another terminal, run ps -efa |grep qemu

It looks strange to me that the result of kata-runtime is the same as runc.

~$ docker run --rm -it --runtime=kata-runtime --cpuset-cpus=0 alpine nproc
1
~$ docker run --rm -it --runtime=kata-runtime --cpuset-cpus=0-3 alpine nproc
4


> And based on default qemu flag, `-smp 1,cores=1,threads=1,sockets=1,maxcpus=123`, we only provide one vcpu for container/VM on arm. so if you are running `make -j$(nproc)` in kata container/VM, that's quite the same thing as `make -j1` .

As shown above, it does make a difference. With `make`, `make -j1` or `make` + `--cpuset-cpus=x` (where x is a single integer between 0 and 7), the build is successful. With `make -jx` (where `x>1`) or `make` + `--cpuset-cpus=x` (where x is a comma separated list of integers between 0 and 7, or a range in the same range; e.g. 4-6), the build fails.

> when you switch to `runc`, does above errors still occur?

No, they do not. But trying to reproduce it for you is driving me mad. Initially, I stumbled upon this issue while trying to `docker build` in a machine where I didn't know that kata-runtime was the default. This case is failing consistently. See log: [kata.log](https://github.com/kata-containers/runtime/files/2912950/kata.log) (search for 'skew' and 'future'). Then, I change `default-runtime` in `/etc/docker/daemon.json` to `runc`, restarted docker (`systemctl restart docker`) and run `docker build` again. This is consistently successful.

However, I tried to get the same result with `docker run`, and I could not. The build is successful with either `docker run --runtime=runc` or `docker run --runtime=kara-runtime`, independently of which is the default in `/etc/docker/daemon.json`. Maybe you have already fixed something for `docker run` but it is not being used in `docker build`?

Could you please collect the results of kata-collect-data.sh ?

@hejianet
Copy link

I tested the building with "make -j8" DynamoRIO/dynamori in kata 1.5.0. Not reproducible

@umarcor
Copy link
Author

umarcor commented Feb 28, 2019

Could you please run docker run --rm -it --runtime=kata-runtime alpine sh ?
And then please open another terminal, run ps -efa |grep qemu

It looks strange to me that the result of kata-runtime is the same as runc.

I think that you expect me to test if kata-runtime is being used, or if the option is being ignored. It is being used. If I run htop in a terminal and then start a container in a different one I can see a completely different behaviour with --runtime=runc or --runtime=kata-runtime. This is the output os ps:

# Container running with --runtime=kata-runtime
~$ ps -efa | grep qemu
root     14148 14105  4 05:16 ?        00:00:02 /snap/kata-containers/33/usr/bin/qemu-system-aarch64 -name sandbox-1979b152565b19ceb34faf9b2c391fe0050403fb22ed9b663c936d933440041a -uuid 16f0fc2a-dc9f-475e-8533-e8f3ba5f2093 -machine virt,usb=off,accel=kvm,gic-version=host -cpu host -qmp unix:/run/vc/vm/1979b152565b19ceb34faf9b2c391fe0050403fb22ed9b663c936d933440041a/qmp.sock,server,nowait -m 2048M,slots=10,maxmem=32129M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/1979b152565b19ceb34faf9b2c391fe0050403fb22ed9b663c936d933440041a/console.sock,server,nowait -device virtio-scsi-pci,id=scsi0,disable-modern=false,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng,rng=rng0,romfile= -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/vm/1979b152565b19ceb34faf9b2c391fe0050403fb22ed9b663c936d933440041a/kata.sock,server,nowait -device virtio-9p-pci,disable-modern=false,fsdev=extra-9p-kataShared,mount_tag=kataShared,romfile= -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/1979b152565b19ceb34faf9b2c391fe0050403fb22ed9b663c936d933440041a,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3:4:5:6:7:8:9:10,fds=11:12:13:14:15:16:17:18 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,disable-modern=false,mq=on,vectors=18,romfile= -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/kata-containers/kata-vmlinuz-4.14.67.container -initrd /snap/kata-containers/33/usr/share/kata-containers/kata-containers-initrd.img -append console=hvc0 console=hvc1 iommu.passthrough=0 quiet panic=1 nr_cpus=8 -smp 8,cores=1,threads=1,sockets=8,maxcpus=
# Stop container and start anothe one with --runtime=runc
~$ ps -efa | grep qemu
umarcor     14354 13999  0 05:19 pts/1    00:00:00 grep qemu

Could you please collect the results of kata-collect-data.sh ?

As commented in #1245, kata-collect-data.sh is not found on the device. How/where can I get it from?

@umarcor
Copy link
Author

umarcor commented Feb 28, 2019

I tested the building with "make -j8" DynamoRIO/dynamori in kata 1.5.0. Not reproducible

Did you try it inside a container started with docker run? If so, the error won't show, as commented above. Please try building either of the following Dockerfiles with the default runtime set to kata-runtime, since --runtime is not supported:

FROM arm64v8/ubuntu:bionic

RUN apt-get update -qq \
  && DEBIAN_FRONTEND=noninteractive apt-get -y install --no-install-recommends \
      ca-certificates \
      clang-6.0 \
      gcc \
      git \
      gnat \
      llvm-6.0-dev \
      make \
      zlib1g-dev \
  && apt-get autoclean && apt-get clean && apt-get -y autoremove \
  && update-ca-certificates

RUN git clone https://github.com/ghdl/ghdl && cd ghdl \
 && ./dist/travis/build.sh -b llvm-6.0 -p ghdl-llvm
RUN apt update \
 && DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends \
   cmake \
   gcc \
   g++-6 \
 && apt autoclean && apt clean && apt -y autoremove \
 && rm /usr/bin/gcc \
 && ln -s /usr/bin/gcc-6 /usr/bin/gcc \
 && ln -s /usr/bin/g++-6 /usr/bin/g++ \
 && git clone https://github.com/DynamoRIO/dynamorio /opt/dynamorio && cd /opt/dynamorio \
 && mkdir build && cd build \
 && cmake .. && make -j

@hejianet
Copy link

@umarcor Hi, the qemu command line "-smp 8" indicates you might use a old kata version. The latest kata uses -smp 1 instead.
The old version kata on arm64 had a rtc driver missing bug which had been mentioned by @Pennyzct. And I thought it might cause the timebase messing up on arm64.

To answer your question:
I ran the "make -j8" in kata container instead of host.

IMO, you can use the kata 1.5.0 firstly to build you dynamorio.

@umarcor
Copy link
Author

umarcor commented Feb 28, 2019

@umarcor Hi, the qemu command line "-smp 8" indicates you might use a old kata version. The latest kata uses -smp 1 instead.
The old version kata on arm64 had a rtc driver missing bug which had been mentioned by @Pennyzct. And I thought it might cause the timebase messing up on arm64.

I think this is not the case:

~$ kata-containers.runtime --version
kata-runtime  : 1.5.0
   commit   : 5f7fcd773089a615b776862f92217e987f06df0a-dirty
   OCI specs: 1.0.1-dev

Unless you mean that the latest stable release is old for this issue, and I should be building kata from sources.

To answer your question:
I ran the "make -j8" in kata container instead of host.

But it is not a matter about container or host:

  • docker run --runtime=runc. OK.
  • docker run --runtime=kata-runtime. OK.
  • docker build (default runtime runc). OK.
  • docker build (default runtime kata-runtime). FAIL.

@grahamwhaley
Copy link
Contributor

/cc @mcastelino who mentioned clock skew the other day - not sure if that is applicable here or not.

@umarcor
Copy link
Author

umarcor commented Mar 15, 2019

ping @mcastelino

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants