-
Notifications
You must be signed in to change notification settings - Fork 302
Service periodically restarted #1402
Comments
Hash seems to "flip" between values: Etcd values: |
Hash "flipping" values:
|
Unit file keeps changing layout
|
@eduardBM Can you provide your etcd configuration? |
@kayrus
Requested information:
Thanks, |
@eduardBM how did you submit this unit? |
fleet currently uses go-systemd @ cf3cdf77462baaad163ad2d5d1984b9c1b493701: Line 45 in 81d6c4a
coreos/go-systemd@cf3cdf7 First thing to check is that that version of go-systemd guaranteed stable serialisation of units. (There have been a number of changes since then, e.g. coreos/go-systemd@6654289 and coreos/go-systemd@3130945, but that won't be relevant until #1375 lands) |
@kayrus |
@jonboulle |
@eduardBM can you reproduce this behavior with the pure |
@kayrus , yes i was able to reproduce it with the cli command.
Unit HASH changes every couple of seconds. |
Here is how I reproduced the issue on ubuntu 16.04 vm: $ apt-get install build-essential golang-go python-dev pipexec fleet etcd -y
$ dpkg -i fleet_0.11.5+dfsg-1_amd64.deb etcd_2.2.3+dfsg-1_amd64.deb
$ git clone https://github.com/cnelson/python-fleet
$ cd python-fleet && python setup.py install
$ systemctl start fleet
$ fleetctl list-machines --full
MACHINE IP METADATA
22e41ca31da38ea956ade49269567301 192.168.122.147 -
$ cat app.service
[Unit]
Description=simple app
[Service]
Type=simple
ExecStart=/bin/bash -c 'while true; do echo Hello, World; sleep 1; done'
Restart=on-failure
RestartSec=5
TimeoutStopSec=60
[Install]
WantedBy=multi-user.target
[X-Fleet]
MachineID=22e41ca31da38ea956ade49269567301
$ fleetctl start app
Unit app.service inactive
Unit app.service launched on 22e41ca3.../192.168.122.147
$ fleetctl start app
WARNING: Unit app.service in registry differs from local unit file app.service
$ fleetctl start app
WARNING: Unit app.service in registry differs from local unit file app.service
$ fleetctl start app
$ fleetctl start app
$ fleetctl start app
WARNING: Unit app.service in registry differs from local unit file app.service
$ sha1sum app.service
fb30d997678396b3938b82ffcef1b5ac4d214e0a app.service
$ etcdctl ls --recursive /_coreos.com/fleet/unit
etcdctl ls --recursive /_coreos.com/fleet/unit
/_coreos.com/fleet/unit/39e247ee21fae23bf0f1a907d33f249d6f90d27f And the normal behavior should be: $ sha1sum app.service
fb30d997678396b3938b82ffcef1b5ac4d214e0a app.service
$ etcdctl ls --recursive /_coreos.com/fleet/unit
/_coreos.com/fleet/unit/fb30d997678396b3938b82ffcef1b5ac4d214e0a
$ etcdctl get /_coreos.com/fleet/job/app.service/object | sed -r 's/.*\[([,0-9]+)\].*/\1/g' | tr ',' '\n' | xargs -I{} printf "%x" {} && echo
fb30d997678396b3938b82ffcef1b5ac4d214e0a I've added extra debug code into ubuntu package: if err == nil && luf.Hash() != suf.Hash() {
stderr("ERROR: %s", err)
stderr("WARNING: %s != %s", luf.Hash(), suf.Hash())
stderr("WARNING: Unit %s in registry differs from local unit file %s", su.Name, loc)
return
} And here is the result: fleetctl start app.service
ERROR: %!s(<nil>)
WARNING: 55651717f8a83c2815ff428b328bf26000135988 != 28887a7ede907207d9ea8a8e8dbda1d9d98b1e8a
WARNING: Unit app.service in registry differs from local unit file app.service
root@ubuntu1:~# fleetctl start app.service
ERROR: %!s(<nil>)
WARNING: 28887a7ede907207d9ea8a8e8dbda1d9d98b1e8a != 28887a7ede907207d9ea8a8e8dbda1d9d98b1e8a
WARNING: Unit app.service in registry differs from local unit file app.service
root@ubuntu1:~# fleetctl start app.service
ERROR: %!s(<nil>)
WARNING: 39e247ee21fae23bf0f1a907d33f249d6f90d27f != 28887a7ede907207d9ea8a8e8dbda1d9d98b1e8a
WARNING: Unit app.service in registry differs from local unit file app.service
root@ubuntu1:~# fleetctl start app.service
ERROR: %!s(<nil>)
WARNING: 55651717f8a83c2815ff428b328bf26000135988 != 55651717f8a83c2815ff428b328bf26000135988
WARNING: Unit app.service in registry differs from local unit file app.service Sometimes when you submit new unit, fleet just returns this: $ fleetctl start app
2016/02/11 11:50:54 WARN fleetctl.go:801: Error retrieving Unit(app.service) from Registry: googleapi: got HTTP response code 500 with body: {"error":{"code":500,"message":""}} Looks like desired units structure has different order every loop. And this bug is valid only in Ubuntu Xenial build. If you build fleet in docker env using |
Now that go-systemd has been bumped in fleet master, could you please check whether this issue persists? |
Oh, I think this wasn't entirely clear from previous discussion (at least not to me): The problem manifests only with the Ubuntu packaging for the (yet unreleased) "xenial" distribution (i.e. 16.04). A "manual" build (using docker) works fine, even on Ubuntu xenial. Considering that, should we still consider it part of the release milestone?... |
@antrik |
@eduardBM no, there is no new package; just an update in the upstream master branch. Also, it turns out this is actually unlikely to change anything... BTW, is this issue known to the Ubuntu packagers? It seems to me that in this instance they might actually be more qualified to investigate it... |
@antrik Not sure if this issue is known to the Ubuntu packagers? How do i get their attention? |
@eduardBM we still don't know exactly what causes this bug (code in fleet, go dependencies or something else). but you are free to report a bug here: https://bugs.launchpad.net/ubuntu/+source/fleet/+filebug |
@eduardBM you can use the "reportbug" command line tool, or the web interface kayrus linked if you prefer. Either way, be sure to include a link to this discussion :-) As for building it manually, this should help if you need a working version urgently. (Use the "build-docker" script for a known working result.) Otherwise you might want to wait for feedback from the Ubuntu packagers... |
I've just compiled and tested this fleet package: this bug doesn't exist. Default go-systemd package in ubuntu xenial has v3 version. It looks like go-systemd should be bumped. |
@eduardBM go-systemd v4 has fix for serialization order: coreos/go-systemd@3130945 |
Bump request was reported here: https://bugs.launchpad.net/ubuntu/+source/golang-github-coreos-go-systemd/+bug/1568902 |
Consider the matter closed |
@kayrus Thanks for your help, i'll keep an eye on the other bug report. |
Hi,
I see something similar to #1366 on my setup:
Log:
Jan 14 14:39:24 o11n203 fleetd[4494]: DEBUG reconcile.go:257: Desired hash "5672ec4bdaefa511f7617e25c40a50fd6814bde1" differs to current hash 08e8f21aa9788333ec99d7eed7ef9754a33fb724 of Job(ovs-snmp@10.130.11.203.service) - unloading
Jan 14 14:39:24 o11n203 fleetd[4494]: DEBUG reconcile.go:321: AgentReconciler attempting tasks [{UnloadUnit unit loaded but hash differs to expected %!s(_job.Unit=&{ovs-snmp@10.130.11.203.service {map[Unit:map[After:[ovs-watcher-framework.service] Description:[ovs snmp server] Requires:[ovs-watcher-framework.service]] Service:map[ExecStart:[/usr/bin/python2 /opt/OpenvStorage/ovs/extensions/snmp/ovssnmpserver.py --port 161] Restart:[on-failure] RestartSec:[5] TimeoutStopSec:[60] Type:[simple] Environment:[PYTHONPATH=/opt/OpenvStorage] WorkingDirectory:[/opt/OpenvStorage]] Install:map[WantedBy:[multi-user.target]] X-Fleet:map[MachineID:[93aae5bf7736d103238cb3b0569655ae]]] [0xc82086ef60 0xc82086efc0 0xc82086f020 0xc82086f080 0xc82086f0e0 0xc82086f140 0xc82086f1a0 0xc82086f200 0xc82086f260 0xc82086f2c0 0xc82086f320 0xc82086f3b0]} })} {UnloadUnit unit loaded but hash differs to expected %!s(_job.Unit=&{ovs-support-agent@10.130.11.203.service {map[Unit:map[Description:[Open vStorage support agent]] Service:map[Type:[simple] Environment:[PYTHONPATH=/opt/OpenvStorage] ExecStart:[/usr/bin/python2 /opt/OpenvStorage/ovs/extensions/support/agent.py] Restart:[on-failure] TimeoutStopSec:[3600]] Install:map[WantedBy:[multi-user.target]] X-Fleet:map[MachineID:[93aae5bf7736d103238cb3b0569655ae]]] [0xc82018b770 0xc82018b7d0 0xc82018b830 0xc82018b890 0xc82018b8f0 0xc82018b950 0xc82018b9b0 0xc82018ba40]} })} {LoadUnit unit scheduled here but not loaded %!s(*job.Unit=&{ovs-snmp@10.130.11.203.service {map[Unit:map[Description:[ovs snmp server] Requires:[ovs-watcher-framework.service] After:[ovs-watcher-framework.service]] Service:map[Environment:[PYTHONPATH=/opt/OpenvStorage] WorkingDirectory:[/opt/OpenvStorage] ExecStart:[/usr/bin/python2 /opt/OpenvStorage/ovs/extensions/snmp/ovssnmpserver.py --port 161] Restart:[on-failure] RestartSec:[5] TimeoutStopSec:[60] Type:[simple]] Install:map[WantedBy:[multi-user.target]] X-Fleet:map[MachineID:[93aae5bf7736d103238cb3b0569655ae]]] [0xc82086ef60 0xc82086efc0 0xc82086f020 0xc82086f080 0xc82086f0e0 0xc82086f140 0xc82086f1a0 0xc8208
Jan 14 14:39:24 o11n203 fleetd[4494]: INFO manager.go:138: Triggered systemd unit ovs-snmp@10.130.11.203.service stop: job=165352
Jan 14 14:39:24 o11n203 fleetd[4494]: INFO manager.go:259: Removing systemd unit ovs-snmp@10.130.11.203.service
I looked at #720, but i don't see a way to fix this.
Any ideas?
The text was updated successfully, but these errors were encountered: