You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.
$ etcdctl member list
4c32b6c5bcf8cfd4: name=78b98c5dfe468973b93e098a5a5829ea peerURLs=http://192.168.122.188:2380 clientURLs=http://192.168.122.188:2379
92b755ba67be7f8b: name=0903835ba13faf7f62cf5d8ddc7812de peerURLs=http://192.168.122.166:2380 clientURLs=http://192.168.122.166:2379
cafc731b5b471889: name=f8f4ec94d4193d2f160e8bd22ed360be peerURLs=http://192.168.122.135:2380 clientURLs=http://192.168.122.135:2379
ping time between etcd members: ~0.469 ms
fleetd configuration:
# /etc/systemd/system/fleet.service.d/10-extra-options.conf[Service]Environment=FLEET_ETCD_REQUEST_TIMEOUT=2 # I used that to avoid etcd registry timeouts.# /etc/systemd/system/fleet.service.d/10-opt-bin.conf[Service]ExecStart=
ExecStart=/opt/bin/fleetd
# /run/systemd/system/fleet.service.d/20-cloudinit.conf[Service]Environment="FLEET_METADATA=hostname=coreos1"Environment="FLEET_PUBLIC_IP=coreos1"
Template unit file hello@.service:
[Unit]Description=My Service
[Service]TimeoutStartSec=0
ExecStart=/bin/sh -c "trap 'exit 0' INT TERM; while true; do echo Hello World %i; sleep 10; done"
Steps to reproduce:
$ fleetctl submit hello@.service
Unit hello@.service inactive
$ time fleetctl start hello@{1..10}.service
Unit hello@1.service inactive
Unit hello@2.service inactive
Unit hello@3.service inactive on 0903835b.../192.168.122.166
Unit hello@4.service inactive
Unit hello@5.service inactive on 0903835b.../192.168.122.166
Unit hello@6.service inactive
Unit hello@7.service inactive
Unit hello@8.service inactive
Unit hello@9.service inactive
Unit hello@10.service inactive on 0903835b.../192.168.122.166
Unit hello@1.service launched on 0903835b.../192.168.122.166
Unit hello@3.service launched on f8f4ec94.../192.168.122.135
Unit hello@2.service launched on 78b98c5d.../192.168.122.188
Unit hello@4.service launched on 0903835b.../192.168.122.166
Unit hello@6.service launched on 78b98c5d.../192.168.122.188
Unit hello@7.service launched on 78b98c5d.../192.168.122.188
Unit hello@5.service launched on f8f4ec94.../192.168.122.135
Unit hello@8.service launched on f8f4ec94.../192.168.122.135
Unit hello@9.service launched on 0903835b.../192.168.122.166
Unit hello@10.service launched on 0903835b.../192.168.122.166
real 0m26.688s
user 0m0.057s
sys 0m0.007s
26 seconds to schedule 10 units? Is it by design: at first submit all units, then start them? I thought that fleetctl should just submit unit with desired state, but not submit unit, then change desired state.
Then... Why for some units we can see additional info like Unit hello@3.service inactive on 0903835b.../192.168.122.166?
Another example without template:
$ foriin {1..10};do sed "s/%i/$i/g" hello@.service > hello$i.service;done
$ time fleetctl start hello{1..10}.service
Unit hello1.service inactive
Unit hello2.service inactive
Unit hello3.service inactive
Unit hello4.service inactive
Unit hello5.service inactive
Unit hello6.service inactive
Unit hello7.service inactive
Unit hello8.service inactive
Unit hello9.service inactive
Unit hello10.service inactive
Unit hello2.service launched
Unit hello3.service launched
Unit hello4.service launched on 78b98c5d.../192.168.122.188
Unit hello5.service launched on 0903835b.../192.168.122.166
Unit hello1.service launched on 0903835b.../192.168.122.166
Unit hello7.service launched on f8f4ec94.../192.168.122.135
Unit hello10.service launched on 78b98c5d.../192.168.122.188
Unit hello6.service launched on 0903835b.../192.168.122.166
Unit hello8.service launched on 0903835b.../192.168.122.166
Unit hello9.service launched on f8f4ec94.../192.168.122.135
real 0m29.444s
user 0m0.050s
sys 0m0.011s
We can see that there is no difference.
Here is the script which submits units directly into etcd registry with desired state and it doesn't wait for etcd response:
$ time ./units_submiter.sh
... ... ...
... ... ...
... ... ...
real 0m11.895s
user 0m0.279s
sys 0m0.151s
This example shows how we can improve fleet performance by three times (in case when we have slow etcd storage).
I suggest to use something like "parallel connections" limit. We can create N parallel tasks to submit units. Go channels should help. And in addition we have to make sure that fleetctl uses single TCP connection with etcd for all queries.
P.S. Just in case, command to remove all unit files: fleetctl list-unit-files --no-legend | awk '{print "/usr/bin/fleetctl destroy "$1}' | sh -s
This report would be mostly true, as we have been already seeing it from time to time. Yes it could be slow, especially when etcd is not able to respond immediately.
Though I would say, that's just how it is. It's rather a design issue.
The only alternative would be to get rid of dependency on etcd as much as possible. For example, turn on enable_grpc & use etcd3.
So I'll close this issue. Though thanks for the precise report!
probably relates to this issue and #1344, #1433 PRs.
When etcd datadir is located on slow storage (i.e. hdd), fleetctl acts really slow. when you move etcd storage into tmpfs - it works really fast.
CoreOS: 955.0.0
each CoreOS node:
etcd2: 2.2.5
fleet: latest master
512 Mb RAM
1 CPU
Cluster of 3 etcd member nodes and three fleet machines:
ping time between etcd members:
~0.469 ms
fleetd configuration:
Template unit file
hello@.service
:Steps to reproduce:
$ fleetctl submit hello@.service Unit hello@.service inactive $ time fleetctl start hello@{1..10}.service Unit hello@1.service inactive Unit hello@2.service inactive Unit hello@3.service inactive on 0903835b.../192.168.122.166 Unit hello@4.service inactive Unit hello@5.service inactive on 0903835b.../192.168.122.166 Unit hello@6.service inactive Unit hello@7.service inactive Unit hello@8.service inactive Unit hello@9.service inactive Unit hello@10.service inactive on 0903835b.../192.168.122.166 Unit hello@1.service launched on 0903835b.../192.168.122.166 Unit hello@3.service launched on f8f4ec94.../192.168.122.135 Unit hello@2.service launched on 78b98c5d.../192.168.122.188 Unit hello@4.service launched on 0903835b.../192.168.122.166 Unit hello@6.service launched on 78b98c5d.../192.168.122.188 Unit hello@7.service launched on 78b98c5d.../192.168.122.188 Unit hello@5.service launched on f8f4ec94.../192.168.122.135 Unit hello@8.service launched on f8f4ec94.../192.168.122.135 Unit hello@9.service launched on 0903835b.../192.168.122.166 Unit hello@10.service launched on 0903835b.../192.168.122.166 real 0m26.688s user 0m0.057s sys 0m0.007s
26 seconds to schedule 10 units? Is it by design: at first submit all units, then start them? I thought that fleetctl should just submit unit with desired state, but not submit unit, then change desired state.
Then... Why for some units we can see additional info like
Unit hello@3.service inactive on 0903835b.../192.168.122.166
?Another example without template:
We can see that there is no difference.
Here is the script which submits units directly into etcd registry with desired state and it doesn't wait for etcd response:
$ time ./units_submiter.sh ... ... ... ... ... ... ... ... ... real 0m11.895s user 0m0.279s sys 0m0.151s
This example shows how we can improve fleet performance by three times (in case when we have slow etcd storage).
I suggest to use something like "parallel connections" limit. We can create N parallel tasks to submit units. Go channels should help. And in addition we have to make sure that fleetctl uses single TCP connection with etcd for all queries.
P.S. Just in case, command to remove all unit files:
fleetctl list-unit-files --no-legend | awk '{print "/usr/bin/fleetctl destroy "$1}' | sh -s
/cc @jonboulle @tixxdz
The text was updated successfully, but these errors were encountered: