Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lease: do lease pile-up reduction in the background #9699

Closed
wants to merge 1 commit into from

Conversation

mgates
Copy link

@mgates mgates commented May 4, 2018

This moves lease pile reduction into a goroutine. This prevents timeouts
when the lessor is locked for a long time (when there are a lot of
leases, mostly).

We had a problem where when we had a lot of leases (100kish), and a leader election happened the lessor was locked for a long time causing timeouts when we tried to grant a lease. This changes it so that the lessor is locked in batches, which allows for creation of leases while the leader is initializing the lessor. It still won't start expiring leases until it's all done rewriting them, though.

Before:

BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op

After:

BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op

@mgates
Copy link
Author

mgates commented May 4, 2018

I'll have some more realistic test examples on Monday.

@cosgroveb
Copy link
Contributor

This addresses #9496

return ls
}

func (le *lessor) Promote(extend time.Duration) {
le.mu.Lock()
defer le.mu.Unlock()
le.Ready = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs lock around le.Ready?

le.Ready = false
go func() {
le.mu.RLock()
le.demotec = make(chan struct{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

le.demotec needs be protected by write lock.

@gyuho
Copy link
Contributor

gyuho commented May 9, 2018

Have you benchmarked with real work workloads?

@mgates
Copy link
Author

mgates commented May 16, 2018

Sorry for the benchmarking delay - still trying to find the time.

@mgates
Copy link
Author

mgates commented May 18, 2018

Still busy, but I blocked out some time at the end of next week to get it done. Sorry for the delay.

@mgates
Copy link
Author

mgates commented May 25, 2018

Ok - so I'm not sure what the best way to demonstarate this in a real-worldish way is, but here's what I did:

  1. set up the test cluster with goreman start
  2. put 1.9 million leases into etcd
  3. Ran this snippet which tries to make a new lease every half second, and kills the leader after 10 seconds
while true; do ETCDCTL_API=3 etcdctl --command-timeout 0.2s --endpoints "http://127.0.0.1:2379,http://127.0.0.1:22379,http://127.0.0.1:32379" lease grant  60  > /dev/null ; echo `date` $? ; sleep 0.5; done & sleep 10; pkill -f  -- "[-]-listen-peer-urls $(ETCDCTL_API=2 etcdctl  member list  | grep isLeader=true | cut -f 3 -d "="| cut -f 1 -d " ")"

On the branch it had 1 timeout exceed error, but on master it had 60, over the 61 seconds it took before it recovered.

Sorry it took so long - if these changes looks plausible to you, we can try to test it against our live, large workloads.

@mgates mgates force-pushed the break_up_lease_promotion branch 2 times, most recently from 10bf546 to 628570e Compare May 25, 2018 21:18
@mgates
Copy link
Author

mgates commented Jun 4, 2018

Hey folks - just wanted to check in about this - does it seem ok, at least in theory? If so, happy to work more on it.

lease/lessor.go Outdated
@@ -329,66 +332,84 @@ func (le *lessor) unsafeLeases() []*Lease {
for _, l := range le.leaseMap {
leases = append(leases, l)
}
sort.Sort(leasesByExpiry(leases))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change can be moved to a separate PR?

@xiang90
Copy link
Contributor

xiang90 commented Jun 4, 2018

@mgates

Can you:

  1. describe what problem (in which case what can happen and how this PR can help ) this PR solves
  2. move the sorting change into another PR

I can understand this PR as it is, but I guess it is easier for other people to review if you do what I suggested.

@mgates
Copy link
Author

mgates commented Jun 4, 2018

Sure.
We had a problem where when we had a lot of leases (100kish), and a leader election happened the lessor was locked for a long time causing timeouts when we tried to grant a lease. This changes it so that the lessor is locked in batches, which allows for creation of leases while the leader is initializing the lessor. It still won't start expiring leases until it's all done rewriting them, though.

mgates pushed a commit to mgates/etcd that referenced this pull request Jun 6, 2018
Because the leases were sorted inside UnsafeLeases() the lessor mutex
ended up being locked while the whole map was sorted. This pulls the
soring outside of the lock, per feedback on
etcd-io#9699
@xiang90
Copy link
Contributor

xiang90 commented Jun 6, 2018

We had a problem where when we had a lot of leases (100kish), and a leader election happened the lessor was locked for a long time causing timeouts when we tried to grant a lease. This changes it so that the lessor is locked in batches, which allows for creation of leases while the leader is initializing the lessor. It still won't start expiring leases until it's all done rewriting them, though.

can you move this to the PR description and also put this into the commit message?

Thanks.

@mgates
Copy link
Author

mgates commented Jun 6, 2018

@xiang90 done, thanks a lot.

@mgates mgates force-pushed the break_up_lease_promotion branch 2 times, most recently from 591a377 to 4c331ff Compare June 6, 2018 21:04
lease/lessor.go Outdated
@@ -144,6 +144,9 @@ type lessor struct {
stopC chan struct{}
// doneC is a channel whose closure indicates that the lessor is stopped.
doneC chan struct{}

// when the lease pile-up reduction is done this is true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention when this will be set to false?

@xiang90
Copy link
Contributor

xiang90 commented Jun 6, 2018

@mgates What happens when a lease is revoked while the go routine created in the promote is still running? is there a risk that the copy of the lease will be re-add into the heap?

@mgates
Copy link
Author

mgates commented Jun 7, 2018

That's a good question - I'm pretty sure that it would get updated, but since the revoke method deletes it from the lease map and deletes the appropriate keys, that wouldn't actually be a problem, and the lease would get cleaned up eventually. It will get added to the heap, but the heap is already full of revoked leases, and ignores them if they aren't in the map.

We'll write a test to confirm, though

@mgates
Copy link
Author

mgates commented Jun 7, 2018

Hey there, we looked into this more, and it doesn't seem possible to write a automated test of this behavior, we did some manual work and added some sleeps and feel good about the behavior - the lease TTL will get refreshed and possibly rewritten for the pile up avoidance, it'll get added to the heap, and will be ignored once the expiry checker gets to it, because it isn't in the lease map.

Copy link

@cyc115 cyc115 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits

lease/lessor.go Outdated
@@ -144,6 +144,11 @@ type lessor struct {
stopC chan struct{}
// doneC is a channel whose closure indicates that the lessor is stopped.
doneC chan struct{}

// this is false when promotion is starting and gets
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The comment should start with Ready, the variable name

lease/lessor.go Outdated
item := &LeaseWithTime{id: l.ID, expiration: l.expiry.UnixNano()}
heap.Push(&le.leaseHeap, item)
}
// adjust expiries in case of overlap

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: white line

@mgates mgates force-pushed the break_up_lease_promotion branch 2 times, most recently from 9e56e57 to a4cfc61 Compare June 12, 2018 21:44
This moves lease pile-up reduction into a goroutine which mostly operates on a copy of
the lease list, to avoid locking. This prevents timeouts when the lessor is locked
for a long time (when there are a lot of leases, mostly). This should solve etcd-io#9496.

We had a problem where when we had a lot of leases (100kish), and a
leader election happened the lessor was locked for a long time causing
timeouts when we tried to grant a lease. This changes it so that the
lessor is locked in batches, which allows for creation of leases while
the leader is initializing the lessor. It still won't start expiring
leases until it's all done rewriting them, though.

Before:
```
BenchmarkLessorPromote1-16                        500000 4036 ns/op
BenchmarkLessorPromote10-16                       500000 3932 ns/op
BenchmarkLessorPromote100-16                      500000 3954 ns/op
BenchmarkLessorPromote1000-16                     300000 3906 ns/op
BenchmarkLessorPromote10000-16                    300000 4639 ns/op
BenchmarkLessorPromote100000-16                      100 27216481 ns/op
BenchmarkLessorPromote1000000-16                     100 325164684 ns/op
```

After:
```
BenchmarkLessorPromote1-16                        500000 3769 ns/op
BenchmarkLessorPromote10-16                       500000 3835 ns/op
BenchmarkLessorPromote100-16                      500000 3829 ns/op
BenchmarkLessorPromote1000-16                     500000 3665 ns/op
BenchmarkLessorPromote10000-16                    500000 3800 ns/op
BenchmarkLessorPromote100000-16                   300000 4114 ns/op
BenchmarkLessorPromote1000000-16                  300000 5143 ns/op
```
@gyuho gyuho self-assigned this Jun 19, 2018
@gyuho gyuho added this to the etcd-v3.4 milestone Jun 19, 2018
jcalvert pushed a commit to jcalvert/etcd that referenced this pull request Jul 2, 2018
Because the leases were sorted inside UnsafeLeases() the lessor mutex
ended up being locked while the whole map was sorted. This pulls the
soring outside of the lock, per feedback on
etcd-io#9699
@wenjiaswe
Copy link
Contributor

cc @jingyih

@gyuho gyuho modified the milestones: etcd-v3.4, etcd-v3.5 Aug 5, 2019
@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

7 participants