Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test TestLeaseGrantAndList isn't stable #13855

Closed
ahrtr opened this issue Mar 30, 2022 · 9 comments
Closed

The test TestLeaseGrantAndList isn't stable #13855

ahrtr opened this issue Mar 30, 2022 · 9 comments

Comments

@ahrtr
Copy link
Member

ahrtr commented Mar 30, 2022

Sometimes it passes, but sometimes it fails. Use commands below to reproduce the issue.

$ cd tests/common
$ go test --tags integration -run TestLeaseGrantAndList

Error log:

    ......
    logger.go:130: 2022-03-31T02:54:21.444+0800	INFO	grpc	[[balancer] base.baseBalancer: handle SubConn state change: 0xc002fb7590, CONNECTING]
    logger.go:130: 2022-03-31T02:54:21.444+0800	INFO	grpc	[[core] Channel Connectivity change to CONNECTING]
    logger.go:130: 2022-03-31T02:54:21.451+0800	INFO	grpc	[[core] Subchannel Connectivity change to READY]
    logger.go:130: 2022-03-31T02:54:21.451+0800	INFO	grpc	[[balancer] base.baseBalancer: handle SubConn state change: 0xc002fb7590, READY]
    logger.go:130: 2022-03-31T02:54:21.451+0800	INFO	grpc	[[roundrobin] roundrobinPicker: Build called with info: {map[0xc002fb7590:{{unix:localhost:m-1 localhost <nil> 0 <nil>}}]}]
    logger.go:130: 2022-03-31T02:54:21.451+0800	INFO	grpc	[[core] Channel Connectivity change to READY]
    logger.go:130: 2022-03-31T02:54:21.566+0800	INFO	grpc	[[core] Channel Connectivity change to SHUTDOWN]
    logger.go:130: 2022-03-31T02:54:21.566+0800	INFO	grpc	[[core] Subchannel Connectivity change to SHUTDOWN]
    logger.go:130: 2022-03-31T02:54:21.566+0800	INFO	grpc	[[core] Channel Connectivity change to SHUTDOWN]
    logger.go:130: 2022-03-31T02:54:21.566+0800	INFO	grpc	[[core] Subchannel Connectivity change to SHUTDOWN]
    logger.go:130: 2022-03-31T02:54:21.566+0800	INFO	grpc	[[transport] transport: loopyWriter.run returning. connection error: desc = "transport is closing"]
    logger.go:130: 2022-03-31T02:54:21.566+0800	INFO	grpc	[[transport] transport: loopyWriter.run returning. connection error: desc = "transport is closing"]
    logger.go:130: 2022-03-31T02:54:21.567+0800	INFO	grpc	[[transport] transport: loopyWriter.run returning. connection error: desc = "transport is closing"]
    logger.go:130: 2022-03-31T02:54:21.567+0800	INFO	grpc	[[transport] transport: loopyWriter.run returning. connection error: desc = "transport is closing"]
    logger.go:130: 2022-03-31T02:54:21.567+0800	INFO	grpc	[[transport] transport: error closing conn during Close: tls: failed to send closeNotify alert (but connection was closed anyway): write unix localhost:m-1->: write: broken pipe]
    logger.go:130: 2022-03-31T02:54:21.567+0800	INFO	grpc	[[transport] transport: error closing conn during Close: tls: failed to send closeNotify alert (but connection was closed anyway): write unix localhost:m-1->: write: broken pipe]
FAIL
exit status 1
FAIL	go.etcd.io/etcd/tests/v3/common	17.077s

cc @endocrimes @kkkkun @serathius

@endocrimes
Copy link
Contributor

Yeah it looks like the client doesn't use a stable etcd server, so sometimes hits one that hasn't got the latest state. I think I need to update it with some kind of Eventually check.

@lavacat
Copy link

lavacat commented Apr 7, 2022

I'll give this a shot
/assign

@lavacat
Copy link

lavacat commented Apr 13, 2022

I think the problem is that we can't rely on revision to stop polling. On server side we create leases list and then get Rev.
But there is no synchronization, so Leases can have older data, while Rev is newer.

s.lessor.Leases()
....
s.KV().Rev()

I think there are 2 solutions:

  1. Use Len(leases)
  2. Make an extra call to refresh leases after Rev condition is met.

I've submitted PR with 1

#13940

@serathius
Copy link
Member

I think this was fixed in #13868

@lavacat
Copy link

lavacat commented Apr 13, 2022

@serathius I was able to reproduce it with #13868

cd tests/common
go test --tags integration -run TestLeaseGrantAndList

        lease_test.go:157:
            	Error Trace:	lease_test.go:157
            	            				execute.go:29
            	            				asm_amd64.s:1581
            	Error:      	elements differ

            	            	extra elements in list A:
            	            	([]interface {}) (len=1) {
            	            	 (clientv3.LeaseID) 5660321200314889477
            	            	}


            	            	listA:
            	            	([]clientv3.LeaseID) (len=3) {
            	            	 (clientv3.LeaseID) 6237626377548445188,
            	            	 (clientv3.LeaseID) 5013210228857097476,
            	            	 (clientv3.LeaseID) 5660321200314889477
            	            	}


            	            	listB:
            	            	([]clientv3.LeaseID) (len=2) {
            	            	 (clientv3.LeaseID) 6237626377548445188,
            	            	 (clientv3.LeaseID) 5013210228857097476
            	            	}
            	Test:       	TestLeaseGrantAndList/PeerTLS/many_leases

@serathius
Copy link
Member

Interesting as in my case it doesn't fail even with 50 runs, when without #13868 it reproduces within couple of runs.

@lavacat
Copy link

lavacat commented Apr 13, 2022

IMO the core issue is getting revision at the end of LeaseLeases function
https://github.com/etcd-io/etcd/blob/main/server/etcdserver/v3_server.go#L399

It's not guaranteed that lss list actually reflects latest Revision. Maybe a better fix is to move newHeader() to the start of LeaseLeases.

@lavacat
Copy link

lavacat commented Apr 15, 2022

@endocrimes since you are the author of #13868, can you check my fix #13940 ?

@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 31, 2022
@ahrtr ahrtr closed this as completed Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants