Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestAllocator is flakey #1050

Closed
pooneh-m opened this issue Sep 10, 2019 · 4 comments
Closed

TestAllocator is flakey #1050

pooneh-m opened this issue Sep 10, 2019 · 4 comments
Labels
area/tests Unit tests, e2e tests, anything to make sure things don't break help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.
Milestone

Comments

@pooneh-m
Copy link
Contributor

TestAllocator is sometimes failing with the "tls: bad certificate" error:

...
time="2019-09-09 23:00:41.910" level=info msg="failing http request" error="Post https://34.83.149.154:443/v1alpha1/gameserverallocation: remote error: tls: bad certificate"
--- FAIL: TestAllocator (338.99s)
    allocator_test.go:93: 
        	Error Trace:	allocator_test.go:93
        	Error:      	Received unexpected error:
        	            	timed out waiting for the condition
        	Test:       	TestAllocator
    allocator_test.go:94: 
        	Error Trace:	allocator_test.go:94
        	Error:      	Http test failed
        	Test:       	TestAllocator
FAIL

To investigate I stored the server TLS private and public certificates:

$ kubectl get secrets allocator-tls -n agones-system -o jsonpath='{.data.tls\.crt}' | base64 -d >> pub

AND used it with openssl and could repro a "bad certificate" error:

 openssl s_client -CApath pub -connect 34.83.149.154:443
CONNECTED(00000003)
depth=0
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0
verify error:num=21:unable to verify the first certificate
verify return:1
140282440995072:error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate:../ssl/record/rec_layer_s3.c:1536:SSL alert number 42
---
Certificate chain
 0 s:
   i:/CN=allocation-ca
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIC4jCCAcqgAwIBAgIRAPc4FFlr/JwQFQZmpfWaKw0wDQYJKoZIhvcNAQELBQAw
GDEWMBQGA1UEAxMNYWxsb2NhdGlvbi1jYTAeFw0xOTA5MDkyMjU0MjNaFw0yOTA5
MDYyMjU0MjNaMAAwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCzboxQ
N+0Z0eUyHqQZLL4fOYqqcDj+fpmV/4dgsDNfMUmvs7s03QOHp+P3b0AfnFrxI1sN
ZD1GEU37KBNT083TCneAQAoMHrwkHhG2ca5c8bDWxC7yEW92VkBm/9VU9yAps0ZT
ytkqxO7ljxh2S5bFwe/b+1RaJ1gH2wydbPL0b7v3jul49XFcZbzu9RVJdpXlBtW/
b5bs8AL4RijQIajGkhuYOj1aQBrNBFYJozVPEh+gcBfSAAVjt9acA0wSDTeaOi2f
oRz0BlvTF/UtECMc4v8cgB8Q/LUFY4im4R1qO5jQ9putdBrE8Ge3b59CoD/eVU0U
gQ1CK5yUb2mCRsThAgMBAAGjPzA9MA4GA1UdDwEB/wQEAwIFoDAdBgNVHSUEFjAU
BggrBgEFBQcDAQYIKwYBBQUHAwIwDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQsF
AAOCAQEAhNwiAu/9oOzKGNuiOnWazm8vOsdw6aWdV45kCJ58N5fUyrQYleXxq+Mu
rLA0dyu2wJP5hhiuaUh1DOPYYAltzHa/ibTlTfmmVMqwt3vMRpYH1t7A6j5N9U0f
flCs4Q8KFShlfkKNdneW4nsYdUEIXCSKrb77q5QVOD/pTg1P/p6BYOwtlUEbMDEZ
6bBI6yVj+QdamuikjCGCtS93MvcFo/SuSrLf6fVG03xKNhOZY3tSnI/hvFMFh3qb
sqQAE76TtMhdozeH29jqJO13ddsizoN3wCO0AxG3D35+AdIwYin/Mf1kKZ+4LHU9
AZngZfJgizbbShC/EjY2Mj/DsRiGyw==
-----END CERTIFICATE-----
subject=
issuer=/CN=allocation-ca
---
Acceptable client certificate CA names
/CN=allocation-ca
Client Certificate Types: RSA sign, ECDSA sign

The CN is incorrect, but the test is expected not to fail, when having this flag set:
https://github.com/googleforgames/agones/blob/master/test/e2e/allocator_test.go#L135

Three steps I would recommend to improve the testing and flakyness:

  1. Issue a TLS certificate with correct CN in the test and restart the pods for allocator service, before running tests.
  2. Issue a client certificate and store it as secret signed by allocator-client-tls instead of reusing the allocator-tls certificate.
  3. The initial CN on the certificate should not be set.
@pooneh-m pooneh-m added the kind/bug These are bugs. label Sep 10, 2019
@pooneh-m
Copy link
Contributor Author

In the stack driver logs, there are errors for TLS handshake, indicating that the client certificate is signed by unknown CA:

{
 insertId: "1sdmzpgg1chmglp"  
 labels: {…}  
 logName: "projects/agones-images/logs/agones-allocator"  
 receiveTimestamp: "2019-09-10T01:49:05.215887010Z"  
 resource: {…}  
 severity: "ERROR"  
 textPayload: "2019/09/10 01:48:58 http: TLS handshake error from 10.138.0.27:57932: tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "allocation-ca")
"  
 timestamp: "2019-09-10T01:48:58.753225399Z"  
}

I did more investigation and certificates seems to be correctly signed and valid:

$ kubectl get secrets allocator-tls -n agones-system -o jsonpath='{.data.tls\.crt}' | base64 -d >> tls.crt
$ kubectl get secrets allocator-tls-ca -n agones-system -o jsonpath='{.data.tls-ca\.crt}' | base64 -d >> tls-ca.crt
$ openssl verify -CAfile tls-ca.crt tls.crt
tls.crt: OK

@markmandel
Copy link
Member

I thought I had fixed this, but it still shows up occasionally 😞

@markmandel markmandel added the help wanted We would love help on these issues. Please come help us! label Sep 25, 2019
@pooneh-m
Copy link
Contributor Author

pooneh-m commented Sep 26, 2019

Oops. My change #1077 is overriding the certificates. Tests will start failing without this change. I fixed the cluster.

@pooneh-m
Copy link
Contributor Author

pooneh-m commented Oct 1, 2019

This issue is not happening anymore. Closing it.

@pooneh-m pooneh-m closed this as completed Oct 1, 2019
@markmandel markmandel added this to the 1.1.0 milestone Oct 22, 2019
@markmandel markmandel added the area/tests Unit tests, e2e tests, anything to make sure things don't break label Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests Unit tests, e2e tests, anything to make sure things don't break help wanted We would love help on these issues. Please come help us! kind/bug These are bugs.
Projects
None yet
Development

No branches or pull requests

2 participants