Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Complement flake(?) TestPartialStateJoin/* due to complement hostname:port re-use #13975

Closed
Tracked by #14030
DMRobertson opened this issue Sep 30, 2022 · 2 comments · Fixed by matrix-org/complement#486 or matrix-org/complement#491
Assignees
Labels
A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. A-Federated-Join joins over federation generally suck A-Testing Issues related to testing in complement, synapse, etc T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact

Comments

@DMRobertson
Copy link
Contributor

DMRobertson commented Sep 30, 2022

  • TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_incorrectly_kicked_servers_once_partial_state_join_completes
    https://github.com/matrix-org/synapse/actions/runs/3154756275/jobs/5132733882

    federation_room_join_partial_state_test.go:1763: MustSendTransaction: contents=[123 34 101 114 114 99 111 100 101 34 58 34 77 95 85 78 65 85 84 72 79 82 73 90 69 68 34 44 34 101 114 114 111 114 34 58 34 73 110 118 97 108 105 100 32 115 105 103 110 97 116 117 114 101 32 102 111 114 32 115 101 114 118 101 114 32 104 111 115 116 46 100 111 99 107 101 114 46 105 110 116 101 114 110 97 108 58 52 52 54 57 55 32 119 105 116 104 32 107 101 121 32 101 100 50 53 53 49 57 58 99 111 109 112 108 101 109 101 110 116 58 32 85 110 97 98 108 101 32 116 111 32 118 101 114 105 102 121 32 115 105 103 110 97 116 117 114 101 32 102 111 114 32 104 111 115 116 46 100 111 99 107 101 114 46 105 110 116 101 114 110 97 108 58 52 52 54 57 55 58 32 60 99 108 97 115 115 32 39 110 97 99 108 46 101 120 99 101 112 116 105 111 110 115 46 66 97 100 83 105 103 110 97 116 117 114 101 69 114 114 111 114 39 62 32 83 105 103 110 97 116 117 114 101 32 119 97 115 32 102 111 114 103 101 100 32 111 114 32 99 111 114 114 117 112 116 34 125] msg=Failed to PUT JSON (hostname "hs1" path "/_matrix/federation/v1/send/complement-879472265") code=401 wrapped=M_UNAUTHORIZED: Invalid signature for server host.docker.internal:44697 with key ed25519:complement: Unable to verify signature for host.docker.internal:44697: <class 'nacl.exceptions.BadSignatureError'> Signature was forged or corrupt
    
  • TestPartialStateJoin/CanReceiveEventsWithHalfMissingGrandparentsDuringPartialStateJoin
    https://github.com/matrix-org/synapse/actions/runs/3182363321/jobs/5188328245

        client.go:604: [CSAPI] POST hs1/_matrix/client/v3/register => 200 OK (26.184393ms)
        client.go:604: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (28.239677ms)
        client.go:604: [CSAPI] GET hs1/_matrix/client/v3/capabilities => 200 OK (3.172875ms)
        server.go:165: Creating room !0:host.docker.internal:37495 with version 9
        federation_room_join_partial_state_test.go:2705: Registered state_ids handler for event $GzPAiiM7ueWOyydmKHb4tyYhwmXXS0iOBtzWJZDYxkE
        federation_room_join_partial_state_test.go:2746: Registered /state handler for event $GzPAiiM7ueWOyydmKHb4tyYhwmXXS0iOBtzWJZDYxkE
        client.go:602: [CSAPI] POST hs1/_matrix/client/v3/join/!0:host.docker.internal:37495 => error: net/http: request canceled (30.000242265s)
        federation_room_join_partial_state_test.go:2617: CSAPI.DoFunc response returned error: Post "http://localhost:49288/_matrix/client/v3/join/%210:host.docker.internal:37495?server_name=host.docker.internal%3A37495": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
    
@DMRobertson DMRobertson added A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact A-Testing Issues related to testing in complement, synapse, etc labels Sep 30, 2022
@squahtx
Copy link
Contributor

squahtx commented Sep 30, 2022

It's failing with
federation_room_join_partial_state_test.go:1763: MustSendTransaction: contents=[123 34 101 114 114 99 111 100 101 34 58 34 77 95 85 78 65 85 84 72 79 82 73 90 69 68 34 44 34 101 114 114 111 114 34 58 34 73 110 118 97 108 105 100 32 115 105 103 110 97 116 117 114 101 32 102 111 114 32 115 101 114 118 101 114 32 104 111 115 116 46 100 111 99 107 101 114 46 105 110 116 101 114 110 97 108 58 52 52 54 57 55 32 119 105 116 104 32 107 101 121 32 101 100 50 53 53 49 57 58 99 111 109 112 108 101 109 101 110 116 58 32 85 110 97 98 108 101 32 116 111 32 118 101 114 105 102 121 32 115 105 103 110 97 116 117 114 101 32 102 111 114 32 104 111 115 116 46 100 111 99 107 101 114 46 105 110 116 101 114 110 97 108 58 52 52 54 57 55 58 32 60 99 108 97 115 115 32 39 110 97 99 108 46 101 120 99 101 112 116 105 111 110 115 46 66 97 100 83 105 103 110 97 116 117 114 101 69 114 114 111 114 39 62 32 83 105 103 110 97 116 117 114 101 32 119 97 115 32 102 111 114 103 101 100 32 111 114 32 99 111 114 114 117 112 116 34 125] msg=Failed to PUT JSON (hostname "hs1" path "/_matrix/federation/v1/send/complement-879472265") code=401 wrapped=M_UNAUTHORIZED: Invalid signature for server host.docker.internal:44697 with key ed25519:complement: Unable to verify signature for host.docker.internal:44697: <class 'nacl.exceptions.BadSignatureError'> Signature was forged or corrupt

which might be due to the way we re-use the same Synapse deployment, but spin complement homeservers up and down across multiple tests. Each time we spin a complement homeserver up, it probably ends up with a different key. And when we have a complement server that reuses the port (and hostname) of a previous one, I'd expect the tests to fail.

@squahtx squahtx added the A-Federated-Join joins over federation generally suck label Sep 30, 2022
@squahtx squahtx self-assigned this Sep 30, 2022
@squahtx squahtx changed the title Complement flake(?) Device_list_updates_reach_incorrectly_kicked_servers_once_partial_state_join_completes Complement flake(?) TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_incorrectly_kicked_servers_once_partial_state_join_completes Oct 3, 2022
@squahtx squahtx changed the title Complement flake(?) TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_incorrectly_kicked_servers_once_partial_state_join_completes Complement flake(?) TestPartialStateJoin/* due to complement hostname:port re-use Oct 4, 2022
@squahtx
Copy link
Contributor

squahtx commented Oct 4, 2022

matrix-org/complement#486 may not be sufficient, since it only handles homeserver keys.
We still end up re-using room IDs across tests which is going to break things.

DMRobertson pushed a commit to matrix-org/complement that referenced this issue Feb 14, 2023
DMRobertson pushed a commit to matrix-org/complement that referenced this issue Feb 14, 2023
* Test that Synapse will purge a room during resync

* Update tests/federation_room_join_partial_state_test.go

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Don't request /members at the end of the test

And ensure we don't flake in the style of
matrix-org/synapse#13975

* Repeat last /sync query to cope with worker races

* Ignore leave events sent to us after purge

---------

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. A-Federated-Join joins over federation generally suck A-Testing Issues related to testing in complement, synapse, etc T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Dev-Wishlist Makes developers' lives better, but doesn't have direct user impact
Projects
None yet
2 participants