Fix re-entrant `GetOrHandshake` issues #1044

nbrownus · 2023-12-17T03:23:38Z

@wadey got a deadlock with v1.8.0 and was able to pull the stack trace. HandshakeManager.GetOrHandshake can be re-entrant via HandshakeManager.StartHandshake through a call to hm.lightHouse.QueryServer(). Aside from the double read lock on the main hostmap not being great, the ConnectionManager go routine had fired between the 1st and 2nd calls to HandshakeManager.GetOrHandshake and was waiting on a write lock for the main hostmap while blocking any future read locks.

This is fixed by adding a channel and handling the actual lighthouse queries in a go routine. Should also speed up the hot path when many handshakes are occurring.

There is also a case when a tunnel is being tested and is using a relay for a double read lock in ConnectionManager.

This is fixed by turning the test packet into a traffic decision result and handling outside of the read lock.

My primary concern is in handling the QueryServer writes on a nonblocking buffered channel. I think we will want to block when full but I am leaving as nonblocking for now to review.

…ap read lock

wadey

approved with a small comment

wadey · 2023-12-18T20:19:35Z

lighthouse.go

+		if lh.l.Level >= logrus.DebugLevel {
+			lh.l.WithField("vpnIp", ip).Debug("Lighthouse query buffer was full, dropping request")
+		}


I wonder if this should be higher than debug, since without debug logs on it would be hard to tell this is happening and that you need to increase the buffer.

I think it might be better to just make this a blocking write to a buffered channel

lighthouse.go

brad-defined

I might have misread it, but I think the implementation may block when the channel is full.

Dont hold a read lock on the main hostmap when starting a handshake

7e846ce

salesforce-cla bot added the cla:signed label Dec 17, 2023

nbrownus added 2 commits December 17, 2023 09:43

Sending a test packet could trigger a relay lookup and a double hostm…

2fd061b

…ap read lock

Break the synchronous call in QueryServer to avoid double read locking

40dab57

nbrownus changed the title ~~Dont hold a read lock on the main hostmap when starting a handshake~~ Fix re-entrant GetOrHandshake issues Dec 18, 2023

wadey previously approved these changes Dec 18, 2023

View reviewed changes

Blocking write to a buffered channel instead

605218a

nbrownus dismissed wadey’s stale review via 605218a December 18, 2023 21:41

wadey previously approved these changes Dec 18, 2023

View reviewed changes

wadey added this to the v1.8.1 milestone Dec 18, 2023

brad-defined reviewed Dec 19, 2023

View reviewed changes

lighthouse.go Show resolved Hide resolved

brad-defined requested changes Dec 19, 2023

View reviewed changes

Add an early exit for lighthouse ips

7df928a

nbrownus dismissed wadey’s stale review via 7df928a December 19, 2023 16:44

brad-defined approved these changes Dec 19, 2023

View reviewed changes

wadey approved these changes Dec 19, 2023

View reviewed changes

nbrownus merged commit 072edd5 into master Dec 19, 2023
7 checks passed

nbrownus deleted the reentrant-getorhandshake branch December 19, 2023 17:58

wadey self-assigned this Apr 11, 2024

wadey mentioned this pull request Apr 11, 2024

avoid deadlock in lighthouse queryWorker #1112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix re-entrant `GetOrHandshake` issues #1044

Fix re-entrant `GetOrHandshake` issues #1044

nbrownus commented Dec 17, 2023 •

edited

Loading

wadey left a comment

wadey Dec 18, 2023

nbrownus Dec 18, 2023

brad-defined left a comment

Fix re-entrant GetOrHandshake issues #1044

Fix re-entrant GetOrHandshake issues #1044

Conversation

nbrownus commented Dec 17, 2023 • edited Loading

wadey left a comment

Choose a reason for hiding this comment

wadey Dec 18, 2023

Choose a reason for hiding this comment

nbrownus Dec 18, 2023

Choose a reason for hiding this comment

brad-defined left a comment

Choose a reason for hiding this comment

Fix re-entrant `GetOrHandshake` issues #1044

Fix re-entrant `GetOrHandshake` issues #1044

nbrownus commented Dec 17, 2023 •

edited

Loading