This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be right to loop through the policy templates and pick the matching one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure about that. It looks like weave only creates policies with one template but a
range
would look a little neater.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comment #3666 (comment) From my understanding of the logs shared in #3666 it ended up creating two connections and creating corresponding two policies. Eventually one connections is deleted (
connection shutting down due to error: Multiple connections to 56:ff:d7:91:ac:16(192.168.128.10) added to 02:df:bc:77:e3:17(192.168.128.11)
) which triggered deleting xfrm policy which as side effect was deleting the unintended policy as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see weave does a policy update instead of creating a new policy so there are never two policies for a given src/dst pair. Trying to create two such policies on the terminal fails for me:
As far as I can tell
XfrmPolicyDel
andXfrmPolicyGet
can only delete/get a single policy at a time.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some debug statements to make sure of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I checked the code. You are right. Its just update.
From what I see root cause is
connection shutting down due to error: Multiple connections to 56:ff:d7:91:ac:16(192.168.128.10) added to 02:df:bc:77:e3:17(192.168.128.11)
so there are two connections gets established between peers. Resulting in two xfrm policies attempted to created/updated. Eventually one version of policy (corresponding to a connection) overwrites the other. When redundant connection is shutdown corresponding xfrm policy is getting destroyed.It seems to me its still possible we may end up with no xfrm policy. This can happen if we have xfrm policy in place corresponds to connection that is getting shutdown, in which case policy gets deleted due to matching spi.
Will check in case of multiple connection which of the connection (or order) is dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like
XfrmPolicyUpdate
callsxfrmPolicyAddOrUpdate
internally so I doubt we could end up without a policy. I'm pretty sure the case where the first connection is completely shut down when the second connection is created was the working case before my changes.