Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidates WAL entry for static role if password policy has changed #56

Merged
merged 2 commits into from
Mar 14, 2023

Conversation

austingebauer
Copy link
Member

Overview

WAL entries created by this secrets engine are the result of partial failure when trying to rotate a password for a static role. If the rotation partially fails, the system will continuously try to roll forward by reusing the password in the WAL entry. This behavior creates a problem when password constraints on the LDAP server are changed.

To give users an option to fix this, this change invalidates WAL entries for static roles when the configuration password_policy has changed. This causes the system to generate a new password instead of reusing the WAL's password that no longer meets the constraints of the LDAP server.

Note that this only works when the named password policy has changed (say "my-policy-1" to "my-policy-2"). It doesn't solve the problem where the contents of the named password policy have changed (say "my-policy-1" has a character set or length change). We don't have a good way to detect that a password policy has a new revision, so solving this for content changes is a larger effort that will involve code changes on the Vault side.

I verified that the test I wrote fails prior to the change in this PR:

$ go test -run=TestPasswordPolicyModificationInvalidatesWAL
2023-03-08T08:59:40.620-0800 [INFO]  initializing database rotation queue
2023-03-08T08:59:40.620-0800 [INFO]  populating role rotation queue
2023-03-08T08:59:40.620-0800 [DEBUG] no WAL entries found
2023-03-08T08:59:40.620-0800 [INFO]  starting periodic ticker
2023-03-08T08:59:40.620-0800 [DEBUG] wrote WAL: role=hashicorp WAL ID=1cdaf653-a445-5a89-c734-bcbd92cb26ff
2023-03-08T08:59:40.620-0800 [DEBUG] deleted WAL: WAL ID=1cdaf653-a445-5a89-c734-bcbd92cb26ff
2023-03-08T08:59:40.621-0800 [DEBUG] wrote WAL: role=hashicorp WAL ID=484e20cd-72ac-d9ae-ff4c-f37f8f34de55
2023-03-08T08:59:40.621-0800 [WARN]  unable to rotate credentials in rotate-role: error="forced error"
2023-03-08T08:59:40.621-0800 [DEBUG] deleted WAL: WAL ID=484e20cd-72ac-d9ae-ff4c-f37f8f34de55
--- FAIL: TestPasswordPolicyModificationInvalidatesWAL (0.00s)
    rotation_test.go:167: expected TestPolicy2Password, got TestPolicy1Password
2023-03-08T08:59:40.621-0800 [INFO]  stopping periodic ticker
FAIL
exit status 1
FAIL    github.com/hashicorp/vault-plugin-secrets-openldap      0.220s

Contributor Checklist

  • Add relevant docs to upstream Vault repository, or sufficient reasoning why docs won’t be added yet
  • Add output for any tests not ran in CI to the PR description (eg, acceptance tests)
  • Backwards compatible

@austingebauer austingebauer requested a review from a team March 8, 2023 17:38
@austingebauer
Copy link
Member Author

One way to have an automated solution for the password policy content change would be to discard the WAL entry after we've tried to apply it for some period of time (break circuit?). In that case, we'd regenerate the password with the current password policy contents. This would require no code change on the Vault side.

@robmonte
Copy link
Member

robmonte commented Mar 9, 2023

Can you explain the failure versus the WAL entry a little more?

Is a change to the password constraints the thing causing the rotation to fail initially? Or is this a case where there's a coincidental situation: the password constraints just happen to change right around the time an unrelated rotation issue occurs which causes a WAL entry to be written?

Or are both wrong?

@fairclothjm
Copy link
Contributor

@austingebauer

break circuit

That sounds like it could work. In the case where the regenerated password also is not applied due to a failure would we bail or will the system continuously try to roll forward?

@austingebauer
Copy link
Member Author

@robmonte - Yep, a change to the password constraint is the thing causing the rotation to fail initially. It could also be a coincidental situation, but that would be much more rare and hard to reproduce.

@austingebauer
Copy link
Member Author

austingebauer commented Mar 14, 2023

@fairclothjm - I was thinking the system would continuously try to roll forward until the new WAL entry (with the newly generated password) itself hits the threshold. So all WAL entries would be subject to this threshold.

@austingebauer
Copy link
Member Author

This PR alone gives customers an immediate workaround, so I'm going to merge this without the changes discussed about setting a threshold for WAL retries. I will create an additional task to address the password policy content changes via that approach.

@austingebauer austingebauer merged commit 812ad22 into main Mar 14, 2023
@austingebauer austingebauer deleted the fix-static-role-password-policy-change branch March 14, 2023 15:47
@robmonte
Copy link
Member

robmonte commented Mar 14, 2023

@austingebauer Thanks for answering. I know you already merged but I agree with this solution for an immediate workaround.

Regarding a longer-term automated fix - is there a reason we must store the new password at the time of the WAL entry being written? When a static role rotation WAL entry gets processed could it simply always generate a new password at that moment using the current password policy? Maybe that would circumvent this issue altogether, if possible.

@austingebauer
Copy link
Member Author

austingebauer commented Mar 15, 2023

@robmonte - I considered the idea of simply generating a new password each rotation attempt as well. Afaict, the same password is reused so that Vault storage will eventually contain the same password that was applied to the external system before things partially failed. The reuse of the same password in a roll-forward fashion is very engrained into the code and WAL handling at this point, so I'm hesitant to change it.

austingebauer added a commit that referenced this pull request Mar 17, 2023
…56)

* Invalidates WAL entry for static role if password policy has changed

* improve test output for failures
austingebauer added a commit that referenced this pull request Mar 20, 2023
…56)

* Invalidates WAL entry for static role if password policy has changed

* improve test output for failures
austingebauer added a commit that referenced this pull request Mar 20, 2023
…licy has changed (#59)

* Invalidates WAL entry for static role if password policy has changed (#56)

* Invalidates WAL entry for static role if password policy has changed

* improve test output for failures

* Fixes decoding of prior WAL entries with missing password_policy field (#57)
austingebauer added a commit that referenced this pull request Mar 20, 2023
…licy has changed (#58)

* Invalidates WAL entry for static role if password policy has changed (#56)

* Invalidates WAL entry for static role if password policy has changed

* improve test output for failures

* Fixes decoding of prior WAL entries with missing password_policy field (#57)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants