-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire,security: auto-upgrade bcrypt hashes to SCRAM-SHA-256 (scram 10/10) #74850
Conversation
70aca27
to
2ba828b
Compare
1362b23
to
9c44b2c
Compare
@ajwerner can you double check what I did to sql/users.go and see if that follows the idea you had |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @knz, and @rafiss)
pkg/sql/user.go, line 681 at r3 (raw file):
return DescsTxn( ctx, execCfg, func(ctx context.Context, txn *kv.Txn, d *descs.Collection) error {
I quarrel only with this choice of wrapping, and that is a nit. Logically looks fine to me.
The below wrapping will lose you one level of indentation
return DescsTxn(ctx, execCfg, func(
ctx context.Context, txn *kv.Txn, d *descs.Collection,
) error {
Code quote:
return DescsTxn(
ctx, execCfg,
func(ctx context.Context, txn *kv.Txn, d *descs.Collection) error {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @rafiss)
pkg/sql/user.go, line 681 at r3 (raw file):
Previously, ajwerner wrote…
I quarrel only with this choice of wrapping, and that is a nit. Logically looks fine to me.
The below wrapping will lose you one level of indentation
return DescsTxn(ctx, execCfg, func( ctx context.Context, txn *kv.Txn, d *descs.Collection, ) error {
Thanks, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @knz)
-- commits, line 11 at r5:
what's the plan for migrating passwords for users that are used very infrequently? i.e., if a user goes years without logging in? it seems we can't do a long-running migration during the cluster upgrade, since we also want the auto-upgrade decision to respect the cluster setting
pkg/security/password.go, line 852 at r5 (raw file):
// if the new cost is high and the operation takes a long time, so // the operator knows what's up. log.Infof(ctx, "hash conversion: computing a SCRAM hash with iteration count %d (from bcrypt cost %d)", scramIterCount, bcryptCost)
should this be in the USER_ADMIN logging channel since it's a "Change to authentication credentials"?
pkg/security/password.go, line 864 at r5 (raw file):
// - it checks that we're able to re-parse what we just generated, // which is a safety mechanism to ensure the result is valid. // - it converts the raw bytes to the PasswordHash interface, which
it doesn't seem like PasswordHash
is used in the return types
pkg/sql/user.go, line 678 at r5 (raw file):
runFn := getUserInfoRunFn(execCfg, username, "set-hash-timeout") return runFn(ctx, func(ctx context.Context) error {
does this mean that the updateuserPasswordHash step will happen with a separate timeout than the authentication flow? doesn't that mean that a login can now take twice as long as the configured timeout? should updating the hash be done in a background thread?
We can't do an automatic background migration because we need the cleartext password to upgrade. There's no way to do this automatically without a successful cleartext login. Our recommendation in docs will be that operators who care about migrating everyone need to re-issue credentials to everyone. |
fdff4d7
to
7ca14c9
Compare
8f1ae00
to
55b9a2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @rafiss)
pkg/security/password.go, line 852 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
should this be in the USER_ADMIN logging channel since it's a "Change to authentication credentials"?
I was thinking this was just a debugging message, for when things could be unusually slow. The final event is logged to USER_ADMIN by the function in the sql
package, only after the UPDATE has completed successfully.
Added an explanatory comment. What do you think?
pkg/security/password.go, line 864 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
it doesn't seem like
PasswordHash
is used in the return types
Good call. This comment was stale. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our recommendation in docs will be that operators who care about migrating everyone need to re-issue credentials to everyone.
sounds good
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @rafiss)
pkg/security/password.go, line 852 at r5 (raw file):
Previously, knz (kena) wrote…
I was thinking this was just a debugging message, for when things could be unusually slow. The final event is logged to USER_ADMIN by the function in the
sql
package, only after the UPDATE has completed successfully.
Added an explanatory comment. What do you think?
got it, sounds great
pkg/sql/user.go, line 678 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
does this mean that the updateuserPasswordHash step will happen with a separate timeout than the authentication flow? doesn't that mean that a login can now take twice as long as the configured timeout? should updating the hash be done in a background thread?
ping on this question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @rafiss)
pkg/sql/user.go, line 678 at r5 (raw file):
oh I hadn't seen this.
Yes, the current implementation has this property, which is also highlighted in the release note:
This conversion will increase the latency of that initial login by a
factor of ~2x, but it will be reduced again after the conversion
completes.
When login latency is a concern, operators are invited to perform the
password conversion ahead of time, by computing new SCRAM hashes for
the clients via ALTER USER/ROLE WITH PASSWORD.
We could do this in a "background thread", although there are two separate concerns to solve:
- we should avoid storms of concurrent goroutines. If we start doing asynchronous stuff for this, we'll need a queue / semaphore.
- this then also asks complicated questions: what to do if the same username starts multiple sessions concurrently? With the current code, most will pick up the change (thx to table version bump) after the first one has logged in and the UPDATE statement won't even be run. If we spawn the work asynchronously, the detection will not be immediate and we might end up running many UPDATEs concurrently.
(NB: this last point is not a correctness issue, thanks to the WHERE clause. We're simply talking about load here.
How do you feel about this? Do you think that a one-off latency bump, with a workaround to issue ALTER ROLE WITH PASSWORD out-of-band to avoid it, is unacceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @rafiss)
pkg/sql/user.go, line 678 at r5 (raw file):
Previously, knz (kena) wrote…
oh I hadn't seen this.
Yes, the current implementation has this property, which is also highlighted in the release note:
This conversion will increase the latency of that initial login by a
factor of ~2x, but it will be reduced again after the conversion
completes.
When login latency is a concern, operators are invited to perform the
password conversion ahead of time, by computing new SCRAM hashes for
the clients via ALTER USER/ROLE WITH PASSWORD.We could do this in a "background thread", although there are two separate concerns to solve:
- we should avoid storms of concurrent goroutines. If we start doing asynchronous stuff for this, we'll need a queue / semaphore.
- this then also asks complicated questions: what to do if the same username starts multiple sessions concurrently? With the current code, most will pick up the change (thx to table version bump) after the first one has logged in and the UPDATE statement won't even be run. If we spawn the work asynchronously, the detection will not be immediate and we might end up running many UPDATEs concurrently.
(NB: this last point is not a correctness issue, thanks to the WHERE clause. We're simply talking about load here.How do you feel about this? Do you think that a one-off latency bump, with a workaround to issue ALTER ROLE WITH PASSWORD out-of-band to avoid it, is unacceptable?
good point on the one-off latency bump -- with that i think the 2x increase sounds fine
NB: I'm holding off merging this PR until all the PRs in the sequence are ready (otherwise, the entire chain will be auto-closed by github). |
1fbbceb
to
5b6c237
Compare
Release note (security update): In order to promote a transition to SCRAM-SHA-256 for password authentication, CockroachDB now automatically attempts to convert stored password hashes after a cleartext password authentication succeeds to SCRAM-SHA-256, if the target hash method configured via `server.user_login.password_encryption` is `scram-sha-256`. This auto-conversion can happen either during SQL logins or HTTP logins that use passwords, whichever happens first. When it happens, a structured event of type `password_hash_converted` is logged to the SESSIONS channel. The PKBDF2 iteration count on the hash is chosen in order to preserve the latency of client logins, to remain similar to the latency incurred from the starting bcrypt cost. (For example, the default configuration of bcrypt cost 10 is converted to a SCRAM iteration count of 119680.) This choice, however, lowers the cost of bruteforcing passwords for an attacker with access to the encoded password hashes, if they have access to ASICs or GPUs, by a factor of ~10. For example, if it would previously cost them $1M to bruteforce a `crdb-bcrypt` hash, it would now cost them "just" $100k to bruteforce the `scram-sha-256` hash that results from this conversion. If an operator wishes to compensate for this, three options are available: 1. they can set up their infrastructure such that only passwords with a high entropy can be used. This can be achieved by e.g. disabling the ability of end-users to select their own passwords and auto-generating passwords for them, or enforce some entropy checks during password selection. This way, the entropy of the password itself compensates for the lower hash complexity. 2. they can manually select a higher SCRAM iteration count. This can be done either by pre-computing SCRAM hashes client-side and providing the precomputed hash using ALTER USER WITH PASSWORD; or adjusting the cluster setting `server.user_login.password_hashes.default_cost.scram_sha_256` and asking CockroachDB to recompute the hash. 3. they can disable the auto-conversion of `crdb-bcrypt` hashes to `scram-sha-256` altogether, using the new cluster setting `server.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled`. This approach is discouraged however, as it removes the other security protections offered by SCRAM authentication. NB: the conversion also only happens if the target configured method via `server.user_login.password_encryption` is `scram-sha-256`, because the goal of the conversion is to move clusters towards that configuration. This acknowledges cases when the operator had a legitimate reason to select another value and future versions of of CockroachDB that may use a new authentication method. Release note (sql change): After a cluster upgrade, the first time a SQL client logs in using password authentication, the password will be converted to a new format (`scram-sha-256`) if it was encoded with `crdb-bcrypt` previously. This conversion will increase the latency of that initial login by a factor of ~2x, but it will be reduced again after the conversion completes. When login latency is a concern, operators are invited to perform the password conversion ahead of time, by computing new SCRAM hashes for the clients via ALTER USER/ROLE WITH PASSWORD. This conversion can also be disabled via the new cluster setting `server.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled`.
(PR peeled away from #74301; previous PR in series #74849)
Epic CRDB-5349
Release note (security update): In order to promote a transition
to SCRAM-SHA-256 for password authentication, CockroachDB now
automatically attempts to convert stored password hashes
after a cleartext password authentication
succeeds to SCRAM-SHA-256, if the target hash method configured via
server.user_login.password_encryption
isscram-sha-256
.This auto-conversion can happen either during SQL logins
or HTTP logins that use passwords, whichever happens first.
When it happens, a structured event of type
password_hash_converted
is logged to the SESSIONS channel.
The PKBDF2 iteration count on the hash is chosen in order to preserve
the latency of client logins, to remain similar to the latency
incurred from the starting bcrypt cost.
(For example, the default configuration of bcrypt cost 10 is converted
to a SCRAM iteration count of 119680.)
This choice, however, lowers the cost of bruteforcing passwords for an
attacker with access to the encoded password hashes, if they have
access to ASICs or GPUs, by a factor of ~10. For example, if it would
previously cost them $1M to bruteforce a
crdb-bcrypt
hash, it wouldnow cost them "just" $100k to bruteforce the
scram-sha-256
hash thatresults from this conversion.
If an operator wishes to compensate for this, three options are
available:
they can set up their infrastructure such that only passwords with
a high entropy can be used. This can be achieved by e.g. disabling
the ability of end-users to select their own passwords and
auto-generating passwords for them, or enforce some entropy checks
during password selection. This way, the entropy of the password
itself compensates for the lower hash complexity.
they can manually select a higher SCRAM iteration count. This can
be done either by pre-computing SCRAM hashes client-side and providing
the precomputed hash using ALTER USER WITH PASSWORD; or adjusting the
cluster setting
server.user_login.password_hashes.default_cost.scram_sha_256
andasking CockroachDB to recompute the hash.
they can disable the auto-conversion of
crdb-bcrypt
hashes toscram-sha-256
altogether, using the new cluster settingserver.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled
.This approach is discouraged however, as it removes the other
security protections offered by SCRAM authentication.
NB: the conversion also only happens if the target configured method
via
server.user_login.password_encryption
isscram-sha-256
,because the goal of the conversion is to move clusters towards that
configuration. This acknowledges cases when the operator had a
legitimate reason to select another value and future versions of of
CockroachDB that may use a new authentication method.
Release note (sql change): After a cluster upgrade, the first time a
SQL client logs in using password authentication, the password will be
converted to a new format (
scram-sha-256
) if it was encoded withcrdb-bcrypt
previously.This conversion will increase the latency of that initial login by a
factor of ~2x, but it will be reduced again after the conversion
completes.
When login latency is a concern, operators are invited to perform the
password conversion ahead of time, by computing new SCRAM hashes for
the clients via ALTER USER/ROLE WITH PASSWORD.
This conversion can also be disabled via the new cluster setting
server.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled
.