Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgwire,security: auto-upgrade bcrypt hashes to SCRAM-SHA-256 (scram 10/10) #74850

Merged
merged 1 commit into from
Jan 21, 2022

Conversation

knz
Copy link
Contributor

@knz knz commented Jan 14, 2022

(PR peeled away from #74301; previous PR in series #74849)
Epic CRDB-5349

Release note (security update): In order to promote a transition
to SCRAM-SHA-256 for password authentication, CockroachDB now
automatically attempts to convert stored password hashes
after a cleartext password authentication
succeeds to SCRAM-SHA-256, if the target hash method configured via
server.user_login.password_encryption is scram-sha-256.

This auto-conversion can happen either during SQL logins
or HTTP logins that use passwords, whichever happens first.

When it happens, a structured event of type password_hash_converted
is logged to the SESSIONS channel.

The PKBDF2 iteration count on the hash is chosen in order to preserve
the latency of client logins, to remain similar to the latency
incurred from the starting bcrypt cost.
(For example, the default configuration of bcrypt cost 10 is converted
to a SCRAM iteration count of 119680.)

This choice, however, lowers the cost of bruteforcing passwords for an
attacker with access to the encoded password hashes, if they have
access to ASICs or GPUs, by a factor of ~10. For example, if it would
previously cost them $1M to bruteforce a crdb-bcrypt hash, it would
now cost them "just" $100k to bruteforce the scram-sha-256 hash that
results from this conversion.

If an operator wishes to compensate for this, three options are
available:

  1. they can set up their infrastructure such that only passwords with
    a high entropy can be used. This can be achieved by e.g. disabling
    the ability of end-users to select their own passwords and
    auto-generating passwords for them, or enforce some entropy checks
    during password selection. This way, the entropy of the password
    itself compensates for the lower hash complexity.

  2. they can manually select a higher SCRAM iteration count. This can
    be done either by pre-computing SCRAM hashes client-side and providing
    the precomputed hash using ALTER USER WITH PASSWORD; or adjusting the
    cluster setting
    server.user_login.password_hashes.default_cost.scram_sha_256 and
    asking CockroachDB to recompute the hash.

  3. they can disable the auto-conversion of crdb-bcrypt hashes to
    scram-sha-256 altogether, using the new cluster setting
    server.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled.
    This approach is discouraged however, as it removes the other
    security protections offered by SCRAM authentication.

NB: the conversion also only happens if the target configured method
via server.user_login.password_encryption is scram-sha-256,
because the goal of the conversion is to move clusters towards that
configuration. This acknowledges cases when the operator had a
legitimate reason to select another value and future versions of of
CockroachDB that may use a new authentication method.

Release note (sql change): After a cluster upgrade, the first time a
SQL client logs in using password authentication, the password will be
converted to a new format (scram-sha-256) if it was encoded with
crdb-bcrypt previously.

This conversion will increase the latency of that initial login by a
factor of ~2x, but it will be reduced again after the conversion
completes.

When login latency is a concern, operators are invited to perform the
password conversion ahead of time, by computing new SCRAM hashes for
the clients via ALTER USER/ROLE WITH PASSWORD.

This conversion can also be disabled via the new cluster setting
server.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz changed the title wip10 pgwire,security: auto-upgrade bcrypt hashes to SCRAM-SHA-256 Jan 14, 2022
@knz knz requested review from rafiss, bdarnell and a team January 14, 2022 11:00
@knz knz marked this pull request as ready for review January 14, 2022 11:01
@knz knz requested a review from a team as a January 14, 2022 11:01
@knz knz requested a review from a team as a January 14, 2022 11:13
@knz knz force-pushed the 20211228-scram10 branch 2 times, most recently from 70aca27 to 2ba828b Compare January 14, 2022 11:15
@knz knz changed the title pgwire,security: auto-upgrade bcrypt hashes to SCRAM-SHA-256 pgwire,security: auto-upgrade bcrypt hashes to SCRAM-SHA-256 (scram 10/10) Jan 14, 2022
@knz knz force-pushed the 20211228-scram9 branch 2 times, most recently from 1362b23 to 9c44b2c Compare January 14, 2022 19:18
@knz
Copy link
Contributor Author

knz commented Jan 14, 2022

@ajwerner can you double check what I did to sql/users.go and see if that follows the idea you had

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @knz, and @rafiss)


pkg/sql/user.go, line 681 at r3 (raw file):

		return DescsTxn(
			ctx, execCfg,
			func(ctx context.Context, txn *kv.Txn, d *descs.Collection) error {

I quarrel only with this choice of wrapping, and that is a nit. Logically looks fine to me.

The below wrapping will lose you one level of indentation

		return DescsTxn(ctx, execCfg, func(
			ctx context.Context, txn *kv.Txn, d *descs.Collection,
		) error {

Code quote:

		return DescsTxn(
			ctx, execCfg,
			func(ctx context.Context, txn *kv.Txn, d *descs.Collection) error {

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @rafiss)


pkg/sql/user.go, line 681 at r3 (raw file):

Previously, ajwerner wrote…

I quarrel only with this choice of wrapping, and that is a nit. Logically looks fine to me.

The below wrapping will lose you one level of indentation

		return DescsTxn(ctx, execCfg, func(
			ctx context.Context, txn *kv.Txn, d *descs.Collection,
		) error {

Thanks, done.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, and @knz)


-- commits, line 11 at r5:
what's the plan for migrating passwords for users that are used very infrequently? i.e., if a user goes years without logging in? it seems we can't do a long-running migration during the cluster upgrade, since we also want the auto-upgrade decision to respect the cluster setting


pkg/security/password.go, line 852 at r5 (raw file):

		// if the new cost is high and the operation takes a long time, so
		// the operator knows what's up.
		log.Infof(ctx, "hash conversion: computing a SCRAM hash with iteration count %d (from bcrypt cost %d)", scramIterCount, bcryptCost)

should this be in the USER_ADMIN logging channel since it's a "Change to authentication credentials"?


pkg/security/password.go, line 864 at r5 (raw file):

	// - it checks that we're able to re-parse what we just generated,
	//   which is a safety mechanism to ensure the result is valid.
	// - it converts the raw bytes to the PasswordHash interface, which

it doesn't seem like PasswordHash is used in the return types


pkg/sql/user.go, line 678 at r5 (raw file):

	runFn := getUserInfoRunFn(execCfg, username, "set-hash-timeout")

	return runFn(ctx, func(ctx context.Context) error {

does this mean that the updateuserPasswordHash step will happen with a separate timeout than the authentication flow? doesn't that mean that a login can now take twice as long as the configured timeout? should updating the hash be done in a background thread?

@knz
Copy link
Contributor Author

knz commented Jan 17, 2022

what's the plan for migrating passwords for users that are used very infrequently? i.e., if a user goes years without logging in? it seems we can't do a long-running migration during the cluster upgrade

We can't do an automatic background migration because we need the cleartext password to upgrade. There's no way to do this automatically without a successful cleartext login.

Our recommendation in docs will be that operators who care about migrating everyone need to re-issue credentials to everyone.

@knz knz force-pushed the 20211228-scram9 branch 2 times, most recently from fdff4d7 to 7ca14c9 Compare January 19, 2022 14:10
@knz knz force-pushed the 20211228-scram10 branch 2 times, most recently from 8f1ae00 to 55b9a2c Compare January 19, 2022 14:14
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @rafiss)


pkg/security/password.go, line 852 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

should this be in the USER_ADMIN logging channel since it's a "Change to authentication credentials"?

I was thinking this was just a debugging message, for when things could be unusually slow. The final event is logged to USER_ADMIN by the function in the sql package, only after the UPDATE has completed successfully.
Added an explanatory comment. What do you think?


pkg/security/password.go, line 864 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

it doesn't seem like PasswordHash is used in the return types

Good call. This comment was stale. Fixed.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our recommendation in docs will be that operators who care about migrating everyone need to re-issue credentials to everyone.

sounds good

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @rafiss)


pkg/security/password.go, line 852 at r5 (raw file):

Previously, knz (kena) wrote…

I was thinking this was just a debugging message, for when things could be unusually slow. The final event is logged to USER_ADMIN by the function in the sql package, only after the UPDATE has completed successfully.
Added an explanatory comment. What do you think?

got it, sounds great


pkg/sql/user.go, line 678 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

does this mean that the updateuserPasswordHash step will happen with a separate timeout than the authentication flow? doesn't that mean that a login can now take twice as long as the configured timeout? should updating the hash be done in a background thread?

ping on this question

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @rafiss)


pkg/sql/user.go, line 678 at r5 (raw file):
oh I hadn't seen this.

Yes, the current implementation has this property, which is also highlighted in the release note:

This conversion will increase the latency of that initial login by a
factor of ~2x, but it will be reduced again after the conversion
completes.
When login latency is a concern, operators are invited to perform the
password conversion ahead of time, by computing new SCRAM hashes for
the clients via ALTER USER/ROLE WITH PASSWORD.

We could do this in a "background thread", although there are two separate concerns to solve:

  • we should avoid storms of concurrent goroutines. If we start doing asynchronous stuff for this, we'll need a queue / semaphore.
  • this then also asks complicated questions: what to do if the same username starts multiple sessions concurrently? With the current code, most will pick up the change (thx to table version bump) after the first one has logged in and the UPDATE statement won't even be run. If we spawn the work asynchronously, the detection will not be immediate and we might end up running many UPDATEs concurrently.
    (NB: this last point is not a correctness issue, thanks to the WHERE clause. We're simply talking about load here.

How do you feel about this? Do you think that a one-off latency bump, with a workaround to issue ALTER ROLE WITH PASSWORD out-of-band to avoid it, is unacceptable?

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @rafiss)


pkg/sql/user.go, line 678 at r5 (raw file):

Previously, knz (kena) wrote…

oh I hadn't seen this.

Yes, the current implementation has this property, which is also highlighted in the release note:

This conversion will increase the latency of that initial login by a
factor of ~2x, but it will be reduced again after the conversion
completes.
When login latency is a concern, operators are invited to perform the
password conversion ahead of time, by computing new SCRAM hashes for
the clients via ALTER USER/ROLE WITH PASSWORD.

We could do this in a "background thread", although there are two separate concerns to solve:

  • we should avoid storms of concurrent goroutines. If we start doing asynchronous stuff for this, we'll need a queue / semaphore.
  • this then also asks complicated questions: what to do if the same username starts multiple sessions concurrently? With the current code, most will pick up the change (thx to table version bump) after the first one has logged in and the UPDATE statement won't even be run. If we spawn the work asynchronously, the detection will not be immediate and we might end up running many UPDATEs concurrently.
    (NB: this last point is not a correctness issue, thanks to the WHERE clause. We're simply talking about load here.

How do you feel about this? Do you think that a one-off latency bump, with a workaround to issue ALTER ROLE WITH PASSWORD out-of-band to avoid it, is unacceptable?

good point on the one-off latency bump -- with that i think the 2x increase sounds fine

@knz
Copy link
Contributor Author

knz commented Jan 20, 2022

NB: I'm holding off merging this PR until all the PRs in the sequence are ready (otherwise, the entire chain will be auto-closed by github).

Release note (security update): In order to promote a transition
to SCRAM-SHA-256 for password authentication, CockroachDB now
automatically attempts to convert stored password hashes
after a cleartext password authentication
succeeds to SCRAM-SHA-256, if the target hash method configured via
`server.user_login.password_encryption` is `scram-sha-256`.

This auto-conversion can happen either during SQL logins
or HTTP logins that use passwords, whichever happens first.

When it happens, a structured event of type `password_hash_converted`
is logged to the SESSIONS channel.

The PKBDF2 iteration count on the hash is chosen in order to preserve
the latency of client logins, to remain similar to the latency
incurred from the starting bcrypt cost.
(For example, the default configuration of bcrypt cost 10 is converted
to a SCRAM iteration count of 119680.)

This choice, however, lowers the cost of bruteforcing passwords for an
attacker with access to the encoded password hashes, if they have
access to ASICs or GPUs, by a factor of ~10. For example, if it would
previously cost them $1M to bruteforce a `crdb-bcrypt` hash, it would
now cost them "just" $100k to bruteforce the `scram-sha-256` hash that
results from this conversion.

If an operator wishes to compensate for this, three options are
available:

1. they can set up their infrastructure such that only passwords with
   a high entropy can be used. This can be achieved by e.g. disabling
   the ability of end-users to select their own passwords and
   auto-generating passwords for them, or enforce some entropy checks
   during password selection. This way, the entropy of the password
   itself compensates for the lower hash complexity.

2. they can manually select a higher SCRAM iteration count.  This can
   be done either by pre-computing SCRAM hashes client-side and providing
   the precomputed hash using ALTER USER WITH PASSWORD; or adjusting the
   cluster setting
   `server.user_login.password_hashes.default_cost.scram_sha_256` and
   asking CockroachDB to recompute the hash.

3. they can disable the auto-conversion of `crdb-bcrypt` hashes to
   `scram-sha-256` altogether, using the new cluster setting
   `server.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled`.
   This approach is discouraged however, as it removes the other
   security protections offered by SCRAM authentication.

NB: the conversion also only happens if the target configured method
via `server.user_login.password_encryption` is `scram-sha-256`,
because the goal of the conversion is to move clusters towards that
configuration. This acknowledges cases when the operator had a
legitimate reason to select another value and future versions of of
CockroachDB that may use a new authentication method.

Release note (sql change): After a cluster upgrade, the first time a
SQL client logs in using password authentication, the password will be
converted to a new format (`scram-sha-256`) if it was encoded with
`crdb-bcrypt` previously.

This conversion will increase the latency of that initial login by a
factor of ~2x, but it will be reduced again after the conversion
completes.

When login latency is a concern, operators are invited to perform the
password conversion ahead of time, by computing new SCRAM hashes for
the clients via ALTER USER/ROLE WITH PASSWORD.

This conversion can also be disabled via the new cluster setting
`server.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled`.
Base automatically changed from 20211228-scram9 to master January 21, 2022 09:12
@craig craig bot merged commit 03556f9 into master Jan 21, 2022
@craig craig bot deleted the 20211228-scram10 branch January 21, 2022 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants