Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent Secure Settings #40416

Merged
merged 57 commits into from
Jun 29, 2019

Conversation

albertzaharovits
Copy link
Contributor

Introduces a new ConsistentSecureSettingsValidatorService service that exposes a single public method, namely allSecureSettingsConsistent . The method returns true if the local node's secure settings (inside the keystore) are equal to the master's, and false otherwise. Technically, the local node has to have exactly the same secure settings - setting names should not be missing or in surplus - for all SecureSetting instances that are flagged with the newly introduced Property.Consistent. It is worth highlighting that the allSecureSettingsConsistent is not a consensus view across the cluster, but rather the local node's perspective in relation to the master.

Notes on implementation:

Hashes published in the cluster state by the master, against which the other nodes cross check, are salted and the salt (by definition) is unique. Consequently, for the other nodes to have a chance to validate the hash, they must retain the original values. Caching those values in plaintext across their lifetime is perilous, so instead the value that gets cached is another hash, but this one is unsalted. The unsalted hash is an obfuscation of secret values that still preserves the required bijective function property for the values that need to be checked for equally across the cluster.

Does not yet contain any tests. I would like to get a preliminary approval of the approach before adding the tests and making it ready for review proper.

Limitations:

  • this is not designed for dynamic secure settings (hashes are fixed and computed at node start-up) but in theory the service can be retrofitted and ConsistentSecureSettingsValidatorService can be injected in the reload handler to acknowledge a dynamic change.

  • It is not possible to wait for the settings to become consistent because, right now, the node has to be restarted to change a consistent secure setting.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@jaymode jaymode self-requested a review March 25, 2019 18:17
Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this seems like it will work. I think what is missing is how a consumer of a secure setting will use this. Currently I only see a global consistent or not consistent; do you really want to fail all features if a single setting is inconsistent?

Regarding the lack of dealing with reloaded secure settings; is that going to be addressed as part of completion of this feature?

@albertzaharovits
Copy link
Contributor Author

Thanks for reviewing @jaymode ! I will ping again when I have added tests and refine the client interface.

The idea behind this seems like it will work.

This is an important validation for me!

Currently I only see a global consistent or not consistent; do you really want to fail all features if a single setting is inconsistent?

No, Indeed the API should allow for checking consistency for a subset of the settings! I will add this.

Regarding the lack of dealing with reloaded secure settings; is that going to be addressed as part of completion of this feature?

When I started working on it I was NOT considering this. This is because right now we consider all secure settings final, unless otherwise pointed out in the docs, and until the need arose I was reluctant to do any work. I will re-evaluate this as it progresses, with this approach it does not sound complicated to add it.

@albertzaharovits albertzaharovits mentioned this pull request Apr 16, 2019
3 tasks
@rjernst rjernst dismissed their stale review April 16, 2019 22:44

did not mean to click approve yet!

@albertzaharovits
Copy link
Contributor Author

@rjernst @jaymode may I ask for another pass over this one? Apologies for the the long time interval and the incurred cache thrashing!

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. I have just a few more comments, mostly minor.

*/
public byte[] getSecretDigest(Settings settings) {
final SecureSettings secureSettings = settings.getSecureSettings();
if (secureSettings == null || false == secureSettings.getSettingNames().contains(getKey())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we be looking up a digest for a setting that doesn't exist?

@@ -175,13 +175,29 @@ private void registerSetting(Setting<?> setting) {
if (existingSetting != null) {
throw new IllegalArgumentException("Cannot register setting [" + setting.getKey() + "] twice");
}
if (setting.isConsistent()) {
if (setting instanceof Setting.AffixSetting<?>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are only going to support this setting property for secure settings, we should clarify that in the javadocs. But why couldn't this work for normal settings as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not thought about consistent ordinary node settings at all. My reasoning is that we have a better way of actually enforcing consistency with cluster settings for all the non-secure node settings.
If we think we need a form of detection of inconsistency of other node settings, I would think of making use of the cluster scoped settings infra, to publish the actual values, rather than hashes as is the case here. In that case we might automatically achieve consistency, without admin intervention.
Can we leave this for another PR? I have updated the javadocs, there were just a few places that alluded to this aspect.

public void testAllConsistentSuccess() throws Exception {
internalCluster().getInstances(Environment.class).forEach(environment -> {
ClusterService clusterService = internalCluster().getInstance(ClusterService.class);
assertTrue(new ConsistentSettingsService(environment.settings(), clusterService, Collections.emptyList()).areAllConsistent());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all seem like unit tests. Why couldn't they be in the Tests class instead of this IT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they're legit integration tests IMO.
They couple the hashes publication with the cluster formation, also starting new nodes. They are end to end tests, where we publish from one node and verify on another.
I am a bit proud of them. The unit tests are in ConsistentSettingsServiceTests where we intercept the cluster state and do checks on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like you are actually using the nodes really, except to do the creation of ctor args. I would expect an integ test to check the node as it is, but all these assertions construct a service that should be running inside the node. They look like unit tests where you want to tweak the state in eg clusterService through mocks, which is much lighter weight than spinning up even in memory nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not tweak the cluster state. They look odd because there is no API for this functionality yet, so I have to construct the ConsistentSettingsService from the node's ClusterService as an API would need to do it. The ConsistentSettingsService ctr args are important to be on different nodes (the tests iterates over all of them).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean about different nodes. The ctor args are always going to come from the local node. I don't think there is any reason all of the state that would be provided by different nodes couldn't be mocked. This would make the tests much faster, and easier to debug (no inter node communication).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed on another channel and agreed that integration tests should be much thinner.

@rjernst I have pared down ConsistentSettingsIT; it contains a test for the success validation and another test for the failed validation, when a new node joins with different settings. The failed validation scenarios (of which there are two) are randomized, because all cases should be covered in ConsistentSettingsServiceTests.

assertTrue(new ConsistentSettingsService(environment.settings(), clusterService, Collections.emptyList()).areAllConsistent());
assertTrue(new ConsistentSettingsService(environment.settings(), clusterService,
Collections.singletonList(DUMMY_STRING_CONSISTENT_SETTING)).areAllConsistent());
assertTrue(new ConsistentSettingsService(environment.settings(), clusterService,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these assert true/false are very opaque. Could we have a helper assertion method that checks the consistency, and dumps the service state if it fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by opaque you mean hard to troubleshoot when they go awry, then there should be WARN logs on the node that failed the verification, that show the setting name and the hash value. I think those are enough? Do you see a need to publish the hashes in the junit output too? I think these are pretty useless anyway...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have been working on this, I'm sure you know exactly where to look in the logs. But some random dev that sees one of these tests fail has no idea when a true/false assertion fails with zero context in the test failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added messages to assertions which also prints the hashes map.

@albertzaharovits
Copy link
Contributor Author

Hi Ryan, can you please take another look. I have simplified the integration tests.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok for now, as long as we follow up with figuring out a consistent approach to whether regular settings can be consistent (maybe they shouldn't be allowed, but then we need to make sure documentation and appropriate errors occur).

} else if (setting instanceof SecureSetting<?>) {
secureSettingConsumer.accept((SecureSetting<?>) setting);
} else {
assert false : "Unrecognized consistent secure setting [" + setting.getKey() + "]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserts aren't always enabled in production systems. This should probably be a normal exception, maybe an IllegalStateException? Same for above on asserting consistency? Doesn't have to be a normal exception for all assert cases, just something to think about whether it is a reachable path, which would be a silent error if asserts are disabled.

@albertzaharovits albertzaharovits merged commit cdfc986 into elastic:master Jun 29, 2019
@albertzaharovits albertzaharovits deleted the consistent_settings branch June 29, 2019 12:17
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Jun 29, 2019
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes
a single public method, namely `allSecureSettingsConsistent`. The method returns
`true` if the local node's secure settings (inside the keystore) are equal to the
master's, and `false` otherwise. Technically, the local node has to have exactly
the same secure settings - setting names should not be missing or in surplus
- for all `SecureSetting` instances that are flagged with the newly introduced
`Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent`
is not a consensus view across the cluster, but rather the local node's perspective
in relation to the master.
albertzaharovits added a commit that referenced this pull request Jun 29, 2019
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes
a single public method, namely `allSecureSettingsConsistent`. The method returns
`true` if the local node's secure settings (inside the keystore) are equal to the
master's, and `false` otherwise. Technically, the local node has to have exactly
the same secure settings - setting names should not be missing or in surplus -
for all `SecureSetting` instances that are flagged with the newly introduced
`Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent`
is not a consensus view across the cluster, but rather the local node's perspective
in relation to the master.
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Jun 30, 2019
albertzaharovits added a commit that referenced this pull request Jun 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants