-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent Secure Settings #40416
Consistent Secure Settings #40416
Conversation
Pinging @elastic/es-core-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind this seems like it will work. I think what is missing is how a consumer of a secure setting will use this. Currently I only see a global consistent or not consistent; do you really want to fail all features if a single setting is inconsistent?
Regarding the lack of dealing with reloaded secure settings; is that going to be addressed as part of completion of this feature?
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
Thanks for reviewing @jaymode ! I will ping again when I have added tests and refine the client interface.
This is an important validation for me!
No, Indeed the API should allow for checking consistency for a subset of the settings! I will add this.
When I started working on it I was NOT considering this. This is because right now we consider all secure settings final, unless otherwise pointed out in the docs, and until the need arose I was reluctant to do any work. I will re-evaluate this as it progresses, with this approach it does not sound complicated to add it. |
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/common/settings/ConsistentSecureSettingsValidatorService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. I have just a few more comments, mostly minor.
...tools/keystore-cli/src/test/java/org/elasticsearch/common/settings/KeyStoreWrapperTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/settings/SecureSetting.java
Outdated
Show resolved
Hide resolved
*/ | ||
public byte[] getSecretDigest(Settings settings) { | ||
final SecureSettings secureSettings = settings.getSecureSettings(); | ||
if (secureSettings == null || false == secureSettings.getSettingNames().contains(getKey())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we be looking up a digest for a setting that doesn't exist?
@@ -175,13 +175,29 @@ private void registerSetting(Setting<?> setting) { | |||
if (existingSetting != null) { | |||
throw new IllegalArgumentException("Cannot register setting [" + setting.getKey() + "] twice"); | |||
} | |||
if (setting.isConsistent()) { | |||
if (setting instanceof Setting.AffixSetting<?>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are only going to support this setting property for secure settings, we should clarify that in the javadocs. But why couldn't this work for normal settings as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not thought about consistent ordinary node settings at all. My reasoning is that we have a better way of actually enforcing consistency with cluster settings for all the non-secure node settings.
If we think we need a form of detection of inconsistency of other node settings, I would think of making use of the cluster scoped settings infra, to publish the actual values, rather than hashes as is the case here. In that case we might automatically achieve consistency, without admin intervention.
Can we leave this for another PR? I have updated the javadocs, there were just a few places that alluded to this aspect.
server/src/test/java/org/elasticsearch/common/settings/ConsistentSettingsIT.java
Outdated
Show resolved
Hide resolved
public void testAllConsistentSuccess() throws Exception { | ||
internalCluster().getInstances(Environment.class).forEach(environment -> { | ||
ClusterService clusterService = internalCluster().getInstance(ClusterService.class); | ||
assertTrue(new ConsistentSettingsService(environment.settings(), clusterService, Collections.emptyList()).areAllConsistent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all seem like unit tests. Why couldn't they be in the Tests
class instead of this IT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they're legit integration tests IMO.
They couple the hashes publication with the cluster formation, also starting new nodes. They are end to end tests, where we publish from one node and verify on another.
I am a bit proud of them. The unit tests are in ConsistentSettingsServiceTests
where we intercept the cluster state and do checks on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like you are actually using the nodes really, except to do the creation of ctor args. I would expect an integ test to check the node as it is, but all these assertions construct a service that should be running inside the node. They look like unit tests where you want to tweak the state in eg clusterService through mocks, which is much lighter weight than spinning up even in memory nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do not tweak the cluster state. They look odd because there is no API for this functionality yet, so I have to construct the ConsistentSettingsService
from the node's ClusterService
as an API would need to do it. The ConsistentSettingsService
ctr args are important to be on different nodes (the tests iterates over all of them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean about different nodes. The ctor args are always going to come from the local node. I don't think there is any reason all of the state that would be provided by different nodes couldn't be mocked. This would make the tests much faster, and easier to debug (no inter node communication).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed on another channel and agreed that integration tests should be much thinner.
@rjernst I have pared down ConsistentSettingsIT
; it contains a test for the success validation and another test for the failed validation, when a new node joins with different settings. The failed validation scenarios (of which there are two) are randomized, because all cases should be covered in ConsistentSettingsServiceTests
.
assertTrue(new ConsistentSettingsService(environment.settings(), clusterService, Collections.emptyList()).areAllConsistent()); | ||
assertTrue(new ConsistentSettingsService(environment.settings(), clusterService, | ||
Collections.singletonList(DUMMY_STRING_CONSISTENT_SETTING)).areAllConsistent()); | ||
assertTrue(new ConsistentSettingsService(environment.settings(), clusterService, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these assert true/false are very opaque. Could we have a helper assertion method that checks the consistency, and dumps the service state if it fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If by opaque you mean hard to troubleshoot when they go awry, then there should be WARN logs on the node that failed the verification, that show the setting name and the hash value. I think those are enough? Do you see a need to publish the hashes in the junit output too? I think these are pretty useless anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have been working on this, I'm sure you know exactly where to look in the logs. But some random dev that sees one of these tests fail has no idea when a true/false assertion fails with zero context in the test failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added messages to assertions which also prints the hashes map.
server/src/test/java/org/elasticsearch/common/settings/SettingsModuleTests.java
Outdated
Show resolved
Hide resolved
Hi Ryan, can you please take another look. I have simplified the integration tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok for now, as long as we follow up with figuring out a consistent approach to whether regular settings can be consistent (maybe they shouldn't be allowed, but then we need to make sure documentation and appropriate errors occur).
} else if (setting instanceof SecureSetting<?>) { | ||
secureSettingConsumer.accept((SecureSetting<?>) setting); | ||
} else { | ||
assert false : "Unrecognized consistent secure setting [" + setting.getKey() + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts aren't always enabled in production systems. This should probably be a normal exception, maybe an IllegalStateException? Same for above on asserting consistency? Doesn't have to be a normal exception for all assert cases, just something to think about whether it is a reachable path, which would be a silent error if asserts are disabled.
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes a single public method, namely `allSecureSettingsConsistent`. The method returns `true` if the local node's secure settings (inside the keystore) are equal to the master's, and `false` otherwise. Technically, the local node has to have exactly the same secure settings - setting names should not be missing or in surplus - for all `SecureSetting` instances that are flagged with the newly introduced `Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent` is not a consensus view across the cluster, but rather the local node's perspective in relation to the master.
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes a single public method, namely `allSecureSettingsConsistent`. The method returns `true` if the local node's secure settings (inside the keystore) are equal to the master's, and `false` otherwise. Technically, the local node has to have exactly the same secure settings - setting names should not be missing or in surplus - for all `SecureSetting` instances that are flagged with the newly introduced `Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent` is not a consensus view across the cluster, but rather the local node's perspective in relation to the master.
Introduces a new
ConsistentSecureSettingsValidatorService
service that exposes a single public method, namelyallSecureSettingsConsistent
. The method returnstrue
if the local node's secure settings (inside the keystore) are equal to the master's, andfalse
otherwise. Technically, the local node has to have exactly the same secure settings - setting names should not be missing or in surplus - for allSecureSetting
instances that are flagged with the newly introducedProperty.Consistent
. It is worth highlighting that theallSecureSettingsConsistent
is not a consensus view across the cluster, but rather the local node's perspective in relation to the master.Notes on implementation:
Hashes published in the cluster state by the master, against which the other nodes cross check, are salted and the salt (by definition) is unique. Consequently, for the other nodes to have a chance to validate the hash, they must retain the original values. Caching those values in plaintext across their lifetime is perilous, so instead the value that gets cached is another hash, but this one is unsalted. The unsalted hash is an obfuscation of secret values that still preserves the required bijective function property for the values that need to be checked for equally across the cluster.
Does not yet contain any tests. I would like to get a preliminary approval of the approach before adding the tests and making it ready for review proper.
Limitations:
this is not designed for dynamic secure settings (hashes are fixed and computed at node start-up) but in theory the service can be retrofitted and
ConsistentSecureSettingsValidatorService
can be injected in the reload handler to acknowledge a dynamic change.It is not possible to wait for the settings to become consistent because, right now, the node has to be restarted to change a consistent secure setting.