-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11132. Update the OLDER_CLIENT_REQUEST validations to check against bucket types. #6919
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sadanand48 Thanks for working over this, can have one minor fix to have clarity over validation usages
...manager/src/main/java/org/apache/hadoop/ozone/om/request/validation/ValidationCondition.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
Obviously they don't otherwise the problem would have been caught. |
Sure @adoroszlai let me add a unit test |
@adoroszlai , I have added a unit test for the specific case for which the issue was caught i.e getFileStatus, we could maybe add a comprehensive test in a follow up jira? |
Thanks for adding the unit test. Created HDDS-11134 for covering this with acceptance test. If you would like to add further unit or integration test, please feel free to create another task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is correct. Lines like this should still be working correctly. I think something else is wrong and this change is masking the problem.
Ideally all the compat requirements would be specified in the annotation but I think there were some limitations there/we needed to get something out quick so we did it this way to start and never came back to it. @fapifta might remember, but its been a few years 😄
@@ -460,7 +460,7 @@ public static OMRequest handleCreateBucketWithBucketLayoutDuringPreFinalize( | |||
* they do not understand. | |||
*/ | |||
@RequestFeatureValidator( | |||
conditions = ValidationCondition.OLDER_CLIENT_REQUESTS, | |||
conditions = ValidationCondition.OLDER_BUCKET_LAYOUT_CLIENT_REQUESTS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different than what is happening on lines 469-470?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Ozone 1.4.0, client supports bucket layout. The feature validator is wrongly rejecting the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rejection does not happen in the annotation. The annotation determines when the method should be called. In this case this method is called when we get an older client request. That's why the method to check the condition is called shouldApply. The body of the method determines whether to reject or modify the request. The body of the method has a more specific check for specific client version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have multiple such validations, this method takes care but other methods don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation determines when the method should be called. In this case this method is called when we get an older client request. That's why the method to check the condition is called shouldApply. The body of the method determines whether to reject or modify the request.
It's very confusing for both the annotation and the method to have conditions. If the method checks the client request anyway, it can be called regardless of client version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I also suggested a refactor, but in a follow up. The refactor done here is not good because it requires updates to two enums for each client layout version.
Ideally all the compat requirements would be specified in the annotation
Yes this part worked but if you check, lines like [these] ( Line 663 in 4d29b6c
these it would fail. We need proper coverage of tests to detect if all API's are compatible but based on what I checked , the current change must take care of most cases. |
Simple acceptance test: adoroszlai@8664d79, reproduces the problem on |
Ok so it looks like the conditional in the body was missed for some requests. That is the proper fix. Ideally all the params should be specified in the annotation but that is an API change that should be its own issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sadanand48 thanks for the patch. The changes look good to me.
@@ -460,7 +460,7 @@ public static OMRequest handleCreateBucketWithBucketLayoutDuringPreFinalize( | |||
* they do not understand. | |||
*/ | |||
@RequestFeatureValidator( | |||
conditions = ValidationCondition.OLDER_CLIENT_REQUESTS, | |||
conditions = ValidationCondition.OLDER_BUCKET_LAYOUT_CLIENT_REQUESTS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*/ | ||
OLDER_CLIENT_REQUESTS { | ||
OLDER_BUCKET_LAYOUT_CLIENT_REQUESTS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@errose28 This fixes the issue.
@swamirishi and I had a quick call to go over the changes. We agreed that overall the proper way to do these validations would be something like this:
The current PR implementation is not good because for each client version added, you also need to add a new condition in a different enum. For this PR however, we should avoid refactoring and just restore correct functionality. This means making all the methods work the same way. So methods like disallowLookupKeyWithBucketLayout in this change would just add a condition like the other methods are using:
I'm in favor of the refactor to prevent future errors, but I think it should be its own change. |
In my defense, prior to this change, the enum OLDER_CLIENT_REQUESTS was only used in the context of bucket layout based validations so the rename seemed convenient and easy. However I agree that we can add the additional condition in the all methods uniformly to maintain consistent behaviour. |
Thanks @errose28 for the explanations. With the refactor (which I agree should be done in follow-up), condition will be moved from the validator method to the annotation. Until then, I would suggest removing the version condition from usages of the annotation as part of this fix. (So only keep it in the method body.) Consider introducing a new version The annotation is great for handling conditions unrelated to the validator's logic, like "not yet finalized" state. As soon as the framework can handle client versions (without the 2-enum problem you mentioned), validators will no longer have to check client version and thus the "unrelated condition" statement will still be true. |
If I'm understanding correctly, this would look something like this:
We can do that. It does run more code in the common case where client and server are the same version. There is a reflection call on this code path that the annotation condition is able to bypass but I'm not sure if that's a legitimate concern for performance or not. We do have metrics there fwiw. I'm not opposed to removing the |
Sounds like a legitimate concern. By using |
@adoroszlai To retain existing behaviour , we not only need to remove the version condition from usages of the annotation but also modify the all methods using the annotation to add a check like this. Currently all methods don't have this check.
I think this was intended when the original change was made in most methods where the additional condition is not present. I think we would need a proper refactor with the conditions included in all the methods using the annotation in a separate change. |
I need to revise that: it should not be a new OLDER_CLIENT_REQUESTS {
public boolean shouldApply(OMRequest req, ValidationContext ctx) {
return req.getVersion() < ClientVersion.CURRENT_VERSION || isTesting();
}
} |
I understand that. Adding the condition in the method body is the fix, as suggested by @errose28. Changing the condition of the annotation (removing it (as I originally suggested) or disabling it in tests (as refined my thinking)) would help validate the fix, and prevent similar issues in the future. |
Adding another condition would not be sufficient. @adoroszlai do you mean we add another condition in the condition list? |
@errose28, @adoroszlai I was giving this another thought and seemed to wonder why the addition of new enums is a bad choice compared to making multiple modifications(if conditions in all Request classes) as the other PR does. In one approach we add multiple enums for different versions that need validations and define methods that use the annotations to perform specific validations for that version and in the other we club all validations under a single method that is invoked for all older versions. Why is one better than the other? Is it for performance reasons? Isn't the former approach a cleaner way to isolate version based validations? |
This is implemented in adoroszlai@c625cdd and sure enough, basic acceptance tests are failing with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi everyone, sorry for being late to the party, but let me add try to add to the context of this change.
The RequestFeatureValidator annotation is designed to mark a method that validates or rewrites a request/response, and either does corrective actions on the request/response to ensure compatibility, or returns a failure response in case the client can not handle the returned value again in case the client is incompatible with the server.
The basic idea behind this is this:
- if an old client sends a request, we may change the request with some desired values to conform with the server's new requirements and process the request afterwards
- if an old client would get a response that it will not be able to process correctly, then we can fail early by sending back a failure response, or we can rewrite the response so that the old client can process it properly
- if a newer client sends a request, it can make it conform with an old server on its own.
Just a sidenote:
There were some additional thoughts put into cases where the a request is initiating a request to an other server and incorporates the result to its own results, like when the flow is client->OM->SCM->OM->client e.g. block locations in a getFileInfo call, but nothing else was implemented outside OM's request handling from the generic framework. There are some additional checks added to .*ClientSideTranslatorPB classes or within SCM's protocol servers but those were/should have been shortcuts only until this framework is done, but we could not get back to finish it so far.
Now that said, during the implementation we came up with an expensive initialization of a structure where we have a 3 layered map structure in ValidatorRegistry, within which we can look up the validators that needs to be applied to the given validation condition, then the given request type, then the request processing phase.
Validation condition in this sense is there to separate two cases one is important from the perspective of upgrades, the other one is important from the perspective of compatibility.
New features should be disabled in case the cluster is not finalized yet, in this case what we usually want to do, is to provide a meaningful error message and reject the request, but there might be scenarios where we just want to fall back to the old behaviour until the cluster is finalized, and for that we rewrite parts of the requests (there is no such use case yet afaik).
New features should not be present in responses to old clients, and in some cases they need to change requests from old clients in order to be processed properly with that ensuring compatibility with old clients.
Now let's say we want to process several hundred thousands of requests per second, and we want to hook this check into the very beginning of request processing or at the very end for response rewrites, we needed a really quick way to figure out what to apply. This is why we have the 3 layer map, from which we can get the actual list of request feature processors that apply to the given request and nothing else, and then we can call these or skip the whole check if there isn't any applicable processors.
The lookup first considers if we are in a pre-finalized state or if we are talking to an older client, if none of this is true, then skips the whole lookup.
Then as a next step the lookup queries the appropriate map based on if we are in a pre-finalized state or if an older client sent the request, within that the layer 2 map defined for the request type, and within that the list of processors for the actual processing phase pre or post.
Note that pre or post processors have a different role and a different signature, preprocessors can rewrite a request, post processors can rewrite a response.
Also note that in a pre-finalized state we do not return the processors for the case when an older client sends the request, and we can do this because in the pre-finalized state the new features are disabled.
Ok, now as I have set the floor, let's look at the problems we have.
The biggest problem is that we have added new client versions without thinking about what validators have to be added, so with that HDDS-10983 added a custom check in the request processing code for the newly added version. This is most likely on me, as the request feature validator framework should be tribal knowledge, and we even miss a very generic documentation for it.
The next problem we have is that we were not mindful enough when we added the first validations, and there isn't any checks for the actual version of the client in the current validators, or in the way how we determine which validations to run.
And finally we also do not finished this framework and postponed a good couple of things, instead we applied shortcuts here and there to solve the handling of the changes regarding bucket layouts due to time constraints.
Let's see now the current patch that tries to address the most burning problem, namely that we broke functionality by just adding a new client version.
Given the original intent of the OLDER_CLIENT_REQUESTS ValidationCondition, that covers the whole problem space, now with this change, we do not cover the case when we have a server that expects ClientVersion.EC_REPLICA_INDEX_REQUIRED_IN_BLOCK_REQUEST and a client that has ClientVersion.BUCKET_LAYOUT_SUPPORT so even if we write a method and annotate it with the RequestFeatureValidator annotation that will not be found and applied with the code in the PR.
The question may arise why this is not a problem with a server expecting ClientVersion.BUCKET_LAYOUT_SUPPORT and a client which has ClientVersion.ERASURE_CODING_SUPPORT or ClientVersion.VERSION_HANDLES_UNKNOWN_DN_PORTS. This is not a problem in general as the 1.2 Ozone release did not have ClientVersion at all, and we introduced ClientVersion in 1.3, where we already had the version up at BUCKET_LAYOUT_SUPPORT.
With all that said, I agree with @errose28 that this patch is not the one we are looking for. It goes against the overall idea and desing of the framework, and also it introduces an other problem.
What possible solutions we have?
One way is to add a version check to all the methods that are annotated with the RequestFeatureValidator annotation, and we ensure that these are applied only to the case where the ClientVersion is not before the CURRENT_VERSION, but before the BUCKET_LAYOUT_SUPPORT.
An other way is to introduce a 4th layer in the multi layered map that we use to get the applicable validators, and introduce a new mapping based on the maximum version for which the validators is applicable, and then when the ValidatorRegistry provides the list of applicable validators, it has to concatenate the list of provided validators for all the versions that are in between the CURRENT_VERSION, and the version the client has.
I tend to believe we should go down the second route where we extend the framework to handle the versions properly, but that seems to be a bit more involved compared to put if statements to the existing validator methods for now, and defer the generic solution when we pick up and finish the work on the request feature validation framework. So I am fine with both ways, I am also open to other ideas, but I think it is clear why @errose28 and me are refraining to approve changing the ValidationCondition itself.
It is too late for me to re-read what I wrote down so far, I hope it is understandable, in case of something is not understandable, please do not hesitate to ask, and apologies for any mistakes that makes understanding and reading this comment, also sorry for the long comment I felt that it is important that we are on the same page regarding the intents and internals of the whole framework.
Two additional thoughts for a proper fix:
|
@fapifta I have raised a patch for the same #6932
|
Can we close this PR? I guess the direction we would take is more in the direction outlined in HDDS-11149 (#6932), but please correct me if I am wrong and the two ideas are still competing. |
What changes were proposed in this pull request?
With the introduction of a new client version in https://issues.apache.org/jira/browse/HDDS-10983, the latest version is 4.
The OLDER_CLIENT_REQUEST validation checked against the latest versions for access to newer non-legacy buckets but since the addition of a newer version all client versions prior to 4 becomes an older client and blocks access to OBS/FSO buckets. This should be fixed to check against version 3.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11132
How was this patch tested?
Added unit test