-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponential algorithm in BaseJsonValidator.hasAdjacentKeywordInEvaluationPath #1091
Comments
Note: If |
This is trying to optimize the unevaluatedItems and unevaluatedProperties to not collect annotations if it is not required. This generally works since the schemas references are cached by default but doesn't in this case as it's not being cached. I'm open to suggestions as I'm not really sure how to go about this without storing a bunch of state which might defeat the purpose of setting the cache to false. One possibility is to just not call this when cacheRefs is false and to always generate annotations for contains, items, prefixItems, additionalProperties, patternProperties and properties. |
The thing is, caching references does not help, because resolving the reference to its definition is not what is expensive here. Each individual validator instance that cares about these two keywords calls this in a block like the one I quoted above for the A general solution would probably involve caching information at the |
The caching results in the same validator instance being used which should result in the I'm not really sure that caching in the The issue with the second proposal is that I'm not sure if there's an easy way to determine when keywords aren't being used due to how the schema gets loaded. |
Repeating the test with
It still has to climb all the way back up the tree for each distinct evaluation path, and the branching of
That does not appear to be true, given the output given above.
I'll defer to your judgment. I'm very new to this library and more throwing out ideas that I could see working and do not mean to suggeste that I've thoroughly explored the idea and know for certain that it will work.
This is why I suggest that if it cannot be detected automatically, then it could still be possible to disable them in the same way that cached refs are. If the schema actually does contain the keyword, then its validator could detect the misconfiguration and trigger a failure. The other keywords would detect the setting and bypass the search. |
Does it help if you change the private boolean collectAnnotations() {
return !this.validationContext.getConfig().isCacheRefs() || hasUnevaluatedPropertiesValidator();
} |
With this configuration:
It drops to 1:09. Since I also saw the |
I'm a little hesitant on adding a flag that disables the correct evaluation of unevaluatedProperties/unevaluatedItems. One avenue I'm looking at is that by right I would only be interested in adjacent keywords to that particular instance location so it shouldn't have to traverse all the way to the top to figure that out but I seem to be having some trouble getting the condition right without failing some existing tests. Apologies as it might take a while to get a fix for this. |
I would expect the biggest problem to be knowing just how far up you have to go to know the answer for that. For example, if there is an
No need to apologise. My use case is very specialized to the structure of the ADF schema linked above, and I know what features it needs, so I created a greatly simplified poor man's schema checker that... well... cheats. It can process this in 78ms, because it knows that we use a That said, we do have the need for JSON schema validation in other parts of our products, and this library is my preferred answer for those cases, so anything that looks like it could possibly lead to a denial-of-service down the road is a bit scary... |
I shall add a fix which should make the evaluation of the adjacent keywords better. It sounds like you already have a heuristic using If you are open to upgrading your schema to at least Draft 7 you can consider using You can refer to the following example
{
"allOf": [
{
"if": {
"properties": {
"foo": { "const": "aaa" }
},
"required": ["foo"]
},
"then": { "$ref": "#/$defs/foo-aaa" }
},
{
"if": {
"properties": {
"foo": { "const": "bbb" }
},
"required": ["foo"]
},
"then": { "$ref": "#/$defs/foo-bbb" }
}
]
} If you want to use |
Yes, I can confirm that it has a similar impact to the hack I attempted above, dropping the validation time for my example doc down to 28 sec. It still won't work for my specific use case, but it should definitely help others.
I don't control the schema that we are using, so I could only offer that as a suggestion to the team that maintains it. They try very hard to avoid breaking changes in the schema's evolution, but it won't hurt to make sure that they know the option exists. |
The algorithm in
BaseJsonValidator.hasAdjacentKeywordInEvaluationPath
exhibits exponential time complexity. I haven't been able to come up with a trivial testcase to draw this out, but encountered it while validating a test document, which I will attach.To reproduce:
schema
test directory asissueOome.json
)Expected behaviour:
Actual behaviour:
cachedRefs(false)
, takes ~1 min 30 sec on my machineAnalysis:
This method walks back up through the schema to try to find where certain keywords are used. This boolean result is cached locally by the validator, but it is not remembered at the schema level, so in a complex schema the same search is performed at each level of the tree. To illustrate, I altered this method locally to print out what it was doing:
I kicked off the validation and stopped it a few second in. This is a small snippet of what it was doing at the time:
I haven't dug into the code in detail here, but the call points make an attempt to cache the boolean result for the specific keyword they are asked about, such as in
ItemsValidator
:But this caching is not visible to
hasAdjacentKeywordInEvaluationPath
, so any node that misses has to climb back up the entire tree.It looks like this is all associated with implementing the
unevaluatedItems
andunevaluatedProperties
keywords, which I am not even using. Perhaps these could be detected when the schema is first parsed and handled more actively by noting subtrees that contain them up front instead of searching for them constantly or this search could be given a method to use as a callback when it has to scan back up through the schema so that the cached result is accessible to it.The text was updated successfully, but these errors were encountered: