Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patternProperties interferes with properties #183

Closed
duckontheweb opened this issue Apr 8, 2021 · 16 comments
Closed

patternProperties interferes with properties #183

duckontheweb opened this issue Apr 8, 2021 · 16 comments

Comments

@duckontheweb
Copy link
Contributor

duckontheweb commented Apr 8, 2021

It seems like the presence of "patternProperties" interferes with validation that should happen in "properties".

This replit shows an example of an instance that should fail validation (and does fail in jsonschemavalidator.net). If you comment out the "patternProperties" portion of the schema, it fails as expected.

I don't have a good enough understanding of the jsonschema internals, but it seems like this could be related to changes from #173 .

UPDATE: I updated the replit and the jsonschemavalidator link to provide a more minimal example.

@duckontheweb
Copy link
Contributor Author

I'm happy to take a crack at a PR, but I'm a complete Rust novice, so it might be a little ugly at first...

@duckontheweb
Copy link
Contributor Author

In the process of trying to debug this with different schema combinations I also uncovered a case where trying to compile a valid schema leads to a SchemaError.

Here is the schema:

{
  "type": "object",
  "properties": {
    "eo:cloud_cover": {
      "type": "number"
    }
  },
  "patternProperties": {
    "^(?!eo:)": {}
  }
}

Failing example here

@Stranger6667
Copy link
Owner

Hi @duckontheweb !

Thank you for an awesome bug report :) I'd be happy to review a PR, sure. Otherwise, I think that I'll have some time over the weekend to check why the bug is happening

@duckontheweb
Copy link
Contributor Author

The CompilationError happens because regex does not support look-around assertions (rust-lang/regex#127), which is not a bug in jsonschema.

However, it seems like this is also the cause of the original issue. If I change the "patternProperties" regex to use the plain character set equivalent of that negative look-behind ("^([^e]|e($|[^o])).*") then things work as expected (example here).

It seems like the real bug is that compilation should have returned an error in the original example.

@duckontheweb
Copy link
Contributor Author

duckontheweb commented Apr 9, 2021

The CompilationError happens because regex does not support look-around assertions (rust-lang/regex#127), which is not a bug in jsonschema.

One possible solution to this would be to use fancy_regex in place of regex, but I'm not sure what the performance implications would be there...

@duckontheweb
Copy link
Contributor Author

One possible solution to this would be to use fancy_regex in place of regex, but I'm not sure what the performance implications would be there...

I looked into this a bit more, and it's far from trivial to swap these out. Many of the fancy-regex methods like Regex.is_match return a Result instead of a simple primitive value (probably as a result of their use of backtracking VMs), which means the signature of most of the jsonschema functions and methods would have to change as well.

@duckontheweb
Copy link
Contributor Author

@Stranger6667 It looks like the reason that this compiles successfully even though it contains an unsupported look-behind assertion is with the logic introduced in #173. If "additionalProperties" is false, then this match arm means that "patternProperties" is not compiled.

However, the following example is a valid case where "additionalProperties" is false and we still want to compile and use the schema from "patternProperties":

{
    "type": "object",
    "properties": {
        "eo:cloud_cover": {
            "type": "number",
        },
    },
    "patternProperties": {
        "^(?!eo:)": {}
    },
    "additionalProperties": false
}

This pattern is used throughout the STAC extension schemas to basically say "allow additional properties as long as they don't start with this prefix".

Let me know if you want me to open a separate issue for this so that it's not buried in this one.

@Stranger6667
Copy link
Owner

Hi @duckontheweb

Thank you for pinging me :) I will take a deeper look tomorrow

@Stranger6667
Copy link
Owner

Stranger6667 commented May 4, 2021

Oh, it is definitely a bug. With the current regex library, it should fail to compile indeed. I will create a new issue & dig why it is happening. That pattern should be compiled in those "combined" validators.

@Stranger6667
Copy link
Owner

I looked into this a bit more, and it's far from trivial to swap these out. Many of the fancy-regex methods like Regex.is_match return a Result instead of a simple primitive value (probably as a result of their use of backtracking VMs), which means the signature of most of the jsonschema functions and methods would have to change as well.

Probably, in the validate method the Err scenario can be transformed to ValidationError variant and to false in is_valid. I'll check that

@Stranger6667
Copy link
Owner

The Err variant is returned only when the backtrack limit is reached. The default value is 1M which seems like ok for the common use-case. From this library point of view, it will be nice to prevent some DDoS vectors - I'll think more about it, but at the moment, it seems like one similar to ValidationError::UnknownReferenceScheme or ValidationError::FileNotFound, i.e., something that we can't validate in runtime due to the input specifics.

@duckontheweb
Copy link
Contributor Author

Probably, in the validate method the Err scenario can be transformed to ValidationError variant and to false in is_valid. I'll check that

The Regex::new method also returns a Result rather than guaranteeing success, and I think there are a handful of other methods where this is the case, too. I have a fork where I've started to work on this. I'll let you know if I get it to a good point.

@Stranger6667
Copy link
Owner

The Regex::new method also returns a Result rather than guaranteeing success, and I think there are a handful of other methods where this is the case, too

The Regex::new case is already handled (may require to update types to fancy-regex specific) - during compilation, it is converted to CompilationError::SchemaError via the From<regex::Error> impl. During validation Regex::new may happen inside the $ref validator, which is handled by From<CompilationError> that converts to ValidationError::Schema. Other than that jsonschema doesn't use anything else from regex

@Stranger6667
Copy link
Owner

At first glance, I expect that there should be not much hassle with replacing regex with fancy_regex and will be happy to review a PR if you'd like to work on it :)

However, if I missed anything there, I believe that those things should not be hard to solve.

@duckontheweb
Copy link
Contributor Author

Actually, it looks like the biggest challenge is going to be the lack of replace methods. There is a suggested workaround here, so I'll give that a go.

@Stranger6667
Copy link
Owner

Released in 0.9.0! :) Thank you for opening this issue and implementing this improvement :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants