Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Support custom resolvers for external references #246

Closed
tamasfe opened this issue Jul 10, 2021 · 7 comments
Closed

Proposal: Support custom resolvers for external references #246

tamasfe opened this issue Jul 10, 2021 · 7 comments

Comments

@tamasfe
Copy link
Contributor

tamasfe commented Jul 10, 2021

Hi, I'm planning to use this library in my project, which is mostly used a constrained WASM environment.

This means that the filesystem, and http requests might not be directly available.

Is supporting a custom external resolver possible?

I propose the following:

Turn Resolver into a trait that could be used to resolve schemas in an unknown environment.

I'm thinking of something like this:

// The resolver that is used for external references.
//
// Internal references such as `#/definitions` and JSON pointers would still be handled internally.
trait Resolver<'resolver> {
    // Allow a custom error type with additional constraints if needed.
    type Error: std::error::Error;

    // The parent schema is provided so that the resolver can resolve relative refs
    // based on the schema `$id` or `id`.
    //
    // `&str` would be used instead of `Url` to support relative values like `./schema`.
    //
    // Returning a `Cow` would allow the resolver to cache schemas, however I'm not sure
    // whose responsibility caching should be.
    fn resolve(
        &self,
        schema: &Value,
        reference: &str,
    ) -> Result<Cow<'resolver, Value>, Self::Error>;
}

Then allow compilations with a custom resolver while providing a default one:

impl<'a> JSONSchema<'a> {
    pub fn compile_with(schema: &'a Value, resolver: impl Resolver<'a>) -> Result<JSONSchema<'a>, ValidationError<'a>> {
        // ...
    }

    pub fn compile(schema: &'a Value) -> Result<JSONSchema<'a>, ValidationError<'a>> {
        Self::compile_with(schema, DefaultResolver::new())
    }
}

Additionally an async version could be supported in a similar way which could help with resolution of many references. This would also be needed to support reqwest in WASM contexts.

trait AsyncResolver<'resolver> {
    type Error: std::error::Error;

    type ResolveFuture: Future<Output = Result<Cow<'resolver, Value>, Self::Error>> + 'resolver;

    fn resolve(
        &mut self,
        schema: &Value,
        reference: &str,
    ) -> Self::ResolveFuture;
}

impl<'a> JSONSchema<'a> {
    pub async fn compile_async_with(schema: &'a Value, resolver: impl AsyncResolver<'a>) -> Result<JSONSchema<'a>, ValidationError<'a>> {
        // ...
    }

    pub async fn compile_async(schema: &'a Value) -> Result<JSONSchema<'a>, ValidationError<'a>> {
        Self::compile_async_with(schema, DefaultAsyncResolver::new())
    }
}

There are several questions to resolve (such as custom error types and schema caching, and probably more).

Are you open to the idea?

@Stranger6667
Copy link
Owner

Hi @tamasfe

Is supporting a custom external resolver possible?

Not at the moment, but I was thinking of implementing it one day (my goal was to be as extendable as the Python jsonschema package).

Are you open to the idea?

I am delighted to see this proposal! Thank you so much for bringing this up :) I am going to look deeper into details next week, but currently, I feel positive about this :)

Here are my thoughts on the details:

It might be better to reuse the existing builder pattern and set resolver there:

let compiled = JSONSchema::options()
    .with_resolver(resolver)
    .compile(&schema)

So, the default resolver will be chosen if nothing is provided to the builder.

On the async part - probably the schema compilation itself won't be much different from what it is now (sync) as the $ref keyword is not evaluated at this time, mostly due to possible circular references. It is present in meta-schemas that are used during compilation, but they have only local references. Though, once this limitation is solved, I'd be happy to have compilation async as well.

So, it might be validate_async instead;

Let me know what do you think :)

@tamasfe
Copy link
Contributor Author

tamasfe commented Jul 10, 2021

I'm glad you're up for it! :)

I would very much prefer bringing resolution forward to the compilation phase, in a lot of cases it is done only once, while validations can easily be part of hot loops. (this is also how Ajv works)

Circular references are indeed a blocker for this, I will look into the library a bit deeper and figure out ways to handle them cleanly.

My argument against making the Resolver a part of CompilationOptions is that if we do that we either have to carry the generic type around, or we have to box it. This is even more problematic if we support both Resolver and AsyncResolver (although we can combine these into a single trait).

With the API I'm proposing, the compilation/validation would look like this:

let mut resolver = MyResolver::default();

let schema_validator = JSONSchema::compile_async_with(&schema, &mut resolver).await?;
let another_schema_validator = JSONSchema::compile_async_with(&another_schema, &mut resolver).await?;

'hot: loop {
    // No async here, should be as fast as possible.
    schema_validator.validate(&some_input)?;
    another_schema_validator.validate(&another_input)?;
}

@Stranger6667
Copy link
Owner

I would very much prefer bringing resolution forward to the compilation phase, in a lot of cases it is done only once, while validations can easily be part of hot loops. (this is also how Ajv works)

Indeed, it will be very beneficial! In this case, I hope we can get rid of RwLock inside RefValidator, which will remove the overhead implied by read(), and as you said, there will be no ref resolution at this point at all.

My argument against making the Resolver a part of CompilationOptions is that if we do that we either have to carry the generic type around, or we have to box it. This is even more problematic if we support both Resolver and AsyncResolver (although we can combine these into a single trait).

But it still carried by JSONSchema, right? So it is generic over Resolver. Are there some implications for the end-users? Or is it more on the internal implementation complexity/boilerplate?

With the API I'm proposing, the compilation/validation would look like this:

My concern is that it introduces the second way for configuring JSONSchema - besides options. It would be nice to keep the API surface smaller.

Though, at the moment, for me, it is hard to compare those approaches, probably some more implementation details will be helpful.

On the validate method. You're right - it seems like we can move all reference resolving (including custom vocabularies like #243) to the compilation phase.

@alexjg
Copy link
Contributor

alexjg commented Jul 12, 2021

I'm not so sure that you can move all reference resolving to the compilation phase. In particular the data access vocabulary requires looking up a value somewhere else in the document currently being validated, this value can contain references so that means you would need to be able to do resolution at runtime. Obviously this is something that would have performance implications and is not something you would want to allow in general, so maybe it would be worth distinguishing in the type system in some way.

@tamasfe
Copy link
Contributor Author

tamasfe commented Jul 12, 2021

Are there some implications for the end-users? Or is it more on the internal implementation complexity/boilerplate?

I'm just concerned about the boilerplate, no breaking API change is involved with any changes I have in mind.

Introducing generic types for CompilationContext might involve a lot of boilerplate. However this can be avoided if it is turned into a trait and we just write impl CompilationContext for all compile functions, this way we wouldn't have to carry every type around.

It would be nice to keep the API surface smaller.
Though, at the moment, for me, it is hard to compare those approaches, probably some more implementation details will be helpful.

After looking more into it, I believe what you suggest would be better, we'd have to make CompilationContext generic anyway, and making the options generic on top of that isn't a huge step in complexity.

I'm not so sure that you can move all reference resolving to the compilation phase.

That's unfortunate, then we have to box the resolver anyway, even worse require it to be Clone, so that each validator can have its own instance of it. We can still make everything generic over it in the compilation phase though.

so maybe it would be worth distinguishing in the type system in some way

I don't think that it's necessary, as I believe it wouldn't be different from a validator (currently isn't), if no resolution is needed, a validator is simply not made for it.

@JeremyRubin
Copy link

Another note on this is that because we pull in some sort of clock somewhere, we can't validate without a reference to a "now" function. In WASM we may not want time to be defined for determinism -- I was able to work around this by making a no_mangle now function, but that seems like a hack...

Stranger6667 pushed a commit that referenced this issue Jan 31, 2022
* feat: resolver

* chore: rustfmt

* chore: even more rustfmt

* chore: relax anyhow version requirement

* chore: restore url dependency requirement

* fix(resolver): jsonschema/src/resolver.rs typo

Co-authored-by: Dmitry Dygalo <Stranger6667@users.noreply.github.com>

* fix(resolver): python binding

* feat(resolver): include original ref string, docs

* feat(resolver): reqwest feature compile error

* chore(resolver): changelog

* fix(*): clippy

* fix(resolver): doctest example

* feat(resolver): reqwest feature compile error

* chore(resolver): changelog

* fix(*): clippy

* fix(resolver): doctest example
@Stranger6667
Copy link
Owner

Resolved via #341

Will be a part of the new 0.15.0 release. Thanks, @tamasfe for implementing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants