Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generation #184

Closed
chriskapp opened this issue Dec 10, 2016 · 14 comments
Closed

Code generation #184

chriskapp opened this issue Dec 10, 2016 · 14 comments
Labels

Comments

@chriskapp
Copy link

Hi, Iam currently writing a library which can generate simple classes based on a JsonSchema. Besides validation the library can also read json data and use the generated classes to build an object tree. Similar to JAXB where it is possible to generate classes from a XSD and vice versa.

For the generation the combination keywords allOf, anyOf and oneOf are difficult to handle. To be exact anyOf and oneOf are rather easy since they result always in a single schema but with allOf there are some problems. Since the subschemas must be all applied independently the results are multiple types which can not be merged easily. Iam curious whether/how other devs have solved this problem.

Iam aware that at the moment JsonSchema is mostly about validation but I would like to bring some attention to the code generation process. Maybe we could simplify or clarify this process.

At the OAI spec there is also an interesting issue with some thoughts regarding this topic:
OAI/OpenAPI-Specification#741 (comment)

@eskoviak
Copy link

I would be interested in seeing this project.

@handrews
Copy link
Contributor

This is one of many reasons that I avoid generating static classes from JSON Schema and instead use generic data types with behavior driven by the data + the schema at runtime rather than schema-only at code generation time.

This, of course, is easier and more practical in some languages and runtime environments than others.

If I were to dig into generating strongly statically typed code from a schema, I'd probably work on using schema algebra to condense the schemas down. This is not trivial, but many situations aren't too hard:

{
    "type": "object",
    "allOf": [
        {
            "properties": {
                "foo": {"type": "number", "maximum": 42},
                "bar": {"type": "array", "maxItems": 100}
            },
            "additionalProperties": {"type": "string"}
        },
        {
            "properties": {
                "bar": {"type": "array", "items": {"type": "boolean"}},
                "biz": {"pattern": "[a-z]*\\.[a-z]*"}
            },
            "additionalProperties": {"maxLength": 50}
        }
    ]
}

can be transformed into:

{
    "type": "object",
    "properties": {
        "foo": {
            "type": "number",
            "maximum": "42",
            "maxLength": 50,
        },
        "bar": {
            "type": "array",
            "items": {"type": "boolean"},
            "maxItems": 100
        },
        "biz": {
            "type": "string",
            "pattern": "[a-z]*\\.[a-z]*",
        }
    },
    "additionalProperties": {
        "type": "string",
        "maxLength": 50
    }   
}

And while the schema for "foo" looks a bit nonsensical, it actually works just fine- "maxLength" only applies to strings, so since "foo" is required to be a number, the "maxLength" is a no-op. If I ever get around to implementing this sort of thing in general, it would also recognize and drop no-op keywords. And (up to some point) recognize impossible schemas and replace them with false.

Anyway, no perfect solution here ("oneOf", which is a combination of "allOf", "anyOf", and "not", is particularly challenging to reduce algebraically) there are some things that could be done.

Ultimately, I'd prefer that a code generation implementation put its own limitations on JSON Schema. The boolean keywords are very useful in many applications, so I'd be reluctant to weaken them just to support code generation.

Another option would be to consider an extension vocabulary to help with code generation, similar to the proposed JSON UI Schema ( #67 ) or documentation-oriented vocabulary ( #136 )

@chriskapp
Copy link
Author

I like the idea of merging the allOf subschemas into one schema. One problem I see are the cases where allOf contains mixed subschema types i.e.:

{
    "type": "object",
    "allOf": [
        {
            "type": "string"
        },
        {
            "properties": {
                "bar": {"type": "array", "items": {"type": "boolean"}},
                "biz": {"pattern": "[a-z]*\\.[a-z]*"}
            }
        }
    ]
}

Since these kind of schemas are anyway logical pointless maybe we should restrict those in the spec, so that all subschema inside allOf must be of the same type. And this type should be defined in the parent schema i.e.:

{
    "type": "object",
    "allOf": [
        {
            "properties": {
                "foo": {"type": "string"}
            }
        },
        {
            "properties": {
                "bar": {"type": "string"}
            }
        }
    ]
}

or

{
    "type": "string",
    "allOf": [
        {
            "minLength": 8
        },
        {
            "maxLength": 16
        }
    ]
}

I think this would clarify the usage and limit it to only meaningful combinations. Also it would make it easier for parsers to handle those types.

@handrews
Copy link
Contributor

Since these kind of schemas are anyway logical pointless maybe we should restrict those in the spec

It turns out that allowing seemingly logically pointless schemas is a very important feature. The example I gave shows one reason- the way the "additionalProperties" combinations work mean that if "additionalProperties" has a type then everything it overlaps with must be of that type. But if you don't give "additionalProperties" a type, but only something like "maxLength": 50 then it's harmless to combine that with a property of a different type (like the "foo" property in the example).

This is a lot easier to work with than meticulously trying to get everything to line up in all cases. The resulting schemas aren't really nonsensical, they just have some superfluous keywords. It is occasionally even useful to have schema combinations produce impossible schemas (although it's generally better to make that intent clear by setting some subschema to false).

@handrews
Copy link
Contributor

Also note that the schema algebra approach is how #119 would work. Which makes it very complex to implement which is why I'm not much of a fan despite having come up with the idea. But it shows some relatively complex examples of possible transformations.

@chriskapp
Copy link
Author

So the idea was that we only need to set a type when using allOf to indicate the "result" schema type of the allOf combinations since there can be only one type if there are multiple types allOf would always fail. Ok there is also the case if they are no types at all this would be then not possible. But it would
restrict those "always false" schemas like i.e.:

{
    "allOf": [
        { "type": "object" },
        { "type": "string" }
    ]
}

but something like this would be possible since it results in a single type:

{
    "allOf": [
        { "type": "object" },
        { "maxLength": 50 }
    ]
}

But this is of course only needed for my case where I want to create a object structure based on the schema and need to know the type. For validation this is not an issue. So I could also simply require this keyword in the implementation. This would be then a limitation of my implementation but since the use case is not only to validate it is probably ok. But I think this would be a good addition in general.

@handrews
Copy link
Contributor

This would be then a limitation of my implementation but since the use case is not only to validate it is probably ok. But I think this would be a good addition in general.

I'm sorry, I seem to be missing something. What, exactly, would be a good addition in general?

@chriskapp
Copy link
Author

To state in the spec that all subschemas in "allOf" must be of the same type. If you think reverse you could also say that the subschemas inside "allOf" must not have different "type" keywords.

@handrews
Copy link
Contributor

To state in the spec that all subschemas in "allOf" must be of the same type. If you think reverse you could also say that the subschemas inside "allOf" must not have different "type" keywords.

While I understand the appeal of this, one of the key design principles of JSON Schema validation is that it is context-free: each schema can be evaluated independently from any parent, child, or sibling schemas.

For implementors, it is much better to keep that property rather than start requiring connections to be enforced. You can also end up with complex nested allOf/anyOf/oneOf/not where it can be very challenging to figure out how to apply such a constraint. The simplest version of this is something like:

{
    "allOf": [
        {"oneOf": [{"type": "number"}, {"type": "string"}]},
        {"oneOf": [{"type": "number"}, {"type": "string"}]}
    ]
}

That's a bit contrived and it's not hard to see that there are two situations in which the criteria you propose are satisfied, but fully determining this in all situations is really non-trivial.

The place to implement this is a linter (I have vague plans to implement one although don't hold me to it). There are a lot of things in JSON Schema that are probably mistakes if done, but are not worth the added difficulty of enforcing them in the spec. A linter (which can be as thorough or superficial as needed, and can be turned off for weird situations where something strange is actually beneficial) is a flexible way to prevent buggy schemas without burdening the primary specifications.

@awwright
Copy link
Member

awwright commented Dec 13, 2016

@k42b3 That's interesting, I would think the problem would be opposite ("allOf" should be the easiest). Keywords, and keywords inside an "allOf" keyword, always reduce the range of valid values. So "allOf" is just a way to use a keyword multiple times.

Consider the schema:

{ allOf: [ 
   { type: [ "string", "number" ] },
   { type: [ "string", "null" ] }
 ] }

If this were valid, it would mean the same thing:

{
   type: ["string", "number"],
   type: ["string", "null"],
}

Which is itself just the same as saying {type: "string"}.

I would think anyOf or oneOf would be harder to represent in code, because you have to add logic to say "if that one is valid, then this can't be valid..."

Can you link to your library/implementation for me to take a look at?

@chriskapp
Copy link
Author

Hi, of course you can take a look at the library here:
https://github.com/apioo/psx-schema#usage

To explain this a bit more why "allOf" is in my case difficult to handle. So I want to generate a static class from a JsonSchema i.e. if we have this schema:

{
	"title": "Example Schema",
	"type": "object",
	"properties": {
		"firstName": {
			"type": "string"
		},
		"lastName": {
			"type": "string"
		},
		"age": {
			"description": "Age in years",
			"type": "integer",
			"minimum": 0
		}
	},
	"required": ["firstName", "lastName"]
}

I want to generate a class which looks similar like this:

class Example_Schema
{
	protected $firstName;
	protected $lastName;
	protected $age;
	
	// getter/setter
}

The library adds also some annotations to be able to recreate the jsonschema back from the class. Then the library should be able to create objects from those classes and fill them with provided json data. And there is the problem with "allOf". If I parse every subschema independently I have potentially multiple object instances for a single property which must be somehow merged into one value. Because of this I like the idea of merging all subschemas since then I would have only one schema which results also in one value. The "allOf" handling is at the class:
https://github.com/apioo/psx-schema/blob/master/src/SchemaTraverser.php#L355

I have not found the perfect solution yet but maybe you have some other ideas how to handle this case.

@Relequestual
Copy link
Member

Relequestual commented Jan 5, 2017

I think your problem is you want to use JSON Schema for something which it is not designed, and want the features that make it difficult for your purposes, changed.

If "allOf" is difficult or in some cases impossible to use with your library, don't support it.

I'm closing this issue because JSON Schema shouldn't be primerily concenred with code generation.


What I mean is. Due to the limited developer resources available on this project, we aren't able to dedicated the required time to solve this issue right now. Maybe we can look at it again in the future.

@chriskapp
Copy link
Author

Hi, no problem. Iam currently happy with the implementation so no need to rush.

But I think in general code generation from json schema is an interesting topic. In the future I really would love to have tooling to automatically generate a complete client from an OAI spec. Similiar to SOAP where i.e. the apache cxf project can generate a complete java client through wsdl2java. This works only since it is possible to generate a java class from a XSD. Iam pretty sure there are many other devs who would really like to have such tooling but this will only be possible if we can generate proper data structures from the schema. So if there is more time I would really like to improve this topic.

@handrews
Copy link
Contributor

handrews commented Jan 6, 2017

@k42b3 one thing you might consider is trying out some custom keywords to help you disambiguate problems like these. If you find some that work, you can propose them as an extension vocabulary. Code generation isn't necessary for validation to work, but it is definitely an interesting use case for JSON Schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants