Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended local ref support #72

Merged
merged 8 commits into from
Apr 12, 2024
Merged

Extended local ref support #72

merged 8 commits into from
Apr 12, 2024

Conversation

yannham
Copy link
Contributor

@yannham yannham commented Apr 9, 2024

This pull request implements support local references to not only top-level definitions, but also properties, that is a JSON pointer of the form #/properties/foo/properties/bar/.../properties/qux.

Motivation

Currently, json-schema-to-nickel is only able to handle local references to top-level definitions, that is references of the form "#/definitions/xxx. Beside simply supporting more schemas, the motivation here is that [json-schema-ref-parser](https://github.com/APIDevTools/json-schema-ref-parser) can help users circumvent the fact that json-schema-to-nickel currently doesn't support external refs by bundling the external refs directly into the schemas. However, this tool most often produces references of the second form, that is #/properties/foo/properties/bar`, so it can't be used yet as a pre-processing step to make json-schema-to-nickel able to handle external references.

Design

Definitions aren't that special, so we extend the idea of definitions which is to introduce special variables (the "environment") handling the conversion of properties as contracts and predicates to be used from anywhere within the schema.

However, there are a few differences.

Code bloat

Definitions are usually, by nature, supposed to be used in the schema. Many schema don't have definitions at all, and those who do most of the time actually use them. Also, definitions should make for a small part of the final schema.

This is different for properties, which are the substance of most JSON schemas. If we blindly make them all available as contracts and predicates in the environment, we will thus potentially duplicate each property 3 times, whether it's actually useful or not: once as a contract in the final schema, and twice in the environment, as a predicate and a contract. It also means we have to hold those two versions (contracts & predicates) somewhere in memory throughout the conversion.

We assume that the happy path is that schemas don't have any local references (or few), and want to optimize for that case. Thus, it would be better to only introduce in the environment the properties that are actually referenced somewhere.

Contracts are already there

Definitions aren't part of the final schema per se, but are only in the environment. However, properties are always part of the final schema, at least their contract version. Thus, we don't need to duplicate the contract version in the environment: we can just use a simple and direct recursive access.

In consequence, we only include the predicate version of reference properties in the environment, to reduce further the code bloat of the generated contract.

Solution

The main idea is to pass additional state around during the conversion of the root schema. This state (RefsUsage) maintains 3 sets:

  1. The top-level definitions used as a predicate
  2. The top-level definitions used as a contract
  3. The properties used as a predicate

We can't rely on the From/TryFrom trait and introduce similar bespoke traits for the conversion from schemas to Predicate or Contract.

Then, at the end of the conversion, we build the environment from the usage registered during the conversion. Building correctly the definition part is iterative: indeed, when constructing the definitions, we convert new schemas that were unseen until now (the definitions), which can themselves reference new definitions. We thus iterate until we don't see any new definitions anymore.

Property predicates are stored in the environment as a flat record: a path #/properties/foo/properties/bar will be included as an entry "foo/bar" = value in <env>.<prop_preds>.

The final environment is put in special variable, with mangled names (to avoid unwanted clashes with properties of the schema).

Follow-up

  • This PR only supports paths of the form properties/x/properties/y, but it shouldn't be too hard to support more general JSON pointers, for example I've seen #/allOf/0 be generated by json-schema-ref-parser as well. This is left for future work given the size of the current PR
  • While all the tests pass, the new feature isn't tested per se (schema with local references to properties). For the same reason (PR size), this is left for a follow-up PR.

Previously, json-schema-to-nickel was only able to handle local
references to top-level definitions, that is references of the form
"#/definitions/xxx`. This commit is a first step toward handling
references to local properties as well, of the form
`#/properties/foo/properties/bar/properties/baz`.

Doing so, we also rework the way top-level definitions are handled as
well, for better uniformity and to reduce the size of generated
contracts (we use to include all definitions by default, while this
change pushes toward recording which ones are actually referenced, and
keep only those).
@yannham yannham force-pushed the feat/extended-local-ref-support branch from a495dc9 to 1973582 Compare April 10, 2024 08:41
yannham added 2 commits April 10, 2024 10:47
Carry the reference state (recording reference usages) around during
contract and predicate conversion from JSON schema components. This is
achieved by introduced specialized traits similar to `From` and
`TryFrom` but taking an additional `&mut RefsUsage` argument.
Simply conversion traits by removing a type parameter (morally going
from a `From`-like trait to a `Into`-like trait where the target type is
fixed).
@yannham yannham force-pushed the feat/extended-local-ref-support branch from 1973582 to c5e2f79 Compare April 10, 2024 09:18
Finally put the pieces together to handle local references to
properties. This commit update the generation of the environment (the
special variables which hold the definitions and the properties
referenced throughout the JSON schema). Do iterative resolution for
definitions, whose conversions might add new ref usages, and hence new
definitions to convert.

Fix tests, don't start mangled names with `___` (for now, as Nickel
didn't parse them correctly - the bug has been fixed upstream, but
hasn't landed yet in a stable version), and put the predicate lib under
a mangled name as well.
@yannham yannham force-pushed the feat/extended-local-ref-support branch from 5636e90 to 53e490f Compare April 11, 2024 09:57
@yannham yannham marked this pull request as ready for review April 11, 2024 09:57
yannham added 2 commits April 11, 2024 12:18
Fix typo in comments, remove dead comments, etc.
@yannham yannham requested review from vkleen and Radvendii April 11, 2024 12:48
@yannham yannham mentioned this pull request Apr 11, 2024
src/contracts.rs Outdated
InstanceType::Integer => Contract::from(static_access("std", ["number", "Integer"])),
}
}
pub trait AsCtrThruPred {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that AsContractThroughPredicate feels very Java, but AsCtrThruPred doesn't exactly spark joy either. I think with @Radvendii we got used to calling contracts that resort to going through the predicate "predicate contracts". Maybe AsPredicateContract would be a reasonable name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a fine balance and it's hard to find. I think Rust in general favor shorter names with missing letters. I think I (and you, and we in general) favor longer, more explicit names. However I think it got a bit out of hand in e.g. the Nickel typechecker (yes, I'm thinking of you, GenericUnifRecordRowsIteratorItem), and I came to think that having abbreviation over a certain limit is ok as long as the meaning of the abbreviation is absolutely non ambiguous. A bit like comments: comments are good, but they need to have some entropy, and paraphrasing the code is usually just noise.

Anyway, I think the PredicateContract is a good alternative. Another would be EagerContract. We haven't used the latter yet though, so maybe PredicateContract is better, although it might be a tad confusing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end I went with AsContractPredicate.

src/definitions.rs Outdated Show resolved Hide resolved
src/definitions.rs Outdated Show resolved Hide resolved
src/definitions.rs Outdated Show resolved Hide resolved
src/definitions.rs Outdated Show resolved Hide resolved
src/definitions.rs Outdated Show resolved Hide resolved
src/definitions.rs Outdated Show resolved Hide resolved
yannham and others added 2 commits April 12, 2024 10:06
Co-authored-by: Viktor Kleen <viktor.kleen@tweag.io>
Get rid of the MANGLED suffix for several constants, which isn't very
important to understand their purpose. Rename `AsCtrThruPred` to
`AsPredicateContract` for better readability. Fix cargo doc warnings
(mostly invalid references).
@yannham yannham enabled auto-merge April 12, 2024 08:15
@yannham yannham added this pull request to the merge queue Apr 12, 2024
Merged via the queue into main with commit 6043e48 Apr 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants