-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json ADR #670
Json ADR #670
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
You might want to also add an for this in the mdBook's |
Is it possible to tag each "entry" with its source file/line? In modelator, we currently do some ad-hoc parsing to fetch the names of the TLA+ assertions representing tests: https://github.com/informalsystems/modelator/blob/99f656fa8b3cf46a2aa0b6513e4e140d1778c4bd/modelator/src/module/tla/mod.rs#L70-L98 |
Can we add a version number to the format. So the user code will be able to see that they have to deal with an incompatible version? |
In principle yes, that shouldn't be too difficult to add, but you'd have to keep in mind that the "location" field would have to be optional, because these JSON representations are intended to bidirectionally connect Apalache with 3rd-party applications, which should be able to produce JSON files, without necessarily starting from a TLA spec. That said, output generated by Apalache could certainly include source information.
Yes. |
This issue is related: #656. The change should be relatively small. I think we should really go for simple formal parameters. This will make both json and our code easier to read. |
This one is also related: #634. I wonder, whether we should include the list of operator names in this ADR? Otherwise, the readers will have to figure out the operator names themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Several comments to improve the text. Since this ADR will be most likely used by all external tooling, let's add clarifications.
docs/src/adr/005adr-json.md
Outdated
|
||
## 3. Serializing typed entities | ||
|
||
Serializations of typed entities will have an additional field named `type`, containing the type of the expression (see [ADR-002][], [ADR-004][] for a description of our type system and the syntax for types-as-string-annotations respectively). For example, the integer literal `1` is represented by a `ValEx`, which has type `Int` and is serialized as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the untyped objects? To make the format regular, I propose to always add the field "type". In case of the untyped encoding, it will be type: "none"
. It looks stupid, but easier to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about typed vs untyped, everything but TlaModule
is typed, in the sense that it has some type tag (possibly Untyped
). The language should probably be clearer and say "tagged", not "typed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We should be also clear about how we represent untyped expressions
docs/src/adr/005adr-json.md
Outdated
|
||
Note that in the above example, the outer `value` field contains a JSON serialization of `TlaInt(1)`, and not just the JSON literal `1`. This is because of deserialization: | ||
Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold. If we chose to encode values directly as JSON numbers, we would need a) a sensible encoding of `BigInt` values, which are not valid integers and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed). | ||
Alternatively, we could encode all values as strings, which would be similarly indistinguishable when deserializing. Importantly, the `type` field is not guaranteed to contain a hint, as it could be `Untyped`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, you are mentioning Untyped
. So we have decide the fate of the field type
in the untyped case.
docs/src/adr/005adr-json.md
Outdated
Note that in the above example, the outer `value` field contains a JSON serialization of `TlaInt(1)`, and not just the JSON literal `1`. This is because of deserialization: | ||
Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold. If we chose to encode values directly as JSON numbers, we would need a) a sensible encoding of `BigInt` values, which are not valid integers and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed). | ||
Alternatively, we could encode all values as strings, which would be similarly indistinguishable when deserializing. Importantly, the `type` field is not guaranteed to contain a hint, as it could be `Untyped`. | ||
For the above reasons, we choose to instead serialize `TlaValue` as a JSON object, which is more verbose, but trivial to deserialize. It has the following shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a bit more structure to this text? For instance, describe the general rules in Section "1. General rules" and the rules for Int in Section "2. Rules for Int". Then it will be clear that Int is a special case, and the rest follows the general principle.
docs/src/adr/005adr-json.md
Outdated
} | ||
``` | ||
|
||
## 4. Serialization rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add examples for all kinds of declarations? This should not be hard. This will help the reader to see the format without running the tool. Perhaps, you would want to use the simplified format for the parameters, which I mentioned in a comment to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That kind of documentation will surely be useful for readers, but I don't think manually generating it for the ADR is a very good use of time.
Once it's implemented, we can just automatically generate it for the docs (and even ensure that the docs stay up to date).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be even better
Yeah, an optional location field makes sense. We have to decide, whether we want it structured or just text, that is, either I thought of adding IDs to the expressions and collecting the source locations in a separate file. This may be a better solution in the long run, if we want to import/export additional data (e.g., annotations). Or perhaps, we could add a separate section in the json file that contains additional data like source info, annotations, etc. |
Both would work for modelator but indeed I prefer the latter as it is explicit. |
This also seems to me like something we should do automatically once the serialization in place and put in the documentation, rather than putting it in the ADR. |
My motivation here is that the names we have right now are quite ad-hoc. If people start integrating with our code via json, it will be much hard to change the names. Of course, I can just go over the list and pick the names I like :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it a lot, but have a few concerns with the new version. By the way, are we consistent with https://tools.ietf.org/html/rfc7159 ?
} | ||
``` | ||
if no source information is available (e.g. for expressions generated purely by Apalache). | ||
Serializations generated by Apalache are always guaranteed to contain a `source` field entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I would be a bit more careful here. We should prepare the user that they may face "UNKNOWN" in rare cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I very explicitly write "or [UNKNOWN] if no source information is available." The next line asserts that a source
field exists, not that the contents of said field are not "UNKNOWN".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about this sentence:
Serializations generated by Apalache are always guaranteed to contain a
source
field entry.
Sometimes, we are missing source information. So "always guaranteed" is way to strong.
docs/src/adr/005adr-json.md
Outdated
``` | ||
|
||
## 5. Version information | ||
JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`. | |
JSON serializations of `TlaModule`s contain the field `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`. |
docs/src/adr/005adr-json.md
Outdated
"kind": _ , | ||
"value": _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the underscores? Is this standard JSON semantics? Shall we use the syntax for comments like /* the kind */
and /* the value /*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underscores are, at least to me, very obvious meta-language placeholders. No less formal than /* */
, since JSON doesn't support comments anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I surely understand what you mean by _
, but we are writing an ADR. So we should not assume too much about the reader. I have not seen the underscore notation outside of OCaml, Scala, and maybe Perl.
|
||
While it might seem more natural to encode the entire `TlaValue` as a JSON primitive, without the added object layer, we would have a much tougher time deserializing. | ||
We would need a) a sensible encoding of `BigInt` values, which are not valid integers, and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed). | ||
We could encode all values as strings, but they would be similarly indistinguishable when deserializing. Importantly, the `type` field of the `ValEx` expression is not guaranteed to contain a hint, as it could be `Untyped` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could cite RFC7159 to justify why we cannot just use unbounded numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's about the capabilities of JSON numbers, it has more to do with the unnecessary complexity of interpreting those JSON values as the correct Scala type, which we can avoid pretty much entirely by just having a kind
field hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for not using JSON number is that they have a bounded range, something like [-(2**53)+1, (2**53)-1]
, which I am sure a few people know about.
docs/src/adr/005adr-json.md
Outdated
args: [jsonOf1, jsonOf1] | ||
"kind": "OperEx", | ||
"type": "Int", | ||
"oper": "(+)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"oper": "(+)", | |
"oper": "+", |
(+)
is a different operator in TLA+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but that is the value of TlaArithOper.plus.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is an ADR and we have #726, let's change it to "PLUS"
right away.
extends JsonEncoder[T] { | ||
val kindFieldName: String = "kind" | ||
val typeFieldName: String = "type" | ||
import TlaToJson._ | ||
|
||
implicit def liftString: String => T = factory.fromStr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason for using implicit conversions here? AFAIK, this may stop working in Scala 3. In general, it's hard to refactor code that is using implicits, as we have seen recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, being able to write s -> "abc"
instead of s -> factory.fromStr("abc")
in like 20 spots. This is far from the kind of implicits used in (un)typed predefs, since it's declared in the same spot as it is used, so it's more of a convenience shorthand.
{ | ||
"kind": "TlaModule", | ||
"name": "MyModule", | ||
"version": "1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should introduce a different root object for the json format. We may want to add meta information to it later. So I would suggest adding another object on top of TlaModule
, e.g.:
{
"name": "ApalacheIR",
"version": "1.0",
"modules": [{
"kind": "TlaModule",
"name": "MyModule",
...
}]
}
Co-authored-by: Igor Konnov <igor@informal.systems>
In what way do you think we are not? |
That was a suggestion to quickly check RFC 7159 :) |
I think the CI failure here is due to a bintray outage: https://status.bintray.com/ I'll look into reducing the dependency chain in the meanwhile, but I'll make the mac-os integration tests optional for the time being. |
CI Failure should be fixed, just need to rebase/merge on top of the trunk. |
Is there much to do, in order to get it merged? We might get some integration blocked on this JSON export/import. |
Closes #610