-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json ADR #670
Json ADR #670
Changes from 3 commits
944de9f
e0393f8
9cbb990
c2b5d11
4a3f699
db7ed14
8673ede
21525b4
3500ed4
10bf178
1840df4
785f9c5
bd6ce9c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
# ADR-005: JSON Serialization Format | ||
|
||
| author | revision | | ||
| ------------ | --------:| | ||
| Jure Kukovec | 1 | | ||
|
||
This ADR documents our decision on serializing the Apalache internal representation (IR) as JSON. | ||
The purpose of introducing such a serialization is to expose the internal representation in a standardized format, which can be used for persistent storage, or for analysis by third-party tools in the future. | ||
|
||
## 1. Serializable classes | ||
|
||
The following classes are serializable: | ||
1. TLA+ expressions `TlaEx` and subclasses thereof: | ||
1. Named expressions `NameEx` | ||
1. Literal values `ValEx` for the following literals: | ||
1. Integers `TlaInt` | ||
1. Strings `TlaStr` | ||
1. Booleans `TlaBool` | ||
1. Decimals `TlaDecimal` | ||
1. Operator expressions `OperEx` | ||
1. LET-IN expressions `LetInEx` | ||
1. TLA+ declarations `TlaDecl` and subclasses thereof: | ||
1. Variable declarations `TlaVarDecl` | ||
1. Constant declarations `TlaConstDecl` | ||
1. Operator declarations `TlaOperDecl` | ||
1. Assumption declarations `TlaAssumeDecl` | ||
1. Theorem declarations `TlaTheoremDecl` | ||
1. TLA+ modules `TlaModule` | ||
|
||
## 2. Disambiguation field | ||
|
||
Every serialization will contain a disambiguation field, named `kind`. This field holds the name of the class being serialized. For example, the serialization of a `NameEx` will have the shape | ||
``` | ||
{ | ||
kind: "NameEx" | ||
... | ||
} | ||
``` | ||
|
||
## 3. Serializing typed entities | ||
|
||
Serializations of typed entities will have an additional field named `type`, containing the type of the expression (see [ADR-002][], [ADR-004][] for a description of our type system and the syntax for types-as-string-annotations respectively). For example, the integer literal `1` is represented by a `ValEx`, which has type `Int` and is serialized as follows: | ||
``` | ||
{ | ||
kind: "NameEx" | ||
type: "Int" | ||
... | ||
} | ||
``` | ||
|
||
## 4. Serialization rules | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add examples for all kinds of declarations? This should not be hard. This will help the reader to see the format without running the tool. Perhaps, you would want to use the simplified format for the parameters, which I mentioned in a comment to this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That kind of documentation will surely be useful for readers, but I don't think manually generating it for the ADR is a very good use of time. Once it's implemented, we can just automatically generate it for the docs (and even ensure that the docs stay up to date). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would be even better |
||
|
||
The goal of the serialization is for the JSON structure to mimic the internal representation as closely as possible, for ease of deserialization. | ||
Concretely, whenever a class declares a field `fld: T`, its serialization also contains a field named `fld`, containing the serialization of the field value. | ||
For example, if `TlaConstDecl` declares a `name: String` field, its JSON serialization will have a `name` field as well, containing the name string. | ||
|
||
If a class field has the type `Traversable[T]`, for some `T`, the corresponding JSON entry is a list containing serializations of the individual arguments. For example, `OperEx` is variadic and declares `args: TlaEx*`, so its serialization has an `args` field containing a (possibly empty) list. | ||
|
||
As a running example, take the expression `1 + 1`, represented with the correct type annotations as | ||
``` | ||
OperEx( | ||
oper = TlaArithOper.plus, | ||
args = Seq( | ||
ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) ), | ||
ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) ) | ||
) | ||
)( typeTag = Typed( IntT1() ) ) | ||
``` | ||
|
||
Since both sub-expressions, the literals `1`, are identical, their serializations are equal as well: | ||
``` | ||
{ | ||
kind: "ValEx", | ||
type: "Int", | ||
value: { | ||
kind: "TlaInt", | ||
value: 1 | ||
} | ||
} | ||
``` | ||
|
||
Note that in the above example, the outer `value` field contains a JSON serialization of `TlaInt(1)`, and not just the JSON literal `1`. This is because of deserialization: | ||
Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold. If we chose to encode values directly as JSON numbers, we would need a) a sensible encoding of `BigInt` values, which are not valid integers and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed). | ||
Alternatively, we could encode all values as strings, which would be similarly indistinguishable when deserializing. Importantly, the `type` field is not guaranteed to contain a hint, as it could be `Untyped`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, you are mentioning |
||
For the above reasons, we choose to instead serialize `TlaValue` as a JSON object, which is more verbose, but trivial to deserialize. It has the following shape | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a bit more structure to this text? For instance, describe the general rules in Section "1. General rules" and the rules for Int in Section "2. Rules for Int". Then it will be clear that Int is a special case, and the rest follows the general principle. |
||
``` | ||
{ | ||
kind: _ | ||
value: _ | ||
} | ||
``` | ||
|
||
The `value` field depends on the kind of `TlaValue`: | ||
1. For `TlaStr`: a JSON string | ||
1. For `TlaBool`: a JSON Boolean | ||
1. For `TlaInt(bigIntValue)`: | ||
1. If `bigIntValue.isValidInt`: a JSON number | ||
2. Otherwise: `{ bigInt: bigIntValue.toString() }` | ||
1. For `TlaDecimal(decValue)`: a JSON string `decValue.toString` | ||
|
||
Take `jsonOf1` to be the serialization of `ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) )` shown above. The serialization of `1 + 1` is then equal to | ||
``` | ||
{ | ||
kind: "OperEx", | ||
type: "Int", | ||
oper: "(+)", | ||
args: [jsonOf1, jsonOf1] | ||
} | ||
``` | ||
In general, for any given `oper: TlaOper` of `OperEx`, the value of the `oper` field in the serialization equals `oper.name`. | ||
|
||
## 5. Implementation | ||
|
||
The implementation of the serialization can be found in the class | ||
`at.forsyte.apalache.io.json.TlaToJson` of the module `tla-import`, see [TlaToJson][]. | ||
|
||
[ADR-002]: https://apalache.informal.systems/docs/adr/002adr-types.html | ||
|
||
[ADR-004]: https://apalache.informal.systems/docs/adr/004adr-annotations.html | ||
|
||
[TlaToJson]: https://github.com/informalsystems/apalache/blob/unstable/tla-import/src/main/scala/at/forsyte/apalache/io/json/TlaToJson.scala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the untyped objects? To make the format regular, I propose to always add the field "type". In case of the untyped encoding, it will be
type: "none"
. It looks stupid, but easier to parse.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about typed vs untyped, everything but
TlaModule
is typed, in the sense that it has some type tag (possiblyUntyped
). The language should probably be clearer and say "tagged", not "typed".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We should be also clear about how we represent untyped expressions