Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json ADR #670

Merged
merged 13 commits into from
Apr 19, 2021
119 changes: 96 additions & 23 deletions docs/src/adr/005adr-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,23 +32,93 @@ The following classes are serializable:
Every serialization will contain a disambiguation field, named `kind`. This field holds the name of the class being serialized. For example, the serialization of a `NameEx` will have the shape
```
{
kind: "NameEx"
"kind": "NameEx"
...
}
```

## 3. Serializing typed entities
## 3. Serializing tagged entities

Serializations of typed entities will have an additional field named `type`, containing the type of the expression (see [ADR-002][], [ADR-004][] for a description of our type system and the syntax for types-as-string-annotations respectively). For example, the integer literal `1` is represented by a `ValEx`, which has type `Int` and is serialized as follows:
Serializations of entities annotated with a `TypeTag` will have an additional field named `type`, containing the type of the expression (see [ADR-002][], [ADR-004][] for a description of our type system and the syntax for types-as-string-annotations respectively), if the tag is `Typed`, or `Untyped` otherwise. For example, the integer literal `1` is represented by a `ValEx`, which has type `Int` and is serialized as follows:
```
{
kind: "NameEx"
type: "Int"
"kind": "ValEx",
"type": "Int"
...
}
```
in the typed encoding, or
```
{
"kind": "ValEx",
"type": "Untyped"
...
}
```
in the untyped encoding.

## 4. Serialization rules
## 4. Source information
Entities in the internal representation are usually annotated with source information, of the form `{filename}:{startLine}:{startColumn}-{endLine}:{endColumn}`, relating them to a file range in the provided specification (from which they may have been transformed as part of preprocessing).
JSON encodings may, but are not required to, contain a `source` providing this information, of the following shape:
```
{
"source": {
"filename" : <FILENAME>,
"from" : {
"line" : <STARTLINE>,
"column" : <STARTCOLUMN>
},
"to" : {
"line" : <ENDLINE>,
"column" : <ENDCOLUMN>
}
}
}
```
or
```
{
"source": "UNKNOWN"
}
```
if no source information is available (e.g. for expressions generated purely by Apalache).
Serializations generated by Apalache are always guaranteed to contain a `source` field entry.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would be a bit more careful here. We should prepare the user that they may face "UNKNOWN" in rare cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I very explicitly write "or [UNKNOWN] if no source information is available." The next line asserts that a source field exists, not that the contents of said field are not "UNKNOWN".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about this sentence:

Serializations generated by Apalache are always guaranteed to contain a source field entry.

Sometimes, we are missing source information. So "always guaranteed" is way to strong.


Example:
```
{
"kind" : "NameEx",
"type" : "Int",
"name" : "myName",
"source": {
"filename" : "MyModule.tla",
"from" : {
"line" : 3,
"column" : 5
},
"to" : {
"line" : 3,
"column" : 10
}
}
}
```

## 5. Version information
JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.
JSON serializations of `TlaModule`s contain the field `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.

This document defines JSON Version 1.0. If and when a different JSON version is defined, this document will be updated accordingly.
Apalache may refuse to import, or trigger warnings for, JSON objects with obsolete versions of the encoding in the future.
Example:
```
{
"kind": "TlaModule",
"name": "MyModule",
"version": "1.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should introduce a different root object for the json format. We may want to add meta information to it later. So I would suggest adding another object on top of TlaModule, e.g.:

{
  "name": "ApalacheIR",
  "version": "1.0",
  "modules": [{
    "kind": "TlaModule",
    "name": "MyModule",
    ...
  }]
}

...
}
```

## 6. General serialization rules

The goal of the serialization is for the JSON structure to mimic the internal representation as closely as possible, for ease of deserialization.
Concretely, whenever a class declares a field `fld: T`, its serialization also contains a field named `fld`, containing the serialization of the field value.
Expand All @@ -70,23 +140,20 @@ OperEx(
Since both sub-expressions, the literals `1`, are identical, their serializations are equal as well:
```
{
kind: "ValEx",
type: "Int",
value: {
kind: "TlaInt",
value: 1
"kind": "ValEx",
"type": "Int",
"value": {
"kind": "TlaInt",
"value": 1
}
}
```

Note that in the above example, the outer `value` field contains a JSON serialization of `TlaInt(1)`, and not just the JSON literal `1`. This is because of deserialization:
Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold. If we chose to encode values directly as JSON numbers, we would need a) a sensible encoding of `BigInt` values, which are not valid integers and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed).
Alternatively, we could encode all values as strings, which would be similarly indistinguishable when deserializing. Importantly, the `type` field is not guaranteed to contain a hint, as it could be `Untyped`.
For the above reasons, we choose to instead serialize `TlaValue` as a JSON object, which is more verbose, but trivial to deserialize. It has the following shape
Observe that we choose to serialize `TlaValue` as a JSON object, which is more verbose, but trivial to deserialize. It has the following shape
```
{
kind: _
value: _
"kind": _ ,
"value": _
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the underscores? Is this standard JSON semantics? Shall we use the syntax for comments like /* the kind */ and /* the value /*.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underscores are, at least to me, very obvious meta-language placeholders. No less formal than /* */, since JSON doesn't support comments anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I surely understand what you mean by _, but we are writing an ADR. So we should not assume too much about the reader. I have not seen the underscore notation outside of OCaml, Scala, and maybe Perl.

}
```

Expand All @@ -95,21 +162,27 @@ The `value` field depends on the kind of `TlaValue`:
1. For `TlaBool`: a JSON Boolean
1. For `TlaInt(bigIntValue)`:
1. If `bigIntValue.isValidInt`: a JSON number
2. Otherwise: `{ bigInt: bigIntValue.toString() }`
2. Otherwise: `{ "bigInt: bigIntValue.toString() }`
Kukovec marked this conversation as resolved.
Show resolved Hide resolved
1. For `TlaDecimal(decValue)`: a JSON string `decValue.toString`

The reason for the non-uniform treatment of integers, is that Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold.

While it might seem more natural to encode the entire `TlaValue` as a JSON primitive, without the added object layer, we would have a much tougher time deserializing.
We would need a) a sensible encoding of `BigInt` values, which are not valid integers, and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed).
We could encode all values as strings, but they would be similarly indistinguishable when deserializing. Importantly, the `type` field of the `ValEx` expression is not guaranteed to contain a hint, as it could be `Untyped`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could cite RFC7159 to justify why we cannot just use unbounded numbers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's about the capabilities of JSON numbers, it has more to do with the unnecessary complexity of interpreting those JSON values as the correct Scala type, which we can avoid pretty much entirely by just having a kind field hint.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not using JSON number is that they have a bounded range, something like [-(2**53)+1, (2**53)-1], which I am sure a few people know about.


Take `jsonOf1` to be the serialization of `ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) )` shown above. The serialization of `1 + 1` is then equal to
```
{
kind: "OperEx",
type: "Int",
oper: "(+)",
args: [jsonOf1, jsonOf1]
"kind": "OperEx",
"type": "Int",
"oper": "(+)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"oper": "(+)",
"oper": "+",

(+) is a different operator in TLA+.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but that is the value of TlaArithOper.plus.name

Copy link
Collaborator

@konnov konnov Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is an ADR and we have #726, let's change it to "PLUS" right away.

"args": [jsonOf1, jsonOf1]
}
```
In general, for any given `oper: TlaOper` of `OperEx`, the value of the `oper` field in the serialization equals `oper.name`.

## 5. Implementation
## 7. Implementation

The implementation of the serialization can be found in the class
`at.forsyte.apalache.io.json.TlaToJson` of the module `tla-import`, see [TlaToJson][].
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ package at.forsyte.apalache.io.json
*/
trait JsonRepresentation {
def toString: String
def getFieldOpt(fieldName: String): Option[this.type]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package at.forsyte.apalache.io.json

/**
* JSON serialization versions need not be tied to Apalache versions
* (Apalache versions, obtained from at.forsyte.apalache.tla.tooling.Version
* cannot be read here anyway, due to tooling -> import dependencies)
*/
object JsonVersion {
val major = 1
val minor = 0
def current: String = s"$major.$minor"
}
Loading