Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json ADR #670

Merged
merged 13 commits into from
Apr 19, 2021
Merged

Json ADR #670

merged 13 commits into from
Apr 19, 2021

Conversation

Kukovec
Copy link
Collaborator

@Kukovec Kukovec commented Mar 22, 2021

Closes #610

@Kukovec Kukovec requested a review from konnov March 22, 2021 14:30
Copy link
Contributor

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shonfeder
Copy link
Contributor

You might want to also add an for this in the mdBook's SUMMARY.md, as this is probably of general interest.

@vitorenesduarte
Copy link
Contributor

Is it possible to tag each "entry" with its source file/line?

In modelator, we currently do some ad-hoc parsing to fetch the names of the TLA+ assertions representing tests: https://github.com/informalsystems/modelator/blob/99f656fa8b3cf46a2aa0b6513e4e140d1778c4bd/modelator/src/module/tla/mod.rs#L70-L98
Ideally, we should be using Apalache for this, and having the source file on each "entry" would make that process more reliable (i.e. not requiring another hack to check from which file the assertion comes from) .

@konnov
Copy link
Collaborator

konnov commented Mar 23, 2021

Can we add a version number to the format. So the user code will be able to see that they have to deal with an incompatible version?

@Kukovec
Copy link
Collaborator Author

Kukovec commented Mar 23, 2021

Is it possible to tag each "entry" with its source file/line?

In modelator, we currently do some ad-hoc parsing to fetch the names of the TLA+ assertions representing tests: https://github.com/informalsystems/modelator/blob/99f656fa8b3cf46a2aa0b6513e4e140d1778c4bd/modelator/src/module/tla/mod.rs#L70-L98
Ideally, we should be using Apalache for this, and having the source file on each "entry" would make that process more reliable (i.e. not requiring another hack to check from which file the assertion comes from) .

In principle yes, that shouldn't be too difficult to add, but you'd have to keep in mind that the "location" field would have to be optional, because these JSON representations are intended to bidirectionally connect Apalache with 3rd-party applications, which should be able to produce JSON files, without necessarily starting from a TLA spec. That said, output generated by Apalache could certainly include source information.

Can we add a version number to the format. So the user code will be able to see that they have to deal with an incompatible version?

Yes.

@konnov
Copy link
Collaborator

konnov commented Mar 25, 2021

This issue is related: #656. The change should be relatively small. I think we should really go for simple formal parameters. This will make both json and our code easier to read.

@konnov
Copy link
Collaborator

konnov commented Mar 25, 2021

This one is also related: #634. I wonder, whether we should include the list of operator names in this ADR? Otherwise, the readers will have to figure out the operator names themselves.

Copy link
Collaborator

@konnov konnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Several comments to improve the text. Since this ADR will be most likely used by all external tooling, let's add clarifications.


## 3. Serializing typed entities

Serializations of typed entities will have an additional field named `type`, containing the type of the expression (see [ADR-002][], [ADR-004][] for a description of our type system and the syntax for types-as-string-annotations respectively). For example, the integer literal `1` is represented by a `ValEx`, which has type `Int` and is serialized as follows:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the untyped objects? To make the format regular, I propose to always add the field "type". In case of the untyped encoding, it will be type: "none". It looks stupid, but easier to parse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about typed vs untyped, everything but TlaModule is typed, in the sense that it has some type tag (possibly Untyped). The language should probably be clearer and say "tagged", not "typed".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We should be also clear about how we represent untyped expressions


Note that in the above example, the outer `value` field contains a JSON serialization of `TlaInt(1)`, and not just the JSON literal `1`. This is because of deserialization:
Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold. If we chose to encode values directly as JSON numbers, we would need a) a sensible encoding of `BigInt` values, which are not valid integers and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed).
Alternatively, we could encode all values as strings, which would be similarly indistinguishable when deserializing. Importantly, the `type` field is not guaranteed to contain a hint, as it could be `Untyped`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you are mentioning Untyped. So we have decide the fate of the field type in the untyped case.

Note that in the above example, the outer `value` field contains a JSON serialization of `TlaInt(1)`, and not just the JSON literal `1`. This is because of deserialization:
Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold. If we chose to encode values directly as JSON numbers, we would need a) a sensible encoding of `BigInt` values, which are not valid integers and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed).
Alternatively, we could encode all values as strings, which would be similarly indistinguishable when deserializing. Importantly, the `type` field is not guaranteed to contain a hint, as it could be `Untyped`.
For the above reasons, we choose to instead serialize `TlaValue` as a JSON object, which is more verbose, but trivial to deserialize. It has the following shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a bit more structure to this text? For instance, describe the general rules in Section "1. General rules" and the rules for Int in Section "2. Rules for Int". Then it will be clear that Int is a special case, and the rest follows the general principle.

}
```

## 4. Serialization rules
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add examples for all kinds of declarations? This should not be hard. This will help the reader to see the format without running the tool. Perhaps, you would want to use the simplified format for the parameters, which I mentioned in a comment to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That kind of documentation will surely be useful for readers, but I don't think manually generating it for the ADR is a very good use of time.

Once it's implemented, we can just automatically generate it for the docs (and even ensure that the docs stay up to date).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be even better

@konnov
Copy link
Collaborator

konnov commented Mar 25, 2021

In principle yes, that shouldn't be too difficult to add, but you'd have to keep in mind that the "location" field would have to be optional, because these JSON representations are intended to bidirectionally connect Apalache with 3rd-party applications, which should be able to produce JSON files, without necessarily starting from a TLA spec. That said, output generated by Apalache could certainly include source information.

Yeah, an optional location field makes sense. We have to decide, whether we want it structured or just text, that is, either MySpec.tla:10:1-12:10 or { filename: "MySpec.tla", from: { line: 10, col: 1 }, to: { line: 12, col: 10} }. I guess, @vitorenesduarte you would prefer the latter?

I thought of adding IDs to the expressions and collecting the source locations in a separate file. This may be a better solution in the long run, if we want to import/export additional data (e.g., annotations). Or perhaps, we could add a separate section in the json file that contains additional data like source info, annotations, etc.

@vitorenesduarte
Copy link
Contributor

Yeah, an optional location field makes sense. We have to decide, whether we want it structured or just text, that is, either MySpec.tla:10:1-12:10 or { filename: "MySpec.tla", from: { line: 10, col: 1 }, to: { line: 12, col: 10} }. I guess, @vitorenesduarte you would prefer the latter?

Both would work for modelator but indeed I prefer the latter as it is explicit.

@konnov
Copy link
Collaborator

konnov commented Mar 25, 2021

For the table of operators, we don't care about the removed ones: #687, #686, #685, #684

@shonfeder
Copy link
Contributor

shonfeder commented Mar 25, 2021

This one is also related: #634. I wonder, whether we should include the list of operator names in this ADR? Otherwise, the readers will have to figure out the operator names themselves.

This also seems to me like something we should do automatically once the serialization in place and put in the documentation, rather than putting it in the ADR.

@konnov
Copy link
Collaborator

konnov commented Mar 25, 2021

This one is also related: #634. I wonder, whether we should include the list of operator names in this ADR? Otherwise, the readers will have to figure out the operator names themselves.

This also seems to me like something we should do automatically once the serialization in place and put in the documentation, rather than putting it in the ADR.

My motivation here is that the names we have right now are quite ad-hoc. If people start integrating with our code via json, it will be much hard to change the names. Of course, I can just go over the list and pick the names I like :)

Copy link
Collaborator

@konnov konnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it a lot, but have a few concerns with the new version. By the way, are we consistent with https://tools.ietf.org/html/rfc7159 ?

}
```
if no source information is available (e.g. for expressions generated purely by Apalache).
Serializations generated by Apalache are always guaranteed to contain a `source` field entry.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would be a bit more careful here. We should prepare the user that they may face "UNKNOWN" in rare cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I very explicitly write "or [UNKNOWN] if no source information is available." The next line asserts that a source field exists, not that the contents of said field are not "UNKNOWN".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about this sentence:

Serializations generated by Apalache are always guaranteed to contain a source field entry.

Sometimes, we are missing source information. So "always guaranteed" is way to strong.

```

## 5. Version information
JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.
JSON serializations of `TlaModule`s contain the field `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.

Comment on lines 155 to 156
"kind": _ ,
"value": _
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the underscores? Is this standard JSON semantics? Shall we use the syntax for comments like /* the kind */ and /* the value /*.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underscores are, at least to me, very obvious meta-language placeholders. No less formal than /* */, since JSON doesn't support comments anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I surely understand what you mean by _, but we are writing an ADR. So we should not assume too much about the reader. I have not seen the underscore notation outside of OCaml, Scala, and maybe Perl.

docs/src/adr/005adr-json.md Outdated Show resolved Hide resolved

While it might seem more natural to encode the entire `TlaValue` as a JSON primitive, without the added object layer, we would have a much tougher time deserializing.
We would need a) a sensible encoding of `BigInt` values, which are not valid integers, and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed).
We could encode all values as strings, but they would be similarly indistinguishable when deserializing. Importantly, the `type` field of the `ValEx` expression is not guaranteed to contain a hint, as it could be `Untyped`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could cite RFC7159 to justify why we cannot just use unbounded numbers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's about the capabilities of JSON numbers, it has more to do with the unnecessary complexity of interpreting those JSON values as the correct Scala type, which we can avoid pretty much entirely by just having a kind field hint.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not using JSON number is that they have a bounded range, something like [-(2**53)+1, (2**53)-1], which I am sure a few people know about.

args: [jsonOf1, jsonOf1]
"kind": "OperEx",
"type": "Int",
"oper": "(+)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"oper": "(+)",
"oper": "+",

(+) is a different operator in TLA+.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but that is the value of TlaArithOper.plus.name

Copy link
Collaborator

@konnov konnov Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is an ADR and we have #726, let's change it to "PLUS" right away.

extends JsonEncoder[T] {
val kindFieldName: String = "kind"
val typeFieldName: String = "type"
import TlaToJson._

implicit def liftString: String => T = factory.fromStr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason for using implicit conversions here? AFAIK, this may stop working in Scala 3. In general, it's hard to refactor code that is using implicits, as we have seen recently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, being able to write s -> "abc" instead of s -> factory.fromStr("abc") in like 20 spots. This is far from the kind of implicits used in (un)typed predefs, since it's declared in the same spot as it is used, so it's more of a convenience shorthand.

{
"kind": "TlaModule",
"name": "MyModule",
"version": "1.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should introduce a different root object for the json format. We may want to add meta information to it later. So I would suggest adding another object on top of TlaModule, e.g.:

{
  "name": "ApalacheIR",
  "version": "1.0",
  "modules": [{
    "kind": "TlaModule",
    "name": "MyModule",
    ...
  }]
}

Co-authored-by: Igor Konnov <igor@informal.systems>
@Kukovec
Copy link
Collaborator Author

Kukovec commented Apr 11, 2021

I like it a lot, but have a few concerns with the new version. By the way, are we consistent with https://tools.ietf.org/html/rfc7159 ?

In what way do you think we are not?

@konnov
Copy link
Collaborator

konnov commented Apr 12, 2021

I like it a lot, but have a few concerns with the new version. By the way, are we consistent with https://tools.ietf.org/html/rfc7159 ?

In what way do you think we are not?

That was a suggestion to quickly check RFC 7159 :)

@shonfeder
Copy link
Contributor

I think the CI failure here is due to a bintray outage: https://status.bintray.com/

I'll look into reducing the dependency chain in the meanwhile, but I'll make the mac-os integration tests optional for the time being.

@shonfeder
Copy link
Contributor

CI Failure should be fixed, just need to rebase/merge on top of the trunk.

@konnov
Copy link
Collaborator

konnov commented Apr 19, 2021

Is there much to do, in order to get it merged? We might get some integration blocked on this JSON export/import.

@Kukovec Kukovec merged commit 9e12b92 into unstable Apr 19, 2021
@Kukovec Kukovec deleted the jk/jsonADR branch April 19, 2021 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] ADR for new JSON format
4 participants