Json ADR #670

Kukovec · 2021-03-22T14:30:16Z

Closes #610

shonfeder

LGTM

shonfeder · 2021-03-22T17:03:30Z

You might want to also add an for this in the mdBook's SUMMARY.md, as this is probably of general interest.

vitorenesduarte · 2021-03-23T15:33:23Z

Is it possible to tag each "entry" with its source file/line?

In modelator, we currently do some ad-hoc parsing to fetch the names of the TLA+ assertions representing tests: https://github.com/informalsystems/modelator/blob/99f656fa8b3cf46a2aa0b6513e4e140d1778c4bd/modelator/src/module/tla/mod.rs#L70-L98
Ideally, we should be using Apalache for this, and having the source file on each "entry" would make that process more reliable (i.e. not requiring another hack to check from which file the assertion comes from) .

konnov · 2021-03-23T18:43:02Z

Can we add a version number to the format. So the user code will be able to see that they have to deal with an incompatible version?

Kukovec · 2021-03-23T18:50:30Z

Is it possible to tag each "entry" with its source file/line?

In modelator, we currently do some ad-hoc parsing to fetch the names of the TLA+ assertions representing tests: https://github.com/informalsystems/modelator/blob/99f656fa8b3cf46a2aa0b6513e4e140d1778c4bd/modelator/src/module/tla/mod.rs#L70-L98
Ideally, we should be using Apalache for this, and having the source file on each "entry" would make that process more reliable (i.e. not requiring another hack to check from which file the assertion comes from) .

In principle yes, that shouldn't be too difficult to add, but you'd have to keep in mind that the "location" field would have to be optional, because these JSON representations are intended to bidirectionally connect Apalache with 3rd-party applications, which should be able to produce JSON files, without necessarily starting from a TLA spec. That said, output generated by Apalache could certainly include source information.

Can we add a version number to the format. So the user code will be able to see that they have to deal with an incompatible version?

Yes.

konnov · 2021-03-25T07:47:37Z

This issue is related: #656. The change should be relatively small. I think we should really go for simple formal parameters. This will make both json and our code easier to read.

konnov · 2021-03-25T07:51:54Z

This one is also related: #634. I wonder, whether we should include the list of operator names in this ADR? Otherwise, the readers will have to figure out the operator names themselves.

konnov

I like it. Several comments to improve the text. Since this ADR will be most likely used by all external tooling, let's add clarifications.

konnov · 2021-03-25T07:54:47Z

docs/src/adr/005adr-json.md

+
+## 3. Serializing typed entities
+
+Serializations of typed entities will have an additional field named `type`, containing the type of the expression (see [ADR-002][], [ADR-004][] for a description of our type system and the syntax for types-as-string-annotations respectively). For example, the integer literal `1` is represented by a `ValEx`, which has type `Int` and is serialized as follows:


What about the untyped objects? To make the format regular, I propose to always add the field "type". In case of the untyped encoding, it will be type: "none". It looks stupid, but easier to parse.

This is not about typed vs untyped, everything but TlaModule is typed, in the sense that it has some type tag (possibly Untyped). The language should probably be clearer and say "tagged", not "typed".

Yes. We should be also clear about how we represent untyped expressions

konnov · 2021-03-25T07:58:28Z

docs/src/adr/005adr-json.md

+
+Note that in the above example, the outer `value` field contains a JSON serialization of `TlaInt(1)`, and not just the JSON literal `1`. This is because of deserialization:
+Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold. If we chose to encode values directly as JSON numbers, we would need a) a sensible encoding of `BigInt` values, which are not valid integers and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed).
+Alternatively, we could encode all values as strings, which would be similarly indistinguishable when deserializing. Importantly, the `type` field is not guaranteed to contain a hint, as it could be `Untyped`.


Here, you are mentioning Untyped. So we have decide the fate of the field type in the untyped case.

konnov · 2021-03-25T08:00:38Z

docs/src/adr/005adr-json.md

+Note that in the above example, the outer `value` field contains a JSON serialization of `TlaInt(1)`, and not just the JSON literal `1`. This is because of deserialization:
+Apalache encodes its TLA+ integers as `BigInt`, which means that it permits values, for which `.isValidInt` does not hold. If we chose to encode values directly as JSON numbers, we would need a) a sensible encoding of `BigInt` values, which are not valid integers and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed).
+Alternatively, we could encode all values as strings, which would be similarly indistinguishable when deserializing. Importantly, the `type` field is not guaranteed to contain a hint, as it could be `Untyped`.
+For the above reasons, we choose to instead serialize `TlaValue` as a JSON object, which is more verbose, but trivial to deserialize. It has the following shape


Can you add a bit more structure to this text? For instance, describe the general rules in Section "1. General rules" and the rules for Int in Section "2. Rules for Int". Then it will be clear that Int is a special case, and the rest follows the general principle.

konnov · 2021-03-25T08:03:24Z

docs/src/adr/005adr-json.md

+}
+```
+
+## 4. Serialization rules


Can you add examples for all kinds of declarations? This should not be hard. This will help the reader to see the format without running the tool. Perhaps, you would want to use the simplified format for the parameters, which I mentioned in a comment to this PR.

That kind of documentation will surely be useful for readers, but I don't think manually generating it for the ADR is a very good use of time.

Once it's implemented, we can just automatically generate it for the docs (and even ensure that the docs stay up to date).

That would be even better

konnov · 2021-03-25T08:12:28Z

In principle yes, that shouldn't be too difficult to add, but you'd have to keep in mind that the "location" field would have to be optional, because these JSON representations are intended to bidirectionally connect Apalache with 3rd-party applications, which should be able to produce JSON files, without necessarily starting from a TLA spec. That said, output generated by Apalache could certainly include source information.

Yeah, an optional location field makes sense. We have to decide, whether we want it structured or just text, that is, either MySpec.tla:10:1-12:10 or { filename: "MySpec.tla", from: { line: 10, col: 1 }, to: { line: 12, col: 10} }. I guess, @vitorenesduarte you would prefer the latter?

I thought of adding IDs to the expressions and collecting the source locations in a separate file. This may be a better solution in the long run, if we want to import/export additional data (e.g., annotations). Or perhaps, we could add a separate section in the json file that contains additional data like source info, annotations, etc.

vitorenesduarte · 2021-03-25T08:35:43Z

Yeah, an optional location field makes sense. We have to decide, whether we want it structured or just text, that is, either MySpec.tla:10:1-12:10 or { filename: "MySpec.tla", from: { line: 10, col: 1 }, to: { line: 12, col: 10} }. I guess, @vitorenesduarte you would prefer the latter?

Both would work for modelator but indeed I prefer the latter as it is explicit.

konnov · 2021-03-25T08:55:28Z

For the table of operators, we don't care about the removed ones: #687, #686, #685, #684

shonfeder · 2021-03-25T11:58:51Z

This one is also related: #634. I wonder, whether we should include the list of operator names in this ADR? Otherwise, the readers will have to figure out the operator names themselves.

This also seems to me like something we should do automatically once the serialization in place and put in the documentation, rather than putting it in the ADR.

konnov · 2021-03-25T12:26:19Z

This one is also related: #634. I wonder, whether we should include the list of operator names in this ADR? Otherwise, the readers will have to figure out the operator names themselves.

This also seems to me like something we should do automatically once the serialization in place and put in the documentation, rather than putting it in the ADR.

My motivation here is that the names we have right now are quite ad-hoc. If people start integrating with our code via json, it will be much hard to change the names. Of course, I can just go over the list and pick the names I like :)

konnov

I like it a lot, but have a few concerns with the new version. By the way, are we consistent with https://tools.ietf.org/html/rfc7159 ?

konnov · 2021-04-11T09:48:17Z

docs/src/adr/005adr-json.md

+}
+```
+if no source information is available (e.g. for expressions generated purely by Apalache).
+Serializations generated by Apalache are always guaranteed to contain a `source` field entry.


Well, I would be a bit more careful here. We should prepare the user that they may face "UNKNOWN" in rare cases.

I mean, I very explicitly write "or [UNKNOWN] if no source information is available." The next line asserts that a source field exists, not that the contents of said field are not "UNKNOWN".

I am concerned about this sentence:

Serializations generated by Apalache are always guaranteed to contain a source field entry.

Sometimes, we are missing source information. So "always guaranteed" is way to strong.

konnov · 2021-04-11T09:49:02Z

docs/src/adr/005adr-json.md

+```
+
+## 5. Version information
+JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.


Suggested change

JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.

JSON serializations of `TlaModule`s contain the field `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.

konnov · 2021-04-11T10:01:58Z

docs/src/adr/005adr-json.md

+  "kind": _ ,
+  "value": _


what are the underscores? Is this standard JSON semantics? Shall we use the syntax for comments like /* the kind */ and /* the value /*.

The underscores are, at least to me, very obvious meta-language placeholders. No less formal than /* */, since JSON doesn't support comments anyway.

Well, I surely understand what you mean by _, but we are writing an ADR. So we should not assume too much about the reader. I have not seen the underscore notation outside of OCaml, Scala, and maybe Perl.

docs/src/adr/005adr-json.md

konnov · 2021-04-11T10:06:50Z

docs/src/adr/005adr-json.md

+
+While it might seem more natural to encode the entire `TlaValue` as a JSON primitive, without the added object layer, we would have a much tougher time deserializing.
+We would need a) a sensible encoding of `BigInt` values, which are not valid integers, and b) a way to distinguish both variants of `BigInt`, as well as decimals, when deserializing (since JSON is not typed).
+We could encode all values as strings, but they would be similarly indistinguishable when deserializing. Importantly, the `type` field of the `ValEx` expression is not guaranteed to contain a hint, as it could be `Untyped`


You could cite RFC7159 to justify why we cannot just use unbounded numbers.

I don't think it's about the capabilities of JSON numbers, it has more to do with the unnecessary complexity of interpreting those JSON values as the correct Scala type, which we can avoid pretty much entirely by just having a kind field hint.

The reason for not using JSON number is that they have a bounded range, something like [-(2**53)+1, (2**53)-1], which I am sure a few people know about.

konnov · 2021-04-11T10:07:34Z

docs/src/adr/005adr-json.md

-  args: [jsonOf1, jsonOf1]
+  "kind": "OperEx",
+  "type": "Int",
+  "oper": "(+)",


Suggested change

"oper": "(+)",

"oper": "+",

(+) is a different operator in TLA+.

Perhaps, but that is the value of TlaArithOper.plus.name

Since it is an ADR and we have #726, let's change it to "PLUS" right away.

konnov · 2021-04-11T10:11:13Z

tla-import/src/main/scala/at/forsyte/apalache/io/json/TlaToJson.scala

    extends JsonEncoder[T] {
-  val kindFieldName: String = "kind"
-  val typeFieldName: String = "type"
+  import TlaToJson._

  implicit def liftString: String => T = factory.fromStr


Is there a good reason for using implicit conversions here? AFAIK, this may stop working in Scala 3. In general, it's hard to refactor code that is using implicits, as we have seen recently.

Yes, being able to write s -> "abc" instead of s -> factory.fromStr("abc") in like 20 spots. This is far from the kind of implicits used in (un)typed predefs, since it's declared in the same spot as it is used, so it's more of a convenience shorthand.

konnov · 2021-04-11T10:19:39Z

docs/src/adr/005adr-json.md

+{
+  "kind": "TlaModule",
+  "name": "MyModule",
+  "version": "1.0"


I think we should introduce a different root object for the json format. We may want to add meta information to it later. So I would suggest adding another object on top of TlaModule, e.g.:

{ "name": "ApalacheIR", "version": "1.0", "modules": [{ "kind": "TlaModule", "name": "MyModule", ... }] }

Co-authored-by: Igor Konnov <igor@informal.systems>

Kukovec · 2021-04-11T13:24:46Z

I like it a lot, but have a few concerns with the new version. By the way, are we consistent with https://tools.ietf.org/html/rfc7159 ?

In what way do you think we are not?

konnov · 2021-04-12T14:21:07Z

I like it a lot, but have a few concerns with the new version. By the way, are we consistent with https://tools.ietf.org/html/rfc7159 ?

In what way do you think we are not?

That was a suggestion to quickly check RFC 7159 :)

shonfeder · 2021-04-12T15:41:43Z

I think the CI failure here is due to a bintray outage: https://status.bintray.com/

I'll look into reducing the dependency chain in the meanwhile, but I'll make the mac-os integration tests optional for the time being.

shonfeder · 2021-04-12T20:59:32Z

CI Failure should be fixed, just need to rebase/merge on top of the trunk.

konnov · 2021-04-19T07:31:20Z

Is there much to do, in order to get it merged? We might get some integration blocked on this JSON export/import.

json ADR

944de9f

Kukovec requested a review from konnov March 22, 2021 14:30

shonfeder approved these changes Mar 22, 2021

View reviewed changes

Merge branch 'unstable' into jk/jsonADR

e0393f8

Merge branch 'unstable' into jk/jsonADR

9cbb990

konnov reviewed Mar 25, 2021

View reviewed changes

konnov mentioned this pull request Mar 25, 2021

Rename BmcOper to ApalacheOper, remove withType #687

Merged

4 tasks

konnov mentioned this pull request Mar 25, 2021

Explain JSON encoding of TLA+ #186

Closed

Merge branch 'unstable' into jk/jsonADR

c2b5d11

JSON source field + misc

4a3f699

konnov reviewed Apr 11, 2021

View reviewed changes

Update docs/src/adr/005adr-json.md

db7ed14

Co-authored-by: Igor Konnov <igor@informal.systems>

Kukovec and others added 4 commits April 11, 2021 15:41

Comments

8673ede

Merge branch 'unstable' into jk/jsonADR

21525b4

Reader components

3500ed4

Merge branch 'unstable' into jk/jsonADR

10bf178

Kukovec added 3 commits April 19, 2021 15:20

Merge branch 'unstable' into jk/jsonADR

1840df4

Post-rebase fixes

785f9c5

Merge branch 'unstable' into jk/jsonADR

bd6ce9c

Kukovec merged commit 9e12b92 into unstable Apr 19, 2021

Kukovec deleted the jk/jsonADR branch April 19, 2021 13:58


		## 3. Serializing typed entities

		Serializations of typed entities will have an additional field named `type`, containing the type of the expression (see [ADR-002][], [ADR-004][] for a description of our type system and the syntax for types-as-string-annotations respectively). For example, the integer literal `1` is represented by a `ValEx`, which has type `Int` and is serialized as follows:

	JSON serializations of `TlaModule`s contain a filed named `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.
	JSON serializations of `TlaModule`s contain the field `version`, holding a string representation of the current JSON encoding version, shaped `{major}.{minor}`.

Json ADR #670

Json ADR #670

Conversation

Kukovec commented Mar 22, 2021

shonfeder left a comment

Choose a reason for hiding this comment

shonfeder commented Mar 22, 2021

vitorenesduarte commented Mar 23, 2021

konnov commented Mar 23, 2021

Kukovec commented Mar 23, 2021

konnov commented Mar 25, 2021 • edited Loading

konnov commented Mar 25, 2021

konnov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

konnov commented Mar 25, 2021

vitorenesduarte commented Mar 25, 2021

konnov commented Mar 25, 2021

shonfeder commented Mar 25, 2021 • edited Loading

konnov commented Mar 25, 2021

konnov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

konnov Apr 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kukovec commented Apr 11, 2021

konnov commented Apr 12, 2021

shonfeder commented Apr 12, 2021

shonfeder commented Apr 12, 2021

konnov commented Apr 19, 2021

konnov commented Mar 25, 2021 •

edited

Loading

shonfeder commented Mar 25, 2021 •

edited

Loading

konnov Apr 12, 2021 •

edited

Loading