-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value and QueryResolution serialization with serde #2493
Conversation
first of all this is brilliant, and solves a lot of problems I'm currently dealing with the shared library.
thinking about this is still a potentially useful exercise though as the popularity of the shared library grows and demands on the data become more intensive it will probably become sub optimal to be transferring data to the shared library interface via string serialization and serialization and at that point the sea style API that you discussed initially is going to become more more useful along with a zero copy pathway that will allow for a shared memory space between the runtimes. I agree that it may be a bit premature to start thinking about deserialization now but it might be good to at least start thinking about it. |
ce4e8ed
to
e227b99
Compare
I now use I just generated the integration tests file from the current JSON output. I think it's fine, but it's so many cases that I couldn't check them all manually. The major changes is that now there is a layer of indirection with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two notes, otherwise this looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small nit that is also fine to just ignore, otherwise this looks good to me.
Ok, so with that I think this is ready to merge @mthom. |
This is correct! {
"bindings": { "Var": "binding" },
"residual": [ "clpz:(X in 1..5) would go here" ]
} If you really want to get the residual goals, see Serialization format
Yeah, I thought about that in my original proposal. @Skgland pointed in this direction of using JSON types where possible, which I think is very nice, but does indeed make it harder to use by not having an uniform representation for all terms. This can be easily changed, and it would be best to do this in this PR, so I would like to hear your opinions. Let's review our options: 1: Using JSON types where possible// 10
10
// Use:
// switch typeof(term) {
// int => ...,
// obj => if term.has_field(...)
// ...
// else if term.has_field(...)
// ...
// end,
// } This is what I currently do in this PR, falling back to externally tagged when there isn't a representation, and doing compound terms untagged
2: Externally tagged// 10
{ "integer": 10 }
// a(1,2)
{ "compound": { "functor": "a", "args": [ { "integer": 1 }, { "integer": 2 } ] } }
// Use:
// if term.has_field("integer")
// ...
// else if term.has_field("compound")
// ...
// end This is closer to my initial proposal (it had the same fallbacks as above), and the serde default for enums.
2.1 Untagged compound terms// a(1,2)
{ "functor": "a", "args": [ { "integer": 1 }, { "integer": 2 } ] } This gets rid of a layer of indirection at no downsides (we can use the 3: Adjacently tagged// 10
{ "type": "integer", "value": 10 }
// a(1,2)
{
"type": "compound",
"value": {
"functor": "a",
"args": [
{ "type": "integer", "val": 1 },
{ "type": "integer", "val": 2 }
]
}
}
// Use:
// switch term["type"] {
// "integer" => ...,
// "compound" => ...,
// ...
// } This is basically what @jjtolton proposed (plus internally tagged compound terms, see bellow). This is also the JSON equivalent of a clean representation in Prolog, so that's nice.
3.1 Internally tagged compound terms// a(1,2)
{
"type": "compound",
"functor": "a",
"args": [
{ "type": "integer", "val": 1 },
{ "type": "integer", "val": 2 }
]
} This only really makes sense for compound terms, but gets rid of a layer of indirection with no downsides, so we should do this if we go with adjacent tagging for the rest. ConclusionI think I would like either full JSON types or full adjacent/internal tagging, and I'm leaning more to the latter. External tagging seems like the worst of both worlds. Opinions? See also |
What about a mixture of 1 and 3.1? Using JSON types where they make sense and having an internal type tag where we need to disambiguate.
Pros:
Cons:
|
Could you possible provide an example? 😅 I have heard what you said repeated before but I haven't figured out how to do it or have seen an example, but that would be clear enough for my tests. |
Would there ever be a type other than "compound" in internally tagged? 🤔 Because the hashmap by itself would signify compound. Edit: Oh, after reading internally tagged this seems like it would be very prone to breaking client libraries. Any time we change the internal representation, the output should change. I don't see any reason why clients should need to care about the rust implementation of the Prolog code, unless it was made to align very closely with Prolog terminology. |
How would you differentiate atom from string? How would you deal with arbitrary precision integers? Those are places where I think we need an object to represent it.
?- call_residue_vars(X in 1..10, ResVars), copy_term(ResVars, ResVars, ResGoals).
ResVars = [X], ResGoals = [clpz:(X in 1..10)], clpz:(X in 1..10). In this case
This may be a good way to do it, I also like it. |
Value currently differentiates between
Internally tagged would be
We will probably need to implement the serialization manually anyways instead of deriving it. So how it is internally represented doesn't necessarily match the json serialization. |
Sorry, I hadn't read the documentation on what the definition of "internally tagged" mean. I understand now. I personally think if we do it right, from the consumer perspective, internal and external tagging should appear identical. There should be no implementation specific terminology related to Scryer internals unless it is synonymous with a Prolog data concept. So, internal/external I think both accomplish the same thing, from what I'm seeing, if we do it right.
I think this is a fine compromise. I would be very curious if the overhead of serialization or the overhead of branching conditionals would be better for performance. One thing to keep in mind is that this branching conditional would have to be called recursively over nested data structures (such as list), so even if it were a list of all integers, if would have to be processed as # pseudocode
def process_term(term):
if isinstance(term, int):
...
if isinstance(term, dict):
...
if isinstance(term, list):
for t in list:
process_term(t) because the consumer would not necessarily know a priori that the list was composed entirely of integers or functors. For dynamically typed languages, this would not be much of an issue. I'm not sure how annoying this would be for statically typed languages to recursively process mixed type collections. So I think two good rules should be:
|
I would expect that they would need to handle mixed lists anyways as neither prolog nor json enforce the list content to be heterogeneous.
Opened #2505 with an work-in-progress deserialize impl based on the current state of this PR.
Given that this is for interop with other languages simplifying common prolog constructs into a form that is "pre-assembled" to match common language constructs should make it easier to work with compared to a canonical form. |
Agreed.. I think a preassembled literal list is much better than {"type: "functor", "functor": ".", "args: [{"type": "variable", "variable": "_A"}, {"type": "functor", "functor": ".", "args": [... and also more ideal than {"type": "internal.foo.something", "functor": "f", "args": [...]} |
With "type" |
I would venture that anything I say related to Prolog is closer to babbling than meaningful speech at speech at this point. 😁 Hopefully the grownups can be in charge of proper terminology! |
In order to make sure that we avoid any pitfalls with JSON: Numbers aren't well defined in the JSON specification and basically every implementation that consumes JSON can interpret it as it wishes. It is often just a double precision float (without Basically JSON numbers are safe to use only when you don't care if it is an integer or float. |
Maybe this means that we should always represent an integer as a string. |
Prolog also silently switches between integers and floats, what makes this distinction kind of artificial. I don't know what is the best way out of this. Maybe we can ignore that distinction, or use some type hints. |
This is effectively the way Python handles large floats. Clojure will often handle floats as fractions unless specifically coerced, so while unconventional, we could return floats as integer ratios rather than floating point. Ironically, the more we explore and refine the JSON API, the more I am convinced that @bakaq 's original suggestion of C-API makes more sense 🤣 On the other hand, the exploration of the serialization is reaping many benefits. I think the tagging taxonomy by itself will be a very valuable educational tool for folks curious about learning Prolog. |
on the other hand if we represent anything as a string besides a string, we have to bag and tag it (wrap it in an object with a |
Yeah, that would mean integers would always be tagged, even if they are small: // 10
{ "type": "integer", "val": "10" }
// "10"
"10" This has the benefit that integer representation would be uniform and so there would be no need for the complicated "integer if small string if big" fallback (this would also apply to rational components), but also means that we always shift the burden of parsing the integer to the user which is a bit unfortunate. |
I would assume the common case to be small integers that fit into a languages normal integer types (for rust i8/u8 upto i128/u128) which would be easier if it was encoded as an int. |
I suppose it's worth asking the question what is the intended purpose -- is this for performance, adoption, or something else? From a performance perspective, JSON is probably the wrong approach anyway. An equivalent C-API would probably be superior. From an adoption standpoint, I feel like a consistent API would probably make the most sense, and objects have the added benefit of being future-extensible. Also, is it worth making the JSON API congruent with the C API? Now, my assumption is that in a C-API, when you do something like,
If there is any type coercion required, we could always wrap something that was Anyway, I don't feel particularly strongly about it, except to say that I think for the serialization we should probably be leaning towards adoption over performance, and that the JSON API should probably be as similar as we can make it to the eventual C-API. I'm fine with a combination of primitive types and tagged types where required for disambiguation. |
I would rather expect the return type to be a tagged union or pointer to a tagged union, so no need for an extra method to get the type. i.e. struct Value {
tag: enum {
Int,
Float,
String
Compound
...
},
value: union {
i: int64_t,
d: double,
s: char*,
c: struct {
functor: char*,
argc: size_t,
argv: Value*,
}
...
}
} or the rust equivalent #[repr(C, u8)]
enum Value {
Int(i64),
Float(f64),
String(*const char),
Compound{ functor: *const char, argc: usize, argv: *const Value }
...
} |
There are some reasons to not do this actually. We may want to change the internal representation of |
I am open to it, but without some kind of type dispatch integer I would be unsure how to write client code to traverse or serialize the answer or leaf answer. There might be a technique I haven't seen before, though. |
Mh, I see. Part of that I think would already be solved by passing a pointer to the struct rather than the struct itself, but if we define a complete type in C, the C code could depend on the size and alignment of the struct and store it directly on the stack or in other types, to prevent that we need an incomplete type in C, but we can't define a struct with a known tag and union field that is also incomplete. I.e we don't have a #[non_exhaustive] equivalent in C, especially because we need to know the offset of the start of the struct to two fields not both of which can be at the start of the struct. So the offset of at least one of the fields depends on either the size or the alignment of the union type which might change if we add more variants. Yeah, looks like we need to go with the functions for this to work as a dynamic library. |
I think the idea is something like SCRYER_Machine *m = scryer_machine_new();
SCRYER_QueryState *q = scryer_query(m, "A = 'test'.");
SCRYER_QueryAnswer *a = scryer_next_answer(*q);
switch (scryer_answer_kind(s)) {
case SCRYER_ANSWER_TRUE:
...
case SCRYER_ANSWER_FALSE:
...
case SCRYER_ANSWER_MATCHES: {
SCRYER_AnswerMatches *am = scryer_get_answer_matches(a);
SCRYER_AnswerBindings *b = scryer_get_answer_matches_bindings(am);
size_t bindings_count = scryer_get_answer_matches_bindings_count(am);
for (size_t i = 0; i < bindings_count; i++) {
SCRYER_Term *t = scryer_get_bound_term(b);
switch (scryer_get_term_kind(t)) {
case SCRYER_TERMKIND_INT:
...
case SCRYER_TERMKIND_FLOAT:
...
case SCRYER_TERMKIND_RATIONAL:
...
case SCRYER_TERMKIND_STRING:
...
case SCRYER_TERMKIND_ATOM: {
char *name = scryer_get_atom_term_string(t);
...
}
}
}
...
}
} |
Ok perfect. Yes I think that's very reasonable. |
Continued in #2707. |
This PR implements
serde::Serialize
forValue
,QueryMatch
,QueryResolution
andQueryResolutionLine
, giving them all access to a good JSON serialization. This is built on top of the improved representation and parsing of terms of #2475. For example, the result of callingMachine::run_query()
onA = a(1,2.5), B = [asdf, "fdsa", C, _, true, false] ; Rat is 3 rdiv 7.
can be serialized into this:I choose to represents the bindings like this because its much more convenient for end users than having to dig into lists of
{ "functor": "=", "args": [...] }
, but there are alsoFrom
implementations from all of these types toValue
, so you could interpret the whole result of a query as a Prolog term if you really want. Also, notice that there is a level of indirection for the bindings, this is so that we could easily extend this to represent things like residual goals in a future API:Some details: For integers that don't fit in an
i64
(or in anu64
in the case of the denominator of a rational) they are represented as strings. So for the queryInt is 10^100, Rat is 10^100 rdiv 7
gives:This doesn't provide implementations for
Deserialize
. They are much harder to implement and I don't think they would be very useful with the interface we have now.