Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value heap walk in lib_machine #2475

Merged
merged 7 commits into from
Aug 19, 2024
Merged

Value heap walk in lib_machine #2475

merged 7 commits into from
Aug 19, 2024

Conversation

bakaq
Copy link
Contributor

@bakaq bakaq commented Aug 12, 2024

This improves the parsing of terms in lib_machine. The current interface can't differentiate strings, variables and atoms, just crashes on composite terms, and is very brittle as it's implemented as parsing a string (that isn't even in canonical form, so operators). With this PR it now does a direct heap walk to parse the term. Here's an test demonstrating what it can represent:

let mut machine = Machine::new_lib();

let query = "X = a(\"asdf\", [42, 3.14, asdf, a, [a,b|_], Z]).".into();

let result = machine.run_query(query);

let expected = Value::Structure(
    // Composite term
    "a".into(),
    vec![
        Value::String("asdf".into()), // String
        Value::List(vec![
            Value::Integer(42.into()),  // Fixnum
            Value::Float(3.14.into()),  // Float
            Value::Atom("asdf".into()), // Atom
            Value::Atom("a".into()),    // Char
            Value::Structure(
                // Partial string
                ".".into(),
                vec![
                    Value::Atom("a".into()),
                    Value::Structure(
                        ".".into(),
                        vec![
                            Value::Atom("b".into()),
                            Value::Var("_A".into()), // Anonymous variable
                        ],
                    ),
                ],
            ),
            Value::Var("Z".into()), // Named variable
        ]),
    ],
);

assert_eq!(
    result,
    Ok(QueryResolution::Matches(vec![QueryMatch::from(
        btreemap! {
            "X" => expected,
        }
    )]))
);

There is also support for rational numbers and arbitrary precision integers. The only thing I didn't do yet is the handling of PStrLoc (which I'm not sure is relevant at this stage).

I #[ignore]d the "integration tests" (why are they in an unit test???) of lib_machine because they assume the old interface where everything is a string and integers are floats. Also, they depend on the current JSON encoding which I'm also planning to improve soon to something like described in #2465 (comment). After that I will then migrate the integration tests to the new format and re-enable them.

Notice that unlike #2472, this is a breaking change (that's why the integration tests needed to be disabled), but I think a very sensible one (not being able to handle composite1 terms is a bit unforgivable). Changing the JSON encoding will also be a breaking change, and if we decide to really take this opportunity to break things I think we could improve the interface a lot.

Footnotes

  1. I just realized I used "composite terms" everywhere but I was trying to say "compound terms". I'm not even sure if there's a meaningful difference, but if there is, I meant "compound terms". I'm not changing all that now.

@bakaq
Copy link
Contributor Author

bakaq commented Aug 12, 2024

I implemented support for rational numbers and arbitrary precision integers. Now the only questions are if it is useful to differentiate distinct anonymous variables (the toplevel does this, so I guess it is), and if it's necessary to handle PStrLoc (I still wasn't able to make a query that makes this interface stumble on that HeapCellValueTag).

@jjtolton
Copy link

jjtolton commented Aug 12, 2024

I was messing around a bit with the shared library and enabled this println! -- you might be way ahead of me on this, but this area of lib_machine appears to be the threshold where residual goals are getting trashed.

    with ScryerMachine("""
    :- use_module(library('$toplevel')).
    :- use_module(library(clpz)).
    """) as wam:
        print(wam.eval("'$toplevel':expand_goal((3 #>= X), user, Term0)."))

results in

Result: Term0 = integer(X)->3>=X;_646920=3,clpz:clpz_geq(_646920,X)
thread '<unnamed>' panicked at src/machine/lib_machine.rs:209:49:
Couldn't convert Houtput to Value: ()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5

src/machine/lib_machine.rs:209:49

but works fine in the repl:

?- '$toplevel':expand_goal((3 #>= X), user, Term0).
%@    Term0 = (integer(X)->3>=X;_A=3,clpz:clpz_geq(_A,X)).

will cross reference with @Skgland -- not sure who needs to know this info.

Not sure if it's relevant here or if I should tuck it under a new issue.

@bakaq
Copy link
Contributor Author

bakaq commented Aug 12, 2024

[...] you might be way ahead of me on this, but this area of lib_machine appears to be the threshold where residual goals are getting trashed.

Yes, it's precisely this part of the code that I improved in this PR. Currently in master it parses a string, but I actually walk the heap so it's more accurate. Your example helped me catch a bug (thanks!), but currently in this branch:

use scryer_prolog::machine::Machine;

fn main() {
    let mut machine = Machine::new_lib();

    machine.load_module_string("facts", "
        :- use_module(library('$toplevel')).
        :- use_module(library(clpz)).
    ".into());

    let result = machine.run_query("'$toplevel':expand_goal((3 #>= X), user, Term0).".into());

    println!("{:?}", result);
}

Prints this:

Ok(Matches([QueryMatch { bindings: {"Term0": Structure(";", [Structure("->", [Structure("integer", [Var("X")]), Structure(">=", [Integer(3), Var("X")])]), Structure(",", [Structure("=", [AnonVar, Integer(3)]), Structure(":", [Atom("clpz"), Structure("clpz_geq", [AnonVar, Var("X")])])])])} }]))

Which is an accurate representation of the answer you get in the toplevel (apart from differentiating anonymous variables like the _A in the toplevel, I will work on that tomorrow).

Also, my code doesn't deal with residual goals yet. I think that is very different and harder to do than what I'm doing here.

@jjtolton
Copy link

[...] you might be way ahead of me on this, but this area of lib_machine appears to be the threshold where residual goals are getting trashed.

Yes, it's precisely this part of the code that I improved in this PR. Currently in master it parses a string, but I actually walk the heap so it's more accurate. Your example helped me catch a bug (thanks!), but currently in this branch:

use scryer_prolog::machine::Machine;



fn main() {

    let mut machine = Machine::new_lib();



    machine.load_module_string("facts", "

        :- use_module(library('$toplevel')).

        :- use_module(library(clpz)).

    ".into());



    let result = machine.run_query("'$toplevel':expand_goal((3 #>= X), user, Term0).".into());



    println!("{:?}", result);

}

Prints this:


Ok(Matches([QueryMatch { bindings: {"Term0": Structure(";", [Structure("->", [Structure("integer", [Var("X")]), Structure(">=", [Integer(3), Var("X")])]), Structure(",", [Structure("=", [AnonVar, Integer(3)]), Structure(":", [Atom("clpz"), Structure("clpz_geq", [AnonVar, Var("X")])])])])} }]))

Which is an accurate representation of the answer you get in the toplevel (apart from differentiating anonymous variables like the _A in the toplevel, I will work on that tomorrow).

Also, my code doesn't deal with residual goals yet. I think that is very different and harder to do than what I'm doing here.

I am still trying to determine if residual goals are even a semantic concept in the rust codebase or if they are given life in the prolog part of the codebase. The more I poke around in the toplevel and loader modules I find it very interesting how much access the prolog code has to the actual machine internals. I've never seen that in a language before, it's fascinating! As far as I can tell, the WAM does nothing except for load/parse the input until directed otherwise by the prolog code.

@bakaq
Copy link
Contributor Author

bakaq commented Aug 13, 2024

Residual goals are actually a (library defined) projection of variable attributes from the attributed variables interface (search that page for "Displaying attributes"). In Scryer, attributed variables are definitely implemented in Rust from the code I've seen, but I'm not sure how they actually work and I feel it will be quite complicated to get the residual goals that we see in the toplevel from the attributes even if I manage to find how to get them (hopefully I'm wrong about this).

The more I poke around in the toplevel and loader modules I find it very interesting how much access the prolog code has to the actual machine internals.

It's the Scryer philosophy of doing as much as possible in Prolog. This dogfooding encourages us to improve Prolog itself (both the implementation and programming techniques) instead of relying on crutches for functionality and performance.

@jjtolton
Copy link

jjtolton commented Aug 13, 2024

I haven't figured out how to setup a test for step through debugging where you could run something like

:- use_module(library(dif)).
?- dif(A, B).

but if you could, I would imagine you could step through in the heap printer and see what's going on.

Residual goals are actually a (library defined) projection of variable attributes from the attributed variables interface (search that page for "Displaying attributes"). In Scryer, attributed variables are definitely implemented in Rust from the code I've seen, but I'm not sure how they actually work and I feel it will be quite complicated to get the residual goals that we see in the toplevel from the attributes even if I manage to find how to get them (hopefully I'm wrong about this).

⚡ ⚡ !!! I've seen these sprinkled around -- HeapCellValueTag::AttrVar. I didn't know what they were before, but I'm willing to bet those are the residual goals, e.g.! I imagine stepping through that would provide a pretty solid understanding of how those are serialized.

@bakaq
Copy link
Contributor Author

bakaq commented Aug 13, 2024

[...] I would imagine you could step through in the heap printer and see what's going on.

That's the thing, it's not the responsibility of the heap printer to portray residual goals in the toplevel, this is done manually in Prolog with lower level primitives (for Prolog, for Rust it's higher level) such as copy_term/3. See for example:

write_eqs_and_read_input(B, VarList, AttrVars) :-
gather_query_vars(VarList, OrigVars),
% one layer of depth added for (=/2) functor
'$term_variables_under_max_depth'(OrigVars, 22, Vars0),
'$project_atts':project_attributes(Vars0, AttrVars),
% Need to copy all the visible Vars here so that they appear
% properly in AttrGoals, even the non-attributed. Need to also
% copy all the attributed variables here so that anonymous
% attributed variables also appear properly in AttrGoals.
copy_term([Vars0, AttrVars], [Vars0, AttrVars], AttrGoals),

@bakaq
Copy link
Contributor Author

bakaq commented Aug 15, 2024

There is now support for differentiating anonymous variables! I did it the same way it's done in the toplevel, naming them progressively _A, _B, etc, in each query answer separately, skipping any of those names if there is an actual named variable with that name in the query. I can't be certain this matches the toplevel behavior in all cases, but I think it's good enough. My implementation of that certainly isn't very good and can probably be optimized a lot, and ideally this would use the same system that the toplevel uses (but I think this is implemented in Prolog, so it's kind of hard in this case).

@mthom, this is ready for review!

@bakaq
Copy link
Contributor Author

bakaq commented Aug 17, 2024

Converting to draft for now so that I can get the order of bindings consistent as mentioned in #2465 (comment).

@bakaq bakaq marked this pull request as ready for review August 19, 2024 17:30
@bakaq
Copy link
Contributor Author

bakaq commented Aug 19, 2024

This is now ready for review again @mthom.

@mthom mthom merged commit 5c39e2a into mthom:master Aug 19, 2024
13 checks passed
@bakaq bakaq deleted the value_heap_walk branch August 19, 2024 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants