Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Much slower than SWI checking [wn_valid] from wordnet-prolog #1681

Closed
dcnorris opened this issue Jan 7, 2023 · 20 comments
Closed

Much slower than SWI checking [wn_valid] from wordnet-prolog #1681

dcnorris opened this issue Jan 7, 2023 · 20 comments

Comments

@dcnorris
Copy link

dcnorris commented Jan 7, 2023

I have forked ekaf/wordnet-prolog and made a few changes needed to get a validity check to run in Scryer.

Whereas swipl does [wn_valid]. in about 13 seconds, I find that scryer-prolog ("v0.9.1-51-gfd191285") takes 1 hour. All but 24 seconds of this is spent in goal check_keys/0:

check_keys:-
  writeln('Searching for ambiguous sense keys'),
  findall(X, multikey(X), L),
  list_to_set(L,S),
  member(K,S),
  format("Ambiguous key: ~w\n",[K]),
  findall(I, sk(I,_,K), IL),
  list_to_set(IL,IS),
  member(I,IS),
  g(I,G),
  format("    ->~w (~w)\n",[I,G]),
  false.
check_keys:-
  ok,
  nl.
@UWN
Copy link

UWN commented Jan 8, 2023

Would you mind reducing the size of your finding? For example, you could introduce false after some goal and compare that failure slice with both systems. It could be the first findall/3 alone that makes problems. Also, are you sure that you used the same apply/2 definition? If you can, directly rewrite any goal for apply/2 with a fixed list to the corresponding call/N. These apply/2 goals are really so 1970s

@UWN
Copy link

UWN commented Jan 8, 2023

... or just compare ?- multikey(X), false.

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

Wherever its 2nd arg was a fixed-length list, I had replaced apply/2 with call/N. (This is advised in SWI's documentation of apply/2.) But I think all the WordNet predicates are of arity known at 'compile time', so it should be straightforward to eliminate the remaining invocations of apply/2 by putting these facts objectively into the program.

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

Indeed, time((multikey(X), false)) is already taking quite awhile. Will confirm its final time an hour from now.

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

It's all in multikey/1, then:

?- time((multikey(X), false)).
   % CPU time: 3716.558s
   false.

SWI by comparison:

?- time((multikey(X), false)). % Using swipl
% 414,620 inferences, 0.078 CPU in 0.078 seconds (100% CPU, 5328386 Lips)
false.

Here is multikey/1 for quick reference:

multikey(K):-
  sk(I,_,K),
  sk(J,_,K),
  I\=J.

I am sold on failure slicing!

@triska
Copy link
Contributor

triska commented Jan 8, 2023

This seems like an issue with indexing. Specifically, we have:

multikey(K):-
  sk(I,_,K),
  sk(J,_,K),
  I\=J.

where sk/3 consists of hundreds of thousands of plain facts.

In the highlighted goal above, i.e., sk(J, _, K), J is a variable and K is instantiated. So, a Prolog system with good indexing can use K to quickly detect which facts apply in this case.

Scryer Prolog currently only indexes on the first instantiated argument that occurs in any clause of a predicate, so in this case, only the first argument of sk/3 is used for indexing. This leads to quadratic overhead multikey/1, because for each fact, all facts have to be checked.

Such an issue can be worked around "manually", by adding (or generating) auxiliary facts, say sk_3/3, where the arguments are reordered to benefit from the engine's argument indexing if they are instantiated at the time of the call.

Ideally of course, the system would automatically generate suitable indices on demand, depending on which arguments are instantiated at the time of the call, so that the fitting clauses can later be found much more quickly.

Related to #192.

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

This is most helpful, thank you @triska! I just now naively tried

sk_3(K,I,J) :- sk(I,J,K).

multikey(K):-
  sk_3(K,I,_),
  sk_3(K,J,_),
  I\=J.

but of course this does not speed anything up because it is the underlying facts that get indexed. Is there any more declarative way to 'request' indexing than writing permuted facts one-by-one?

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

For the time being, maybe the best I can contribute back to the original repo via PR would be increased ISO standard compliance? I'll see what I can do to get this code running with gprolog.

@triska
Copy link
Contributor

triska commented Jan 8, 2023

@dcnorris: Improving ISO compliance for better portability would definitely help already, if only to make such comparisons between different systems easier.

Regarding the auxiliary facts specifically useful for Scryer Prolog and other systems with limited built-in indexing, you may be able to use Scryer Prolog's term_expansion/2 mechanism to automatically generate, for each defined fact, also its corresponding alternative definition with a different argument order.

For example, try:

:- discontiguous(sk_1/3).
:- discontiguous(sk_3/3).

sk(X, Y, Z) :- sk_1(X, Y, Z).

term_expansion(sk(X,Y,Z),
               [sk_1(X,Y,Z),
                sk_3(Z,X,Y)]).

sk(100001740,1,'entity%1:03:00::').
sk(100001930,1,'physical_entity%1:03:00::').
sk(100002137,1,'abstraction%1:03:00::').
sk(100002137,2,'abstract_entity%1:03:00::').
sk(100002452,1,'thing%1:03:00::').
sk(100002684,1,'object%1:03:00::').
sk(100002684,2,'physical_object%1:03:00::').
...

In this way, you automatically get, for each sk/3 fact that occurs in the file, the corresponding sk_1/3 and sk_3/3 facts.

@triska
Copy link
Contributor

triska commented Jan 8, 2023

And building on that, you can extend sk/3 to dynamically switch between the different variants, for example (in the case above):

sk(X, Y, Z) :-
        (   nonvar(X) ->
            sk_1(X, Y, Z)
        ;   nonvar(Z) ->
            sk_3(Z, X, Y)
        ;   sk_1(X, Y, Z)
        ).

In this way, all your code can use sk/3, and when Scryer Prolog's built-in indexing improves, you simply remove these manual workarounds.

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

I find gprolog is taking a long time already with [wn_valid], but that like Scryer it has a term_expansion/2 mechanism. So perhaps this solution will apply uniformly across both of these engines.

@pmoura
Copy link
Contributor

pmoura commented Jan 8, 2023

I find gprolog is taking a long time already with [wn_valid], but that like Scryer it has a term_expansion/2 mechanism. So perhaps this solution will apply uniformly across both of these engines.

From the GNU Prolog documentation:

The GNU Prolog compiler (section 4.4) automatically calls expand_term/2 on each Term1 read in. However, in the current release, only DCG transformation are done by the compiler (i.e. term_expansion/2 cannot be used). To use term_expansion/2, it is necessary to call expand_term/2 explicitly.

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

Too bad about these gprolog limitations. In fact, using gprolog required increasing 3 resource limits (MAX_ATOM, GLOBALSZ and TRAILSZ) and its computation took 1.4 hours. But still I trust it's a good guide to ISO compliance?

@UWN
Copy link

UWN commented Jan 8, 2023

... gprolog ... But still I trust it's a good guide to ISO compliance?

Syntax is excellent. Currently the best. Weak points are

  • expression evaluation (neither unbounded integers nor consistent overflow checking as required)
  • some non-conforming strict mode (5.1 e). As an example, there is no way to turn off the clpfd extension. In Scryer just don't import clpz.
  • clpfd cannot be considered a valid extension in the sense of 5.5, in particular because of missing overflow checking (or missing instantiation errors, if you take it the other way round). SICStus shows how this is done with representation errors. In Scryer, this is not an issue due to unbounded integers for (finite) domain bounds.

@pmoura
Copy link
Contributor

pmoura commented Jan 8, 2023

Too bad about these gprolog limitations.

You can use Logtalk's portable term-expansion mechanism to easily expand a Prolog file into another Prolog file. See:

https://logtalk3.readthedocs.io/en/latest/libraries/hook_objects.html#outputting-term-expansion-results-to-a-stream-or-a-file

This solution will work with all Logtalk supported backend Prolog systems, including Scryer Prolog and GNU Prolog.

P.S. I maintain a Prolog packs registry for WordNet at https://github.com/LogtalkDotOrg/wordnet It includes a pack manifest file for the project you forked.

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

Cowabunga:

% Appended the following to 'scryer.pl'
:- discontiguous(sk_1/3).
:- discontiguous(sk_3/3).

term_expansion(sk(X,Y,Z),
               [sk_1(X,Y,Z),
                sk_3(Z,X,Y)]).

sk(X, Y, Z) :-
        (   nonvar(X) ->
            sk_1(X, Y, Z)
        ;   nonvar(Z) ->
            sk_3(Z, X, Y)
        ;   sk_1(X, Y, Z)
        ).

Result:

?- [wn_valid].
output/wn_valid.pl-Output-3.1
Semantic Relations: [at,cs,ent,hyp,ins,mm,mp,ms,sim,vgp]
... etc. etc. ...
   % CPU time: 30.251s
   true.
?- 

@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

With the above application of term_expansion/2, I can close this issue.

@dcnorris dcnorris closed this as completed Jan 8, 2023
@dcnorris
Copy link
Author

dcnorris commented Jan 8, 2023

I find gprolog is taking a long time already with [wn_valid], but that like Scryer it has a term_expansion/2 mechanism. So perhaps this solution will apply uniformly across both of these engines.

From the GNU Prolog documentation:

The GNU Prolog compiler (section 4.4) automatically calls expand_term/2 on each Term1 read in. However, in the current release, only DCG transformation are done by the compiler (i.e. term_expansion/2 cannot be used). To use term_expansion/2, it is necessary to call expand_term/2 explicitly.

@pmoura I can confirm that @triska's solution indeed does not work in 'gprolog.pl' as it does in 'scryer.pl.' But may I expect that after loading all the facts, I could simply asserta/1 a clause that invokes expand_term/1?

@triska
Copy link
Contributor

triska commented Jan 8, 2023

May I suggest to use the GNU Prolog repository to file and discuss GNU Prolog-specific issues:

https://github.com/didoudiaz/gprolog ?

@didoudiaz
Copy link

didoudiaz commented Jun 22, 2023

I have added a support for term_expansion/2 in gprolog which works with the above proposed workaround. However, here is an alternative solution based on findall/setof and keysort (which are generally efficiently implemented). It avoids to duplication of the large sk/3 database (207272 facts) as with the term_expansion/2 solution.

To try it: simply replace in wn_valid.pl the definitons of multikey/1 and checkeys/0 by the following code:

multikey(K, IL) :-
        findall(K-I, sk(I, _, K), L1),
        keysort(L1, L2),
        setof(I, member(K-I, L2), IL),
        length(IL, N),
        N > 1.

check_keys:-
        write('Searching for ambiguous sense keys'), nl,
        multikey(K, IL),
        format('Ambiguous key: ~w\n',[K]),
        member(I,IL),
        g(I,G),
        format('    ->~w (~w)\n',[I,G]),
        false.
check_keys:-
        ok,
        nl.

@dcnorris I had to do several changes in the original code from Eric Kafe (too much SWI-oriented). I also implemented the changes for the `term_expansion/2' solution and for the above solution. I can share them with you if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants