Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial string leak #1714

Open
UWN opened this issue Jan 26, 2023 · 11 comments
Open

Partial string leak #1714

UWN opened this issue Jan 26, 2023 · 11 comments

Comments

@UWN
Copy link

UWN commented Jan 26, 2023

p([a,b,c,d|Nihil]) :-
    Nihil = [].

q([A,B,C,D|Nihil]) :-
    Nihil = [],
    A = a,
    B = b,
    C = c,
    D = d.

loop(Xs) :-
    l(Xs),
    loop(Xs).

l([]).
l([_|Es]) :-
    l(Es).

?- q(L), loop(L).
   loops.  % good, as expected! I'm watching this with top
?- p(L), loop(L).
memory allocation of 4194304 bytes failed

Aborted (core dumped)
~/scryer$ ulimit -v
200000

Simply going through the term (i.e. partial list) should not require any extra memory.

*/

@UWN
Copy link
Author

UWN commented Mar 1, 2023

Is there hope that this gets fixed soon? That is, prior to GC?

@UWN
Copy link
Author

UWN commented Jul 1, 2023

Current minimal limit to run this:

ulimit -v 250000

@UWN
Copy link
Author

UWN commented Dec 22, 2023

Please note that just implementing this would be also a big step.

@bakaq
Copy link
Contributor

bakaq commented Dec 24, 2023

Is there hope that this gets fixed soon? That is, prior to GC?

As I understand reading through the code, this will only happen with GC, because partial strings are allocated in the atom table. One thing that could be done is to allocate partial strings in another place. An "partial string table" would have the same problem, but maybe they can be allocated in the heap (and therefore avoid this problem completely)? I'm not sure, can you think of any cases where a partial string can't be allocated in the heap?

@bakaq
Copy link
Contributor

bakaq commented Dec 24, 2023

One place that comes to mind is when using something like bb_put/2, like I did in the implementation of phrase_from_stream/2. In that case the partial string should not be backtracked, so it needs to be stored somewhere "permanent", like the atom table. If this is the only case (or one of the few cases) where this happens, we could allocate partial strings in the heap by default and move them to the atom table when needed for something like this. This would solve the problem in this issue as far as I understand, but phrase_from_stream/2 would still leak until GC is implemented.

@bakaq
Copy link
Contributor

bakaq commented Dec 24, 2023

I think, however, that the copying from heap to the atom table could be very expensive, and that would make things such as phrase_from_stream/2 artificially bad. In the long term I think having partial strings be garbage collected seems like the best of all worlds, so maybe it's not worth the effort to solve this for the short term. Or maybe this indicates that this "workaround" should be implemented in a way that is easy to undo when we have GC.

@UWN
Copy link
Author

UWN commented Dec 24, 2023

The atom table is currently used. But it should not be used. See #24

@bakaq
Copy link
Contributor

bakaq commented Dec 28, 2023

From what I understood of #24, the partial strings are supposed to be allocated in the heap then? But what about my concerns about cases where the string shouldn't be backtracked? Am I misunderstanding how the heap works?

@bakaq
Copy link
Contributor

bakaq commented Dec 28, 2023

Ok, going through the code it seems I'm not misunderstanding how the heap works. I think I now know what a Ball does though, which is doing that "copying to somewhere more permanent" thing so that stuff like bb_put/2 can work (please correct me if I'm wrong, because Ball is the most confusing type in this whole codebase for me).

@mthom
Copy link
Owner

mthom commented Dec 29, 2023

yes, that's all Ball is, a container for a heap term whose references to other heap cells have to be aligned to its absolute position in the heap.

@UWN
Copy link
Author

UWN commented Dec 29, 2023

The idea is that partial strings are a compact replacement for list of characters. That is, a program with partial strings should at best compress terms by a factor of 16 (or 24, would '.'/2 be represented by a structure); and in the worst case (a partial string of length one) they use as much space as a term without that optimization. But there is never a case where that representation uses more space (1). That is, it is always better to replace a term '.'(Nonnull_character, Any) by a partial string. With this compact representation it makes sense to represent also the character representation of atoms with partial strings (as long as they only contain nonnull characters).


1 There is one case where the representation may be less efficient: A dynamically allocated partial string that ends within the next word. That is the 0 is the only character in the next word. This can only happen if the partial string was initially longer, and the prefix up to the last character has become garbage. Thus GC should take care of this in the same manner as GC should that care of an unused list of characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants