-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial strings: Padding (almost) never needed #385
Comments
Do you mean that, for example, if we have the string +--------+ |a0000000| +--------+ | []/0 | +--------+ we can "align" it so that only the terminating +--------+ |......a0| +--------+ | []/0 | +--------+ So, this would free the first 6 bytes in the cell (and at the most it frees 7 bytes for a string), is this would you mean? What would be the advantage of this, can we do anything with these cells? It seems we could store another very short (complete) string there, is this the intention? |
It is rather that processing such a more compact string becomes a tiny bit more efficient. |
As I understand it, as soon as |
Just space reduction (and thus cachelines and the like). As a worst case example consider |
... or as another worst case consider |
So we have: +--------+ |.......a| +--------+ |00000000| +--------+ | Xs | +--------+ Is this right? One of my questions is still: Why do we need the padding at all in this case, and in fact in any case? I.e., what about ( +--------+ |.......a| +--------+ |0.......| +--------+ | Xs | +--------+ Is there any reason to even care about the bytes that follow after My other question concerns the +--------+ |abcdefg0| +--------+ | Xs | +--------+ An unfortunate representation would be for example: +--------+ |....abcd| +--------+ |efg00000| +--------+ | Xs | +--------+ However, what exactly is Scryer Prolog expected to do now about strings to avoid padding? The above example seems to indicate that we should start with strings "early" in a cell, so that they do not "reach into" yet another cell at the end. On the other hand, we can also avoid padding by "pushing" strings to the right, so that for example instead of representing +--------+ |abc00000| +--------+ | []/0 | +--------+ we push it to the right and store it as: +--------+ |....abc0| +--------+ | []/0 | +--------+ But is there any advantage in preventing the padding here? It seems we cannot do anything with the "available" bytes in front of the string, so it seems the best strategy is to always start a string at the "start" of a cell, because this will make the most use of the available space in the cell. Is there an example where another strategy yields fewer occupied cells in total? |
Attempting to answer my own question, it seems that yes, we can indeed do something with the available space at the start, but indeed only if we do not "pad" at all, but stop after encountering the first +--------+ |xyz0abc0| +--------+ | []/0 | +--------+ I somehow doubt this is the intention here though? |
I never thought about that. Think of it:
So two cells now represent what would be otherwise 8. Factor of 4! Nice, but GC might become much more difficult.
When you start writing a partial string without knowing the actual length. For longer strings the padding might be a small overhead compared to computing the length first. (deleted text) |
I summarize the issue as I understand it:
The reason for (2) is subtle and was so far not explained in this issue. To see the advantage, suppose we store the string "abcdefgh" as: +--------+ |abcdefgh| +--------+ |0.......| +--------+ | []/0 | +--------+ It is better to store it instead "pushed to the right", as: +--------+ |.......a| +--------+ |bcdefgh0| +--------+ | []/0 | +--------+ This also takes 3 cells. But now we have an advantage: If the string prefix At least that's my understanding of this issue. Is this the intended advantage? |
Yes. But filling it with 0s makes sense, because this helps to detect invalid partial string pointers. That is, a partial string pointer that points to a 0 is invalid. Of course, that should never happen, but it is as useful for d-bugging as the Rust messages are. |
Whenever a partial string is created, its length is most often known. Thus there is no need for padding after the terminating 0-byte.
Except maybe for #251
The text was updated successfully, but these errors were encountered: