-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal representation of atoms #908
Comments
Originally, I introduced the optional value to pass operator information along to the printer, so that operator terms could be printed correctly. As unlikely as this may sound, the same problem (that of supporting dynamic operator changes) occurred to me a few hours after I opened the
I also expanded If If
The values of the
This new design allows operator definitions to change dynamically without our needing to update the value of each |
I think it would take only a single hash lookup to find out the configured precedence of the operator. This will likely be completely negligible compared to the time it takes to actually write a term on the terminal or to a file, and it seems better to keep operators separate from the internal data representation: Operators are only for syntax, needed only to read and write terms, and should not change the internal representation in any way. If the operator state is encoded together with atoms, there will be a lot of duplication when operators change even though the terms are completely the same semantically, and should be merged to allow faster tests with Internally, identical terms should always be shared to enable fast checks and to use memory efficiently. Ideally, as much as possible can internally then be reduced to comparing pointers (to cells). In general, I think it would be important to aim for a design that conceptually allows compound terms with unbounded arity, or which can at least be feasibly extended later in this way. The reason is that such terms allow O(1) access to individual arguments (using |
I'm not sure how we'd go about supporting compound terms with unbounded arity. Term arity is closely connected to the (necessarily finite) number of registers in the WAM design. We would have to go beyond the WAM. If the atom of a 1-arity term is defined as both a prefix and postfix operator, how do we know which was originally intended when printing back the term? Is it okay to get that wrong? (I always assumed the answer to be no, but perhaps it's yes?) I think this was another of the original considerations that motivated EDIT: I've reviewed the standard and convinced myself that it's ok to print a principal functor using whatever operator information is current. I'm going to go ahead with your proposed alternative. Operator information will not be encoded in the atom. The 10-bit arity encoding for functors will be retained for now, though. We can consider how to implement unbounded functor arities in the future. |
As to the prefix/postfix question see conformity_testing#201. It makes sense to be consistent and stick to one or the other, but if you insist, also It is good to remove operator information from the atom table. It so rarely is used - there are only a couple of operators but many more atoms whose entry would be needlessly bloated. Further, it might be a future consideration to make operators module local. Some implementations do this like SWI, YAP, Eclipse. As for functors, For the representation of functors, a small one (for 1..4 or so), and a very large one might be the best, both with O(1) access, indeed. (This would not complicate unification) |
@UWN @triska I assume the functor cell would be one of the component cells of the functor it extends? Since a functor cell never occurs as a component of a functor elsewhere in the WAM, it would be obvious that it extends the arguments. |
I do not understand this statement: A functor occurs as a component for example in nested terms such as Also, I think it is important to distinguish between arity of terms on the heap, and arity of predicates which must be compiled. I think the arity of a term on the heap can easily be much higher than 1024, can it not? This is because a cell that stores a term needs to be tagged as a compound term (using a few bits), and the rest of the cell is simply a pointer to the functor table that stores the functor name and the arity of the term, and that arity can be as high as you want. So, it should be absolutely no problem to create a huge term on the heap for example with Regarding the number of predicate arguments: As I understand it, @UWN outlines essentially a source tranformation that does not need any changes of the WAM. Instead of invoking a predicate with a huge number of arguments, we invoke an auxiliary predicate with a smaller number of arguments where the final argument is a compound term (with high arity) that stores the arguments for which we do not have enough registries. This can always be implemented at a later stage. Note that the maximum arity of term on the heap is the much more important factor in practice, because a term with many arguments can be used as an efficient array (using |
I'll clarify by stepping through the heap representation of
The functor cell I took Ulrich's answer to mean that we could limit the arity of functor cells to 4 or 5 bits, and extend the arity if necessary by appending a specialized extender cell specifying the number of remaining cells. The remaining arity would be encoded as a 57 bit unsigned integer (deducting 6 bits for the cell tag and 1 bit for marking). Indexing would still be O(1), of course. |
Yes, something of that kind. What is in any case important that these big structures do not need to be supported by WAM instructions at all, functor/3 and arg/3 are good enough for them ( and univ). So very few places need to know them in detail. |
@UWN: Could you please briefly expand on how the heap should be organized to enable sharing of terms: From what I remember, a good organization would start with the "permanent" atoms at the bottom of the heap so that strings can share the atom names by simply pointing to the name of the atom. Is there anything else to note about the heap organization that should be taken into account? For example, |
These are two different issues. 1mo, the handling of atom indices. Currently #992 suggests that there are still two different kinds of 2do, the actual overall heap organization. Here, some (gradual) improvements are possible. Those improvements can be realized without much consequence for most of the code. So it might be of interest to consider the following organization. Starting with the "oldest".
Such an organization would also help to speed up many operations. For example testing for |
This representation is most excellently improved in the |
In the WAM, an atom is a pointer (index) to an entry in the atom table.
However, in
machine_indices.rs
, an atom seems to be represented asAtom(ClauseName, Option<SharedOpDesc>)
on the heap.My question is: Why is
SharedOpDesc
involved here? The operators are only used when parsing and emitting a term, and can also be changed dynamically. They should play no role when internally representing an atom.With great interest, I see from Robbepop/modular-bitfield#64 that a new heap representation is now being considered, where we have:
Also here, the question arises: Why are
prec
andspec
stored together with the atom? What doesm
indicate?Ideally, the name of an atom is internally stored in the same way as strings (#24), i.e., using raw bytes, in the atom table. If this is the case, then
atom_chars/2
can be implemented in O(1) to obtain the atom name as a list of characters.The text was updated successfully, but these errors were encountered: