-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TaggedRef{T}
/ TaggedPtr{T}
/ ImmediateOrRef{T}
#47735
Comments
I have not checked since it's GPL source code, but I'd be very surprised if GMP (which we use for
What's the advantage over a bitfield or a primitive type to encode these? How would transparent decay from a |
E.g. However, I am not suggesting that There is a reason I am proposing
You are mistaken; no, it does not. This is one of multiple reasons why we don't use Julia's Besides, even if GMP did something of this kind, it would not help as much as a "native" tagged pointer type, as long as Julia invokes generic functions like Note: this technique is not something I made up. E.g. LISP implementations have used this for decades. Several computer algebra systems I work on or am familiar with use this technique (FLINT, GAP, Singular, ...). I know that Ruby and Java implementations use variations of this. Certainly many more. (One thing, though: I just checked Wikipedia, and they use "Tagged pointer" slightly differently than what I do here. Perhaps a different named would be better, e.g.
No. There are tons of applications where "most of the time" you have fairly small numbers, but "sometimes" they can get big, and you can't anticipate easily when which happens. Naive approaches like "first do all computations with Besides, even in code that "knows" big values will appear, having optimizations for small values can have tremendous payoff.
I am sure tagged pointers could be used for MPFR; but I am not interested in those myself, so I won't spend time debating this :-).
How would you use a bitfield or primitive type to encode a polynomial over a number field or over an arbitrary commutative ring?
Realistically, I am pretty sure |
Gotcha, I misunderstood your proposal then. I was under the impression this should be used for all things
Ah, I see :/ That's surprising.
I'm not suggesting you did, I'm certainly aware of existing work featuring these sorts of things :) I'm mostly aware of the linux kernel using a similar technique with the upper bits of pointers for page management. I just understood your proposal as a deeper modification than it truly is, that's all. I'm no stranger to the challenges a
My mistake, I only thought of polynomials with binary coefficients. The idea was to encode the existence of any given exponent in the polynomial as a bit in the bitfield (or a primitive type of the appropriate size, if we're in modular arithmetic), but that is of course exactly what large polynomials would use a Pure conjecture though, feel free to disregard - you've probably already explored that :) Out of curiosity, is having a julia wrapper type implementing this without special GC support not an option? As far as I'm aware, you're pretty much allowed to do whatever you want with Regardless, I think this would be more appropriate as part of an effort for custom allocators - from what I've read in discussions in other PRs, your usecase mostly deals with using such tagged pointers created outside of julia, correct? I personally think it would make sense to have a custom allocator responsible for memory management of these objects, but I'm admittedly out of my depth in terms of feasability of having that in julia (and/or it may just be me wanting that). My gut also says this shouldn't really need special GC support and it should work with a |
I like it. I think you can assume at least 3 tag bits also For 32 bit, I assume we would also make this object be 64-bit sized, to be most useful? |
Most likely that means it was not phrased as well as it should have been :-/. So it's good you asked, and hopefully things are a bit clearer now :-). Please keep asking.
Ah, very nice, and interesting. Well, I think it'll be difficult to match performance even for just, say, addition, in pure Julia -- one really wants access to machine instructions like add-with-carry. (Perhaps there is a way to get those using some advanced LLVM magic? That'd be quite cool.) But it definitely would be quite interesting to try how could one can get!
That's a major part, yes, but it's more (talking specifically about large integer handling here): on the fastpath (were the value is immediate), often a few fast CPU instructions can handle all. E.g. addition of two "immediate integers" can be done like this if we use that we have two bits for tagging, and assume bit 0 is 1 and bit 1 is 0: (using pseudo assembler)
Typically multivariate polynomials are indeed store sparsely, for those the "tagged pointer" / "immediate value" optimization is not that useful. The idea of using bitmasks for exponent vectors is indeed being used in the best code bases for multivariate polynomials, in various ways, e.g. also to speed up divisibility tests. But univariates often are stored densely. And also for matrices there are situations were spares storage is not useful (the matrix is "too dense"), yet many entries in the matrix are 0 or 1 (but of course *as polynomials), and here it can be beneficial if these values are stored "directly".
Exactly: I want to benefit from the GC, and be able to do these things with pure Julia code, too. Indeed, right now important tons of data types from C libraries, and use Ptr for these library, and even use tagging techniques there, too. But this is limited in what it can do; and it all comes with a price as well: we need to use finalizers extensively to manage the memory, and those are not free either. Since Julia's BigInt uses
No, not really. Well, perhaps that, too; but for me it's mostly about thinking forward and coming up with ways to replace some of that native / external code with pure Julia code. But I keep coming back to the lack of tagged pointers / immediate values. Without those, all kinds of optimizations are off the table. And I am not talking about something small in some cases. Take this admittedly somewhat silly and artificial example (I have real examples but they wouldn't make sense w/o explaining a lot of theory first): function foo(s::BigInt, x::BigInt)
for i in 1:x
s += i
end
return s
end
function foo2(s::BigInt, x::BigInt)
for i in 1:x
GMP.MPZ.add!(s,s,i)
end
return s
end
x = big(2)^20
@time foo(big(0), x)
@time foo(big(2)^60, x)
@time foo(big(2)^160, x)
@time foo2(big(0), x)
@time foo2(big(2)^60, x)
@time foo2(big(2)^160, x) Output:
Now compare this to a similar program in GAP, an interpreted language which uses an "immediate integer optimization": foo:=function(s, x)
local i;
for i in [1..x] do
s := s + i;
od;
return s;
end;
x := 2^20;
foo(0, 2^20); time;
foo(2^60, 2^20); time;
foo(2^160, 2^20); time; results in the following (the given timings are in milliseconds):
So this blows Julia out of the water, being 100 times faster, even when compared to Julia code which supposedly uses in-place addition. Admittedly, it is still that much faster when it can't resort to immediate integers for all values (the last example starts with an accumulator value of Also, I was a bit unfair: you may very well argue "but you added small integers, so you should have used
But this only because I knew beforehand that the values were small. In general code I don't have that luxury and must assume that in the worst case, all values are "big".
Well, any existing implementation I know of has dedicated support in the GC, because the GC needs to know what is a pointer and what is not. I've thought long and hard about this in the past two years, and I really don't see how to achieve this without support by the Julia kernel -- though it's very well possible I missed something, and I'd be happy to learn about a concrete way how to do it differently. We already got "foreign type" supported added to the Julia kernel of course, but those don't seem sufficient to me -- of course inside a foreign object I can do whatever I want, including storing tagged pointer. But to get to the foreign object, Julia already had to dereference a pointer. (Another problem is the lack of support for serializing foreign objects, see issue #46214). |
You're welcome! Always happy to look like a fool when it makes things clearer for the next person :)
Well, my naive version gets within 2x performance of GMP - I sadly don't have GAP installed, so I can't test that on my machine :/ I just pushed my local version to github, so a comparison of this specific benchmark on your machine would be welcome :)
so it seems like you've found the performance problem in my multiplication - thank you! Though to be fair, I have not optimized the library too extensively, it's just a "let me try to play with this" kind of deal :) Seems like I really do need to take a look at this hot loop with MCAnalyzer.jl, to figure out how to schedule the additions better and use a better Regardless, now that my misunderstanding is cleared up, this seems like a good idea 👍 |
I agree with @vtjnash that this is a nice API proposal and fills a big GAP for us. Making it UInt64 everywhere seems appealing since 64-bit integers aren't that terrible on 32-bit systems and otherwise it's hard to spare the bits there. I'm thinking about the other use for tagged pointers, which would be a string type that is effectively |
String7, but yes |
Right, our pointers aren't that fat (yet). |
I want this! Having short strings encoded as immediate values would be huge.
I like this name better. Another possibility Or it may be generalized to But seeing this, it's so close conceptually to
Viewing it as a storage optimization for Here's another speculative use case I was thinking about yesterday, before I saw this issue: I'm experimenting with adding a single new field to
This would allow us to store more detailed source location information and have it loaded lazily without overly bloating |
Yeah, I am wondering if we want it instead as field semantics, similar to |
I like this idea. Then we get to express it in the code as a On the other hand... what if fields with a |
Could we tie this approach into an efficient |
Not sure whether an efficient
performed the same as raw |
There's already an issue about how that version of |
That's a bit of a different case than I'm thinking of, since introducing the |
TaggedRef{T}
/ TaggedPtr{T}
TaggedRef{T}
/ TaggedPtr{T}
/ ImmediateOrRef{T}
Fields with |
I feel like there's some low level overlap between this and the discussion at #48883 - both cases seek to extend the Julia object model so that the GC can distinguish bits from pointers based on data which doesn't overlap with the bits part. One could think of |
Another thought: bitpacking things into pointers is definitely not portable across architectures (AVR for one doesn't require pointers to be even, as far as I know - you can happily load single bytes by incrementing a pointer by |
This is bumping up against the limits of my knowledge, so forgive me if this is a dumb question. I thought we use something similar for marking pointers in the garbage collector, so how does Julia work at all on such platforms? |
It doesn't, they are not supported :) Maybe they are some day in the far future. Even if they were, they wouldn't use our existing GC for memory management anyway. Just thought it would be good to mention that here, so someone looking through old issues at a later date has a breadcrumb of why a specific feature doesn't work in some settings (IF that ever comes to be). |
bitpacking into Julia references (the constraint on them being part of the Julia object model and not naked pointers is important here) is fine on all architectures. It is simply a constraint imposed by the Julia object model that must be respected by all implementations of the runtime. |
Do julia references have a defined size/model to work with? The issue I can imagine is that if we assume the lower e.g. 4 bits are free for usage in GC, we'd throw away about a quarter of the addressable space on AVR, since pointers there are only 16 bits wide. |
Julia won't support a 16 bit arch like AVR. I don't think discussing that any further is not helpful. |
One optimization I miss in Julia compared to other GC-based language implementations are "tagged pointers".
The problem
Consider the
BigInt
type: it is a lot slower thanInt
and takes up more memory; computing something likex * y
wherex=big(0); y=big(1);
requires dereferencing pointers, multiple loads from this indirect data, then an allocation and more writes. In contrast when x,y areInt
, 1-2 machine instructions suffice on most architectures.Tagged pointers / tagged reference
GC languages that work a lot with multi precision integers often compensate for this using tagged pointers: the idea is to store small instances of the type inside the pointer. To distinguish between a proper pointer, and a fake pointer containing data, a "tag" is added to the pointer. E.g. on modern architectures, pointers usually must be even, if not a multiple of a machine word; so divisible by 8 on a 64 bit architectures. (Well, at least pointers to objects; but also in general it tends to preferable to use such aligned pointers for performance reasons). The idea then is to use bit 0 (LSB) to indicate a tagged value: if it is 1 (thus the "pointer" has an odd value), we assume this is not an actual pointer, but rather it contains data. That way one can store up to 63 bites resp. 31 bits of "immediate" data.
There are many other applications for this besides multi precision integers:
BigFloat
Rational
: can store numerator and denominator directly if they are small, else pointer to full objectA possible Julia user interface to tagged pointers
My suggestion would be to add a new parametric type
TaggedRef{T}
(requiring some kernel support, see below). Functionally I naively imagine it would behave similarly to aUnion{Ref{T}, UInt}
e.g. for purposes of type inference.Methods that should exist (likely with different/better names -- I mostly care about the functionality that should be there, so that I can sketch example usage), where
tr isa TaggedRef{T}
:istagged(tr)
return true if the ref is tagged, else falsetr[]
acts like[]
for a normalRef{T}
if it is not tagged; if it is tagged, throw an exception (as with a ref that is not assigned)getrawvalue(tr)
returns the raw data underlyingtr
, including the tag bittr
is an actual pointer? though I am not sure that's needed (most likely it is faster if it doesn't do that)setrawvalue(tr, val::UInt)
-- set the raw value. Possibly should enforce that the tag bit is set, by usingval | 1
An example
Using a TaggedRef one can now implement fast(er) bigints. Say like this:
Thoughts on implementation
Some kernel support for this would be needed (else I wouldn't post this here but just implement it).
jl_tagged_ref_type
in the GC loop and in there, use the low bit to decide whether or not to queue the referenced object. Of course adding yet another top-level special case to that check isn't greatThe text was updated successfully, but these errors were encountered: