-
-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More efficient object finalising implementation #1638
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently switched from the C standard library assert
to our custom pony_assert
. I've left comments on the assert
s that need to be changed.
src/libponyrt/mem/heap.c
Outdated
p = chunk->m + (bit << HEAP_MINBITS); | ||
|
||
// run finaliser | ||
assert((*(pony_type_t**)p)->final != NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be pony_assert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Praetonus thanks for catching these. I should have caught that but lost track of it due to copy/paste from the sendence ponyc fork.
I'll fix these and the CI failures.
src/libponyrt/mem/heap.c
Outdated
p = chunk->m + (bit << HEAP_MINBITS); | ||
|
||
// run finaliser | ||
assert((*(pony_type_t**)p)->final != NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be pony_assert
.
5683f1d
to
01ee0f2
Compare
I think this should be split into two PRs, especially since the changes don't seem to have a strict dependency relationship to eachother (for example, the finaliser change doesn't seem to strictly depend on having more customizable benchmark facilities, even though it did help you to confirm the approach). I'm also inclined to say the |
@jemc I'm okay with splitting this PR into two PRs. I only made the
I modeled the If the Regardless, I'll split this PR and/or open the RFC once other folks have had a chance to weigh in (no point in opening another PR if the decision is to go through an RFC first). |
Yeah, I agree that sounds to me like the good topic of an RFC discussion. |
01ee0f2
to
387f086
Compare
@jemc I've updated this PR to remove the |
@Praetonus @jemc Are there any other comments, questions or changes for this PR before it's merged? I'll try and get the |
I'm not sure about the benchmark added to |
@Praetonus I understand your concern and agree that there should be another place for the benchmarks. However, the only other Pony based benchmarks in the repo are part of Any suggestions for where to put this benchmark since it's not testing a specific |
Maybe it could be implemented in C and added to |
387f086
to
e2ae104
Compare
@Praetonus I've updated the PR to remove the changes to I'll work on creating a benchmark in C that exercises the same functionality to add into |
Sorry to be a pain, @dipinhora , but I think another set of changes is needed. In However, whether a finaliser is present or not is known statically when the memory allocation occurs, so we don't need to pay this cost. If we separate out functions that set the |
src/libponyrt/mem/heap.c
Outdated
@@ -225,6 +315,10 @@ void* ponyint_heap_alloc_small(pony_actor_t* actor, heap_t* heap, | |||
m = chunk->m + (bit << HEAP_MINBITS); | |||
chunk->slots = slots; | |||
|
|||
// note that a finaliser needs to run | |||
if(has_finaliser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch impacts all allocations adversely.
src/libponyrt/mem/heap.c
Outdated
@@ -237,6 +331,12 @@ void* ponyint_heap_alloc_small(pony_actor_t* actor, heap_t* heap, | |||
n->m = (char*) POOL_ALLOC(block_t); | |||
n->size = sizeclass; | |||
|
|||
// note that a finaliser needs to run | |||
if(has_finaliser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one too.
src/libponyrt/mem/heap.c
Outdated
@@ -265,6 +366,12 @@ void* ponyint_heap_alloc_large(pony_actor_t* actor, heap_t* heap, size_t size) | |||
chunk->slots = 0; | |||
chunk->shallow = 0; | |||
|
|||
// note that a finaliser needs to run | |||
if(has_finaliser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one for large allocations.
e2ae104
to
373daae
Compare
@sylvanc No problem at all and good catch on the extra/unnecessary branch. I didn't even notice it. 8*/ I've eliminated branch by converting the Please let me know if you find anything else that needs fixing. |
4ccafba
to
362d1d0
Compare
Ah I see what you did there :) But what I meant was, just like how in |
97b1943
to
21d37aa
Compare
Thanks to Sylvan and Andy Turley for their ideas. * The old finaliser implementation used the object hashmap to keep track of finalisers that needed to be run. This was not ideal because while the hashmap provides constant time operations, the constant time was still much larger than the time for a normal no finaliser allocation. Additionally, keeping track of finalisers in the object hashmap meant that every object with a finaliser would be added to the object hashmap even if it was only transient and never sent to another actor. This, once again, was different from normal allocations where the objects wouldn't be added to the hashmap until they were sent to another actor. The benchmark using ponybench showed that objects with finalisers were about 1 order of magnitude slower than objects without finalisers due to the overhead of using the object hashmap for tracking them. * The new finaliser implementation keeps the finaliser information in the chunk where the memory was allocated from. This is exactly the same as how non-finaliser allocations are tracked except for the additional work to keep track of the finaliser. The resulting benefit is that objects with finalisers will only get added to the object hashmap under the same circumstances as objects without finalisers. This gives us an increase in performance by 1 order of magnitude so that now objects with finalisers have the same allocation performance as objects without finalisers. * Keep a finaliser bitmap in chunk_t instead of an array of finaliser pointers. Run the finaliser from the pony_type_t->final_fn instead of storing/using the function passed in to pony_alloc_final. * Add pony_alloc_small_final and pony_alloc_large_final functions to avoid having to go through a branch and another function call to allocate memory with a finaliser. * Update compiler to call the appropriate one of pony_alloc_small_final or pony_alloc_large_final instead of pony_alloc_final.
373daae
to
c574c26
Compare
@sylvanc changes made to add functions Also, one minor note, there is still a tiny overhead with the new implementation of alloc (without a finaliser) as compared to the old code because the new code needs to set |
@sylvanc Any additional suggestions or comments before this can be merged? |
Nice! Thanks @dipinhora for bearing with all the little changes on this one. |
…segfault (#4522) A long time ago, some rando named "Dipin Hora" decided to be clever and rework how objects with finalisers are stored/garbage collected in actor heaps (see: #1638). Unfortunately for us all, he wasn't nearly as clever as he thought and introduced a use after free bug that only occurs when finalisers have logic to reference other objects that might have already been freed and reused during the same garbage collection process. Luckily for us, a smarterer rando who also happens to be named "Dipin Hora" has come along to save the day. This commit adds in tests to reproduce the bug and rework the actor heap garbage collection logic to make sure this issue can no longer occur by making sure that: * finalisers can no longer re-use memory that might be freed earlier in the garbage collection process * heap chunks are only freed/destroyed after all finalisers have been run to ensure all cross object references in finalisers remain valid at the time the finalisers are run
This PR includes 2 changes
Ponybench arguments + finaliser benchmarks
This commit enhances
ponybench
to useoptions
to takeoptional command line arguments to change
autobench_time
and
autobench_max_ops
dynamically at runtime.This commit also adds benchmarks for objects with and without
finalisers to
minimal-cates/finalisers
.More efficient object finalising implementation
Thanks to @sylvanc and @aturley for their ideas.
track of finalisers that needed to be run. This was not ideal
because while the hashmap provides constant time operations,
the constant time was still much larger than the time for a normal
no finaliser allocation. Additionally, keeping track of finalisers
in the object hashmap meant that every object with a finaliser
would be added to the object hashmap even if it was only transient
and never sent to another actor. This, once again, was different
from normal allocations where the objects wouldn't be added to
the hashmap until they were sent to another actor. The benchmark
using ponybench showed that objects with finalisers were about
1 order of magnitude slower than objects without finalisers due
to the overhead of using the object hashmap for tracking them.
in the chunk where the memory was allocated from. This is exactly
the same as how non-finaliser allocations are tracked except for
the additional work to keep track of the finaliser. The resulting
benefit is that objects with finalisers will only get added to the
object hashmap under the same circumstances as objects without
finalisers. This gives us an increase in performance by 1 order of
magnitude so that now objects with finalisers have the same allocation
performance as objects without finalisers.
pointers. Run the finaliser from the pony_type_t->final_fn instead
of storing/using the function passed in to pony_alloc_final.
to avoid having to go through a branch and another function call
to allocate memory with a finaliser.
or pony_alloc_large_final instead of pony_alloc_final.
Future work:
update pony_alloc, pony_alloc_small, pony_alloc_large to take
a boolean as to whether a finaliser exists for the type or not.
This would also require changes to the compiler to generate the
appropriate boolean true/false for when a finaliser exists or not.
Benchmarks before:
Benchmarks after:
Please let me know if there are any questions or concerns or if any of the changes need to go through an RFC.
Also, as always, feedback/suggestions for improvements welcome.