Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

callback support for checktime timer expiry #7803

Merged
merged 4 commits into from
Aug 30, 2019

Conversation

spoonincode
Copy link
Contributor

Change Description

When a transaction's checktimer timer expires, call a user specified function. This can allow potential future wasm runtimes to be notified of expiry in a very low overhead manner (they don't need to poll expiry).

This PR tries to KISS and just adds support for a single callback by storing two void*s in the platform_timer. Since the callback could be called within a signal handling context I didn't want to complicate manners with std::functions or anything that required resizing to support more than one.

Usage could be something like this:

super_fast_wasm_runtime::execute_action(apply_context& context, ...) {
   context.trx_context.transaction_timer.set_expiry_callback([](void* user) {
      //this code could be called from signal handling context!
      super_fast_wasm_runtime* self = (super_fast_wasm_runtime*)user;
      self->stop_wasm_execution();
   }, this);
   context.trx_context.checktime(); //catch any expiration that might have occurred before setting up callback

   //when wasm code is done, disable callback
   auto reset_expiry_cb = fc::make_scoped_exit([&tt=context.trx_context.transaction_timer](){tt.set_expiry_callback(nullptr, nullptr);});

   run_wasm();
}

You will notice what technically amounts to a very slight race condition here. If the timer expires at the very same time that the callback is attempting to be disabled (by calling set_expiry_callback(nullptr, nullptr)) it's conceivable that the callback will still be called even though it has been disabled at that point.

It's not clear what the best way is to solve that problem. The expiry code being run from a signal handling context ties my hands on many typical solutions. Any thoughts on this are welcome.

However, this issue isn't a problem for potential new wasm runtime given the current state of single threaded execution -- slight late execution of the callback even after disabling it doesn't cause a problem (as long it isn't so late that it is called after the next transaction's wasm starts executing, which would be absurd in practice). In fact, the only reason potential new wasm runtime needs to disable the callback at all is just to make platform_timer's assert() happy that no one is trying to pile up more than one callback.

Consensus Changes

  • Consensus Changes

API Changes

  • API Changes

Documentation Additions

  • Documentation Additions

result in an exception. Setting to nullptr is allowed as a "no callback" hint */
void set_expiry_callback(void(*func)(void*), void* user) {
EOS_ASSERT(!(func && _expiration_callback), misc_exception, "Setting a platform_timer callback when one already exists");
_expiration_callback = func;
Copy link
Contributor

@b1bart b1bart Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that there is more than just the one "Race" that you mentioned in the PR here. Given FuncT1 DataT1 FuncT2 DataT2 (where one pair may be nullptr) Depending on how the compiler orders these instructions on both set_expiry_callback and call_expiration_callback you can at least get any of the following cases:

if (FuncT1) FuncT1(DataT1);
if (FuncT1) FuncT2(DataT1)
if (FuncT1) FuncT1(DataT2)
if (FuncT1) FuncT2(DataT2)
if (FuncT2) FuncT2(DataT1)
if (FuncT2) FuncT2(DataT2)

I think I understand in that in the current use case where T2 is practically going to be nullptr that boils down to:
if (FuncT1) FuncT1(DataT1);
if (FuncT1) nullptr(DataT1)
if (FuncT1) FuncT1(nullptr)
if (FuncT1) nullptr(nullptr)
if (nullptr)...
if (nullptr)...

so, 2 out of 6 of those are crashes, 1 out of 6 calls FuncT1 in an unexpected way. 1 out of 6 calls FuncT1 "spuriously" (the noted race in the PR) and the remaining 2 operate as expected.

We can improve this dramatically if we can guarantee that the pair is atomically set and queried. Does the signal handling context allow us to use atomic primitives like CAS?

Or can we double buffer the function+data pair with some sort of data barrier and then "swap" the active buffer with a single index update?

@spoonincode
Copy link
Contributor Author

Good points. We could probably mitigate much of the reordering by simply asking the compiler not to. But, since we're dealing with two pointers here there will still be the possibility of mixing something up.

CAS should be safe here but I'm curious what construct you have in mind. Anything that spins, like a mutex, could result in a deadlock if the signal is delivered to the same thread that is trying to modify the pointers. This is potentially solvable by delivering the timer signal to a dedicated thread. But doing that is an abstraction nightmare -- in POSIX threads inherit the signal mask of the creating thread. This means we'd have to mask off the timer signal early in main() before net threads, chain threads, etc get created.

Another possibility here is to do an atomic read/write of 16 bytes probably via a CAS of 16 bytes. It's not clear to be how easily this is done in gcc/clang. For example, on x86_64 this is the CMPXCHG16B instruction. But on ARM8??

I tried compiling the following

   std::cout << std::atomic<__int128>::is_always_lock_free << std::endl;

This returns 1 and indeed if the -mcx16 option is used CMPXCHG16B is used on x86_64 when loading/storing to an atomic this way. However on ARM8 this is returning 0 for me and I'm not sure if the instruction set supports 128-bit atomics.

@b1bart
Copy link
Contributor

b1bart commented Aug 27, 2019

yeah, I'm not sure what the CAS construct would be that is safe. I think the safe "baseline" implementation may be double buffering as it should be pretty easy (maybe slow) to guarantee atomic updates of the pair and to put a memory barrier in on platforms that need it (so that the updates to the func and data precede the update to the index).

If we want to get fancy on platforms that have (pointer-bits * 2) bit atomic writes then, I'm okay with it. Having a dirt simple fallback is worthwhile.

@spoonincode
Copy link
Contributor Author

I took a stab at "double buffering"; which can be simplified because the code never lets you overwrite the callback with a different callback. This code still probably gets an F in a concurrency textbook but I believe it no longer has the defect where only 1 of the 2 variables may be set or weirdness due to reordering.

I had to use __sync constructs because std::atomic<> was always resolving to a function call for me 😠

@@ -32,11 +33,17 @@ struct platform_timer {
fc::fwd<impl,fwd_size> my;

void call_expiration_callback() {
if(_expiration_callback)
_expiration_callback(_expiration_callback_data);
if(__atomic_load_n(&_callback_enabled, __ATOMIC_CONSUME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt we'd ever see this as problematic given our use case but, as these are fences, not locks there is no guarantee that between line 36 and 37 and 38, we don't see different states of both _expiration_callback and _expiration_callback_data. Consider that if a thread "pauses" after each line and another calls the appropriate permutations of set_expiry_callback we could get the same set of problem cases as before and still have _callback_enabled reset by line 39.

I think something like a 4 stage atomic enumerated int (unset, writing, set, reading) with:

  • set_expiry_callback spinlock until it gets a definitive "set" and fails or can CAS an "unset" for a "writing" and upates the values with a memory fence atomic write to "set" at the end.
  • a specialization clear_expiry_callback spinlock until it gets a definitive "unset" and fails or can CAS a "set" for an "unset"
  • and a call_expiry_callback that doesnt spinlock it opportunistically tries once to CAS a "set" for a "reading" and then pull out the values, atomically write back to "set" with a fence and call the callback. Any other result from the CAS is treated as "unset".

reduces our possible damage to

  • the race between "releasing" the spinlock in call_expiry_callback and the consistent but perhaps stale callback being called. We could alleviate this by calling the callback holding the spin lock but that may very well deadlock.
  • the race between two call_expiry_callback invocations that cause one to erroneously fail as it treats a conflicting "reading" value as an "unset" and therefore does nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no guarantee that between line 36 and 37 and 38 ..

Yeah it's a variation of an ABA problem; but it would require someone to do set_callback(nullptr), set_callback(0x...) back to back right? Something that is pretty far fetched for our use case (it would effectively be an action ending and starting without anything in between).

I have no qualms about hardening this further; as the complexity increases it starts to smell more like something that should be improved with a 128-bit CAS for at least x86_64 though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something that should be improved with a 128-bit CAS for at least x86_64 though

As long as we have a base implementation that is safe, I support platforms dabbling with faster versions (which should also be safe) where available. In this case we just need 128bit atomic load/store right? It would effectively become last-write-wins but with the concession that we are only trying to guarantee consistency of callback/data pointers thats "safe"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, just need an atomic 128bit load/store to meet the same safety criteria it seems. Which surprisingly (to me) on x86_64 only means CMPXCHG16B -- from what I can from docs even wide SSE/AVX load/stores are not guaranteed to be atomic.

anyways, it's worth benchmarking the impact of the base impl before optimizing this immediately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once again, I've failed at PR review, you were right to need a 128 bit CAS as the result of the comparison allows you to maintain the expectation that you cannot overwrite an existing callback (that throws)

@@ -17,11 +17,12 @@ struct platform_timer {

/* Sets a callback for when timer expires. Be aware this could might fire from a signal handling context and/or
on any particular thread. Only a single callback can be registered at once; trying to register more will
result in an exception. Setting to nullptr is allowed as a "no callback" hint */
result in an exception. Setting to nullptr disables any current set callback */
void set_expiry_callback(void(*func)(void*), void* user) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remain consistent with the name of the member variable and the call_expiration_callback method this should probably be renamed to set_expiration_callback

bool _callback_enabled = false;
static_assert(__atomic_always_lock_free(sizeof(_callback_enabled), 0), "eosio requires an always lock free bool type");
void(*_expiration_callback)(void*);
void* _expiration_callback_data;
};
Copy link
Contributor

@heifner heifner Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a "carries a dependency" relationship here between _callback_enabled and _expiration_callback so __ATOMIC_CONSUME will not work correctly. Did you mean __ATOMIC_ACQUIRE ? With atomic acquire this should work just fine and the second __atomic_load_n is not needed.

This https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html indicates __ATOMIC_CONSUME is implemented as __ATOMIC_ACQUIRE anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/atomic/memory_order

The specification of release-consume ordering is being revised, and the use of memory_order_consume is temporarily discouraged. (since C++17)

Copy link
Contributor

@b1bart b1bart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I sent you down a rabbit hole and overcomplicated things with the 4-state suggestion. I apologize. It seems like a simple 2-state lock would be simpler to reason about and likely safer.

I like that the "read" in call_expiration_callback is optimistic and non-blocking/spinning.

the "write" in set_expiration_callback should probably just spin until it can CAS the state from "unlocked" to "locked", do all of the checks and possibly manipulation, store the state back to "unlocked" and then throw any errors it would have thrown in the middle. We get no benefit from the read/write paradigm I suggested and its muddied the waters significantly.

I plead guilty to the crimes of over-engineering and solutioning in PRs ☹️

callback_state current_state;
while((current_state = _callback_state.load(std::memory_order_acquire)) == READING) {} //wait for call_expiration_callback() to be done with it
if(func)
EOS_ASSERT(current_state == UNSET, misc_exception, "Setting a platform_timer callback when one already exists");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assert will sometimes fire and sometimes not, such that you cannot make assumptions in the remaining code that _callback_state is eitherSET or func is nullptr as the _callback_state can change between this line and later lines where it is modified.

return;

//prepare for loading in new values by moving from UNSET->WRITING
while(!atomic_compare_exchange_strong(&_callback_state, &current_state, WRITING))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble working out the control flow in here. There are ideally two expectations here: we are "clearing" the callback and the _callback_state is SET or we are "setting" the callback and _callback_state is UNSET`

After the first iteration of the loop, we change our expected _callback_state to UNSET even if we are going to clear. The code after is safe but wont this spinlock spin until something else clears the callback if it misses its first time through the loop (say if a callback fired at a bad time)?

@spoonincode
Copy link
Contributor Author

simpler to reason about and likely safer

At this point it's probably easier & safer to just use the threaded approach. Since we could safely use a real lock it would allow solving all problems (even prevent late firing of the callback).

@spoonincode
Copy link
Contributor Author

Unfortunately using a dedicated thread for signal handling causes high variation in timer quality. Probably due to a dormant thread needing to be completely woke up and scheduled when the timer expires. No go here.

@spoonincode spoonincode merged commit 53e19b9 into develop Aug 30, 2019
@spoonincode spoonincode deleted the checktime_timer_callback branch August 30, 2019 14:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants