Skip to content

Commit

Permalink
i#1369: Use synch flush callback to enable drcachesim tracing. (#4491)
Browse files Browse the repository at this point in the history
DR translates a fault in the code cache to a fault at the corresponding application address. This is done using ilist reconstruction for the fragment where the fault occurred.

But, this does not work as expected when the DR client changes instrumentation during execution; currently, drcachesim does this to enable tracing after -trace_after_instrs. The reconstructed basic block gets the new instrumentation whereas the one in code cache has the old one. This causes issues during fault handling.

In the current drcachesim case, it appears as though a meta-instr has faulted because the reconstructed ilist has a meta-instr at the code cache fault pc. This issue may manifest differently if the basic block with the new instrumentation is smaller than the old one (unlike the drcachesim 'meta-instr faulted' case) and the faulting address lies beyond the end of the new instrumented basic block. We may see an ASSERT_NOT_REACHED due to the ilist walk ending before the faulting code cache pc was found in the reconstructed ilist.

In the existing code, drcachesim attempts to avoid this by flushing old fragments using dr_unlink_flush_region after it switches to the tracing instrumentation. However, due to the flush being asynch, there's a race and the flush does not complete in time.

This PR adds support for a callback in the synchronous dr_flush_region API. The callback is executed after the flush but before the threads are resumed.

Using the dr_flush_region callback to change drcachesim instrumentation ensures that old instrumentation is not applied after the flush and the new one is not applied before.

Fixes: #1369
  • Loading branch information
abhinav92003 authored Oct 30, 2020
1 parent 7a5d3e1 commit 893c06c
Show file tree
Hide file tree
Showing 13 changed files with 155 additions and 66 deletions.
12 changes: 7 additions & 5 deletions api/docs/bt.dox
Original file line number Diff line number Diff line change
Expand Up @@ -981,15 +981,17 @@ dr_insert_cbr_instrumentation()
\subsection sec_adaptive Dynamic Instrumentation

DynamoRIO allows a client to dynamically adjust its instrumentation
by providing routines to flush all cached fragments corresponding to
an application code region:
by providing a routine to flush all cached fragments corresponding to
an application code region and register (or unregister) instrumentation
event callbacks:

\code
dr_flush_region()
dr_unlink_flush_region()
dr_delay_flush_region()
dr_flush_region_ex()
\endcode

The client should provide a callback to this routine, that unregisters
old instrumentation event callbacks, and registers new ones.

In order to directly modify the instrumentation on a particular fragment
(as opposed to replacing instrumentation on all copies of fragments
corresponding to particular application code), DynamoRIO also supports
Expand Down
3 changes: 3 additions & 0 deletions api/docs/release.dox
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,9 @@ Further non-compatibility-affecting changes include:
- Added opnd_create_immed_double(), opnd_get_immed_double() and
opnd_is_immed_double() to enable the creation and handling of double
precision floating-point operands.
- Added dr_flush_region_ex API that accepts a callback to be executed after synch
flush but before the threads are resumed. The existing dr_flush_region API
is modified to invoke dr_flush_region_ex with a NULL callback.

**************************************************
<hr>
Expand Down
57 changes: 39 additions & 18 deletions clients/drcachesim/tracer/tracer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1316,13 +1316,14 @@ event_kernel_xfer(void *drcontext, const dr_kernel_xfer_info_t *info)
*/

static uint64 instr_count;
static volatile bool tracing_enabled;
static void *enable_tracing_lock;
static bool tracing_enabled;
static volatile bool tracing_scheduled;
static void *schedule_tracing_lock;

#ifdef X86_64
# define DELAYED_CHECK_INLINED 1
#else
// XXX: we don't have the inlining implemented yet.
// XXX i#4487: we don't have the inlining implemented yet.
#endif

static dr_emit_flags_t
Expand All @@ -1345,7 +1346,7 @@ enable_delay_instrumentation()
if (!drmgr_register_bb_instrumentation_event(
event_delay_bb_analysis, event_delay_app_instruction, &memtrace_pri))
DR_ASSERT(false);
enable_tracing_lock = dr_mutex_create();
schedule_tracing_lock = dr_mutex_create();
}

static void
Expand All @@ -1358,8 +1359,8 @@ disable_delay_instrumentation()
static void
exit_delay_instrumentation()
{
if (enable_tracing_lock != NULL)
dr_mutex_destroy(enable_tracing_lock);
if (schedule_tracing_lock != NULL)
dr_mutex_destroy(schedule_tracing_lock);
#ifdef DELAYED_CHECK_INLINED
drx_exit();
#endif
Expand All @@ -1379,28 +1380,46 @@ enable_tracing_instrumentation()
}

static void
hit_instr_count_threshold()
change_instrumentation_callback(void *unused_user_data)
{
NOTIFY(0, "Hit delay threshold: enabling tracing.\n");
disable_delay_instrumentation();
enable_tracing_instrumentation();
}

static void
hit_instr_count_threshold(app_pc next_pc)
{
bool do_flush = false;
dr_mutex_lock(enable_tracing_lock);
if (!tracing_enabled) { // Already came here?
NOTIFY(0, "Hit delay threshold: enabling tracing.\n");
disable_delay_instrumentation();
enable_tracing_instrumentation();
dr_mutex_lock(schedule_tracing_lock);
if (!tracing_scheduled) {
do_flush = true;
tracing_scheduled = true;
}
dr_mutex_unlock(enable_tracing_lock);
if (do_flush && !dr_unlink_flush_region(NULL, ~0UL))
dr_mutex_unlock(schedule_tracing_lock);

if (do_flush) {
if (!dr_flush_region_ex(NULL, ~0UL, change_instrumentation_callback,
NULL /*user_data*/))
DR_ASSERT(false);

dr_mcontext_t mcontext;
mcontext.size = sizeof(mcontext);
mcontext.flags = DR_MC_ALL;
dr_get_mcontext(dr_get_current_drcontext(), &mcontext);
mcontext.pc = next_pc;
dr_redirect_execution(&mcontext);
DR_ASSERT(false);
}
}

#ifndef DELAYED_CHECK_INLINED
static void
check_instr_count_threshold(uint incby)
check_instr_count_threshold(uint incby, app_pc next_pc)
{
instr_count += incby;
if (instr_count > op_trace_after_instrs.get_value())
hit_instr_count_threshold();
hit_instr_count_threshold(next_pc);
}
#endif

Expand Down Expand Up @@ -1450,7 +1469,8 @@ event_delay_app_instruction(void *drcontext, void *tag, instrlist_t *bb, instr_t
}
MINSERT(bb, instr, INSTR_CREATE_jcc(drcontext, OP_jl, opnd_create_instr(skip_call)));
dr_insert_clean_call(drcontext, bb, instr, (void *)hit_instr_count_threshold,
false /*fpstate */, 0);
false /*fpstate */, 1,
OPND_CREATE_INTPTR((ptr_uint_t)instr_get_app_pc(instr)));
MINSERT(bb, instr, skip_call);
if (scratch != DR_REG_NULL) {
if (drreg_unreserve_register(drcontext, bb, instr, scratch) != DRREG_SUCCESS)
Expand All @@ -1463,7 +1483,8 @@ event_delay_app_instruction(void *drcontext, void *tag, instrlist_t *bb, instr_t
// XXX: drx_insert_counter_update doesn't support 64-bit, and there's no
// XINST_CREATE_load_8bytes. For now we pay the cost of a clean call every time.
dr_insert_clean_call(drcontext, bb, instr, (void *)check_instr_count_threshold,
false /*fpstate */, 1, OPND_CREATE_INT32(num_instrs));
false /*fpstate */, 2, OPND_CREATE_INT32(num_instrs),
OPND_CREATE_INTPTR((ptr_uint_t)instr_get_app_pc(instr)));
#endif
return DR_EMIT_DEFAULT;
}
Expand Down
4 changes: 3 additions & 1 deletion core/dispatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -902,7 +902,9 @@ dispatch_enter_dynamorio(dcontext_t *dcontext)
app_pc begin = (app_pc)dcontext->local_state->spill_space.r2;
app_pc end = (app_pc)dcontext->local_state->spill_space.r3;
dcontext->next_tag = (app_pc)dcontext->local_state->spill_space.r4;
flush_fragments_from_region(dcontext, begin, end - begin, true);
flush_fragments_from_region(dcontext, begin, end - begin, true,
NULL /*flush_completion_callback*/,
NULL /*user_data*/);
}
#endif

Expand Down
19 changes: 14 additions & 5 deletions core/fragment.c
Original file line number Diff line number Diff line change
Expand Up @@ -5654,13 +5654,15 @@ process_client_flush_requests(dcontext_t *dcontext, dcontext_t *alloc_dcontext,
/* FIXME - for implementation simplicity we do a synch-all flush so
* that we can inform the client right away, it might be nice to use
* the more performant regular flush when possible. */
flush_fragments_from_region(dcontext, iter->start, iter->size,
true /*force synchall*/);
flush_fragments_from_region(
dcontext, iter->start, iter->size, true /*force synchall*/,
NULL /*flush_completion_callback*/, NULL /*user_data*/);
(*iter->flush_callback)(iter->flush_id);
} else {
/* do a regular flush */
flush_fragments_from_region(dcontext, iter->start, iter->size,
false /*don't force synchall*/);
flush_fragments_from_region(
dcontext, iter->start, iter->size, false /*don't force synchall*/,
NULL /*flush_completion_callback*/, NULL /*user_data*/);
}
}
HEAP_TYPE_FREE(alloc_dcontext, iter, client_flush_req_t, ACCT_CLIENT,
Expand Down Expand Up @@ -6877,10 +6879,13 @@ flush_fragments_and_remove_region(dcontext_t *dcontext, app_pc base, size_t size

/* Flushes fragments from the region without any changes to the exec list.
* Does not free futures and caller can't be holding the initexit lock.
* Invokes the given callback after flushing and before resuming threads.
* FIXME - add argument parameters (free futures etc.) as needed. */
void
flush_fragments_from_region(dcontext_t *dcontext, app_pc base, size_t size,
bool force_synchall)
bool force_synchall,
void (*flush_completion_callback)(void *user_data),
void *user_data)
{
/* we pass false to flush_fragments_in_region_start() below for owning the initexit
* lock */
Expand All @@ -6891,6 +6896,10 @@ flush_fragments_from_region(dcontext_t *dcontext, app_pc base, size_t size,
flush_fragments_in_region_start(dcontext, base, size, false /*don't own initexit*/,
false /*don't free futures*/, false /*exec valid*/,
force_synchall _IF_DGCDIAG(NULL));
if (flush_completion_callback != NULL) {
(*flush_completion_callback)(user_data);
}

flush_fragments_in_region_finish(dcontext, false);
}

Expand Down
4 changes: 3 additions & 1 deletion core/fragment.h
Original file line number Diff line number Diff line change
Expand Up @@ -1141,7 +1141,9 @@ flush_fragments_and_remove_region(dcontext_t *dcontext, app_pc base, size_t size

void
flush_fragments_from_region(dcontext_t *dcontext, app_pc base, size_t size,
bool force_synchall);
bool force_synchall,
void (*flush_completion_callback)(void *user_data),
void *user_data);

void
flush_fragments_custom_list(dcontext_t *dcontext, fragment_t *list,
Expand Down
31 changes: 24 additions & 7 deletions core/lib/instrument.c
Original file line number Diff line number Diff line change
Expand Up @@ -7171,10 +7171,12 @@ DR_API
/* Flush all fragments that contain code from the region [start, start+size).
* Uses a synchall flush to guarantee that no execution occurs out of the fragments
* flushed once this returns. Requires caller to be holding no locks (dr or client) and
* to be !couldbelinking (xref PR 199115, 227619). Caller must use
* to be !couldbelinking (xref PR 199115, 227619). Invokes the given callback after the
* flush completes and before threads are resumed. Caller must use
* dr_redirect_execution() to return to the cache. */
bool
dr_flush_region(app_pc start, size_t size)
dr_flush_region_ex(app_pc start, size_t size,
void (*flush_completion_callback)(void *user_data), void *user_data)
{
dcontext_t *dcontext = get_thread_private_dcontext();
CLIENT_ASSERT(!standalone_library, "API not supported in standalone mode");
Expand All @@ -7198,20 +7200,34 @@ dr_flush_region(app_pc start, size_t size)
"dr_flush_region: caller owns a client "
"lock or was called from an event callback that doesn't support "
"calling this routine; see header file for restrictions.");
CLIENT_ASSERT(size != 0, "dr_flush_region: 0 is invalid size for flush");
CLIENT_ASSERT(size != 0, "dr_flush_region_ex: 0 is invalid size for flush");

/* release build check of requirements, as many as possible at least */
if (size == 0 || is_couldbelinking(dcontext))
if (size == 0 || is_couldbelinking(dcontext)) {
(*flush_completion_callback)(user_data);
return false;
}

if (!executable_vm_area_executed_from(start, start + size))
if (!executable_vm_area_executed_from(start, start + size)) {
(*flush_completion_callback)(user_data);
return true;
}

flush_fragments_from_region(dcontext, start, size, true /*force synchall*/);
flush_fragments_from_region(dcontext, start, size, true /*force synchall*/,
flush_completion_callback, user_data);

return true;
}

DR_API
/* Equivalent to dr_flush_region_ex, without the callback. */
bool
dr_flush_region(app_pc start, size_t size)
{
return dr_flush_region_ex(start, size, NULL /*flush_completion_callback*/,
NULL /*user_data*/);
}

DR_API
/* Flush all fragments that contain code from the region [start, start+size).
* Uses an unlink flush which guarantees that no thread will enter a fragment that was
Expand Down Expand Up @@ -7259,7 +7275,8 @@ dr_unlink_flush_region(app_pc start, size_t size)
if (!executable_vm_area_executed_from(start, start + size))
return true;

flush_fragments_from_region(dcontext, start, size, false /*don't force synchall*/);
flush_fragments_from_region(dcontext, start, size, false /*don't force synchall*/,
NULL /*flush_completion_callback*/, NULL /*user_data*/);

return true;
}
Expand Down
9 changes: 9 additions & 0 deletions core/lib/instrument_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -6228,10 +6228,19 @@ DR_API
* \note Use \p size == 1 to flush fragments containing the instruction at address
* \p start. A flush of \p size == 0 is not allowed.
*
* \note Use flush_completion_callback to specify logic to be executed after the flush
* and before the threads are resumed. Use NULL if not needed.
*
* \note As currently implemented, dr_delay_flush_region() with no completion callback
* routine specified can be substantially more performant.
*/
bool
dr_flush_region_ex(app_pc start, size_t size,
void (*flush_completion_callback)(void *user_data), void *user_data);

DR_API
/** Equivalent to dr_flush_region_ex(start, size, NULL). */
bool
dr_flush_region(app_pc start, size_t size);

/* FIXME - get rid of the no locks requirement by making event callbacks !couldbelinking
Expand Down
4 changes: 3 additions & 1 deletion core/unix/os.c
Original file line number Diff line number Diff line change
Expand Up @@ -7655,7 +7655,9 @@ pre_system_call(dcontext_t *dcontext)
* use synch to ensure other threads see the
* new code.
*/
false /*don't force synchall*/);
false /*don't force synchall*/,
NULL /*flush_completion_callback*/,
NULL /*user_data*/);
break;
}
# endif /* ARM */
Expand Down
Loading

0 comments on commit 893c06c

Please sign in to comment.