Skip to content

Commit

Permalink
Perfmap C-API and JIT integration
Browse files Browse the repository at this point in the history
Summary:
With the perf trampoline writing to the perf-map files, we want to have a C-API to unify writing to the perf-map files to avoid file corruption from simultaneous writes. We are trying to upstream the API here python/cpython#103546. More details about the motivation is in the PR.

In addition to introducing the C-API, we also change JIT to utilize the new C-API.

Reviewed By: czardoz

Differential Revision: D45421966

fbshipit-source-id: d270cc753a245f93cbfe3d723d0880595fef45f2
  • Loading branch information
Gamal Sallam authored and facebook-github-bot committed May 19, 2023
1 parent a88f497 commit c475629
Show file tree
Hide file tree
Showing 12 changed files with 359 additions and 119 deletions.
3 changes: 3 additions & 0 deletions Cinder/module/known-core-python-exported-symbols
Original file line number Diff line number Diff line change
Expand Up @@ -1985,6 +1985,9 @@ _PyUnicode_XStrip
_PyUnion_Type
_Py_union_type_or
Py_UniversalNewlineFgets
PyUnstable_PerfMapState_Fini
PyUnstable_PerfMapState_Init
PyUnstable_WritePerfMapEntry
_Py_UTF8_Edit_Cost
_Py_VaBuildStack
_Py_VaBuildStack_SizeT
Expand Down
39 changes: 39 additions & 0 deletions Doc/c-api/perfmaps.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@


.. highlight:: c

.. _perfmaps:

Support for Perf Maps
----------------------

On supported platforms (as of this writing, only Linux), the runtime can take
advantage of *perf map files* to make Python functions visible to an external
profiling tool (such as `perf <https://perf.wiki.kernel.org/index.php/Main_Page>`_).
A running process may create a file in the `/tmp` directory, which contains entries
that can map a section of executable code to a name. This interface is described in the
`documentation of the Linux Perf tool <https://git.kernel.org/pub/scm/linux/
kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt>`_.

In Python, these helper APIs can be used by libraries and features that rely
on generating machine code on the fly.

.. c:function:: int PyUnstable_PerfMapState_Init(void)
Open the `/tmp/perf-$pid.map` file, unless it's already opened, and create
a lock to ensure thread-safe writes to the file (provided the writes are
done through :c:func:`PyUnstable_WritePerfMapEntry`). Normally, there's no need
to call this explicitly, and it is safe to directly use :c:func:`PyUnstable_WritePerfMapEntry`
in your code. If the state isn't already initialized, it will be created on
the first call.
.. c:function:: int PyUnstable_WritePerfMapEntry(const void *code_addr, unsigned int code_size, const char *entry_name)
Write one single entry to the `/tmp/perf-$pid.map` file. This function is
thread safe. Here is what an example entry looks like::
# address size name
0x7f3529fcf759 b py::bar:/run/t.py
Extensions are encouraged to directly call this API when needed, instead of
separately initializing the state by calling :c:func:`PyUnstable_PerfMapState_Init`.
.. c:function:: int PyUnstable_PerfMapState_Fini(void)
Close the perf map file, which was opened in `PyUnstable_PerfMapState_Init`. This
API is called by the runtime itself, during interpreter shut-down. In general,
there shouldn't be a reason to explicitly call this, except to handle specific
scenarios such as forking.
3 changes: 2 additions & 1 deletion Doc/howto/perf_profiling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ functions to appear in the output of the ``perf`` profiler. When this mode is
enabled, the interpreter will interpose a small piece of code compiled on the
fly before the execution of every Python function and it will teach ``perf`` the
relationship between this piece of code and the associated Python function using
`perf map files`_.
`perf map files`_. If you're an extension author interested in having your extension
write to the perf map files, refer to :doc:`the C-API <../c-api/perfmaps>`.

.. warning::

Expand Down
2 changes: 1 addition & 1 deletion Include/internal/pycore_ceval.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ typedef struct {
void (*write_state)(void* state, const void *code_addr,
unsigned int code_size, PyCodeObject* code);
// Callback to free the trampoline state
int (*free_state)(void* state);
void (*free_state)(void);
} _PyPerf_Callbacks;

extern int _PyPerfTrampoline_SetCallbacks(_PyPerf_Callbacks *);
Expand Down
13 changes: 13 additions & 0 deletions Include/osmodule.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,19 @@ extern "C" {
PyAPI_FUNC(PyObject *) PyOS_FSPath(PyObject *path);
#endif

#if !defined(Py_LIMITED_API)
typedef struct {
FILE* perf_map;
PyThread_type_lock map_lock;
} PerfMapState;

PyAPI_FUNC(int) PyUnstable_PerfMapState_Init(void);

PyAPI_FUNC(int) PyUnstable_WritePerfMapEntry(const void *code_addr, unsigned int code_size, const char *entry_name);

PyAPI_FUNC(void) PyUnstable_PerfMapState_Fini(void);
#endif

#ifdef __cplusplus
}
#endif
Expand Down
188 changes: 154 additions & 34 deletions Jit/perf_jitdump.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "Jit/pyjit.h"
#include "Jit/threaded_compile.h"
#include "Jit/util.h"
#include "pycore_ceval.h"

#include <elf.h>
#include <fcntl.h>
Expand All @@ -19,6 +20,10 @@
#include <cstdio>
#include <cstring>
#include <ctime>
#include <iostream>
#include <regex>
#include <sstream>
#include <tuple>

#ifdef __x86_64__
// Use the cheaper rdtsc by default. If you disable this for some reason, or
Expand Down Expand Up @@ -242,6 +247,40 @@ void initFiles() {
inited = true;
}

// Parses a JIT entry and returns a tuple containing the
// code address, code size, and entry name. An example of an entry is:
// 7fa873c00148 360 __CINDER_JIT:__main__:foo2
std::tuple<const void*, unsigned int, const char*> parseJitEntry(
const char* entry) {
std::string_view entry_view = entry;
size_t space_pos_1 = entry_view.find(' ');

// Extract the hexadecimal code address
const char* code_addr_str = entry_view.substr(0, space_pos_1).data();
unsigned long long code_addr_val = 0;
std::from_chars(
code_addr_str, code_addr_str + space_pos_1, code_addr_val, 16);
const void* code_addr = reinterpret_cast<const void*>(code_addr_val);

// Find the second space character
size_t space_pos_2 = entry_view.find(' ', space_pos_1 + 1);

// Extract the hexadecimal code size
const char* code_size_str =
entry_view.substr(space_pos_1 + 1, space_pos_2).data();
uint32_t code_size;
std::from_chars(
code_size_str,
code_size_str + (space_pos_2 - space_pos_1 - 1),
code_size,
16);

// Extract the entry name
const char* entry_name = entry_view.substr(space_pos_2 + 1).data();

return std::make_tuple(code_addr, code_size, entry_name);
}

// Copy the contents of from_name to to_name. Returns a std::FILE* at the end
// of to_name on success, or nullptr on failure.
std::FILE* copyFile(const std::string& from_name, const std::string& to_name) {
Expand Down Expand Up @@ -277,6 +316,74 @@ std::FILE* copyFile(const std::string& from_name, const std::string& to_name) {
}
}

// Copy the contents of the parent perf map file to the child perf map file.
// Returns 1 on success and 0 on failure.
int copyJitFile(const std::string& parent_filename) {
auto parent_file = std::fopen(parent_filename.c_str(), "r");
if (parent_file == nullptr) {
JIT_LOG(
"Couldn't open %s for reading (%s)",
parent_filename,
string_error(errno));
return 0;
}

char buf[1024];
while (std::fgets(buf, sizeof(buf), parent_file) != nullptr) {
buf[strcspn(buf, "\n")] = '\0';
auto jit_entry = parseJitEntry(buf);
try {
PyUnstable_WritePerfMapEntry(
std::get<0>(jit_entry),
std::get<1>(jit_entry),
std::get<2>(jit_entry));
} catch (const std::invalid_argument& e) {
JIT_LOG("Error: Invalid JIT entry: %s \n", buf);
}
}
std::fclose(parent_file);
return 1;
}

// Copy the JIT entries from the parent perf map file to the child perf map
// file. This is used when perf-trampoline is enabled, as the perf map file
// will also include trampoline entries. We only want to copy the JIT entries.
// Returns 1 on success, and 0 on failure.
int copyJitEntries(const std::string& parent_filename) {
auto parent_file = std::fopen(parent_filename.c_str(), "r");
if (parent_file == nullptr) {
JIT_LOG(
"Couldn't open %s for reading (%s)",
parent_filename,
string_error(errno));
return 0;
}

char buf[1024];
while (std::fgets(buf, sizeof(buf), parent_file) != nullptr) {
if (std::strstr(buf, "__CINDER_") != nullptr) {
buf[strcspn(buf, "\n")] = '\0';
auto jit_entry = parseJitEntry(buf);
try {
PyUnstable_WritePerfMapEntry(
std::get<0>(jit_entry),
std::get<1>(jit_entry),
std::get<2>(jit_entry));
} catch (const std::invalid_argument& e) {
JIT_LOG("Error: Invalid JIT entry: %s \n", buf);
}
}
}
std::fclose(parent_file);
return 1;
}

bool isPerfTrampolineActive() {
PyThreadState* tstate = PyThreadState_GET();
return tstate->interp->eval_frame &&
tstate->interp->eval_frame != _PyEval_EvalFrameDefault;
}

// Copy the perf pid map from the parent process into a new file for this child
// process.
void copyFileInfo(FileInfo& info) {
Expand All @@ -290,33 +397,53 @@ void copyFileInfo(FileInfo& info) {
fmt::format(fmt::runtime(info.filename_format), getpid());
info = {};

unlink(child_filename.c_str());

if (_PyJIT_IsEnabled()) {
// The JIT is still enabled: copy the file to allow for more compilation in
// this process.
if (auto new_pid_map = copyFile(parent_filename, child_filename)) {
info.filename = child_filename;
info.file = new_pid_map;
if (parent_filename.starts_with("/tmp/perf-") &&
parent_filename.ends_with(".map") && isPerfTrampolineActive()) {
if (!copyJitEntries(parent_filename)) {
JIT_LOG(
"Failed to copy JIT entries from %s to %s",
parent_filename,
child_filename);
}
} else {
// The JIT has been disabled: hard link the file to save disk space. Don't
// open it in this process, to avoid messing with the parent's file.
if (::link(parent_filename.c_str(), child_filename.c_str()) != 0) {
} else if (
parent_filename.starts_with("/tmp/perf-") &&
parent_filename.ends_with(".map") && _PyJIT_IsEnabled()) {
// The JIT is still enabled: copy the file to allow for more compilation
// in this process.
if (!copyJitFile(parent_filename)) {
JIT_LOG(
"Failed to link %s to %s: %s",
child_filename,
"Failed to copy perf map file from %s to %s",
parent_filename,
string_error(errno));
child_filename);
}
} else {
unlink(child_filename.c_str());
if (_PyJIT_IsEnabled()) {
// The JIT is still enabled: copy the file to allow for more compilation
// in this process.
if (auto new_pid_map = copyFile(parent_filename, child_filename)) {
info.filename = child_filename;
info.file = new_pid_map;
}
} else {
// Poke the file's atime to keep tmpwatch at bay.
std::FILE* file = std::fopen(parent_filename.c_str(), "r");
if (file != nullptr) {
std::fclose(file);
// The JIT has been disabled: hard link the file to save disk space. Don't
// open it in this process, to avoid messing with the parent's file.
if (::link(parent_filename.c_str(), child_filename.c_str()) != 0) {
JIT_LOG(
"Failed to link %s to %s: %s",
child_filename,
parent_filename,
string_error(errno));
} else {
// Poke the file's atime to keep tmpwatch at bay.
std::FILE* file = std::fopen(parent_filename.c_str(), "r");
if (file != nullptr) {
std::fclose(file);
}
}
info.file = nullptr;
info.filename = "";
}
info.file = nullptr;
info.filename = "";
}
}

Expand Down Expand Up @@ -353,19 +480,12 @@ void registerFunction(

initFiles();

if (auto file = g_pid_map.file) {
for (auto& section_and_size : code_sections) {
void* code = section_and_size.first;
std::size_t size = section_and_size.second;
fmt::print(
file,
"{:x} {:x} {}:{}\n",
reinterpret_cast<uintptr_t>(code),
size,
prefix,
name);
std::fflush(file);
}
for (auto& section_and_size : code_sections) {
void* code = section_and_size.first;
std::size_t size = section_and_size.second;
auto jit_entry = prefix + ":" + name;
PyUnstable_WritePerfMapEntry(
static_cast<const void*>(code), size, jit_entry.c_str());
}

if (auto file = g_jitdump_file.file) {
Expand Down
19 changes: 19 additions & 0 deletions Lib/test/test_perfmaps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import os
import sys
import unittest

from _testinternalcapi import perf_map_state_teardown, write_perf_map_entry

if sys.platform != 'linux':
raise unittest.SkipTest('Linux only')


class TestPerfMapWriting(unittest.TestCase):
def test_write_perf_map_entry(self):
self.assertEqual(write_perf_map_entry(0x1234, 5678, "entry1"), 0)
self.assertEqual(write_perf_map_entry(0x2345, 6789, "entry2"), 0)
with open(f"/tmp/perf-{os.getpid()}.map") as f:
perf_file_contents = f.read()
self.assertIn("1234 162e entry1", perf_file_contents)
self.assertIn("2345 1a85 entry2", perf_file_contents)
perf_map_state_teardown()
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Introduced :c:func:`PyUnstable_WritePerfMapEntry`, :c:func:`PyUnstable_PerfMapState_Init` and
:c:func:`PyUnstable_PerfMapState_Fini`. These allow extension modules (JIT compilers in
particular) to write to perf-map files in a thread safe manner. The
:doc:`../howto/perf_profiling` also uses these APIs to write
entries in the perf-map file.
26 changes: 26 additions & 0 deletions Modules/_testinternalcapi.c
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,30 @@ test_gc_visit_objects(PyObject *Py_UNUSED(self), PyObject *Py_UNUSED(ignored)) {
Py_RETURN_NONE;
}

static PyObject *
write_perf_map_entry(PyObject *self, PyObject *args)
{
const void *code_addr;
unsigned int code_size;
const char *entry_name;

if (!PyArg_ParseTuple(args, "KIs", &code_addr, &code_size, &entry_name))
return NULL;

int ret = PyUnstable_WritePerfMapEntry(code_addr, code_size, entry_name);
if (ret == -1) {
PyErr_SetString(PyExc_OSError, "Failed to write performance map entry");
return NULL;
}
return Py_BuildValue("i", ret);
}

static PyObject *
perf_map_state_teardown(PyObject *Py_UNUSED(self), PyObject *Py_UNUSED(ignored))
{
PyUnstable_PerfMapState_Fini();
Py_RETURN_NONE;
}

// These are used in native calling tests, ensure the compiler
// doesn't hide or remove these symbols
Expand Down Expand Up @@ -438,6 +462,8 @@ static PyMethodDef TestMethods[] = {
{"test_atomic_funcs", test_atomic_funcs, METH_NOARGS},
{"test_edit_cost", test_edit_cost, METH_NOARGS},
{"test_gc_visit_objects", test_gc_visit_objects, METH_NOARGS},
{"write_perf_map_entry", write_perf_map_entry, METH_VARARGS},
{"perf_map_state_teardown", perf_map_state_teardown, METH_NOARGS},
{NULL, NULL} /* sentinel */
};

Expand Down
Loading

0 comments on commit c475629

Please sign in to comment.