-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add read/write instrumentation with LTTng #898
Add read/write instrumentation with LTTng #898
Conversation
d1073ae
to
3037262
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @christophebedard thanks for contributing this first attempt at introducing proper tracing into the Cyclone code base. I have a problem with basing everything exclusively on LTTng, as I really want to (also) be able to use dtrace
on macOS, for example.
That means probe definitions need to be a bit differently. This in itself is going to be tricky, so I would really like to see a PoC for first, or at least a sketch of one 😁.
Based on what you noted in ros2/rmw_cyclonedds#324 (comment) concerning semaphores, an argument could be made that LTTng tracepoints would necessarily end up in less interesting places and fewer in number and that consequently it wouldn't be worth the effort. In that case, arguably, LTTng tracepoints may not be a good choice for Cyclone to begin with.
That restriction does strike me as odd, as I haven't seen it in the LTTng documentation and instead found a remark that it uses a lock-free procedure for adding to the trace. I don't see how the two fit together.
Some of the comments are perhaps a bit stream-of-consciousness ... my apologies for that.
|
||
#ifdef DDS_HAS_LTTNG_TRACING | ||
dds_guid_t guid; | ||
(void)dds_get_guid(reader, &guid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dds_get_guid
can fail if there is a concurrent call attempting to delete the new reader. Worse, the reader may have been deleted at this point, because dds_entity_init_complete
on line 607 makes the new reader accessible via its handle and unpins it as well. That is way too early (I am sure there is a reason, so I need to look into it before making any changes).
However, if you were to try calling dds_get_guid
prior to dds_entity_init_complete
it would fail because it would still be "pending" ... It probably makes sense to introduce a function that takes an entity pointer and returns the corresponding dds_guid_t
(analogous to the inner workings of dds_get_guid
) and then refactor dds_get_guid
to use that function.
I think it would be better to trace not (just) the address of the reader (rd
) but (also) its handle (reader
), because the handle is what is used consistently in the API. Tracing the address is primarily interesting for the purpose of analyzing Cyclone itself.
From a ROS 2 perspective, I can see that tracing just the topic name would be sufficient, but for more general DDS usage, the partition names might be rather useful, too. Based on the QoS matching rules one could argue that all QoS are important, but I don't know of anyone who deliberately relies on incompatible QoS's in a system, and I don't think this type of instrumentation is appropriate for debugging incorrect QoS.
For analyzing the reader side of Cyclone itself, having rd->m_rd
and rd->m_rhc
available would also be useful, as those are involved in handling data at a lower level in the stack (esp. the latter would be useful, it'd allow you to trace data going into and coming out of the reader history cache).
However, adding all that information to the trace brings it perilously close to the problems of the current tracing to a text file. There is always something else to add, and in the end you are tracing far too much, most of it completely useless for the problem at hand. This is incidentally one reason why I very much prefer dtrace/bpftrace-style tooling: then a script can extract all the necessary information from just the reader pointer.
|
||
#ifdef DDS_HAS_LTTNG_TRACING | ||
dds_guid_t guid; | ||
(void)dds_get_guid(writer, &guid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment at dds_create_reader
: here the call to dds_entity_init_complete
wasn't done too early, but still there is no guarantee that the writer
is still accessible nor that wr
is still a valid pointer.
(There is also the possibility that in between line 448 (perhaps 452 because deleting a writer does require locking the publisher for removing it from its table of children) and this tracepoint, the writer is deleted and a new one is created that just happens to be at the same address. If you're really out of luck you might end up with an ambiguous trace ...)
@@ -333,6 +334,8 @@ dds_return_t dds_write_impl (dds_writer *wr, const void * data, dds_time_t tstam | |||
if (data == NULL) | |||
return DDS_RETCODE_BAD_PARAMETER; | |||
|
|||
TRACEPOINT(write, (const void *)wr, data, tstamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd trace the writer's handle, not its address (see comment at dds_create_reader
).
There's also dds_writecdr_impl
which is used by rmw_publish_serialized_message
.
@@ -128,6 +129,7 @@ static dds_return_t dds_read_impl (bool take, dds_entity_t reader_or_condition, | |||
} | |||
dds_entity_unpin (entity); | |||
thread_state_asleep (ts1); | |||
TRACEPOINT(read, (const void *)rd, (const void *)*buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd probably want to instrument dds_readcdr_impl
as well, it is used by rmw_take_serialized_message
.
* SPDX-License-Identifier: EPL-2.0 OR BSD-3-Clause | ||
*/ | ||
|
||
// Provide fake header guard for cpplint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, read enough of the docs and header files of LTTng to understand this comment. It would probably be better to explain that LTTng tracepoints expect multiple inclusion of this file and point to https://lttng.org/docs/v2.13/#doc-tpp-header for a bit of a backgrounder.
I also think it would make sense to not disable LLTng API version-0 compatibility and use the LTTNG_UST_
prefixed names instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, read enough of the docs and header files of LTTng to understand this comment. It would probably be better to explain that LTTng tracepoints expect multiple inclusion of this file and point to lttng.org/docs/v2.13/#doc-tpp-header for a bit of a backgrounder.
I don't think Cyclone DDS uses cpplint, so it might not be needed here if linters don't complain. I'll look into it and remove this part if it's not needed.
I also think it would make sense to not disable LLTng API version-0 compatibility and use the
LTTNG_UST_
prefixed names instead.
I think it's the other way around, no? Version 0 doesn't use those prefixes. Besides, Ubuntu 20.04 is on LTTng 2.11.
#include <stdint.h> | ||
#include <stdbool.h> | ||
|
||
#define DDS_GID_STORAGE_SIZE 16u |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be possible to include dds/dds.h
and use dds_guid_t
and its sizeof
directly, rather than defining its size here and passing it as an array of uint8_t
? Not that its definition will change 🙂
The other thing is that it would definitely by better to consistently use GUID
rather than introduce GID
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be possible to include
dds/dds.h
and usedds_guid_t
and itssizeof
directly, rather than defining its size here and passing it as an array ofuint8_t
? Not that its definition will change
Sure I'll try.
The other thing is that it would definitely by better to consistently use
GUID
rather than introduceGID
here.
Sure I'll change it. Honestly I couldn't figure out which term was the correct one. It seems like there are many different terms for this 😅
@@ -645,6 +646,12 @@ static dds_entity_t dds_create_reader_int (dds_entity_t participant_or_subscribe | |||
dds_topic_allow_set_qos (tp); | |||
dds_topic_unpin (tp); | |||
dds_subscriber_unlock (sub); | |||
|
|||
#ifdef DDS_HAS_LTTNG_TRACING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to make it in, we'll have to find a better way to specify trace points and to hide all the conditional stuff inside the TRACEPOINT
macro (or whatever it ends being).
Looking at USDT as well, I think an attempt at supporting both is worth a try. For example, the TRACEPOINT
use here could be mapped to DTRACE_PROBE3(TRACEPOINT_PROVIDER, (const void *)rd, rd->m_topic->m_name, guid.v)
. It may not always be this straightforward, of course, and no doubt there'll be cases where one makes sense and the other doesn't.
As usual, Sun did a rather nice job, and as a consequence dtrace
has separate probe definitions from which you generate C (and you get type information via CTF
that is completely unrelated to that abbreviation's use by LTTng
). That does mean the probe definitions currently in tracing_lttng.h
will need equivalent tracepoints defined in .d
files.
Getting all that to play nicely and without a lot of duplication is going to take some work. Requiring a work of art at this stage is, I think, detrimental to progress, but a proof-of-concept (even a fragile one) supporting both with a single source and some preprocessing (in Python/Perl/Haskell/CLang-derived compiler/whatever) and with the ifdef
s pushed into header files is something I would really like to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely can't promise I'll do all of that, but I can try to get a proof of concept for LTTng + "native" probes support at some point.
@eboasson sorry for the late reply!
I think you can use LTTng UST on macOS, so you can build the SDT probes with LTTng and you can use dtrace. Limitations aside, do you just want to avoid having to use LTTng?
I can try something.
I'm not sure I understand. The restriction is for SDT probes only. Why would LTTng tracepoint necessarily end up in less interesting places?
AFAIU LTTng doesn't support SDT probe arguments simply because it's not currently implemented. It might be the case for semaphore support as well; I'm not sure. |
This restores the directories used before the MinGW support was added. For some reason, in the CI the VS and Xcode builds do store the output in directories not suffixed with Release or Debug, but this is not usually the case. As a consequence, on a typical build with those build systems, this test failed. Signed-off-by: Erik Boasson <eb@ilities.com>
This fixes incompatibilities with bison 3.8: * returning user-defined values from yyparse is not supported, this works aruond that by storing a more meaningful return value in the parser state object; * the token name table generated by %token-table is deprecated and has been for quite some time, and it seems that now the accompanying mapping to internal token numbers is gone as well, this uses a hack to work around that. A follow-up fix is needed to remove the use of the token name table entirely. The hack here is attractive only in that it provides a quick solution for the failing CI runs on macOS. Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
The analyzer doesn't do a good job if there are no SAL2 annotations, but those annotations are only supported by Visual Studio, the documentation is very incomplete and it is (nearly) impossible to get them right without actually compiling the code with Visual Studio. The latter is what it makes it really impossible to use them. That makes dealing with false positives from the Visual Studio analzyer far more trouble than it is worth (especially considering we're also using the clang and gcc analyzers and Coverity). Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Dennis Potman <dennis.potman@adlinktech.com>
Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
The length returned by snprintf does not includes the ending null. So the test must be changed to ensure that if the final length if sadly 64, the library can be found all the time. Also add little comments. Signed-off-by: José Bollo <jose.bollo@iot.bzh>
A message is emitted to stderr when either the plugin can not be loaded or the symbol 'generate' can't be located in the loaded plugin. Logging to sdterr is in wait of some better integration to some other logging system. Signed-off-by: José Bollo <jose.bollo@iot.bzh>
Signed-off-by: Erik Boasson <eb@ilities.com>
- The parser was not proceeding correctly in the case of the state being IDL_SCAN_ANNOTATION_APPL_SCOPED_NAME Signed-off-by: Martijn Reicher <martijn.reicher@adlinktech.com>
- The parser was setting the node mask to IDL_ANY in stead of the correct mask for bool, char and string literals Signed-off-by: Martijn Reicher <martijn.reicher@adlinktech.com>
Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
Signed-off-by: Dennis Potman <dennis.potman@adlinktech.com>
Case label destructors deleted the const-expr as opposed to unreferencing it causing a use-after-free if the same enum was used in more than one union. Fixes eclipse-cyclonedds#946. Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
Destructors are not executed for untyped midrule actions because Bison does not know which destructor to run. annotation_appl nodes were not cleaned up if the annotation_appl could not be successfully finalized. Remove use of midrule actions in annotation_appl as some platforms still ship with Bison 3.0.4. Fixes eclipse-cyclonedds#950. Signed-off-by: Jeroen Koekkoek <jeroen@koekkoek.nl>
…g 'C' as a language platform-specific macros are irrelevant for idlc, and this is a blocker for compiling mac universal binaries.
Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
There is a maximum nesting depth for the configuration that is determined by the internal configuration tables. Simply aborting the interpretation of the XML input on the first unrecognized element item would guarantee never exceeding the maximum nesting depth. As an error handling strategy this is less user-friendly than ignoring some of the input, attempting to interpret the remainder and reporting any further issues. This commit makes it abort the interpretation as soon as the maximum nesting depth is reached, in all other cases it continues and reports as many issues with the configuration as it can. Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
- This fixes eclipse-cyclonedds#968 - Add a byte swap to the participant GUID prefix Signed-off-by: Martijn Reicher <martijn.reicher@adlinktech.com>
CMAKE_SOURCE_DIR always points to the top-level CMake project, so if CycloneDDS was included as a gitmodule or pulled in using CMake FetchContent, the CMake step would fail with errors like ``` CMake Error at cyclonedds/src/tools/idlc/CMakeLists.txt:52 (file): file failed to open for reading (No such file or directory): /home/robin/dds-project/src/ddsrt/include/getopt.h.in CMake Error: File /home/robin/dds-project/src/ddsrt/src/getopt.c does not exist. CMake Error at cyclonedds/src/tools/idlc/CMakeLists.txt:61 (configure_file): configure_file Problem configuring file CMake Error at cyclonedds/src/tools/idlc/CMakeLists.txt:95 (include): include could not find load file: /home/robin/dds-project/cmake/Modules/Generate.cmake ``` This commit replaces CMAKE_SOURCE_DIR with <PROJECT_NAME>_SOURCE_DIR, a variable that always points to the source dir of the named project. See: https://cmake.org/cmake/help/v3.21/variable/CMAKE_SOURCE_DIR.html https://cmake.org/cmake/help/v3.21/variable/PROJECT-NAME_SOURCE_DIR.html Signed-off-by: Robin Lindén <_@robinlinden.eu>
Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
gcc 11 warns that there is a conversion from 'long int' to 'uint32_t' when assigning the value of PTHREAD_STACK_MIN to tattr.stackSize. It's pedantically correct; tattr.stackSize is a uint32_t, and PTHREAD_STACK_MIN is either an integer (if it is hard-coded), or the result of calling sysconf(), which always returns a long. In either case it is a signed value, so we should explicitly cast it to uint32_t here. This quiets the warning. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
This makes idlc show an error message in case the maximum number of instructions is reached. Because of 16-bits offset fields in the instructions, the maximum number of instructions that can be used is limited to 65k. Signed-off-by: Dennis Potman <dennis.potman@adlinktech.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
This suppresses deprecation warnings from OpenSSL 3.0. It is not a proper fix, but the OpenSSL documentation on how to deal with the deprecations is such that it will take some time to fix this. Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Christophe Bedard <bedard.christophe@gmail.com>
Signed-off-by: Christophe Bedard <bedard.christophe@gmail.com>
c2be68f
to
21d447c
Compare
While I would actually like to merge this PR it's been left too long and has become unreviewable with the many conflicts and highly diverged branches. @christophebedard would you like to rebase your changes onto the current master or should I close this and defer LTTng to the TODO backlog? |
Thanks for the follow up. We can close this for now. |
This adds tracing instrumentation using LTTng for:
When a writer or reader entity is created, we note:
When we write or read, we note:
This allows us to:
Also, while it has to be separate from the ROS 2 instrumentation, these tracepoints can easily be enabled by the tracing-related launch/CLI utilities in https://gitlab.com/ros-tracing/ros2_tracing
See the discussion here: ros2/rmw_cyclonedds#324 (comment). As mentioned at the bottom of that PR, LTTng has only partial support for tracing SDT probes, but it can generate them alongside its own tracepoints if it is built from source and configured with
--with-sdt
.Note: this PR targets
master
, but I'd like to get it cherry-picked (if the PR is accepted) to theiceoryx
branch so that ROS 2 can use it (since it currently targets theiceoryx
branch).Signed-off-by: Christophe Bedard bedard.christophe@gmail.com