Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix factory map leak (#131) #179

Closed
wants to merge 1 commit into from

Conversation

y-okumura-isp
Copy link

Fixes #131.

Sorry that my message is very long.
And as we dont't know the historical reasons well, I'm sorry if there are mistakes.

where leak happens

It looks 'new_factory' is not deleted when factoryMap is deleted.

// class_loader_core.hpp
template<typename Derived, typename Base>
void registerPlugin(const std::string & class_name, const std::string & base_class_name)
{
  // this "new_factory" is not deleted
  impl::AbstractMetaObject<Base> * new_factory =
    new impl::MetaObject<Derived, Base>(class_name, base_class_name);

  FactoryMap & factoryMap = getFactoryMapForBaseClass<Base>();
  factoryMap[class_name] = new_factory;
}

Additionally, factoryMap is finally a typedef of std::map and is stored in static BaseToFactoryMapMap instance which is also std::map.

// class_loader_core.hpp
typedef std::map<ClassName, impl::AbstractMetaObjectBase *> FactoryMap;
typedef std::map<BaseClassName, FactoryMap> BaseToFactoryMapMap;



// // class_loader_core.cpp
BaseToFactoryMapMap & getGlobalPluginBaseToFactoryMapMap()
{
  static BaseToFactoryMapMap instance;
  return instance;
}

FactoryMap & getFactoryMapForBaseClass(const std::string & typeid_base_class_name)
{
  BaseToFactoryMapMap & factoryMapMap = getGlobalPluginBaseToFactoryMapMap();
  std::string base_class_name = typeid_base_class_name;
  if (factoryMapMap.find(base_class_name) == factoryMapMap.end()) {
    factoryMapMap[base_class_name] = FactoryMap();
  }

  return factoryMapMap[base_class_name];
}

It looks new_factory is not deleted when instance is out of scope.
We tried to use a custom deleter(please see our draft PR), then we got deleting object of polymorphic class type ‘class_loader::impl::AbstractMetaObjectBase’ which has non-virtual destructor might cause undefined behavior warning. This may be because the AbstractMetaObjectBase destructor is not virtual.
We found "virtual" was kept in PR53, but we didn't catch well why this should not be virtual.

About AbstractMetaObjectBase non-virtual destructor

new_factory above is a variable of impl::AbstractMetaObject, which has vtable but has non-virtual distructor.
This looks to invoke undefined behavior according to https://en.cppreference.com/w/cpp/language/destructor "Virtual destructors".
And this is what the compiler warning says.

Here is the AbstractMetaObjectsBase code:

// meta_object.hpp
class CLASS_LOADER_PUBLIC AbstractMetaObjectBase
{
public:
  /**
   * @brief Constructor for the class
   */
  AbstractMetaObjectBase(
    const std::string & class_name,
    const std::string & base_class_name,
    const std::string & typeid_base_class_name = "UNSET");
  /**
   * @brief Destructor for the class. THIS MUST NOT BE VIRTUAL AND OVERRIDDEN BY
   * TEMPLATE SUBCLASSES, OTHERWISE THEY WILL PULL IN A REDUNDANT METAOBJECT
   * DESTRUCTOR OUTSIDE OF libclass_loader WITHIN THE PLUGIN LIBRARY! T
   */
  ~AbstractMetaObjectBase();

protected:
  /**
   * This is needed to make base class polymorphic (i.e. have a vtable)
   */
  virtual void dummyMethod() {}

  AbstractMetaObjectBaseImpl * impl_;
}

In the destructor comment, we can see THIS MUST NOT BE VIRTUAL AND OVERIDDEN BY TEMPLATE SUBCLASSES.

We found this comment was added at da86427.
The commit message is Fixed bug with redundant destructor definition being pulled into plugin library for metaobjects instead of being contained with libclass_loader.so.
Ths problem looks MetaObject destructor is placed in both libsome_plugin.so and libclass_loader.so.

As there are three related classes and two runtime libraries(plugin and libclass_loader), let me explain the details.

We describe AbstractMetaObjectBase and its children structures as below in descendants to ancestor order:

// (1) final template class
MetaObject<
       Derived,
       Base>

// (2) intermediate template classs
AbstractMetaObject<Base>

// (3) Base class, it is not a template class
AbstractMetaObjectBase

When we implement a plugin library, we use CLASS_LOADER_REGISTER_CLASS macro in the plugin .cpp file.
This macro calls registerPlugin and then invokes new impl::MetaObject<Derived, Base>.
So (1) and (2) are instantiated in plugin .cpp files, thus object codes are in libsome_plugin.so.
On the other hand, as meta_object.cpp is compiled only in libclass_loader.so, object code of (3) is included only in libclass_loader.so.

As template classes are instantiated in plugin .cpp, the reasonable situation may be the following:

  • ~MetaObject and ~AbstractMetaObject are only in plugin library(libsome_plugin.so) and not in libclass_loader.so
    • the latter may not happen because the template classes are not instantiated
  • ~AbstractMetaObjectBase is only in libclass_loader.so and not in libsome_plugin.so
    • As we don't use class_loader/src/meta_object.cpp to compile libsome_plugin.so, this also looks always true.

compare the latest code and our code

We built the latest code as build-asan and our patched code as build-asan2.

Here are the destructor symbols defined in libclass_loader.so.
~AbstractMetaObject and ~MetaObject is not defined.

## the latest code
$ nm build-asan/class_loader/libclass_loader.so | c++filt | grep -e ":~.*MetaObject"
000000000005f28a T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005f28a T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005feee W class_loader::impl::AbstractMetaObjectBaseImpl::~AbstractMetaObjectBaseImpl()
000000000005feee W class_loader::impl::AbstractMetaObjectBaseImpl::~AbstractMetaObjectBaseImpl()


## our modified code
$ nm build-asan2/class_loader/libclass_loader.so | c++filt | grep -e ":~.*MetaObject"
000000000005f434 T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005f2fc T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005f2fc T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005ff90 W class_loader::impl::AbstractMetaObjectBaseImpl::~AbstractMetaObjectBaseImpl()
000000000005ff90 W class_loader::impl::AbstractMetaObjectBaseImpl::~AbstractMetaObjectBaseImpl()

And here are destrucotr symbols in libplugin.so. We choose composition/libclient_component.so.
Only our version has ~AbstractMetaObjectBase and ~MetaObject but they are only in libclient_component.so and not in libclass_loader.so.
Note we can find ~AbstractMetaObjectBase() but it's undefined symbol so it is not also redundant.

## the latest code
$ nm build-asan/composition/libclient_component.so | c++filt | grep -e ":~.*MetaObject"


## our modified code
$ nm build-asan2/composition/libclient_component.so | c++filt | grep -e ":~.*MetaObject"
00000000000729aa W class_loader::impl::MetaObject<rclcpp_components::NodeFactoryTemplate<composition::Client>, rclcpp_components::NodeFactory>::~MetaObject()
0000000000072958 W class_loader::impl::MetaObject<rclcpp_components::NodeFactoryTemplate<composition::Client>, rclcpp_components::NodeFactory>::~MetaObject()
0000000000072958 W class_loader::impl::MetaObject<rclcpp_components::NodeFactoryTemplate<composition::Client>, rclcpp_components::NodeFactory>::~MetaObject()
0000000000061a2e W class_loader::impl::AbstractMetaObject<rclcpp_components::NodeFactory>::~AbstractMetaObject()
00000000000619dc W class_loader::impl::AbstractMetaObject<rclcpp_components::NodeFactory>::~AbstractMetaObject()
00000000000619dc W class_loader::impl::AbstractMetaObject<rclcpp_components::NodeFactory>::~AbstractMetaObject()
                 U class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()

Test result

By this modification, we can resolve ASAN error.

For example, in composition/test_linktime_composition, we got the following errors.

$ launch_test test_linktime_composition__rmw_cyclonedds_cpp_Debug.py  | grep leak
test_linktime_composition (test_linktime_composition__rmw_cyclonedds_cpp_Debug.TestComposition)
Test process' output against expectations. ... ok

----------------------------------------------------------------------
Ran 1 test in 2.051s

OK
[test_linktime_composition-1] ==23173==ERROR: LeakSanitizer: detected memory leaks
[test_linktime_composition-1] Direct leak of 16 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Direct leak of 16 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Direct leak of 16 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Direct leak of 16 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 506 byte(s) in 12 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 152 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 152 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 152 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 152 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 8 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 8 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 8 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 8 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] SUMMARY: AddressSanitizer: 1210 byte(s) leaked in 24 allocation(s).

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK

With our modification, there are no error.

$ launch_test test_linktime_composition__rmw_cyclonedds_cpp_Debug.py
[INFO] [launch]: Default logging verbosity is set to INFO
test_linktime_composition (test_linktime_composition__rmw_cyclonedds_cpp_Debug.TestComposition)
Test process' output against expectations. ... [INFO] [test_linktime_composition-1]: process started with pid [23243]
[test_linktime_composition-1] [INFO] [1611135411.660105883] [linktime_composition]: Load library 
[test_linktime_composition-1] [INFO] [1611135411.660261196] [linktime_composition]: Instantiate class rclcpp_components::NodeFactoryTemplate<composition::Client>
[test_linktime_composition-1] [INFO] [1611135411.681911205] [linktime_composition]: Instantiate class rclcpp_components::NodeFactoryTemplate<composition::Listener>
[test_linktime_composition-1] [INFO] [1611135411.691607930] [linktime_composition]: Instantiate class rclcpp_components::NodeFactoryTemplate<composition::Server>
[test_linktime_composition-1] [INFO] [1611135411.701023683] [linktime_composition]: Instantiate class rclcpp_components::NodeFactoryTemplate<composition::Talker>
[test_linktime_composition-1] [INFO] [1611135412.710482236] [talker]: Publishing: 'Hello World: 1'
[test_linktime_composition-1] [INFO] [1611135412.710937423] [listener]: I heard: [Hello World: 1]
[test_linktime_composition-1] [INFO] [1611135413.682717590] [Server]: Incoming request: [a: 2, b: 3]
[test_linktime_composition-1] [INFO] [1611135413.683428550] [Client]: Got result: [5]
ok

----------------------------------------------------------------------
Ran 1 test in 2.043s

OK
[INFO] [test_linktime_composition-1]: sending signal 'SIGINT' to process[test_linktime_composition-1]
[test_linktime_composition-1] [INFO] [1611135413.686109959] [rclcpp]: signal_handler(signal_value=2)
[INFO] [test_linktime_composition-1]: process has finished cleanly [pid 23243]

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK

One one more topic: dummyMethod

We also found AbstractMetaObjectBase::dummyMethod is in both libplugin.so and libclass_loader.so.
This may be because implemantation of dummyMethod is in header file.
Is it better to move it to .cpp file?

$ nm build-asan/composition/libclient_component.so | c++filt | grep -e dummyMethod
000000000005b758 W class_loader::impl::AbstractMetaObjectBase::dummyMethod()

$  nm build-asan/class_loader/libclass_loader.so | c++filt | grep -e dummyMethod
000000000005fe1e W class_loader::impl::AbstractMetaObjectBase::dummyMethod()

Signed-off-by: y-okumura <y-okumura@isp.co.jp>
Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: dummy method. It appears to no longer be necessary if the destructor is virtual.


static std::shared_ptr<BaseToFactoryMapMap> instance;

// TODO: need lock?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@y-okumura-isp perhaps, static initialization is not thread-safe either.

static std::shared_ptr<BaseToFactoryMapMap> instance;

// TODO: need lock?
// TODO: we may use unique_ptr but use shared_ptr for PoC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@y-okumura-isp std::unique_ptr sounds best. Also, the custom deleter is fine, using std::unique_ptr<MetaObject> values would be best.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for comment.
I intended to make changes minimum. But it looks better to hold smart pointer directory.

I have one confirmation.

std::unique_ptr values would be best.

What you say is to change

typedef std::map<ClassName, impl::AbstractMetaObjectBase *> FactoryMap;

to

typedef std::map<ClassName, std::unique_ptr<impl::AbstractMetaObjectBase>> FactoryMap;

Right?

The pointer FactoryMap[class] is used in createInstance, so we could shared_ptr.

typedef std::map<ClassName, std::shared_ptr<impl::AbstractMetaObjectBase>> FactoryMap;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right?

Correct.

The pointer FactoryMap[class] is used in createInstance, so we could shared_ptr.

I presume you mean

factory = dynamic_cast<impl::AbstractMetaObject<Base> *>(factoryMap[derived_class_name]);
. If so, I see no need for shared ownership. Just grab the raw pointer momentarily.

@@ -69,7 +69,7 @@ class CLASS_LOADER_PUBLIC AbstractMetaObjectBase
* TEMPLATE SUBCLASSES, OTHERWISE THEY WILL PULL IN A REDUNDANT METAOBJECT
* DESTRUCTOR OUTSIDE OF libclass_loader WITHIN THE PLUGIN LIBRARY! T
*/
~AbstractMetaObjectBase();
virtual ~AbstractMetaObjectBase();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@y-okumura-isp meta: the comment is rather imprecise. Lacking virtual indeed breaks polymorphic destruction.

@hidmic
Copy link
Contributor

hidmic commented Jan 20, 2021

Also, it's been many years but I'll ask. @mirzashah do you recall any of this?

@y-okumura-isp
Copy link
Author

Is it better to update PR?

@hidmic
Copy link
Contributor

hidmic commented Jan 28, 2021

Is it better to update PR?

Please do. Also, check the linter issues reported in the Rpr job.

@y-okumura-isp
Copy link
Author

Hmm, it looks more complicated than I initially thought.

I tried to use a unique pointer and found the following.

Pointers in FatoryMap are not deleted directly but inserted into graveyard and finally purged.
https://github.com/ros/class_loader/blob/melodic-devel/src/class_loader_core.cpp#L374

Additionally, according to the comment class_loader_core.cpp:399,
the global factory map map might have multiple pointers to the same address.
https://github.com/ros/class_loader/blob/melodic-devel/src/class_loader_core.cpp#L399

I wonder if it is the right way to delete pointers in the destructor.
It's going to take a little longer.

@y-okumura-isp
Copy link
Author

I cancel this PR.
As we discussed above, We have two problems:

(1) leaks in BaseToFactoryMapMap. I found Graveyard also has the same leak.
(2) ~AbstractMetaObjectBase should be virtual (as AbstractMetaObjectBase has vtable).

We may fix (1) by a custom deleter or smart pointers.
But I couldn't address (2) because:

  • By adding virtual to ~AbstractMetaObjectBase, MetaObject should have a destructor.
  • The destructor binary code lies in plugin.so.
  • As BaseToFactoryMapMap and Graveyard are static variables, they are finalized at process termination. At that time, plugin.so is already unloaded. Thus we cannot call MetaObject destructor, and SEGV happens. Note that
    • MetaObject in BaseToFactoryMapMap is moved to Graveyard when the plugin library is unloaded.
    • MetaObject in Graveyard is purged at a certain timing, for example, when the plugin is loaded again.
    • so, leaks happen when MetaObject in BaseToFactoryMapMap or Graveyard remains until process exists.

Additionally, I initially referred to the leaks of composition/test_linktime_composition, but it may be an another problem.
This test is about libraries linked by the linker (rather than dlopen), as in https://github.com/ros2/demos/blob/master/composition/src/linktime_composition.cpp#L40.
But the class_loader looks to be designed to be used with dlopen. When we run test_linktime_composition, class_loader alerts as the following. And this is why MetaObject remains in BaseToFactoryMapMap.

# I set console_bridge log level = DEBUG
$ launch_test test_linktime_composition__rmw_cyclonedds_cpp_Debug.py

class_loader.impl: ALERT!!! A metaobject (i.e. factory) exists for desired class, but has no owner. This implies that the library containing the class was dlopen()ed by means other than through the class_loader interfac e. This can happen if you build plugin libraries that contain more than just plugins (i.e. normal code your app links against) -- that intrinsically will trigger a dlopen() prior to main(). You should isolate your plugins into their own library, otherwise it will not be possible to shutdown the library!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants