heap-use-after-free in ExampleCluster #174

tmadlener · 2021-01-28T16:40:11Z

As of 29dfbdd there is at least one heap-use-after-free problem in our tests, and I think this also affects the generated code. It seems to have gone unnoticed until now because for some reason builds with gcc do not seem to suffer from any runtime problems. However, builds with clang do and occasionally lead to runtime problems, i.e. segmentation faults.

After instrumenting the core podio library with AddressSanitizer, running the tests/write points to an attempt of releaseing an already destroyed Obj again. I have attached the complete output below. I have currently named the ExampleCluster in the issue title, since this is the first instance where this problem occurs, but it doesn't have to be the only one: I am not yet sure whether this problem only affects types with relations, or if this is a more general problem that affects instances that have not been created via Collection::create but instead via directly constructing them and then adding them to a collection. I also do not know why gcc builds seem to be less affected by this at runtime compared to clang builds.

address_sanitize_podio.txt

The text was updated successfully, but these errors were encountered:

tmadlener · 2021-01-28T17:09:04Z

This seems to be a more fundamental problem, that is connected with clearing collections. A minimal reproducer is:

#include "datamodel/ExampleClusterCollection.h"

int main () {
  auto clusters = ExampleClusterCollection();
  auto cluster = clusters.create();
  clusters.clear(); // remove this and everything works
}

tmadlener · 2021-02-04T11:33:14Z

This is a very fundamental problem that, as far as I can tell, happens every time we clear a collection in the same scope as we have either created or obtained objects from the collection.

The underlying problem is that the call to clusters.clear() already destroys the ExampleClusterObj that is held by the cluster. Consequently, by the time cluster goes out of scope and calls its destructor, we are more or less trying to destroy the same ExampleClusterObj again:

ExampleCluster::~ExampleCluster() {
  if (m_obj) m_obj->release();
}

We would usually get a double-free here, but since podio::ObjBase::release checks the index first:

podio/include/podio/ObjBase.h

Lines 22 to 30 in 6b18dfc

    
           /// checks whether object is "untracked" by a collection 
        
           /// if yes, decrease reference count and delete itself if count===0 
        
           int release(){ 
        
             if (id.index != podio::ObjectID::untracked){ return 1;}; 
        
             if (--ref_counter == 0) { 
        
               delete this; 
        
             } 
        
             return 0; 
        
           };

we get a use-after-free. The reason why we are not running in a double-free more often is, because first id.index would have to be -1, which is unlikely when reading from random memory. And additionally also ref_counter would have to be 0, again unlikely from random memory. So the chances of calling delete this a second time are extremely small.

Since the cluster has no connection to the collection any longer there is no way to "inform" it that m_obj has already been destroyed by the collection.

tmadlener · 2021-09-15T08:30:24Z

Coming back to this once more. I think this is even more fundamental than I previously thought. Up until now I was under the impression that this will not be a problem in our "usual workflows" because the objects will (almost) never be created in the same scope as the collections are cleared.

However, the issue goes deeper than that. E.g. the following minimal showcase also has the same heap-use-after-free problem:

#include "datamodel/ExampleWithOneRelationCollection.h"
#include "datamodel/ExampleClusterCollection.h"

int main() {
  auto clusters = ExampleClusterCollection();
  auto oneRelations = ExampleWithOneRelationCollection();

  // introduce a new scope to get around the originally described problem
  {
    auto cluster = clusters.create();
    auto rel = oneRelations.create();
    rel.cluster(cluster);
  }

  clusters.clear(); // clear this second and everything works
  oneRelations.clear();
}

In this case ~ExampleWithOneRelationObj looks like this (m_cluster is a ConstExampleCluster*)

~ExampleWithOneRelationObj() {
  if (m_cluster) delete m_cluster;
}

Hence, it will call the deconstructor of ConstExampleCluster, which will then in turn call release, where the above described heap-use-after-free happens, because the corresponding ObjBase was already removed by the call to clusters.clear(). Switching the order in which we call clear on the collections could fix the problem here, but that is not really a viable solution in general.

hegner · 2021-09-15T08:42:36Z

OK. I think we have to go back to the drawing board here to think of all possible corner cases. What seems a possible conclusion is that the ref counting part and the data itself may need to be separated more clearly. Which would make it potentially incredibly complicated.
We should have a discussion meeting on that.

gaede · 2021-09-16T10:04:29Z

Following up on our discussion yesterday: could the whole issue be resolved by introducing a new type of objects that are not yet managed (owned) by a collection ?
These handle types would manage the underlying storage (own the obj and thereby the data) and delete it, when going out of scope - unless they have been moved into a collection. Normal types can only be created by the collection that has the ownership from the beginning.
Some pseudo code:

auto clucol = event.get<ClusterCollection>("clusters") ;
for(i : range){
  auto clu = clucol.create() ;
  clu.setEnergy( 42. ) ;
}

// some candidate clusters:
for(j : range){
  ClusterCand clu ;   // a cluster candidate 
  clu.setEnergy( j * 4 ) ;

  if( clu.energy() > 42.) {
     clucol.emplace_back( clu ) ;
     /// ownership and memory handling moved to collection for these clusters
   } 
}

/// clusters not passed to collection go out of scope and are deleted

This would greatly simplify the memory handling. Am I missing something ?

tmadlener · 2021-09-22T14:01:41Z

Another possibility could be to introduce a "semi smart pointer" that is used instead of the raw Obj* in the user classes. Here we would have a handle in the destructor on how to handle different cases. The basic layout would look like this

template<typename T>
class maybe_shared_ptr {
  // c'tors, d'tors, operator->, get()
private:
  T* ptr;
  struct ControlBlock {
    std::atomic<unsigned> count{1};
    std::atomic<bool> owned{true};
  };
  ControlBlock* ctrl_block{nullptr};
};

There are three different states:

When handed out by a collection, only the ptr will be populated and the ctrl_block will remain nullptr. The destructor would check the latter and be a no-op from there.
When created via a free standing (user) object, the first construction will construct the ctrl_block and assume ownership (the defaults above). Copies will then simply get the same ptr and ctrl_block but increase the reference count by one. Similarly destroying a user object will decrease the reference count by one (until it reaches 0 and both the ptr and the ctrl_block will be deleted).
When created as a free standing (user) object but then added to a collection, everything will happen the same way as in the case above (i.e. the reference counting remains unchanged). The major difference is that at the time the object is added to the collection the owned flag is flipped and this changes the behavior of the destructor to only destroy the ctrl_block once the ref count hits zero, but leave the ptr untouched (as that will be destroyed by the collection).

Basically what would change with respect to the current situation is that the ref-count mainly controls the lifetime of the control block, and only conditionally that of the "managed" pointer. I am not yet entirely sure if this covers all edge cases, but I think it could. One of the drawbacks is that the user facing classes will double their size (essentially from one pointer to two pointers). However, I think that could be acceptable if it indeed solves all our problems.

tmadlener mentioned this issue Sep 27, 2021

[wip] Introduce MaybeSharedPtr type and use it for mutable objects #220

Closed

This was referenced Dec 3, 2021

Add sanitizer build options #249

Merged

Fix failing tests in sanitizer builds #250

Closed

tmadlener mentioned this issue Jan 31, 2022

Add templated links between arbitrary datatypes #257

Open

13 tasks

tmadlener mentioned this issue May 26, 2023

Make CollectionIDs a 32bit hash value of the collection name #412

Merged

tmadlener mentioned this issue Sep 28, 2023

AddressSanitizer: heap-use-after-free in object destructor #492

Closed

tmadlener mentioned this issue Nov 10, 2023

Introduce MaybeSharedPtr for managing user facing objects #514

Merged

2 tasks

tmadlener closed this as completed in #514 Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

heap-use-after-free in ExampleCluster #174

heap-use-after-free in ExampleCluster #174

tmadlener commented Jan 28, 2021

tmadlener commented Jan 28, 2021

tmadlener commented Feb 4, 2021

tmadlener commented Sep 15, 2021

hegner commented Sep 15, 2021

gaede commented Sep 16, 2021

tmadlener commented Sep 22, 2021

heap-use-after-free in ExampleCluster #174

heap-use-after-free in ExampleCluster #174

Comments

tmadlener commented Jan 28, 2021

tmadlener commented Jan 28, 2021

tmadlener commented Feb 4, 2021

tmadlener commented Sep 15, 2021

hegner commented Sep 15, 2021

gaede commented Sep 16, 2021

tmadlener commented Sep 22, 2021