Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine-only example crash #6724

Closed
LeXXik opened this issue Jun 19, 2024 · 14 comments · Fixed by #6902
Closed

Engine-only example crash #6724

LeXXik opened this issue Jun 19, 2024 · 14 comments · Fixed by #6902
Labels

Comments

@LeXXik
Copy link
Contributor

LeXXik commented Jun 19, 2024

Originally reported in #6723

Needs an investigation, if the error is caused by something else.

Repro:

  • open Vehicle example
  • click the code refresh button

image

@kungfooman
Copy link
Collaborator

Funny, did a bunch of tests and this is what happened so far:

  1. It either just works
  2. Same bug as you have
  3. Tires go crazy (either stuck in each other or stuck in ground):

image

@kungfooman
Copy link
Collaborator

I'm in the process of adding steering wheel support to the Vehicle script in an RTI session and stumbled over this again after some dozen reloads:

image

(if someone wants to add conditional breakpoints for debugging)

@kungfooman
Copy link
Collaborator

Investigating a bit more, I hope I found the main problem, still testing:

In the ammo wasm at 0x01FE20 you can use this conditional breakpoint: stack[0].value === 0 || stack[1].value === 0 || stack[2].value === 0. This will trigger whenever this happens for some debugging directly at that place.

With more RTI debugging I figured that the first occurance of NaN happens here:

image

And this gave me an idea for a solution which seems to fix this, but I have to test a bit more... basically this contains the problem:

    // Add wheels to the vehicle
    var wheelAxle = new Ammo.btVector3(-1, 0, 0);
    var wheelDirection = new Ammo.btVector3(0, -1, 0);
    var connectionPoint = new Ammo.btVector3(0, 0, 0);

    this.wheels.forEach(function (wheelEntity) {
        var wheelScript = wheelEntity.script.vehicleWheel;

        var frictionSlip = wheelScript.frictionSlip;
        var isFront = wheelScript.isFront;
        var radius = wheelScript.radius;
        var rollInfluence = wheelScript.rollInfluence;
        var suspensionCompression = wheelScript.suspensionCompression;
        var suspensionDamping = wheelScript.suspensionDamping;
        var suspensionRestLength = wheelScript.suspensionRestLength;
        var suspensionStiffness = wheelScript.suspensionStiffness;

        var wheelPos = wheelEntity.getLocalPosition();
        connectionPoint.setValue(wheelPos.x, wheelPos.y, wheelPos.z);
        var wheelInfo = vehicle.addWheel(connectionPoint, wheelDirection, wheelAxle, suspensionRestLength, radius, tuning, isFront);

        wheelInfo.set_m_suspensionStiffness(suspensionStiffness);
        wheelInfo.set_m_wheelsDampingRelaxation(suspensionDamping);
        wheelInfo.set_m_wheelsDampingCompression(suspensionCompression);
        wheelInfo.set_m_frictionSlip(frictionSlip);
        wheelInfo.set_m_rollInfluence(rollInfluence);
    }, this);

    Ammo.destroy(wheelAxle);
    Ammo.destroy(wheelDirection);
    Ammo.destroy(connectionPoint);

We are malloc'ing these three Ammo vectors, put the same into four calls of vehicle.addWheel and then free them while they are still linked into the vehicle. So based on malloc randomness the values get overwritten with something else, causing both the NaN issue and null function or function signature mismatch.

Possible solution?

Free those vectors later and don't reuse them for each wheel.

@LeXXik
Copy link
Contributor Author

LeXXik commented Aug 29, 2024

Great investigation! Looks like you are correct. Those vectors are stored as reference, and not get copied to each wheel:

https://github.com/bulletphysics/bullet3/blob/e9c461b0ace140d5c73972760781d94b7b5eee53/src/BulletDynamics/Vehicle/btRaycastVehicle.cpp#L59-L65

@kungfooman
Copy link
Collaborator

kungfooman commented Aug 29, 2024

Nice, thank you for the quick check in actual source!

So I guess we could just make an array, add these three vectors for each wheel and just toFree.forEach(Ammo.destroy) or something (at the place where we destroy the vehicle aswell). Would you like to work the details out for a PR?

Hmm, edit:

https://github.com/bulletphysics/bullet3/blob/e9c461b0ace140d5c73972760781d94b7b5eee53/src/LinearMath/btVector3.h#L147-L154

https://github.com/bulletphysics/bullet3/blob/e9c461b0ace140d5c73972760781d94b7b5eee53/src/BulletDynamics/Vehicle/btWheelInfo.h#L19C8-L34

Didn't do C++ for a while, but it seems like the vector references are actually copied based on that assignment operator? But once I don't free those vectors after the loop, it works anyway... so IDK why there is a problem here, but the solution somehow speaks for itself that this must be the issue.

@LeXXik
Copy link
Contributor Author

LeXXik commented Aug 29, 2024

Ah, right, it has the operator overload. Hmm.

@kungfooman
Copy link
Collaborator

Just printed the vectors:

image

Via:

    // Add wheels to the vehicle
    var wheelAxle = new Ammo.btVector3(-1, 0, 0);
    var wheelDirection = new Ammo.btVector3(0, -1, 0);
    var connectionPoint = new Ammo.btVector3(0, 0, 0);
    console.log('wheelAxle.kB', wheelAxle.kB);
    console.log('wheelDirection.kB', wheelDirection.kB);
    console.log('connectionPoint.kB', connectionPoint.kB);

And then testing the pointers from vehicle, which should be the same pointers, if not copied by value (aka only a pointer/reference copy):

image

And they seem to be all independent... quite a strange mystery 👻

@LeXXik
Copy link
Contributor Author

LeXXik commented Aug 29, 2024

Thought the pointer was showing indication that something doesn't get cleaned up, but it was not the case. At least not indicative.

this.collisionConfiguration = new Ammo.btDefaultCollisionConfiguration();

If I hard reset Ammo by forcing it to initialize again, instead of re-using old instance, then the issue goes away. Refreshing browser or hitting reset button give the same result.

static getInstance(moduleName, callback) {
const module = Impl.getModule(moduleName);
if (module.instance) {
callback(module.instance);
} else {
module.callbacks.push(callback);
if (module.config) {
// config has been provided, kick off module initialize
Impl.initialize(moduleName, module);
}
}
}

    static getInstance(moduleName, callback) {
        const module = Impl.getModule(moduleName);

        // if (module.instance) {
        //     callback(module.instance);
        // } else {
            module.callbacks.push(callback);
            module.initializing = false;
            if (module.config) {
                // config has been provided, kick off module initialize
                Impl.initialize(moduleName, module);
            }
        // }
    }

@LeXXik
Copy link
Contributor Author

LeXXik commented Aug 29, 2024

I think the proper fix would be to instantiate a new Ammo instance on reset, as the engine keeps creating new objects in the same Wasm instance, until we eventually run out of memory. I guess something gets referenced to an old object or something.

@kungfooman
Copy link
Collaborator

Doing a full Ammo instance reset just covers that we have memory leaks that may show up in other scenarios aswell? For example spawning/destroying many cars in a GTA style game. I would like a solution that goes to the root of the problem here 🙈

@LeXXik
Copy link
Contributor Author

LeXXik commented Aug 30, 2024

That should be fine. I mean creating multiple cars should be no issue here.

I think the problem comes from the fact that when we click reset button, the engine creates a duplicate dynamics world, dispatcher, overlapping pairs cache, etc. without destroying old ones, and something internally gets referred to either duplicate or old system. Clicking reset button would then create yet another set of those. If we don't destroy the old Wasm instance altogether, then we should clear the current instance before creating them again.

I think the fact that it doesn't crash if you don't destroy vectors is just a lucky circumstance.

@kungfooman
Copy link
Collaborator

I think the problem comes from the fact that when we click reset button, the engine creates a duplicate dynamics world, dispatcher, overlapping pairs cache, etc. without destroying old ones, and something internally gets referred to either duplicate or old system.

They are created here:

this.collisionConfiguration = new Ammo.btDefaultCollisionConfiguration();
this.dispatcher = new Ammo.btCollisionDispatcher(this.collisionConfiguration);
this.overlappingPairCache = new Ammo.btDbvtBroadphase();
this.solver = new Ammo.btSequentialImpulseConstraintSolver();
this.dynamicsWorld = new Ammo.btDiscreteDynamicsWorld(this.dispatcher, this.overlappingPairCache, this.solver, this.collisionConfiguration);

And deleted here on refresh:

destroy() {
super.destroy();
this.app.systems.off('update', this.onUpdate, this);
if (typeof Ammo !== 'undefined') {
Ammo.destroy(this.dynamicsWorld);
Ammo.destroy(this.solver);
Ammo.destroy(this.overlappingPairCache);
Ammo.destroy(this.dispatcher);
Ammo.destroy(this.collisionConfiguration);
this.dynamicsWorld = null;
this.solver = null;
this.overlappingPairCache = null;
this.dispatcher = null;
this.collisionConfiguration = null;
}
}
}

Since we call this._app.destroy(); for a refresh which then snowballs to everything else:

destroy() {
MiniStats.destroy();
if (this._app && this._app.graphicsDevice) {
this._app.destroy();
}
this.destroyHandlers.forEach(destroy => destroy());
this.ready = false;
}

But memory leaks seem obvious here:

_ammoVec1 = new Ammo.btVector3();
_ammoQuat = new Ammo.btQuaternion();
_ammoTransform = new Ammo.btTransform();

static onLibraryLoaded() {
// Lazily create shared variable
if (typeof Ammo !== 'undefined') {
_ammoTransform = new Ammo.btTransform();
_ammoVec1 = new Ammo.btVector3();
_ammoVec2 = new Ammo.btVector3();
_ammoQuat = new Ammo.btQuaternion();
}
}

ammoRayStart = new Ammo.btVector3();
ammoRayEnd = new Ammo.btVector3();

For starters... at least I don't see any corresponding Ammo.destroy for these.

My gut feeling is we reuse old pointers from e.g. those and that corrupts the memory state - we just have to make sure we free properly what we allocate and the errors should just disappear.

@kungfooman
Copy link
Collaborator

kungfooman commented Sep 1, 2024

So I rewrote literally every new Ammo.btWhatever and every Ammo.destroy to meticulously keep track of every Ammo object:

src/framework/components/ammo.js:

const trackedObjects = [];
let createCount = 0;
let deleteCount = 0;
function ammoTrack(object) {
  trackedObjects.push(object);
  object.deleteCount = 0;
  object.stack = new Error().stack; // Track *where* it was created
  createCount++;
  return object;
}
function ammoPurge() {
  for (const o of trackedObjects) {
    if (o.deleteCount === 0) {
      console.log("Purge", o);
      ammoDestroy(o);
    }
  }
}
function ammoOverview() {
  let a = 0;
  let b = 0;
  for (const o of trackedObjects) {
    if (o.deleteCount === 0) {
      a++;
    } else {
      b++;
    }
  }
  console.log(`ammoOverview notDeleted=${a} deleted=${b} createCount=${createCount} deleteCount=${deleteCount}`);
}
function ammoDestroy(o) {
  deleteCount++;
  o.deleteCount++;
  if (o.deleteCount > 1) {
    // Should be deleted only once...
    debugger;
  }
  try {
    Ammo.destroy(o);
  } catch (e) {
    console.warn("ammoDestroy> whatever");
  }
}
export {ammoTrack, ammoPurge, ammoOverview, ammoDestroy, trackedObjects};

After a refresh I call pc.ammoOverview() and every refresh added 64 Ammo objects that were never destroyed (o.destroyCount === 0). Okay, but which? Easy to look into the logged stack for each object:

image

Clear thing, rigidbody objects aren't freed.

Remember that 10x5 wall in the Vehicle example? Those get never freed it seems.

Usually I can refresh the example 9x.

If I turn the 10x5 wall into a 20x5 wall I can refresh 7x.
If I turn the 10x5 wall into a 30x5 wall I can refresh 2x.
If I turn the 10x5 wall into a 40x5 wall I can refresh 1x.

I counted 64 unfreed objects, but the wall in itself is of course only 50 rigid bodies... but that's a start that should be fixed.

Edit: Oops, this kind of measuring didn't account for the body.getMotionState() JS object which needs to be deleted separately, so that was a wrong positive. Need to figure out the other cases which aren't deleted.

@willeastcott
Copy link
Contributor

This is great detective work. I'm guessing the rigid bodies are not destroyed because the hierarchy is never explicitly destroyed? If it was explicitly destroyed, the components wouldn't be removed from all entities, and when a RigidBodyComponent is removed from an entity, the Ammo body is destroyed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants