Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic due to assertion chunk_id_removed.is_some() #7992

Closed
grtlr opened this issue Nov 4, 2024 · 5 comments · Fixed by #8020
Closed

Panic due to assertion chunk_id_removed.is_some() #7992

grtlr opened this issue Nov 4, 2024 · 5 comments · Fixed by #8020
Assignees
Labels
🪳 bug Something isn't working 💣 crash crash, deadlock/freeze, do-no-start ⛃ re_datastore affects the datastore itself

Comments

@grtlr
Copy link
Contributor

grtlr commented Nov 4, 2024

Describe the bug

I'm currently working on the graph primitives and I'm encountering a panic when using log_static multiple times. Although, of course, there should only ever be a single call to log_static for a given entity, having multiple calls should not cause Rerun to crash.

To Reproduce
Steps to reproduce the behavior:

I'm on branch #7500, and I'm using the following code:

use napkin::force::SimulationBuilder;
use rerun::GraphNodes;

const NUM_NODES: usize = 10;

fn main() -> anyhow::Result<()> {
    let rec = rerun::RecordingStreamBuilder::new("napkin_disjoint").spawn()?;

    let nodes = (0..NUM_NODES)
        .map(|i| (format!("node{}", i), [0.0, 0.0]))
        .collect::<Vec<_>>();

    let builder = SimulationBuilder::default();
    let mut simulation = builder
        .build(nodes.clone())
        .add_force_collide("collide".into(), Default::default());

    simulation.step_with(
        10.into(),
        || {
            let nodes = nodes.iter().map(|(key, _)| key.clone()).collect::<Vec<_>>();
            println!("Step!");
            rec.log_static("/nodes", &GraphNodes::new(nodes)).unwrap();
        },
        || {
            let nodes = nodes.iter().map(|(key, _)| key.clone()).collect::<Vec<_>>();
            println!("Finish!");
            rec.log_static("/nodes", &GraphNodes::new(nodes)).unwrap();
        },
    );

    Ok(())
}

If it helps, I can also share the complete example, including the unreleased napkin crate.

Removing the _static makes the panic go away. Unortunately, I was not able to create a self-contained example yet, but I'm happy to try some more if it helps.

Expected behavior

Rerun should not crash but rather emit an error / warning.

Backtrace

thread 'main' panicked at 'assertion failed: chunk_id_removed.is_some()'
re_chunk_store/src/writes.rs:145
stack backtrace:
   7: core::panicking::panic_fmt
             at core/src/panicking.rs:72:14
   8: core::panicking::panic
             at core/src/panicking.rs:146:5
   9: re_chunk_store::writes::<impl re_chunk_store::store::ChunkStore>::insert_chunk
             at re_chunk_store/src/writes.rs:145:25
  10: re_entity_db::entity_db::EntityDb::add_chunk
             at re_entity_db/src/entity_db.rs:367:28
  11: re_entity_db::entity_db::EntityDb::add
             at re_entity_db/src/entity_db.rs:353:17
  12: re_viewer::app::App::receive_messages
             at re_viewer/src/app.rs:1160:19
  13: <re_viewer::app::App as eframe::epi::App>::update
             at re_viewer/src/app.rs:1718:9
  14: eframe::native::epi_integration::EpiIntegration::update::{{closure}}
             at eframe-0.29.1/src/native/epi_integration.rs:283:17
      egui::context::Context::run
             at egui-0.29.1/src/context.rs:815:13
  15: eframe::native::epi_integration::EpiIntegration::update
             at eframe-0.29.1/src/native/epi_integration.rs:276:27
  16: eframe::native::wgpu_integration::WgpuWinitRunning::run_ui_and_paint
             at eframe-0.29.1/src/native/wgpu_integration.rs:603:27
      <eframe::native::wgpu_integration::WgpuWinitApp as eframe::native::winit_integration::WinitApp>::run_ui_and_paint
             at eframe-0.29.1/src/native/wgpu_integration.rs:372:13
  17: <eframe::native::run::WinitAppWrapper<T> as winit::application::ApplicationHandler<eframe::native::winit_integration::UserEvent>>::window_event::{{closure}}
             at eframe-0.29.1/src/native/run.rs:283:21
      eframe::native::event_loop_context::with_event_loop_context
             at eframe-0.29.1/src/native/event_loop_context.rs:53:5
      <eframe::native::run::WinitAppWrapper<T> as winit::application::ApplicationHandler<eframe::native::winit_integration::UserEvent>>::window_event
             at eframe-0.29.1/src/native/run.rs:280:9
  18: winit::event_loop::dispatch_event_for_app
             at winit-0.30.5/src/event_loop.rs:642:52
      winit::event_loop::EventLoop<T>::run_app::{{closure}}
             at winit-0.30.5/src/event_loop.rs:265:49
  19: winit::platform_impl::macos::event_loop::map_user_event::{{closure}}
             at winit-0.30.5/src/platform_impl/macos/event_loop.rs:174:22
  20: <alloc::boxed::Box<F,A> as core::ops::function::FnMut<Args>>::call_mut
             at alloc/src/boxed.rs:2029:9
      winit::platform_impl::macos::event_handler::EventHandler::handle_event
             at winit-0.30.5/src/platform_impl/macos/event_handler.rs:125:17
  21: winit::platform_impl::macos::app_state::ApplicationDelegate::handle_event
             at winit-0.30.5/src/platform_impl/macos/app_state.rs:303:9
  22: winit::platform_impl::macos::app_state::ApplicationDelegate::cleared
             at winit-0.30.5/src/platform_impl/macos/app_state.rs:365:13
  23: winit::platform_impl::macos::observer::control_flow_end_handler::{{closure}}
             at winit-0.30.5/src/platform_impl/macos/observer.rs:84:21
      winit::platform_impl::macos::observer::control_flow_handler::{{closure}}
             at winit-0.30.5/src/platform_impl/macos/observer.rs:46:9
      std::panicking::try::do_call
             at std/src/panicking.rs:559:40
      std::panicking::try
             at std/src/panicking.rs:523:19
  24: std::panic::catch_unwind
             at std/src/panic.rs:149:14
      winit::platform_impl::macos::event_loop::stop_app_on_panic
             at winit-0.30.5/src/platform_impl/macos/event_loop.rs:435:11
      winit::platform_impl::macos::observer::control_flow_handler
             at winit-0.30.5/src/platform_impl/macos/observer.rs:44:5
      winit::platform_impl::macos::observer::control_flow_end_handler
             at winit-0.30.5/src/platform_impl/macos/observer.rs:79:9
  25: <unknown>
  26: <unknown>
  27: <unknown>
  28: <unknown>
  29: <unknown>
  30: <unknown>
  31: <unknown>
  32: <unknown>
  33: <unknown>
  34: <unknown>
  35: winit::platform_impl::macos::event_loop::EventLoop<T>::run_on_demand::{{closure}}::{{closure}}
             at winit-0.30.5/src/platform_impl/macos/event_loop.rs:306:26
      objc2::rc::autorelease::autoreleasepool
             at objc2-0.5.2/src/rc/autorelease.rs:438:15
      winit::platform_impl::macos::event_loop::EventLoop<T>::run_on_demand::{{closure}}
             at winit-0.30.5/src/platform_impl/macos/event_loop.rs:292:13
      winit::platform_impl::macos::event_handler::EventHandler::set
             at winit-0.30.5/src/platform_impl/macos/event_handler.rs:98:9
  36: winit::platform_impl::macos::app_state::ApplicationDelegate::set_event_handler
             at winit-0.30.5/src/platform_impl/macos/app_state.rs:172:9
      winit::platform_impl::macos::event_loop::EventLoop<T>::run_on_demand
             at winit-0.30.5/src/platform_impl/macos/event_loop.rs:291:9
  37: <winit::event_loop::EventLoop<T> as winit::platform::run_on_demand::EventLoopExtRunOnDemand>::run_on_demand
             at winit-0.30.5/src/platform/run_on_demand.rs:89:9
      winit::platform::run_on_demand::EventLoopExtRunOnDemand::run_app_on_demand
             at winit-0.30.5/src/platform/run_on_demand.rs:75:9
      eframe::native::run::run_and_return
             at eframe-0.29.1/src/native/run.rs:300:16
      eframe::native::run::run_wgpu::{{closure}}
             at eframe-0.29.1/src/native/run.rs:357:13
      eframe::native::run::with_event_loop::{{closure}}
             at eframe-0.29.1/src/native/run.rs:52:12
      std::thread::local::LocalKey<T>::try_with
             at std/src/thread/local.rs:286:12
      std::thread::local::LocalKey<T>::with
             at std/src/thread/local.rs:262:9
      eframe::native::run::with_event_loop
             at eframe-0.29.1/src/native/run.rs:42:16
      eframe::native::run::run_wgpu
             at eframe-0.29.1/src/native/run.rs:355:16
  38: eframe::run_native
             at eframe-0.29.1/src/lib.rs:265:13

Troubleshooting Rerun: https://www.rerun.io/docs/getting-started/troubleshooting 
Report bugs: https://github.com/rerun-io/rerun/issues

Desktop (please complete the following information):

  • OS: macOS Ventura 13.7 (Intel)

Rerun version
#7500.

Additional context

@grtlr grtlr added 🪳 bug Something isn't working 👀 needs triage This issue needs to be triaged by the Rerun team labels Nov 4, 2024
@grtlr grtlr changed the title Panic due to chunk_id_removed.is_some() Panic due to assertion chunk_id_removed.is_some() Nov 4, 2024
@jleibs jleibs added this to the 0.20 - Maps, H.264, Undo milestone Nov 4, 2024
@abey79 abey79 added ⛃ re_datastore affects the datastore itself and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Nov 4, 2024
@grtlr
Copy link
Contributor Author

grtlr commented Nov 4, 2024

The following .rrd file can be used to reproduce the issue, but most likely will require #7500 to reproduce as well.

test.rrd.zip

The crash does not happen on the following (web) versions:

The crash does happen on:

@emilk emilk added the 💣 crash crash, deadlock/freeze, do-no-start label Nov 5, 2024
@teh-cmc
Copy link
Member

teh-cmc commented Nov 5, 2024

I'll have a look but I'm removing the 0.20 milestone blocker in the meantime: these are just my usual too-paranoid-until-proven-wrong debug assertions, they have no impact on release builds whatsoever.

The crash does not happen on the following (web) versions:
* https://rerun.io/viewer/version/0.19.0
* https://rerun.io/viewer/version/nightly

That's because these are release builds.

@teh-cmc teh-cmc removed this from the 0.20 - Maps, H.264, Undo milestone Nov 5, 2024
@grtlr
Copy link
Contributor Author

grtlr commented Nov 5, 2024

Sounds good—thank you! I have encountered the assertion again in another context, again when logging a second time to an entity that has been statically logged before.

The use case of (statically) logging two different archetypes to the same entity came up during #7500, where we need to log nodes and edges.

@teh-cmc
Copy link
Member

teh-cmc commented Nov 5, 2024

Statically logging to the same entity multiple times is perfectly legitimate btw, and even something we encourage in some scenarios (e.g. soft real-time telemetry).

Feel free to comment those debug_assertions in your branch to make your life simpler in the meantime.

@teh-cmc teh-cmc self-assigned this Nov 6, 2024
@teh-cmc
Copy link
Member

teh-cmc commented Nov 6, 2024

Repro:

use std::f32::consts::TAU;

use rerun::demo_util::color_spiral;

const NUM_POINTS: usize = 100;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let rec = rerun::RecordingStreamBuilder::new("rerun_example_dna_abacus").connect_tcp()?;

    rec.set_time_seconds("stable_time", 0f64);

    for _ in 0..100 {
        let (points1, _colors1) = color_spiral(NUM_POINTS, 2.0, 0.02, 0.0, 0.1);
        let (points2, colors2) = color_spiral(NUM_POINTS, 2.0, 0.02, TAU * 0.5, 0.1);
        rec.log_static("/nodes", &rerun::Points3D::new(points1))?;
        rec.log_static(
            "/nodes",
            &rerun::Points3D::new(points2).with_colors(colors2),
        )?;
    }

    for _ in 0..100 {
        let (points1, colors1) = color_spiral(NUM_POINTS, 2.0, 0.02, 0.0, 0.1);
        let (points2, _colors2) = color_spiral(NUM_POINTS, 2.0, 0.02, TAU * 0.5, 0.1);
        rec.log_static(
            "/nodes",
            &rerun::Points3D::new(points1).with_colors(colors1),
        )?;
        rec.log_static("/nodes", &rerun::Points3D::new(points2))?;
    }

    Ok(())
}
  1. Start a viewer in debug:
cargo r -p rerun-cli --no-default-features --features native_viewer,nasm
  1. Run the code above in release (might need to do it a handful of times, it's timing dependent)

teh-cmc added a commit that referenced this issue Nov 7, 2024
…ks (#8020)

The assertion was correct 🥳🎈.

There was a real memleak in there when dealing with partially dangling
static chunks (simplifying a bit: those are static chunks being
overwritten for some components but not all, which is a very niche use
case, thus why we've never hit it until now).

* Fixes #7992 

cc @grtlr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working 💣 crash crash, deadlock/freeze, do-no-start ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants