Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe queries 3: dense range #7341

Merged
merged 4 commits into from
Sep 4, 2024
Merged

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Sep 3, 2024

Implements the dense range dataframe APIs.

Examples:

cargo r --all-features -p re_dataframe --example range -- /tmp/data.rrd /helix/structure/scaffolding/beads
cargo r --all-features -p re_dataframe --example range -- /tmp/data.rrd /helix/structure/scaffolding/beads /helix/structure/scaffolding/**
use itertools::Itertools as _;

use re_chunk_store::{
    ChunkStore, ChunkStoreConfig, ComponentColumnDescriptor, RangeQueryExpression, Timeline,
    VersionPolicy,
};
use re_dataframe::QueryEngine;
use re_log_types::{ResolvedTimeRange, StoreKind};

fn main() -> anyhow::Result<()> {
    let args = std::env::args().collect_vec();

    let get_arg = |i| {
        let Some(value) = args.get(i) else {
            eprintln!(
                "Usage: {} <path_to_rrd_with_position3ds> <entity_path_pov> [entity_path_expr]",
                args.first().map_or("$BIN", |s| s.as_str())
            );
            std::process::exit(1);
        };
        value
    };

    let path_to_rrd = get_arg(1);
    let entity_path_pov = get_arg(2).as_str();
    let entity_path_expr = args.get(3).map_or("/**", |s| s.as_str());

    let stores = ChunkStore::from_rrd_filepath(
        &ChunkStoreConfig::DEFAULT,
        path_to_rrd,
        VersionPolicy::Warn,
    )?;

    for (store_id, store) in &stores {
        if store_id.kind != StoreKind::Recording {
            continue;
        }

        let cache = re_dataframe::external::re_query::Caches::new(store);
        let engine = QueryEngine {
            store,
            cache: &cache,
        };

        let query = RangeQueryExpression {
            entity_path_expr: entity_path_expr.into(),
            timeline: Timeline::log_tick(),
            time_range: ResolvedTimeRange::new(0, 30),
            pov: ComponentColumnDescriptor::new::<re_types::components::Position3D>(
                entity_path_pov.into(),
            ),
        };

        let query_handle = engine.range(&query, None /* columns */);
        eprintln!("{query}:");
        for batch in query_handle.into_iter() {
            eprintln!("{batch}");
        }
    }

    Ok(())
}

Dataframe APIs PR series:

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@teh-cmc teh-cmc added ⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself do-not-merge Do not merge this PR include in changelog labels Sep 3, 2024
@teh-cmc teh-cmc force-pushed the cmc/dataframe_queries_2_latestat branch from dd81b55 to b568403 Compare September 3, 2024 11:14
@teh-cmc teh-cmc force-pushed the cmc/dataframe_queries_4_range branch from 77382d4 to b0cb111 Compare September 3, 2024 11:15
@teh-cmc teh-cmc marked this pull request as ready for review September 3, 2024 11:46
@teh-cmc
Copy link
Member Author

teh-cmc commented Sep 3, 2024

@rerun-bot full-check

Copy link

github-actions bot commented Sep 3, 2024

Started a full build: https://github.com/rerun-io/rerun/actions/runs/10682478883

@teh-cmc teh-cmc force-pushed the cmc/dataframe_queries_4_range branch from 5795664 to 4056357 Compare September 3, 2024 12:32
@teh-cmc teh-cmc force-pushed the cmc/dataframe_queries_4_range branch from 4056357 to eda1e78 Compare September 3, 2024 14:56
Comment on lines 50 to 52
pov: ComponentColumnDescriptor::new::<re_types::components::Position3D>(
entity_path_pov.into(),
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still leaving that specific part hardcoded, because this is an example not the end-all-be-all CLI query tool.

It does show why using a full blown ComponentColumnDescriptor as pov is a massive pain though. To be improved on later on.

@teh-cmc teh-cmc force-pushed the cmc/dataframe_queries_4_range branch from eda1e78 to cb253be Compare September 3, 2024 15:03
crates/store/re_dataframe/src/range.rs Show resolved Hide resolved
Comment on lines +151 to +154
// TODO(cmc): There are more efficient, albeit infinitely more complicated ways to do this.
// Let's first implement all features (multi-PoV, pagination, timestamp streaming, etc) and
// see if this ever becomes an issue before going down this road.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. We'll always need this as a worst-case fallback path, but I can definitely imagine this being replaced with a series of checks that determine whether particular happy-path join operations are applicable with more efficient implementations for those cases.

crates/store/re_dataframe/src/range.rs Outdated Show resolved Hide resolved
teh-cmc added a commit that referenced this pull request Sep 4, 2024
All the boilerplate for the new `re_dataframe`.

Also introduces all the new types:
* `QueryExpression`, `LatestAtQueryExpression`, `RangeQueryExpression`
* `QueryHandle`, `LatestAtQueryHandle` (unimplemented),
`RangeQueryHandle` (unimplemented)
* `ColumnDescriptor`, `ControlColumnDescriptor`, `TimeColumnDescriptor`,
`ComponentColumnDescriptor`

No actual code logic, just definitions.

* Part of #7284 

---

Dataframe APIs PR series:
- #7338
- #7339
- #7340
- #7341
- #7345
teh-cmc added a commit that referenced this pull request Sep 4, 2024
The schema resolution logic.

* Part of #7284 

---

Dataframe APIs PR series:
- #7338
- #7339
- #7340
- #7341
- #7345
@teh-cmc teh-cmc force-pushed the cmc/dataframe_queries_2_latestat branch from cdd842e to 175e63d Compare September 4, 2024 08:22
teh-cmc added a commit that referenced this pull request Sep 4, 2024
Implements the latest-api dataframe API.

Examples:
```
cargo r --all-features -p re_dataframe --example latest_at -- /tmp/helix.rrd
cargo r --all-features -p re_dataframe --example latest_at -- /tmp/helix.rrd /helix/structure/scaffolding/**
```

```rust
use itertools::Itertools as _;

use re_chunk::{TimeInt, Timeline};
use re_chunk_store::{ChunkStore, ChunkStoreConfig, LatestAtQueryExpression, VersionPolicy};
use re_dataframe::QueryEngine;
use re_log_types::StoreKind;

fn main() -> anyhow::Result<()> {
    let args = std::env::args().collect_vec();

    let get_arg = |i| {
        let Some(value) = args.get(i) else {
            eprintln!(
                "Usage: {} <path_to_rrd> <entity_path_expr>",
                args.first().map_or("$BIN", |s| s.as_str())
            );
            std::process::exit(1);
        };
        value
    };

    let path_to_rrd = get_arg(1);
    let entity_path_expr = args.get(2).map_or("/**", |s| s.as_str());

    let stores = ChunkStore::from_rrd_filepath(
        &ChunkStoreConfig::DEFAULT,
        path_to_rrd,
        VersionPolicy::Warn,
    )?;

    for (store_id, store) in &stores {
        if store_id.kind != StoreKind::Recording {
            continue;
        }

        let cache = re_dataframe::external::re_query::Caches::new(store);
        let engine = QueryEngine {
            store,
            cache: &cache,
        };

        let query = LatestAtQueryExpression {
            entity_path_expr: entity_path_expr.into(),
            timeline: Timeline::log_time(),
            at: TimeInt::MAX,
        };

        let query_handle = engine.latest_at(&query, None /* columns */);
        let batch = query_handle.get();

        eprintln!("{query}:\n{batch}");
    }

    Ok(())
}
```

* Part of #7284 

---

Dataframe APIs PR series:
- #7338
- #7339
- #7340
- #7341
- #7345
Base automatically changed from cmc/dataframe_queries_2_latestat to main September 4, 2024 08:25
@teh-cmc teh-cmc force-pushed the cmc/dataframe_queries_4_range branch from 67df21f to b1d1c51 Compare September 4, 2024 08:28
@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Sep 4, 2024
@teh-cmc teh-cmc merged commit 9a994a0 into main Sep 4, 2024
27 of 29 checks passed
@teh-cmc teh-cmc deleted the cmc/dataframe_queries_4_range branch September 4, 2024 08:29
teh-cmc added a commit that referenced this pull request Sep 4, 2024
Implements the paginated dense range dataframe APIs.

If there's no off-by-one anywhere in there, I will eat my hat.
Getting this in the hands of people is the highest prio though, I'll add
tests later.


![image](https://github.com/user-attachments/assets/e865ba62-21db-41c1-9899-35a0e7aea134)

![image](https://github.com/user-attachments/assets/32934ba8-2673-401a-aafc-409dfbe9b2c5)


* Fixes #7284 

---

Dataframe APIs PR series:
- #7338
- #7339
- #7340
- #7341
- #7345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New ChunkStore APIs to facilitate data access
3 participants