Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re_datastore: get should give you a row, not a [row] #590

Merged
merged 8 commits into from
Dec 18, 2022

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Dec 18, 2022

This does what it says it does...

...but it also kills usage of Array::slice(), which turns out to be very slow.

datastore/batch/rects/insert
                        time:   [906.63 µs 907.25 µs 908.18 µs]
                        thrpt:  [11.011 Melem/s 11.022 Melem/s 11.030 Melem/s]
                 change:
                        time:   [-1.3064% -1.1775% -1.0614%] (p = 0.00 < 0.05)
                        thrpt:  [+1.0728% +1.1916% +1.3237%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

datastore/batch/rects/query
                        time:   [417.19 ns 417.87 ns 418.88 ns]
                        thrpt:  [238.73 Melem/s 239.31 Melem/s 239.70 Melem/s]
                 change:
                        time:   [-60.284% -60.229% -60.179%] (p = 0.00 < 0.05)
                        thrpt:  [+151.12% +151.44% +151.79%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

@teh-cmc teh-cmc force-pushed the cmc/datastore/get_a_single_row branch from cb0d833 to b7e5fd5 Compare December 18, 2022 15:06
teh-cmc added a commit that referenced this pull request Dec 18, 2022
@teh-cmc teh-cmc merged commit 16a835c into main Dec 18, 2022
@teh-cmc teh-cmc deleted the cmc/datastore/get_a_single_row branch December 18, 2022 18:03
teh-cmc added a commit that referenced this pull request Dec 19, 2022
* get is supposed to return a row, not a [row]

* unwrap note

* the bench too

* self review

* doc test also

* and re_query ofc!

* slicing is _very_ slow, don't do it if you don't have to

* no more col_arrays in re_query

* there's actually no need for concatenating at all

* incrementally compute and cache bucket sizes

* cleaning up and documenting existing limitations

* introducing bucket retirement

* issue ref

* some more doc stuff

* self-review

* polars/fmt should always be there for tests

* streamlining batch support

* take list header into account

* it's fine

* self-review

* just something i want to keep around for later

* (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow!

* merge cmc/datastore/get_a_single_row (#590)

* no more col_arrays in re_query

* addressing PR comments, I hope

* missed a couple

* addressed PR comments
teh-cmc added a commit that referenced this pull request Dec 22, 2022
* get is supposed to return a row, not a [row]

* unwrap note

* the bench too

* self review

* doc test also

* and re_query ofc!

* slicing is _very_ slow, don't do it if you don't have to

* no more col_arrays in re_query

* there's actually no need for concatenating at all

* incrementally compute and cache bucket sizes

* cleaning up and documenting existing limitations

* introducing bucket retirement

* issue ref

* some more doc stuff

* self-review

* polars/fmt should always be there for tests

* streamlining batch support

* take list header into account

* it's fine

* self-review

* just something i want to keep around for later

* (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow!

* merge cmc/datastore/get_a_single_row (#590)

* no more col_arrays in re_query

* introducing the notion of clustering key, thankfully breaking all tests by design

* making good use of that shiny new Instance component

* merge cmc/datastore/get_rid_of_copies (#584)

* missed one

* introducing arrow_util with is_dense_array()

* finding the clustering comp of the row... or creating it!

* rebasin'

* post rebase clean up

* addressing PR comments, I hope

* ensure that clustering components are properly sorted, failing the existing test suite

* build_instances now generate sorted ids, thus greenlighting the test suite

* missed a couple

* addressed PR comments

* going for the ArrayExt route

* completing the quadrifecta of checks

* the unavoidable typed error revolution is on its way, and it's early

* where we're going we don't need polars

* update everything for the new APIs

* error for unsupported clustering key types

* clean up and actually testing our error paths

* move those nasty internal tests into their own dirty corner

* finally some high-level tests in here

* i happen to like where this is going

* shuffling things

* demonstrating that implicit instances are somehow broken

* fully working implicit clustering keys, but demonstrating a sorting issue somewhere

* there is still something weird going on tho

* latest_at behaving as one would expect

* automatically cache generated cluster instances

* time to clean up en masse

* still want to put some stress on the bucketing

* make ArrayExt::is_dense a little more friendly, just in case...

* re_query: use polars/fmt in tests

* re_query: remove implicit instances

* fixing the u32 vs u64 instance drama

* cluster-aware polars helpers :>

* cleanin up tests

* continuing cleanup and doc

* updating visuals for this brave new world

* docs

* self-review

* bruh

* bruh...

* ...

* outdated comment

* no reason to search for it multiple times

* polars_helpers => polars_util for consistency's sake

* addressing PR comments and a couple other things

* xxx

* post-merge fixes

* more fixes
teh-cmc added a commit that referenced this pull request Jan 2, 2023
* get is supposed to return a row, not a [row]

* unwrap note

* the bench too

* self review

* doc test also

* and re_query ofc!

* slicing is _very_ slow, don't do it if you don't have to

* no more col_arrays in re_query

* there's actually no need for concatenating at all

* incrementally compute and cache bucket sizes

* cleaning up and documenting existing limitations

* introducing bucket retirement

* issue ref

* some more doc stuff

* self-review

* polars/fmt should always be there for tests

* streamlining batch support

* take list header into account

* it's fine

* self-review

* just something i want to keep around for later

* (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow!

* merge cmc/datastore/get_a_single_row (#590)

* no more col_arrays in re_query

* introducing the notion of clustering key, thankfully breaking all tests by design

* making good use of that shiny new Instance component

* merge cmc/datastore/get_rid_of_copies (#584)

* missed one

* introducing arrow_util with is_dense_array()

* finding the clustering comp of the row... or creating it!

* rebasin'

* post rebase clean up

* addressing PR comments, I hope

* ensure that clustering components are properly sorted, failing the existing test suite

* build_instances now generate sorted ids, thus greenlighting the test suite

* missed a couple

* addressed PR comments

* going for the ArrayExt route

* completing the quadrifecta of checks

* the unavoidable typed error revolution is on its way, and it's early

* where we're going we don't need polars

* update everything for the new APIs

* error for unsupported clustering key types

* clean up and actually testing our error paths

* move those nasty internal tests into their own dirty corner

* finally some high-level tests in here

* i happen to like where this is going

* shuffling things

* demonstrating that implicit instances are somehow broken

* fully working implicit clustering keys, but demonstrating a sorting issue somewhere

* there is still something weird going on tho

* latest_at behaving as one would expect

* automatically cache generated cluster instances

* time to clean up en masse

* still want to put some stress on the bucketing

* make ArrayExt::is_dense a little more friendly, just in case...

* TimeType::format_range

* independent latest_at query and using appropriate types everywhere

* re_query: use polars/fmt in tests

* re_query: remove implicit instances

* fixing the u32 vs u64 instance drama

* really starting to like how this looks

* cluster-aware polars helpers :>

* cleanin up tests

* continuing cleanup and doc

* updating visuals for this brave new world

* docs

* self-review

* bruh

* bruh...

* ...

* outdated comment

* no reason to search for it multiple times

* polars_helpers => polars_util for consistency's sake

* addressing PR comments and a couple other things

* xxx

* post-merge fixes

* TimeInt should be nohash

* high-level polar range tools + making first half of range impl pass

* implement the streaming half

* finally defeated all demons

* still passes?

* it looks like we've made it out alive

* polars util: join however you wish

* fixed formatting

* point2d's PoVs working as expected

* passing full ranges

* docs and such part 1, the semantics are hell

* fixing the filtering mess in tests

* me stoopid

* polars docs

* addressing the clones

* xxx

* missed a gazillon conflict somehow

* polars util spring cleaning

* do indicate and demonstrate that range_components is _not_ a real streaming join

* fixed some comments

* bruh

* screw it, going for the real deal: full streaming joins

* YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE

* yep yep i like this

* I hereby declare myself _satisfied_

* initiating the great cleanup

* add notes for upcoming terminology pr

* bringing IndexRowNr into the mix and slowly starting to fix terminology mess

* improving range_components ergonomics

* putting it all in self-reviewable state

* self-review

* add bench

* xxx

* addressing PR comments

* demonstrating nasty edge-case with streaming-joins

* update streaming-join merging rules to fix said edge case

* implement PoV-less, always-yield lower-level API + adapt higher-level one

* addressing PR comments

* self and not-so-self reviews
teh-cmc added a commit that referenced this pull request Jan 2, 2023
* get is supposed to return a row, not a [row]

* unwrap note

* the bench too

* self review

* doc test also

* and re_query ofc!

* slicing is _very_ slow, don't do it if you don't have to

* no more col_arrays in re_query

* there's actually no need for concatenating at all

* incrementally compute and cache bucket sizes

* cleaning up and documenting existing limitations

* introducing bucket retirement

* issue ref

* some more doc stuff

* self-review

* polars/fmt should always be there for tests

* streamlining batch support

* take list header into account

* it's fine

* self-review

* just something i want to keep around for later

* (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow!

* merge cmc/datastore/get_a_single_row (#590)

* no more col_arrays in re_query

* introducing the notion of clustering key, thankfully breaking all tests by design

* making good use of that shiny new Instance component

* merge cmc/datastore/get_rid_of_copies (#584)

* missed one

* introducing arrow_util with is_dense_array()

* finding the clustering comp of the row... or creating it!

* rebasin'

* post rebase clean up

* addressing PR comments, I hope

* ensure that clustering components are properly sorted, failing the existing test suite

* build_instances now generate sorted ids, thus greenlighting the test suite

* missed a couple

* addressed PR comments

* going for the ArrayExt route

* completing the quadrifecta of checks

* the unavoidable typed error revolution is on its way, and it's early

* where we're going we don't need polars

* update everything for the new APIs

* error for unsupported clustering key types

* clean up and actually testing our error paths

* move those nasty internal tests into their own dirty corner

* finally some high-level tests in here

* i happen to like where this is going

* shuffling things

* demonstrating that implicit instances are somehow broken

* fully working implicit clustering keys, but demonstrating a sorting issue somewhere

* there is still something weird going on tho

* latest_at behaving as one would expect

* automatically cache generated cluster instances

* time to clean up en masse

* still want to put some stress on the bucketing

* make ArrayExt::is_dense a little more friendly, just in case...

* TimeType::format_range

* independent latest_at query and using appropriate types everywhere

* re_query: use polars/fmt in tests

* re_query: remove implicit instances

* fixing the u32 vs u64 instance drama

* really starting to like how this looks

* cluster-aware polars helpers :>

* cleanin up tests

* continuing cleanup and doc

* updating visuals for this brave new world

* docs

* self-review

* bruh

* bruh...

* ...

* outdated comment

* no reason to search for it multiple times

* polars_helpers => polars_util for consistency's sake

* addressing PR comments and a couple other things

* xxx

* post-merge fixes

* TimeInt should be nohash

* high-level polar range tools + making first half of range impl pass

* implement the streaming half

* finally defeated all demons

* still passes?

* it looks like we've made it out alive

* polars util: join however you wish

* fixed formatting

* point2d's PoVs working as expected

* passing full ranges

* docs and such part 1, the semantics are hell

* fixing the filtering mess in tests

* me stoopid

* polars docs

* addressing the clones

* xxx

* missed a gazillon conflict somehow

* polars util spring cleaning

* do indicate and demonstrate that range_components is _not_ a real streaming join

* fixed some comments

* bruh

* screw it, going for the real deal: full streaming joins

* YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE

* yep yep i like this

* I hereby declare myself _satisfied_

* initiating the great cleanup

* add notes for upcoming terminology pr

* bringing IndexRowNr into the mix and slowly starting to fix terminology mess

* improving range_components ergonomics

* putting it all in self-reviewable state

* self-review

* add bench

* xxx

* having some casual fun with dataframes :>

* now with components!

* just some experiments im not too found of... keeping around just in case

* Revert "just some experiments im not too found of... keeping around just in case"

This reverts commit 15f6487.

* playing around with insert_id-as-data... which turns out to be quite helpful

* going into store_insert_ids for real

* full impl

* add example

* self-review

* reviewable

* addressing PR comments

* is that readable

* Revert "is that readable"

This reverts commit f802ff5.

* standalone examples for all dataframe APIs

* burn all todos

* update doc: you can do that now!

* demonstrating nasty edge-case with streaming-joins

* update streaming-join merging rules to fix said edge case

* addressed PR comments
teh-cmc added a commit that referenced this pull request Jan 2, 2023
* get is supposed to return a row, not a [row]

* unwrap note

* the bench too

* self review

* doc test also

* and re_query ofc!

* slicing is _very_ slow, don't do it if you don't have to

* no more col_arrays in re_query

* there's actually no need for concatenating at all

* incrementally compute and cache bucket sizes

* cleaning up and documenting existing limitations

* introducing bucket retirement

* issue ref

* some more doc stuff

* self-review

* polars/fmt should always be there for tests

* streamlining batch support

* take list header into account

* it's fine

* self-review

* just something i want to keep around for later

* (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow!

* merge cmc/datastore/get_a_single_row (#590)

* no more col_arrays in re_query

* introducing the notion of clustering key, thankfully breaking all tests by design

* making good use of that shiny new Instance component

* merge cmc/datastore/get_rid_of_copies (#584)

* missed one

* introducing arrow_util with is_dense_array()

* finding the clustering comp of the row... or creating it!

* rebasin'

* post rebase clean up

* addressing PR comments, I hope

* ensure that clustering components are properly sorted, failing the existing test suite

* build_instances now generate sorted ids, thus greenlighting the test suite

* missed a couple

* addressed PR comments

* going for the ArrayExt route

* completing the quadrifecta of checks

* the unavoidable typed error revolution is on its way, and it's early

* where we're going we don't need polars

* update everything for the new APIs

* error for unsupported clustering key types

* clean up and actually testing our error paths

* move those nasty internal tests into their own dirty corner

* finally some high-level tests in here

* i happen to like where this is going

* shuffling things

* demonstrating that implicit instances are somehow broken

* fully working implicit clustering keys, but demonstrating a sorting issue somewhere

* there is still something weird going on tho

* latest_at behaving as one would expect

* automatically cache generated cluster instances

* time to clean up en masse

* still want to put some stress on the bucketing

* make ArrayExt::is_dense a little more friendly, just in case...

* TimeType::format_range

* independent latest_at query and using appropriate types everywhere

* re_query: use polars/fmt in tests

* re_query: remove implicit instances

* fixing the u32 vs u64 instance drama

* really starting to like how this looks

* cluster-aware polars helpers :>

* cleanin up tests

* continuing cleanup and doc

* updating visuals for this brave new world

* docs

* self-review

* bruh

* bruh...

* ...

* outdated comment

* no reason to search for it multiple times

* polars_helpers => polars_util for consistency's sake

* addressing PR comments and a couple other things

* xxx

* post-merge fixes

* TimeInt should be nohash

* high-level polar range tools + making first half of range impl pass

* implement the streaming half

* finally defeated all demons

* still passes?

* it looks like we've made it out alive

* polars util: join however you wish

* fixed formatting

* point2d's PoVs working as expected

* passing full ranges

* docs and such part 1, the semantics are hell

* fixing the filtering mess in tests

* me stoopid

* polars docs

* addressing the clones

* xxx

* missed a gazillon conflict somehow

* polars util spring cleaning

* do indicate and demonstrate that range_components is _not_ a real streaming join

* fixed some comments

* bruh

* screw it, going for the real deal: full streaming joins

* YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE

* yep yep i like this

* I hereby declare myself _satisfied_

* initiating the great cleanup

* add notes for upcoming terminology pr

* bringing IndexRowNr into the mix and slowly starting to fix terminology mess

* improving range_components ergonomics

* putting it all in self-reviewable state

* self-review

* add bench

* xxx

* addressing PR comments

* sanity checking cluster components

* demonstrating nasty edge-case with streaming-joins

* update streaming-join merging rules to fix said edge case

* implement PoV-less, always-yield lower-level API + adapt higher-level one

* addressing PR comments

* self and not-so-self reviews

* always-on cluster key, it's fiiiine
teh-cmc added a commit that referenced this pull request Jan 2, 2023
* get is supposed to return a row, not a [row]

* unwrap note

* the bench too

* self review

* doc test also

* and re_query ofc!

* slicing is _very_ slow, don't do it if you don't have to

* no more col_arrays in re_query

* there's actually no need for concatenating at all

* incrementally compute and cache bucket sizes

* cleaning up and documenting existing limitations

* introducing bucket retirement

* issue ref

* some more doc stuff

* self-review

* polars/fmt should always be there for tests

* streamlining batch support

* take list header into account

* it's fine

* self-review

* just something i want to keep around for later

* (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow!

* merge cmc/datastore/get_a_single_row (#590)

* no more col_arrays in re_query

* introducing the notion of clustering key, thankfully breaking all tests by design

* making good use of that shiny new Instance component

* merge cmc/datastore/get_rid_of_copies (#584)

* missed one

* introducing arrow_util with is_dense_array()

* finding the clustering comp of the row... or creating it!

* rebasin'

* post rebase clean up

* addressing PR comments, I hope

* ensure that clustering components are properly sorted, failing the existing test suite

* build_instances now generate sorted ids, thus greenlighting the test suite

* missed a couple

* addressed PR comments

* going for the ArrayExt route

* completing the quadrifecta of checks

* the unavoidable typed error revolution is on its way, and it's early

* where we're going we don't need polars

* update everything for the new APIs

* error for unsupported clustering key types

* clean up and actually testing our error paths

* move those nasty internal tests into their own dirty corner

* finally some high-level tests in here

* i happen to like where this is going

* shuffling things

* demonstrating that implicit instances are somehow broken

* fully working implicit clustering keys, but demonstrating a sorting issue somewhere

* there is still something weird going on tho

* latest_at behaving as one would expect

* automatically cache generated cluster instances

* time to clean up en masse

* still want to put some stress on the bucketing

* make ArrayExt::is_dense a little more friendly, just in case...

* TimeType::format_range

* independent latest_at query and using appropriate types everywhere

* re_query: use polars/fmt in tests

* re_query: remove implicit instances

* fixing the u32 vs u64 instance drama

* really starting to like how this looks

* cluster-aware polars helpers :>

* cleanin up tests

* continuing cleanup and doc

* updating visuals for this brave new world

* docs

* self-review

* bruh

* bruh...

* ...

* outdated comment

* no reason to search for it multiple times

* polars_helpers => polars_util for consistency's sake

* addressing PR comments and a couple other things

* xxx

* post-merge fixes

* TimeInt should be nohash

* high-level polar range tools + making first half of range impl pass

* implement the streaming half

* finally defeated all demons

* still passes?

* it looks like we've made it out alive

* polars util: join however you wish

* fixed formatting

* point2d's PoVs working as expected

* passing full ranges

* docs and such part 1, the semantics are hell

* fixing the filtering mess in tests

* me stoopid

* polars docs

* addressing the clones

* xxx

* missed a gazillon conflict somehow

* polars util spring cleaning

* do indicate and demonstrate that range_components is _not_ a real streaming join

* fixed some comments

* bruh

* screw it, going for the real deal: full streaming joins

* YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE

* yep yep i like this

* I hereby declare myself _satisfied_

* initiating the great cleanup

* add notes for upcoming terminology pr

* bringing IndexRowNr into the mix and slowly starting to fix terminology mess

* improving range_components ergonomics

* putting it all in self-reviewable state

* self-review

* add bench

* xxx

* addressing PR comments

* first impl

* ported simple_query() to simple_range

* doc and such

* added e2e example for range queries

* self-review

* support for new EntityView

* demonstrating nasty edge-case with streaming-joins

* update streaming-join merging rules to fix said edge case

* mimicking range_components' new merging rules

* implement PoV-less, always-yield lower-level API + adapt higher-level one

* addressing PR comments

* ported to new low-level APIs

* xxx

* addressed PR comments

* self and not-so-self reviews

* the future is quite literally here
teh-cmc added a commit that referenced this pull request Jan 3, 2023
* get is supposed to return a row, not a [row]

* unwrap note

* the bench too

* self review

* doc test also

* and re_query ofc!

* slicing is _very_ slow, don't do it if you don't have to

* no more col_arrays in re_query

* there's actually no need for concatenating at all

* incrementally compute and cache bucket sizes

* cleaning up and documenting existing limitations

* introducing bucket retirement

* issue ref

* some more doc stuff

* self-review

* polars/fmt should always be there for tests

* streamlining batch support

* take list header into account

* it's fine

* self-review

* just something i want to keep around for later

* (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow!

* merge cmc/datastore/get_a_single_row (#590)

* no more col_arrays in re_query

* introducing the notion of clustering key, thankfully breaking all tests by design

* making good use of that shiny new Instance component

* merge cmc/datastore/get_rid_of_copies (#584)

* missed one

* introducing arrow_util with is_dense_array()

* finding the clustering comp of the row... or creating it!

* rebasin'

* post rebase clean up

* addressing PR comments, I hope

* ensure that clustering components are properly sorted, failing the existing test suite

* build_instances now generate sorted ids, thus greenlighting the test suite

* missed a couple

* addressed PR comments

* going for the ArrayExt route

* completing the quadrifecta of checks

* the unavoidable typed error revolution is on its way, and it's early

* where we're going we don't need polars

* update everything for the new APIs

* error for unsupported clustering key types

* clean up and actually testing our error paths

* move those nasty internal tests into their own dirty corner

* finally some high-level tests in here

* i happen to like where this is going

* shuffling things

* demonstrating that implicit instances are somehow broken

* fully working implicit clustering keys, but demonstrating a sorting issue somewhere

* there is still something weird going on tho

* latest_at behaving as one would expect

* automatically cache generated cluster instances

* time to clean up en masse

* still want to put some stress on the bucketing

* make ArrayExt::is_dense a little more friendly, just in case...

* TimeType::format_range

* independent latest_at query and using appropriate types everywhere

* re_query: use polars/fmt in tests

* re_query: remove implicit instances

* fixing the u32 vs u64 instance drama

* really starting to like how this looks

* cluster-aware polars helpers :>

* cleanin up tests

* continuing cleanup and doc

* updating visuals for this brave new world

* docs

* self-review

* bruh

* bruh...

* ...

* outdated comment

* no reason to search for it multiple times

* polars_helpers => polars_util for consistency's sake

* addressing PR comments and a couple other things

* xxx

* post-merge fixes

* TimeInt should be nohash

* high-level polar range tools + making first half of range impl pass

* implement the streaming half

* finally defeated all demons

* still passes?

* it looks like we've made it out alive

* polars util: join however you wish

* fixed formatting

* point2d's PoVs working as expected

* passing full ranges

* docs and such part 1, the semantics are hell

* fixing the filtering mess in tests

* me stoopid

* polars docs

* addressing the clones

* xxx

* missed a gazillon conflict somehow

* polars util spring cleaning

* do indicate and demonstrate that range_components is _not_ a real streaming join

* fixed some comments

* bruh

* screw it, going for the real deal: full streaming joins

* YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE

* yep yep i like this

* I hereby declare myself _satisfied_

* initiating the great cleanup

* add notes for upcoming terminology pr

* bringing IndexRowNr into the mix and slowly starting to fix terminology mess

* improving range_components ergonomics

* putting it all in self-reviewable state

* self-review

* add bench

* xxx

* addressing PR comments

* first impl

* ported simple_query() to simple_range

* doc and such

* added e2e example for range queries

* self-review

* support for new EntityView

* demonstrating nasty edge-case with streaming-joins

* update streaming-join merging rules to fix said edge case

* mimicking range_components' new merging rules

* Demonstrating how insanely slow the obvious solution is

datastore/insert/batch/rects/insert
            time:   [387.54 µs 387.98 µs 388.52 µs]
            thrpt:  [25.739 Melem/s 25.775 Melem/s 25.804 Melem/s]
     change:
            time:   [+227.27% +227.92% +228.56%] (p = 0.00 < 0.05)
            thrpt:  [-69.564% -69.505% -69.444%]
            Performance has regressed.

* it'd be a tiny bit better with some kind of splats...

datastore/insert/batch/rects/insert
            time:   [284.35 µs 284.55 µs 284.86 µs]
            thrpt:  [35.105 Melem/s 35.144 Melem/s 35.167 Melem/s]
     change:
            time:   [+137.45% +138.08% +138.52%] (p = 0.00 < 0.05)
            thrpt:  [-58.075% -57.997% -57.885%]
            Performance has regressed.

* and now with MsgId being a full fledged component

datastore/insert/batch/rects/insert
            time:   [180.84 µs 184.42 µs 188.96 µs]
            thrpt:  [52.920 Melem/s 54.225 Melem/s 55.296 Melem/s]
     change:
            time:   [+1.0072% +2.1236% +3.3206%] (p = 0.00 < 0.05)
            thrpt:  [-3.2139% -2.0795% -0.9972%]

* stuff

* bruh

* implement PoV-less, always-yield lower-level API + adapt higher-level one

* addressing PR comments

* ported to new low-level APIs

* xxx

* addressed PR comments

* self and not-so-self reviews

* the future is quite literally here

* some comments at the very least

* msgid standing on its own

* bruh
teh-cmc added a commit that referenced this pull request Jan 3, 2023
* get is supposed to return a row, not a [row]

* unwrap note

* the bench too

* self review

* doc test also

* and re_query ofc!

* slicing is _very_ slow, don't do it if you don't have to

* no more col_arrays in re_query

* there's actually no need for concatenating at all

* incrementally compute and cache bucket sizes

* cleaning up and documenting existing limitations

* introducing bucket retirement

* issue ref

* some more doc stuff

* self-review

* polars/fmt should always be there for tests

* streamlining batch support

* take list header into account

* it's fine

* self-review

* just something i want to keep around for later

* (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow!

* merge cmc/datastore/get_a_single_row (#590)

* no more col_arrays in re_query

* introducing the notion of clustering key, thankfully breaking all tests by design

* making good use of that shiny new Instance component

* merge cmc/datastore/get_rid_of_copies (#584)

* missed one

* introducing arrow_util with is_dense_array()

* finding the clustering comp of the row... or creating it!

* rebasin'

* post rebase clean up

* addressing PR comments, I hope

* ensure that clustering components are properly sorted, failing the existing test suite

* build_instances now generate sorted ids, thus greenlighting the test suite

* missed a couple

* addressed PR comments

* going for the ArrayExt route

* completing the quadrifecta of checks

* the unavoidable typed error revolution is on its way, and it's early

* where we're going we don't need polars

* update everything for the new APIs

* error for unsupported clustering key types

* clean up and actually testing our error paths

* move those nasty internal tests into their own dirty corner

* finally some high-level tests in here

* i happen to like where this is going

* shuffling things

* demonstrating that implicit instances are somehow broken

* fully working implicit clustering keys, but demonstrating a sorting issue somewhere

* there is still something weird going on tho

* latest_at behaving as one would expect

* automatically cache generated cluster instances

* time to clean up en masse

* still want to put some stress on the bucketing

* make ArrayExt::is_dense a little more friendly, just in case...

* TimeType::format_range

* independent latest_at query and using appropriate types everywhere

* re_query: use polars/fmt in tests

* re_query: remove implicit instances

* fixing the u32 vs u64 instance drama

* really starting to like how this looks

* cluster-aware polars helpers :>

* cleanin up tests

* continuing cleanup and doc

* updating visuals for this brave new world

* docs

* self-review

* bruh

* bruh...

* ...

* outdated comment

* no reason to search for it multiple times

* polars_helpers => polars_util for consistency's sake

* addressing PR comments and a couple other things

* xxx

* post-merge fixes

* TimeInt should be nohash

* high-level polar range tools + making first half of range impl pass

* implement the streaming half

* finally defeated all demons

* still passes?

* it looks like we've made it out alive

* polars util: join however you wish

* fixed formatting

* point2d's PoVs working as expected

* passing full ranges

* docs and such part 1, the semantics are hell

* fixing the filtering mess in tests

* me stoopid

* polars docs

* addressing the clones

* xxx

* missed a gazillon conflict somehow

* polars util spring cleaning

* do indicate and demonstrate that range_components is _not_ a real streaming join

* fixed some comments

* bruh

* screw it, going for the real deal: full streaming joins

* YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE

* yep yep i like this

* I hereby declare myself _satisfied_

* initiating the great cleanup

* add notes for upcoming terminology pr

* bringing IndexRowNr into the mix and slowly starting to fix terminology mess

* improving range_components ergonomics

* putting it all in self-reviewable state

* self-review

* add bench

* xxx

* addressing PR comments

* first impl

* ported simple_query() to simple_range

* doc and such

* added e2e example for range queries

* self-review

* support for new EntityView

* demonstrating nasty edge-case with streaming-joins

* update streaming-join merging rules to fix said edge case

* mimicking range_components' new merging rules

* implement timepoints-as-components and insert them automatically

* derive proper ObjectType TextEntry component

* introducing the TextEntry component

* re_viewer now supports both legacy and arrow data sources

* added pure rust example for text entries

* plugging everything on the python side

* self-review

* bruh

* it's decent, but we won't ever be able to deduplicate that way :(

* Revert "it's decent, but we won't ever be able to deduplicate that way :("

This reverts commit f0f4e00.

* Demonstrating how insanely slow the obvious solution is

datastore/insert/batch/rects/insert
            time:   [387.54 µs 387.98 µs 388.52 µs]
            thrpt:  [25.739 Melem/s 25.775 Melem/s 25.804 Melem/s]
     change:
            time:   [+227.27% +227.92% +228.56%] (p = 0.00 < 0.05)
            thrpt:  [-69.564% -69.505% -69.444%]
            Performance has regressed.

* it'd be a tiny bit better with some kind of splats...

datastore/insert/batch/rects/insert
            time:   [284.35 µs 284.55 µs 284.86 µs]
            thrpt:  [35.105 Melem/s 35.144 Melem/s 35.167 Melem/s]
     change:
            time:   [+137.45% +138.08% +138.52%] (p = 0.00 < 0.05)
            thrpt:  [-58.075% -57.997% -57.885%]
            Performance has regressed.

* and now with MsgId being a full fledged component

datastore/insert/batch/rects/insert
            time:   [180.84 µs 184.42 µs 188.96 µs]
            thrpt:  [52.920 Melem/s 54.225 Melem/s 55.296 Melem/s]
     change:
            time:   [+1.0072% +2.1236% +3.3206%] (p = 0.00 < 0.05)
            thrpt:  [-3.2139% -2.0795% -0.9972%]

* stuff

* bruh

* storing msg metadata

* Revert "implement timepoints-as-components and insert them automatically"

This reverts commit e6a6fd5.

* I guess it's decent

* implement PoV-less, always-yield lower-level API + adapt higher-level one

* addressing PR comments

* ported to new low-level APIs

* xxx

* addressed PR comments

* self and not-so-self reviews

* the future is quite literally here

* add MsgBundle helpers

* remove hardwired ref to Instance

* post-merge fixes

* cleanup

* guess that isnt very useful anymore

* using new msgbundle helpers

* updating my non-sensical msgbundle helpers

* explaining the example

* woops

* addressed PR comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants