-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re_datastore: get
should give you a row, not a [row]
#590
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
teh-cmc
force-pushed
the
cmc/datastore/get_a_single_row
branch
from
December 18, 2022 15:06
cb0d833
to
b7e5fd5
Compare
teh-cmc
added a commit
that referenced
this pull request
Dec 18, 2022
teh-cmc
added a commit
that referenced
this pull request
Dec 18, 2022
emilk
approved these changes
Dec 18, 2022
teh-cmc
added a commit
that referenced
this pull request
Dec 19, 2022
* get is supposed to return a row, not a [row] * unwrap note * the bench too * self review * doc test also * and re_query ofc! * slicing is _very_ slow, don't do it if you don't have to * no more col_arrays in re_query * there's actually no need for concatenating at all * incrementally compute and cache bucket sizes * cleaning up and documenting existing limitations * introducing bucket retirement * issue ref * some more doc stuff * self-review * polars/fmt should always be there for tests * streamlining batch support * take list header into account * it's fine * self-review * just something i want to keep around for later * (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow! * merge cmc/datastore/get_a_single_row (#590) * no more col_arrays in re_query * addressing PR comments, I hope * missed a couple * addressed PR comments
teh-cmc
added a commit
that referenced
this pull request
Dec 22, 2022
* get is supposed to return a row, not a [row] * unwrap note * the bench too * self review * doc test also * and re_query ofc! * slicing is _very_ slow, don't do it if you don't have to * no more col_arrays in re_query * there's actually no need for concatenating at all * incrementally compute and cache bucket sizes * cleaning up and documenting existing limitations * introducing bucket retirement * issue ref * some more doc stuff * self-review * polars/fmt should always be there for tests * streamlining batch support * take list header into account * it's fine * self-review * just something i want to keep around for later * (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow! * merge cmc/datastore/get_a_single_row (#590) * no more col_arrays in re_query * introducing the notion of clustering key, thankfully breaking all tests by design * making good use of that shiny new Instance component * merge cmc/datastore/get_rid_of_copies (#584) * missed one * introducing arrow_util with is_dense_array() * finding the clustering comp of the row... or creating it! * rebasin' * post rebase clean up * addressing PR comments, I hope * ensure that clustering components are properly sorted, failing the existing test suite * build_instances now generate sorted ids, thus greenlighting the test suite * missed a couple * addressed PR comments * going for the ArrayExt route * completing the quadrifecta of checks * the unavoidable typed error revolution is on its way, and it's early * where we're going we don't need polars * update everything for the new APIs * error for unsupported clustering key types * clean up and actually testing our error paths * move those nasty internal tests into their own dirty corner * finally some high-level tests in here * i happen to like where this is going * shuffling things * demonstrating that implicit instances are somehow broken * fully working implicit clustering keys, but demonstrating a sorting issue somewhere * there is still something weird going on tho * latest_at behaving as one would expect * automatically cache generated cluster instances * time to clean up en masse * still want to put some stress on the bucketing * make ArrayExt::is_dense a little more friendly, just in case... * re_query: use polars/fmt in tests * re_query: remove implicit instances * fixing the u32 vs u64 instance drama * cluster-aware polars helpers :> * cleanin up tests * continuing cleanup and doc * updating visuals for this brave new world * docs * self-review * bruh * bruh... * ... * outdated comment * no reason to search for it multiple times * polars_helpers => polars_util for consistency's sake * addressing PR comments and a couple other things * xxx * post-merge fixes * more fixes
teh-cmc
added a commit
that referenced
this pull request
Jan 2, 2023
* get is supposed to return a row, not a [row] * unwrap note * the bench too * self review * doc test also * and re_query ofc! * slicing is _very_ slow, don't do it if you don't have to * no more col_arrays in re_query * there's actually no need for concatenating at all * incrementally compute and cache bucket sizes * cleaning up and documenting existing limitations * introducing bucket retirement * issue ref * some more doc stuff * self-review * polars/fmt should always be there for tests * streamlining batch support * take list header into account * it's fine * self-review * just something i want to keep around for later * (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow! * merge cmc/datastore/get_a_single_row (#590) * no more col_arrays in re_query * introducing the notion of clustering key, thankfully breaking all tests by design * making good use of that shiny new Instance component * merge cmc/datastore/get_rid_of_copies (#584) * missed one * introducing arrow_util with is_dense_array() * finding the clustering comp of the row... or creating it! * rebasin' * post rebase clean up * addressing PR comments, I hope * ensure that clustering components are properly sorted, failing the existing test suite * build_instances now generate sorted ids, thus greenlighting the test suite * missed a couple * addressed PR comments * going for the ArrayExt route * completing the quadrifecta of checks * the unavoidable typed error revolution is on its way, and it's early * where we're going we don't need polars * update everything for the new APIs * error for unsupported clustering key types * clean up and actually testing our error paths * move those nasty internal tests into their own dirty corner * finally some high-level tests in here * i happen to like where this is going * shuffling things * demonstrating that implicit instances are somehow broken * fully working implicit clustering keys, but demonstrating a sorting issue somewhere * there is still something weird going on tho * latest_at behaving as one would expect * automatically cache generated cluster instances * time to clean up en masse * still want to put some stress on the bucketing * make ArrayExt::is_dense a little more friendly, just in case... * TimeType::format_range * independent latest_at query and using appropriate types everywhere * re_query: use polars/fmt in tests * re_query: remove implicit instances * fixing the u32 vs u64 instance drama * really starting to like how this looks * cluster-aware polars helpers :> * cleanin up tests * continuing cleanup and doc * updating visuals for this brave new world * docs * self-review * bruh * bruh... * ... * outdated comment * no reason to search for it multiple times * polars_helpers => polars_util for consistency's sake * addressing PR comments and a couple other things * xxx * post-merge fixes * TimeInt should be nohash * high-level polar range tools + making first half of range impl pass * implement the streaming half * finally defeated all demons * still passes? * it looks like we've made it out alive * polars util: join however you wish * fixed formatting * point2d's PoVs working as expected * passing full ranges * docs and such part 1, the semantics are hell * fixing the filtering mess in tests * me stoopid * polars docs * addressing the clones * xxx * missed a gazillon conflict somehow * polars util spring cleaning * do indicate and demonstrate that range_components is _not_ a real streaming join * fixed some comments * bruh * screw it, going for the real deal: full streaming joins * YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE * yep yep i like this * I hereby declare myself _satisfied_ * initiating the great cleanup * add notes for upcoming terminology pr * bringing IndexRowNr into the mix and slowly starting to fix terminology mess * improving range_components ergonomics * putting it all in self-reviewable state * self-review * add bench * xxx * addressing PR comments * demonstrating nasty edge-case with streaming-joins * update streaming-join merging rules to fix said edge case * implement PoV-less, always-yield lower-level API + adapt higher-level one * addressing PR comments * self and not-so-self reviews
teh-cmc
added a commit
that referenced
this pull request
Jan 2, 2023
* get is supposed to return a row, not a [row] * unwrap note * the bench too * self review * doc test also * and re_query ofc! * slicing is _very_ slow, don't do it if you don't have to * no more col_arrays in re_query * there's actually no need for concatenating at all * incrementally compute and cache bucket sizes * cleaning up and documenting existing limitations * introducing bucket retirement * issue ref * some more doc stuff * self-review * polars/fmt should always be there for tests * streamlining batch support * take list header into account * it's fine * self-review * just something i want to keep around for later * (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow! * merge cmc/datastore/get_a_single_row (#590) * no more col_arrays in re_query * introducing the notion of clustering key, thankfully breaking all tests by design * making good use of that shiny new Instance component * merge cmc/datastore/get_rid_of_copies (#584) * missed one * introducing arrow_util with is_dense_array() * finding the clustering comp of the row... or creating it! * rebasin' * post rebase clean up * addressing PR comments, I hope * ensure that clustering components are properly sorted, failing the existing test suite * build_instances now generate sorted ids, thus greenlighting the test suite * missed a couple * addressed PR comments * going for the ArrayExt route * completing the quadrifecta of checks * the unavoidable typed error revolution is on its way, and it's early * where we're going we don't need polars * update everything for the new APIs * error for unsupported clustering key types * clean up and actually testing our error paths * move those nasty internal tests into their own dirty corner * finally some high-level tests in here * i happen to like where this is going * shuffling things * demonstrating that implicit instances are somehow broken * fully working implicit clustering keys, but demonstrating a sorting issue somewhere * there is still something weird going on tho * latest_at behaving as one would expect * automatically cache generated cluster instances * time to clean up en masse * still want to put some stress on the bucketing * make ArrayExt::is_dense a little more friendly, just in case... * TimeType::format_range * independent latest_at query and using appropriate types everywhere * re_query: use polars/fmt in tests * re_query: remove implicit instances * fixing the u32 vs u64 instance drama * really starting to like how this looks * cluster-aware polars helpers :> * cleanin up tests * continuing cleanup and doc * updating visuals for this brave new world * docs * self-review * bruh * bruh... * ... * outdated comment * no reason to search for it multiple times * polars_helpers => polars_util for consistency's sake * addressing PR comments and a couple other things * xxx * post-merge fixes * TimeInt should be nohash * high-level polar range tools + making first half of range impl pass * implement the streaming half * finally defeated all demons * still passes? * it looks like we've made it out alive * polars util: join however you wish * fixed formatting * point2d's PoVs working as expected * passing full ranges * docs and such part 1, the semantics are hell * fixing the filtering mess in tests * me stoopid * polars docs * addressing the clones * xxx * missed a gazillon conflict somehow * polars util spring cleaning * do indicate and demonstrate that range_components is _not_ a real streaming join * fixed some comments * bruh * screw it, going for the real deal: full streaming joins * YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE * yep yep i like this * I hereby declare myself _satisfied_ * initiating the great cleanup * add notes for upcoming terminology pr * bringing IndexRowNr into the mix and slowly starting to fix terminology mess * improving range_components ergonomics * putting it all in self-reviewable state * self-review * add bench * xxx * having some casual fun with dataframes :> * now with components! * just some experiments im not too found of... keeping around just in case * Revert "just some experiments im not too found of... keeping around just in case" This reverts commit 15f6487. * playing around with insert_id-as-data... which turns out to be quite helpful * going into store_insert_ids for real * full impl * add example * self-review * reviewable * addressing PR comments * is that readable * Revert "is that readable" This reverts commit f802ff5. * standalone examples for all dataframe APIs * burn all todos * update doc: you can do that now! * demonstrating nasty edge-case with streaming-joins * update streaming-join merging rules to fix said edge case * addressed PR comments
teh-cmc
added a commit
that referenced
this pull request
Jan 2, 2023
* get is supposed to return a row, not a [row] * unwrap note * the bench too * self review * doc test also * and re_query ofc! * slicing is _very_ slow, don't do it if you don't have to * no more col_arrays in re_query * there's actually no need for concatenating at all * incrementally compute and cache bucket sizes * cleaning up and documenting existing limitations * introducing bucket retirement * issue ref * some more doc stuff * self-review * polars/fmt should always be there for tests * streamlining batch support * take list header into account * it's fine * self-review * just something i want to keep around for later * (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow! * merge cmc/datastore/get_a_single_row (#590) * no more col_arrays in re_query * introducing the notion of clustering key, thankfully breaking all tests by design * making good use of that shiny new Instance component * merge cmc/datastore/get_rid_of_copies (#584) * missed one * introducing arrow_util with is_dense_array() * finding the clustering comp of the row... or creating it! * rebasin' * post rebase clean up * addressing PR comments, I hope * ensure that clustering components are properly sorted, failing the existing test suite * build_instances now generate sorted ids, thus greenlighting the test suite * missed a couple * addressed PR comments * going for the ArrayExt route * completing the quadrifecta of checks * the unavoidable typed error revolution is on its way, and it's early * where we're going we don't need polars * update everything for the new APIs * error for unsupported clustering key types * clean up and actually testing our error paths * move those nasty internal tests into their own dirty corner * finally some high-level tests in here * i happen to like where this is going * shuffling things * demonstrating that implicit instances are somehow broken * fully working implicit clustering keys, but demonstrating a sorting issue somewhere * there is still something weird going on tho * latest_at behaving as one would expect * automatically cache generated cluster instances * time to clean up en masse * still want to put some stress on the bucketing * make ArrayExt::is_dense a little more friendly, just in case... * TimeType::format_range * independent latest_at query and using appropriate types everywhere * re_query: use polars/fmt in tests * re_query: remove implicit instances * fixing the u32 vs u64 instance drama * really starting to like how this looks * cluster-aware polars helpers :> * cleanin up tests * continuing cleanup and doc * updating visuals for this brave new world * docs * self-review * bruh * bruh... * ... * outdated comment * no reason to search for it multiple times * polars_helpers => polars_util for consistency's sake * addressing PR comments and a couple other things * xxx * post-merge fixes * TimeInt should be nohash * high-level polar range tools + making first half of range impl pass * implement the streaming half * finally defeated all demons * still passes? * it looks like we've made it out alive * polars util: join however you wish * fixed formatting * point2d's PoVs working as expected * passing full ranges * docs and such part 1, the semantics are hell * fixing the filtering mess in tests * me stoopid * polars docs * addressing the clones * xxx * missed a gazillon conflict somehow * polars util spring cleaning * do indicate and demonstrate that range_components is _not_ a real streaming join * fixed some comments * bruh * screw it, going for the real deal: full streaming joins * YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE * yep yep i like this * I hereby declare myself _satisfied_ * initiating the great cleanup * add notes for upcoming terminology pr * bringing IndexRowNr into the mix and slowly starting to fix terminology mess * improving range_components ergonomics * putting it all in self-reviewable state * self-review * add bench * xxx * addressing PR comments * sanity checking cluster components * demonstrating nasty edge-case with streaming-joins * update streaming-join merging rules to fix said edge case * implement PoV-less, always-yield lower-level API + adapt higher-level one * addressing PR comments * self and not-so-self reviews * always-on cluster key, it's fiiiine
teh-cmc
added a commit
that referenced
this pull request
Jan 2, 2023
* get is supposed to return a row, not a [row] * unwrap note * the bench too * self review * doc test also * and re_query ofc! * slicing is _very_ slow, don't do it if you don't have to * no more col_arrays in re_query * there's actually no need for concatenating at all * incrementally compute and cache bucket sizes * cleaning up and documenting existing limitations * introducing bucket retirement * issue ref * some more doc stuff * self-review * polars/fmt should always be there for tests * streamlining batch support * take list header into account * it's fine * self-review * just something i want to keep around for later * (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow! * merge cmc/datastore/get_a_single_row (#590) * no more col_arrays in re_query * introducing the notion of clustering key, thankfully breaking all tests by design * making good use of that shiny new Instance component * merge cmc/datastore/get_rid_of_copies (#584) * missed one * introducing arrow_util with is_dense_array() * finding the clustering comp of the row... or creating it! * rebasin' * post rebase clean up * addressing PR comments, I hope * ensure that clustering components are properly sorted, failing the existing test suite * build_instances now generate sorted ids, thus greenlighting the test suite * missed a couple * addressed PR comments * going for the ArrayExt route * completing the quadrifecta of checks * the unavoidable typed error revolution is on its way, and it's early * where we're going we don't need polars * update everything for the new APIs * error for unsupported clustering key types * clean up and actually testing our error paths * move those nasty internal tests into their own dirty corner * finally some high-level tests in here * i happen to like where this is going * shuffling things * demonstrating that implicit instances are somehow broken * fully working implicit clustering keys, but demonstrating a sorting issue somewhere * there is still something weird going on tho * latest_at behaving as one would expect * automatically cache generated cluster instances * time to clean up en masse * still want to put some stress on the bucketing * make ArrayExt::is_dense a little more friendly, just in case... * TimeType::format_range * independent latest_at query and using appropriate types everywhere * re_query: use polars/fmt in tests * re_query: remove implicit instances * fixing the u32 vs u64 instance drama * really starting to like how this looks * cluster-aware polars helpers :> * cleanin up tests * continuing cleanup and doc * updating visuals for this brave new world * docs * self-review * bruh * bruh... * ... * outdated comment * no reason to search for it multiple times * polars_helpers => polars_util for consistency's sake * addressing PR comments and a couple other things * xxx * post-merge fixes * TimeInt should be nohash * high-level polar range tools + making first half of range impl pass * implement the streaming half * finally defeated all demons * still passes? * it looks like we've made it out alive * polars util: join however you wish * fixed formatting * point2d's PoVs working as expected * passing full ranges * docs and such part 1, the semantics are hell * fixing the filtering mess in tests * me stoopid * polars docs * addressing the clones * xxx * missed a gazillon conflict somehow * polars util spring cleaning * do indicate and demonstrate that range_components is _not_ a real streaming join * fixed some comments * bruh * screw it, going for the real deal: full streaming joins * YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE * yep yep i like this * I hereby declare myself _satisfied_ * initiating the great cleanup * add notes for upcoming terminology pr * bringing IndexRowNr into the mix and slowly starting to fix terminology mess * improving range_components ergonomics * putting it all in self-reviewable state * self-review * add bench * xxx * addressing PR comments * first impl * ported simple_query() to simple_range * doc and such * added e2e example for range queries * self-review * support for new EntityView * demonstrating nasty edge-case with streaming-joins * update streaming-join merging rules to fix said edge case * mimicking range_components' new merging rules * implement PoV-less, always-yield lower-level API + adapt higher-level one * addressing PR comments * ported to new low-level APIs * xxx * addressed PR comments * self and not-so-self reviews * the future is quite literally here
teh-cmc
added a commit
that referenced
this pull request
Jan 3, 2023
* get is supposed to return a row, not a [row] * unwrap note * the bench too * self review * doc test also * and re_query ofc! * slicing is _very_ slow, don't do it if you don't have to * no more col_arrays in re_query * there's actually no need for concatenating at all * incrementally compute and cache bucket sizes * cleaning up and documenting existing limitations * introducing bucket retirement * issue ref * some more doc stuff * self-review * polars/fmt should always be there for tests * streamlining batch support * take list header into account * it's fine * self-review * just something i want to keep around for later * (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow! * merge cmc/datastore/get_a_single_row (#590) * no more col_arrays in re_query * introducing the notion of clustering key, thankfully breaking all tests by design * making good use of that shiny new Instance component * merge cmc/datastore/get_rid_of_copies (#584) * missed one * introducing arrow_util with is_dense_array() * finding the clustering comp of the row... or creating it! * rebasin' * post rebase clean up * addressing PR comments, I hope * ensure that clustering components are properly sorted, failing the existing test suite * build_instances now generate sorted ids, thus greenlighting the test suite * missed a couple * addressed PR comments * going for the ArrayExt route * completing the quadrifecta of checks * the unavoidable typed error revolution is on its way, and it's early * where we're going we don't need polars * update everything for the new APIs * error for unsupported clustering key types * clean up and actually testing our error paths * move those nasty internal tests into their own dirty corner * finally some high-level tests in here * i happen to like where this is going * shuffling things * demonstrating that implicit instances are somehow broken * fully working implicit clustering keys, but demonstrating a sorting issue somewhere * there is still something weird going on tho * latest_at behaving as one would expect * automatically cache generated cluster instances * time to clean up en masse * still want to put some stress on the bucketing * make ArrayExt::is_dense a little more friendly, just in case... * TimeType::format_range * independent latest_at query and using appropriate types everywhere * re_query: use polars/fmt in tests * re_query: remove implicit instances * fixing the u32 vs u64 instance drama * really starting to like how this looks * cluster-aware polars helpers :> * cleanin up tests * continuing cleanup and doc * updating visuals for this brave new world * docs * self-review * bruh * bruh... * ... * outdated comment * no reason to search for it multiple times * polars_helpers => polars_util for consistency's sake * addressing PR comments and a couple other things * xxx * post-merge fixes * TimeInt should be nohash * high-level polar range tools + making first half of range impl pass * implement the streaming half * finally defeated all demons * still passes? * it looks like we've made it out alive * polars util: join however you wish * fixed formatting * point2d's PoVs working as expected * passing full ranges * docs and such part 1, the semantics are hell * fixing the filtering mess in tests * me stoopid * polars docs * addressing the clones * xxx * missed a gazillon conflict somehow * polars util spring cleaning * do indicate and demonstrate that range_components is _not_ a real streaming join * fixed some comments * bruh * screw it, going for the real deal: full streaming joins * YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE * yep yep i like this * I hereby declare myself _satisfied_ * initiating the great cleanup * add notes for upcoming terminology pr * bringing IndexRowNr into the mix and slowly starting to fix terminology mess * improving range_components ergonomics * putting it all in self-reviewable state * self-review * add bench * xxx * addressing PR comments * first impl * ported simple_query() to simple_range * doc and such * added e2e example for range queries * self-review * support for new EntityView * demonstrating nasty edge-case with streaming-joins * update streaming-join merging rules to fix said edge case * mimicking range_components' new merging rules * Demonstrating how insanely slow the obvious solution is datastore/insert/batch/rects/insert time: [387.54 µs 387.98 µs 388.52 µs] thrpt: [25.739 Melem/s 25.775 Melem/s 25.804 Melem/s] change: time: [+227.27% +227.92% +228.56%] (p = 0.00 < 0.05) thrpt: [-69.564% -69.505% -69.444%] Performance has regressed. * it'd be a tiny bit better with some kind of splats... datastore/insert/batch/rects/insert time: [284.35 µs 284.55 µs 284.86 µs] thrpt: [35.105 Melem/s 35.144 Melem/s 35.167 Melem/s] change: time: [+137.45% +138.08% +138.52%] (p = 0.00 < 0.05) thrpt: [-58.075% -57.997% -57.885%] Performance has regressed. * and now with MsgId being a full fledged component datastore/insert/batch/rects/insert time: [180.84 µs 184.42 µs 188.96 µs] thrpt: [52.920 Melem/s 54.225 Melem/s 55.296 Melem/s] change: time: [+1.0072% +2.1236% +3.3206%] (p = 0.00 < 0.05) thrpt: [-3.2139% -2.0795% -0.9972%] * stuff * bruh * implement PoV-less, always-yield lower-level API + adapt higher-level one * addressing PR comments * ported to new low-level APIs * xxx * addressed PR comments * self and not-so-self reviews * the future is quite literally here * some comments at the very least * msgid standing on its own * bruh
teh-cmc
added a commit
that referenced
this pull request
Jan 3, 2023
* get is supposed to return a row, not a [row] * unwrap note * the bench too * self review * doc test also * and re_query ofc! * slicing is _very_ slow, don't do it if you don't have to * no more col_arrays in re_query * there's actually no need for concatenating at all * incrementally compute and cache bucket sizes * cleaning up and documenting existing limitations * introducing bucket retirement * issue ref * some more doc stuff * self-review * polars/fmt should always be there for tests * streamlining batch support * take list header into account * it's fine * self-review * just something i want to keep around for later * (un)wrapping lists is a bit slow... and slicing them is _extremely_ slow! * merge cmc/datastore/get_a_single_row (#590) * no more col_arrays in re_query * introducing the notion of clustering key, thankfully breaking all tests by design * making good use of that shiny new Instance component * merge cmc/datastore/get_rid_of_copies (#584) * missed one * introducing arrow_util with is_dense_array() * finding the clustering comp of the row... or creating it! * rebasin' * post rebase clean up * addressing PR comments, I hope * ensure that clustering components are properly sorted, failing the existing test suite * build_instances now generate sorted ids, thus greenlighting the test suite * missed a couple * addressed PR comments * going for the ArrayExt route * completing the quadrifecta of checks * the unavoidable typed error revolution is on its way, and it's early * where we're going we don't need polars * update everything for the new APIs * error for unsupported clustering key types * clean up and actually testing our error paths * move those nasty internal tests into their own dirty corner * finally some high-level tests in here * i happen to like where this is going * shuffling things * demonstrating that implicit instances are somehow broken * fully working implicit clustering keys, but demonstrating a sorting issue somewhere * there is still something weird going on tho * latest_at behaving as one would expect * automatically cache generated cluster instances * time to clean up en masse * still want to put some stress on the bucketing * make ArrayExt::is_dense a little more friendly, just in case... * TimeType::format_range * independent latest_at query and using appropriate types everywhere * re_query: use polars/fmt in tests * re_query: remove implicit instances * fixing the u32 vs u64 instance drama * really starting to like how this looks * cluster-aware polars helpers :> * cleanin up tests * continuing cleanup and doc * updating visuals for this brave new world * docs * self-review * bruh * bruh... * ... * outdated comment * no reason to search for it multiple times * polars_helpers => polars_util for consistency's sake * addressing PR comments and a couple other things * xxx * post-merge fixes * TimeInt should be nohash * high-level polar range tools + making first half of range impl pass * implement the streaming half * finally defeated all demons * still passes? * it looks like we've made it out alive * polars util: join however you wish * fixed formatting * point2d's PoVs working as expected * passing full ranges * docs and such part 1, the semantics are hell * fixing the filtering mess in tests * me stoopid * polars docs * addressing the clones * xxx * missed a gazillon conflict somehow * polars util spring cleaning * do indicate and demonstrate that range_components is _not_ a real streaming join * fixed some comments * bruh * screw it, going for the real deal: full streaming joins * YESgit sgit s FINALLY SEMANTICS I ACTUALLY LIKE * yep yep i like this * I hereby declare myself _satisfied_ * initiating the great cleanup * add notes for upcoming terminology pr * bringing IndexRowNr into the mix and slowly starting to fix terminology mess * improving range_components ergonomics * putting it all in self-reviewable state * self-review * add bench * xxx * addressing PR comments * first impl * ported simple_query() to simple_range * doc and such * added e2e example for range queries * self-review * support for new EntityView * demonstrating nasty edge-case with streaming-joins * update streaming-join merging rules to fix said edge case * mimicking range_components' new merging rules * implement timepoints-as-components and insert them automatically * derive proper ObjectType TextEntry component * introducing the TextEntry component * re_viewer now supports both legacy and arrow data sources * added pure rust example for text entries * plugging everything on the python side * self-review * bruh * it's decent, but we won't ever be able to deduplicate that way :( * Revert "it's decent, but we won't ever be able to deduplicate that way :(" This reverts commit f0f4e00. * Demonstrating how insanely slow the obvious solution is datastore/insert/batch/rects/insert time: [387.54 µs 387.98 µs 388.52 µs] thrpt: [25.739 Melem/s 25.775 Melem/s 25.804 Melem/s] change: time: [+227.27% +227.92% +228.56%] (p = 0.00 < 0.05) thrpt: [-69.564% -69.505% -69.444%] Performance has regressed. * it'd be a tiny bit better with some kind of splats... datastore/insert/batch/rects/insert time: [284.35 µs 284.55 µs 284.86 µs] thrpt: [35.105 Melem/s 35.144 Melem/s 35.167 Melem/s] change: time: [+137.45% +138.08% +138.52%] (p = 0.00 < 0.05) thrpt: [-58.075% -57.997% -57.885%] Performance has regressed. * and now with MsgId being a full fledged component datastore/insert/batch/rects/insert time: [180.84 µs 184.42 µs 188.96 µs] thrpt: [52.920 Melem/s 54.225 Melem/s 55.296 Melem/s] change: time: [+1.0072% +2.1236% +3.3206%] (p = 0.00 < 0.05) thrpt: [-3.2139% -2.0795% -0.9972%] * stuff * bruh * storing msg metadata * Revert "implement timepoints-as-components and insert them automatically" This reverts commit e6a6fd5. * I guess it's decent * implement PoV-less, always-yield lower-level API + adapt higher-level one * addressing PR comments * ported to new low-level APIs * xxx * addressed PR comments * self and not-so-self reviews * the future is quite literally here * add MsgBundle helpers * remove hardwired ref to Instance * post-merge fixes * cleanup * guess that isnt very useful anymore * using new msgbundle helpers * updating my non-sensical msgbundle helpers * explaining the example * woops * addressed PR comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This does what it says it does...
...but it also kills usage of
Array::slice()
, which turns out to be very slow.