feat: Implement the `SortBy` stream adapter #43

Hywan · 2024-01-20T20:08:16Z

Sort is VectorDiff stream adapter that presents a sorted view of the underlying ObservableVector items. A demonstration:

use eyeball_im::{ObservableVector, VectorDiff};
use eyeball_im_util::vector::VectorObserverExt;
use imbl::vector;
use std::cmp::Ordering;
use stream_assert::{assert_closed, assert_next_eq, assert_pending};

// A comparison function that is used to sort our
// `ObservableVector` values.
fn cmp<T>(left: &T, right: &T) -> Ordering
where
    T: Ord,
{
    left.cmp(right)
}

// Our vector.
let mut ob = ObservableVector::<char>::new();
let (values, mut sub) = ob.subscribe().sort_by(&cmp);
//                                              ^^^
//                                              | our comparison function

assert!(values.is_empty());
assert_pending!(sub);

// Append multiple unsorted values.
ob.append(vector!['d', 'b', 'e']);
// We get a `VectorDiff::Append` with sorted values!
assert_next_eq!(sub, VectorDiff::Append { values: vector!['b', 'd', 'e'] });

// Let's recap what we have. `ob` is our `ObservableVector`,
// `sub` is the “sorted view”/“sorted stream” of `ob`:
// | `ob`  | d b e |
// | `sub` | b d e |

// Append other multiple values.
ob.append(vector!['f', 'g', 'a', 'c']);
// We get three `VectorDiff`s!
assert_next_eq!(sub, VectorDiff::PushFront { value: 'a' });
assert_next_eq!(sub, VectorDiff::Insert { index: 2, value: 'c' });
assert_next_eq!(sub, VectorDiff::Append { values: vector!['f', 'g'] });

// Let's recap what we have:
// | `ob`  | d b e f g a c |
// | `sub` | a b c d e f g |
//           ^   ^     ^^^
//           |   |     |
//           |   |     with `VectorDiff::Append { .. }`
//           |   with `VectorDiff::Insert { index: 2, .. }`
//           with `VectorDiff::PushFront { .. }`

// Technically, `Sort` emits `VectorDiff`s that mimic a sorted `Vector`.

drop(ob);
assert_closed!(sub);

The code above demonstrates how a VectorDiff::Append is handled, but all other VectorDiff's variants produce various VectorDiff. I've tried to generate VectorDiff that are as much correct as possible regarding the semantics the user might expect. For example, if a VectorDiff::Insert { index: 42, … } is received but the value will be inserted at the (sorted) index 0, a VectorDiff::PushFront will be emitted instead of a VectorDiff::Insert { index: 0, … }.

The code is extensively documented. I hope it helps to understand how this is implemented. It's pretty basic actually. The only difficulty is that it breaks your brain because you've to constantly remind: “Is this index for the sorted or the unsorted position”? I've tried to use consistent and clear naming for variables.

The motivation behind this patch is to replace `arrayvec` by `smallvec`. Why? Because `arrayvec` cannot grow, it's an array-baked vector. Once the full capacity is reached, it cannot grow any more. That was fine up until now because it was only used by the `Limit` stream adapter which needs only 2 values in a buffer that is typed by an `ArrayVec`. But for other stream adapters, it's a limitation. What if a stream adapter needs a buffer of more than 2 values? Ideally, we want to preserve the stack-allocated property for obvious performance reasons (cache locality, reducing allocator traffic and so on). However, we also want to be able to fallback to the heap for larger allocations. `smallvec` provides this feature. Hence, it does what `arrayvec` does, with the addition of heap-allocated fallback when it's needed.

This type alias is renamed to `VectorDiffContainerStreamBuffer`.

jplatte

A few general comments before I dive into the implementation:

I don't like that Limit now uses SmallVec. I'd prefer to rename the associated Buffer type back to LimitBuf and introduce a separate SortBuf that's either a SmallVec or just a regular Vec for the non-batching subscriber. If you consider the SmallVec usage important I don't mind the extra dependency, but Limit should continue using ArrayVec IMO. In that case I have a slight preference for using SmallVec<[T; 4]> over SmallVec<[T; 2]> but that's just my personal feeling about what might be a reasonable cut-off for stack vs. heap, feel free to ignore it.
The name sort is used in std and other libraries to sort based on the element type's Ord implementation. The closure-taking form is called sort_by, so the same name should be used here. We can keep sort for the module name though and reuse most of the implementation for a Sort adapter that does not take a sorting function (and SortByKey that takes a key function) later.

jplatte

Rough first review

jplatte · 2024-01-21T10:18:10Z

eyeball-im-util/src/vector/sort.rs

+    VectorDiffContainerStreamElement,
+};
+
+type UnsortedIndex = usize;


In the Filter adapter, the "index of the element in the input vector" is called the original index. Would you mind using the same name here?

I don't believe the naming should be identical in all stream adapters. The semantics take the priority over the consistency in some cases, and I reckon this is one of the cases. What matters —to me— is that we have facing an index that represented the unsorted position of the value. Whether it is the original or not the original index isn't really important. Thoughts?

Yes, the semantics are what's important definitely. I still think it's a clearer name. The input / original list could already be sorted ;)

I'd also be okay with input / output index naming (for all adapters that re-order or filter items), how do you feel about that option?

Do you want all stream adapters to use the same terminology?

eyeball-im-util/src/vector/sort.rs

`Sort` is `VectorDiff` stream adapter that presents a sorted view of the underlying `ObservableVector` items. ```rust use eyeball_im::{ObservableVector, VectorDiff}; use eyeball_im_util::vector::VectorObserverExt; use imbl::vector; use std::cmp::Ordering; use stream_assert::{assert_closed, assert_next_eq, assert_pending}; // A comparison function that is used to sort our // `ObservableVector` values. fn cmp<T>(left: &T, right: &T) -> Ordering where T: Ord, { left.cmp(right) } // Our vector. let mut ob = ObservableVector::<char>::new(); let (values, mut sub) = ob.subscribe().sort(cmp); // ^^^ // | our comparison function assert!(values.is_empty()); assert_pending!(sub); // Append multiple unsorted values. ob.append(vector!['d', 'b', 'e']); // We get a `VectorDiff::Append` with sorted values! assert_next_eq!(sub, VectorDiff::Append { values: vector!['b', 'd', 'e'] }); // Let's recap what we have. `ob` is our `ObservableVector`, // `sub` is the “sorted view”/“sorted stream” of `ob`: // | `ob` | d b e | // | `sub` | b d e | // Append other multiple values. ob.append(vector!['f', 'g', 'a', 'c']); // We get three `VectorDiff`s! assert_next_eq!(sub, VectorDiff::PushFront { value: 'a' }); assert_next_eq!(sub, VectorDiff::Insert { index: 2, value: 'c' }); assert_next_eq!(sub, VectorDiff::Append { values: vector!['f', 'g'] }); // Let's recap what we have: // | `ob` | d b e f g a c | // | `sub` | a b c d e f g | // ^ ^ ^^^ // | | | // | | with `VectorDiff::Append { .. }` // | with `VectorDiff::Insert { index: 2, .. }` // with `VectorDiff::PushFront { .. }` // Technically, `Sort` emits `VectorDiff` that mimics a sorted `Vector`. drop(ob); assert_closed!(sub); ```

Hywan · 2024-01-22T10:57:39Z

I've written a long comment but it's been deleted… Damn… I'll try to summarize 😛 .

I don't like that Limit now uses SmallVec. I'd prefer to rename the associated Buffer type back to LimitBuf and introduce a separate SortBuf that's either a SmallVec or just a regular Vec for the non-batching subscriber. If you consider the SmallVec usage important I don't mind the extra dependency, but Limit should continue using ArrayVec IMO. In that case I have a slight preference for using SmallVec<[T; 4]> over SmallVec<[T; 2]> but that's just my personal feeling about what might be a reasonable cut-off for stack vs. heap, feel free to ignore it.

So. SmallVec does what ArrayVec does except that it moves to the heap if more space is required.

Inside Limit, we are sure that at most VectorDiff will be generated, hence SmallVec<[T; 2]> provides the same guarantees that ArrayVec<T, 2>.

Inside SortBy, for most operations (like insert, remove, push_front, push_back and so on), we are sure that the number of generated VectorDiff is at most 1. There is 2 exceptions:

set generates at most 2 VectorDiff,
append generates at best 1 VectorDiff but it can be many.
If we increase the capacity of SmallVec from 2 to 4, we will simply waste stack space most of the time.

We can make N in SmallVec<[T; N]> parameterizable with our trait system (still not super clear how to make this ergnomics though, but this is possible) if the need arises.

To conclude, I don't see the benefit of using ArrayVec over SmallVec for Limit, and I believe that SmallVec<[T; 2]> is fine for the majority of usecases in SortBy.

Hywan · 2024-01-23T07:45:59Z

Just a note that I’m improving the main “find the position” algorithm to use a binary search. It’s coming soon.

jplatte · 2024-01-26T20:14:21Z

So. SmallVec does what ArrayVec does except that it moves to the heap if more space is required.

I know. I like the simplicity of a fixed capacity where nothing more is necessary. I still prefer keeping arrayvec for the Limit adapter. The reasoning for using SmallVec<[T; 2]> instead of something else like SmallVec<[T; 4]> for SortBy makes sense to me.

This patch uses a binary search when the position of a value must be found. Finding a new value based on the `compare` function can be costly, hence the use of a binary search.

jplatte

I'll take care of the ArrayVec / SmallVec difference separately from this PR.

jplatte · 2024-02-16T21:26:09Z

Released as part of eyeball-im-util 0.5.3.

Hywan added 2 commits January 11, 2024 22:12

chore: Rename VectorDiffContainerStreamLimitBuf.

d0173f6

This type alias is renamed to `VectorDiffContainerStreamBuffer`.

Hywan force-pushed the feat-sort branch from b86e737 to f47a353 Compare January 20, 2024 20:10

Hywan requested a review from jplatte January 20, 2024 20:23

jplatte reviewed Jan 21, 2024

View reviewed changes

Hywan force-pushed the feat-sort branch from f47a353 to 4970ec0 Compare January 21, 2024 19:12

Hywan added 4 commits January 22, 2024 08:29

chore: Rename Sort to SortBy.

c51b10d

chore: Remove #[project = SortProj].

e8a51b6

test: Remove a temporary method.

15e77e5

chore: Format.

c729d94

Hywan changed the title ~~feat: Implement the Sort stream adapter~~ feat: Implement the SortBy stream adapter Jan 22, 2024

chore: Elide some lifetimes.

d2692ae

Hywan mentioned this pull request Jan 29, 2024

feat(base) Client-side sorting, prelude: Implement Client::rooms_stream matrix-org/matrix-rust-sdk#3068

Merged

feat: Use binary search for better performance.

c08bac6

This patch uses a binary search when the position of a value must be found. Finding a new value based on the `compare` function can be costly, hence the use of a binary search.

jplatte approved these changes Jan 31, 2024

View reviewed changes

jplatte merged commit 5829029 into jplatte:main Jan 31, 2024
6 checks passed

Hywan mentioned this pull request Jun 21, 2024

feat(ui): Client-side sorting in RoomList matrix-org/matrix-rust-sdk#3585

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement the `SortBy` stream adapter #43

feat: Implement the `SortBy` stream adapter #43

Hywan commented Jan 20, 2024 •

edited

Loading

jplatte left a comment •

edited

Loading

jplatte left a comment

jplatte Jan 21, 2024

Hywan Jan 22, 2024

jplatte Jan 22, 2024

Hywan Jan 22, 2024

Hywan commented Jan 22, 2024

Hywan commented Jan 23, 2024

jplatte commented Jan 26, 2024

jplatte left a comment

jplatte commented Feb 16, 2024

feat: Implement the SortBy stream adapter #43

feat: Implement the SortBy stream adapter #43

Conversation

Hywan commented Jan 20, 2024 • edited Loading

jplatte left a comment • edited Loading

Choose a reason for hiding this comment

jplatte left a comment

Choose a reason for hiding this comment

jplatte Jan 21, 2024

Choose a reason for hiding this comment

Hywan Jan 22, 2024

Choose a reason for hiding this comment

jplatte Jan 22, 2024

Choose a reason for hiding this comment

Hywan Jan 22, 2024

Choose a reason for hiding this comment

Hywan commented Jan 22, 2024

Hywan commented Jan 23, 2024

jplatte commented Jan 26, 2024

jplatte left a comment

Choose a reason for hiding this comment

jplatte commented Feb 16, 2024

feat: Implement the `SortBy` stream adapter #43

feat: Implement the `SortBy` stream adapter #43

Hywan commented Jan 20, 2024 •

edited

Loading

jplatte left a comment •

edited

Loading