Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement the SortBy stream adapter #43

Merged
merged 9 commits into from
Jan 31, 2024
Merged

Conversation

Hywan
Copy link
Collaborator

@Hywan Hywan commented Jan 20, 2024

Sort is VectorDiff stream adapter that presents a sorted view of the underlying ObservableVector items. A demonstration:

use eyeball_im::{ObservableVector, VectorDiff};
use eyeball_im_util::vector::VectorObserverExt;
use imbl::vector;
use std::cmp::Ordering;
use stream_assert::{assert_closed, assert_next_eq, assert_pending};

// A comparison function that is used to sort our
// `ObservableVector` values.
fn cmp<T>(left: &T, right: &T) -> Ordering
where
    T: Ord,
{
    left.cmp(right)
}

// Our vector.
let mut ob = ObservableVector::<char>::new();
let (values, mut sub) = ob.subscribe().sort_by(&cmp);
//                                              ^^^
//                                              | our comparison function

assert!(values.is_empty());
assert_pending!(sub);

// Append multiple unsorted values.
ob.append(vector!['d', 'b', 'e']);
// We get a `VectorDiff::Append` with sorted values!
assert_next_eq!(sub, VectorDiff::Append { values: vector!['b', 'd', 'e'] });

// Let's recap what we have. `ob` is our `ObservableVector`,
// `sub` is the “sorted view”/“sorted stream” of `ob`:
// | `ob`  | d b e |
// | `sub` | b d e |

// Append other multiple values.
ob.append(vector!['f', 'g', 'a', 'c']);
// We get three `VectorDiff`s!
assert_next_eq!(sub, VectorDiff::PushFront { value: 'a' });
assert_next_eq!(sub, VectorDiff::Insert { index: 2, value: 'c' });
assert_next_eq!(sub, VectorDiff::Append { values: vector!['f', 'g'] });

// Let's recap what we have:
// | `ob`  | d b e f g a c |
// | `sub` | a b c d e f g |
//           ^   ^     ^^^
//           |   |     |
//           |   |     with `VectorDiff::Append { .. }`
//           |   with `VectorDiff::Insert { index: 2, .. }`
//           with `VectorDiff::PushFront { .. }`

// Technically, `Sort` emits `VectorDiff`s that mimic a sorted `Vector`.

drop(ob);
assert_closed!(sub);

The code above demonstrates how a VectorDiff::Append is handled, but all other VectorDiff's variants produce various VectorDiff. I've tried to generate VectorDiff that are as much correct as possible regarding the semantics the user might expect. For example, if a VectorDiff::Insert { index: 42, … } is received but the value will be inserted at the (sorted) index 0, a VectorDiff::PushFront will be emitted instead of a VectorDiff::Insert { index: 0, … }.

The code is extensively documented. I hope it helps to understand how this is implemented. It's pretty basic actually. The only difficulty is that it breaks your brain because you've to constantly remind: “Is this index for the sorted or the unsorted position”? I've tried to use consistent and clear naming for variables.

The motivation behind this patch is to replace `arrayvec` by `smallvec`.
Why? Because `arrayvec` cannot grow, it's an array-baked vector. Once
the full capacity is reached, it cannot grow any more. That was fine
up until now because it was only used by the `Limit` stream adapter
which needs only 2 values in a buffer that is typed by an `ArrayVec`.
But for other stream adapters, it's a limitation. What if a stream
adapter needs a buffer of more than 2 values?

Ideally, we want to preserve the stack-allocated property for obvious
performance reasons (cache locality, reducing allocator traffic and so
on). However, we also want to be able to fallback to the heap for larger
allocations.

`smallvec` provides this feature. Hence, it does what `arrayvec` does,
with the addition of heap-allocated fallback when it's needed.
This type alias is renamed to `VectorDiffContainerStreamBuffer`.
Copy link
Owner

@jplatte jplatte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few general comments before I dive into the implementation:

  • I don't like that Limit now uses SmallVec. I'd prefer to rename the associated Buffer type back to LimitBuf and introduce a separate SortBuf that's either a SmallVec or just a regular Vec for the non-batching subscriber. If you consider the SmallVec usage important I don't mind the extra dependency, but Limit should continue using ArrayVec IMO. In that case I have a slight preference for using SmallVec<[T; 4]> over SmallVec<[T; 2]> but that's just my personal feeling about what might be a reasonable cut-off for stack vs. heap, feel free to ignore it.
  • The name sort is used in std and other libraries to sort based on the element type's Ord implementation. The closure-taking form is called sort_by, so the same name should be used here. We can keep sort for the module name though and reuse most of the implementation for a Sort adapter that does not take a sorting function (and SortByKey that takes a key function) later.

Copy link
Owner

@jplatte jplatte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rough first review

VectorDiffContainerStreamElement,
};

type UnsortedIndex = usize;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Filter adapter, the "index of the element in the input vector" is called the original index. Would you mind using the same name here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the naming should be identical in all stream adapters. The semantics take the priority over the consistency in some cases, and I reckon this is one of the cases. What matters —to me— is that we have facing an index that represented the unsorted position of the value. Whether it is the original or not the original index isn't really important. Thoughts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the semantics are what's important definitely. I still think it's a clearer name. The input / original list could already be sorted ;)

I'd also be okay with input / output index naming (for all adapters that re-order or filter items), how do you feel about that option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want all stream adapters to use the same terminology?

eyeball-im-util/src/vector/sort.rs Outdated Show resolved Hide resolved
eyeball-im-util/src/vector/sort.rs Outdated Show resolved Hide resolved
eyeball-im-util/src/vector/sort.rs Outdated Show resolved Hide resolved
eyeball-im-util/src/vector/sort.rs Outdated Show resolved Hide resolved
eyeball-im-util/src/vector/sort.rs Outdated Show resolved Hide resolved
`Sort` is `VectorDiff` stream adapter that presents a sorted view of the
underlying `ObservableVector` items.

```rust
use eyeball_im::{ObservableVector, VectorDiff};
use eyeball_im_util::vector::VectorObserverExt;
use imbl::vector;
use std::cmp::Ordering;
use stream_assert::{assert_closed, assert_next_eq, assert_pending};

// A comparison function that is used to sort our
// `ObservableVector` values.
fn cmp<T>(left: &T, right: &T) -> Ordering
where
    T: Ord,
{
    left.cmp(right)
}

// Our vector.
let mut ob = ObservableVector::<char>::new();
let (values, mut sub) = ob.subscribe().sort(cmp);
//                                          ^^^
//                                          | our comparison function

assert!(values.is_empty());
assert_pending!(sub);

// Append multiple unsorted values.
ob.append(vector!['d', 'b', 'e']);
// We get a `VectorDiff::Append` with sorted values!
assert_next_eq!(sub, VectorDiff::Append { values: vector!['b', 'd', 'e'] });

// Let's recap what we have. `ob` is our `ObservableVector`,
// `sub` is the “sorted view”/“sorted stream” of `ob`:
// | `ob`  | d b e |
// | `sub` | b d e |

// Append other multiple values.
ob.append(vector!['f', 'g', 'a', 'c']);
// We get three `VectorDiff`s!
assert_next_eq!(sub, VectorDiff::PushFront { value: 'a' });
assert_next_eq!(sub, VectorDiff::Insert { index: 2, value: 'c' });
assert_next_eq!(sub, VectorDiff::Append { values: vector!['f', 'g'] });

// Let's recap what we have:
// | `ob`  | d b e f g a c |
// | `sub` | a b c d e f g |
//           ^   ^     ^^^
//           |   |     |
//           |   |     with `VectorDiff::Append { .. }`
//           |   with `VectorDiff::Insert { index: 2, .. }`
//           with `VectorDiff::PushFront { .. }`

// Technically, `Sort` emits `VectorDiff` that mimics a sorted `Vector`.

drop(ob);
assert_closed!(sub);
```
@Hywan Hywan changed the title feat: Implement the Sort stream adapter feat: Implement the SortBy stream adapter Jan 22, 2024
@Hywan
Copy link
Collaborator Author

Hywan commented Jan 22, 2024

I've written a long comment but it's been deleted… Damn… I'll try to summarize 😛 .

  • I don't like that Limit now uses SmallVec. I'd prefer to rename the associated Buffer type back to LimitBuf and introduce a separate SortBuf that's either a SmallVec or just a regular Vec for the non-batching subscriber. If you consider the SmallVec usage important I don't mind the extra dependency, but Limit should continue using ArrayVec IMO. In that case I have a slight preference for using SmallVec<[T; 4]> over SmallVec<[T; 2]> but that's just my personal feeling about what might be a reasonable cut-off for stack vs. heap, feel free to ignore it.

So. SmallVec does what ArrayVec does except that it moves to the heap if more space is required.

Inside Limit, we are sure that at most VectorDiff will be generated, hence SmallVec<[T; 2]> provides the same guarantees that ArrayVec<T, 2>.

Inside SortBy, for most operations (like insert, remove, push_front, push_back and so on), we are sure that the number of generated VectorDiff is at most 1. There is 2 exceptions:

  • set generates at most 2 VectorDiff,
  • append generates at best 1 VectorDiff but it can be many.
    If we increase the capacity of SmallVec from 2 to 4, we will simply waste stack space most of the time.

We can make N in SmallVec<[T; N]> parameterizable with our trait system (still not super clear how to make this ergnomics though, but this is possible) if the need arises.

To conclude, I don't see the benefit of using ArrayVec over SmallVec for Limit, and I believe that SmallVec<[T; 2]> is fine for the majority of usecases in SortBy.

@Hywan
Copy link
Collaborator Author

Hywan commented Jan 23, 2024

Just a note that I’m improving the main “find the position” algorithm to use a binary search. It’s coming soon.

@jplatte
Copy link
Owner

jplatte commented Jan 26, 2024

So. SmallVec does what ArrayVec does except that it moves to the heap if more space is required.

I know. I like the simplicity of a fixed capacity where nothing more is necessary. I still prefer keeping arrayvec for the Limit adapter. The reasoning for using SmallVec<[T; 2]> instead of something else like SmallVec<[T; 4]> for SortBy makes sense to me.

This patch uses a binary search when the position of a value must be
found. Finding a new value based on the `compare` function can be
costly, hence the use of a binary search.
Copy link
Owner

@jplatte jplatte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take care of the ArrayVec / SmallVec difference separately from this PR.

@jplatte jplatte merged commit 5829029 into jplatte:main Jan 31, 2024
6 checks passed
@jplatte
Copy link
Owner

jplatte commented Feb 16, 2024

Released as part of eyeball-im-util 0.5.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants