Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a Compare trait in the cmp mod #527

Open
tisonkun opened this issue Jan 26, 2025 · 20 comments
Open

Introduce a Compare trait in the cmp mod #527

tisonkun opened this issue Jan 26, 2025 · 20 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@tisonkun
Copy link

tisonkun commented Jan 26, 2025

Proposal

Problem statement

We have core::cmp::Reverse to wrap a type for changing the ordering logic of a type T. However, under certain generic context, it would introduce unnecessary burden over the input type T where T can be qualified instead.

Introducing a Compare trait can help in such a situation that qualified the type T as is, and accepting a new type C: Compare, acting as a comparator, to switch among different comparison functions.

Motivating examples or use cases

We have functions like:

fn do_register_extremum<T, C>(registry: &mut FuncRegistry, name: &'static str)
where
    T: ConcreteType,
    C: Compare<T::Datum, T::Datum> + Default,
{ ... }

and use the compare traits:

do_register_extremum::<T, Natural<_>>(registry, "min");
do_register_extremum::<T, Rev<Natural<_>>>(registry, "max");

If using core::cmp::Reverse, it would require complex method forwarding to let Reverse<T> implement everything T does.

Take a more complex case, we make a PartialCompare trait based on the Compare trait

/// A trait like [`Compare`] but only requires partial comparison ability.
pub trait PartialCompare<L: ?Sized, R: ?Sized = L> {
    fn compare(&self, l: &L, r: &R) -> Option<Ordering>;

    fn compares_lt(&self, l: &L, r: &R) -> bool {
        matches!(self.compare(l, r), Some(Ordering::Less))
    }

    fn compares_le(&self, l: &L, r: &R) -> bool {
        matches!(
            self.compare(l, r),
            Some(Ordering::Less) | Some(Ordering::Equal)
        )
    }

    fn compares_gt(&self, l: &L, r: &R) -> bool {
        matches!(self.compare(l, r), Some(Ordering::Greater))
    }

    fn compares_ge(&self, l: &L, r: &R) -> bool {
        matches!(
            self.compare(l, r),
            Some(Ordering::Greater) | Some(Ordering::Equal)
        )
    }

    fn compares_eq(&self, l: &L, r: &R) -> bool {
        matches!(self.compare(l, r), Some(Ordering::Equal))
    }

    fn compares_ne(&self, l: &L, r: &R) -> bool {
        !matches!(self.compare(l, r), Some(Ordering::Equal))
    }
}

impl<L, R, C: Compare<L, R>> PartialCompare<L, R> for C {
    fn compare(&self, l: &L, r: &R) -> Option<Ordering> {
        Some(self.compare(l, r))
    }
}

and use it as:

    register_common_comparison::<IntType, IntType, Natural<_>>(registry);
    register_common_comparison::<UIntType, UIntType, Natural<_>>(registry);
    register_common_comparison::<FloatType, FloatType, Natural<_>>(registry);
    register_common_comparison::<BinaryType, BinaryType, Borrowing<Natural<[u8]>, _, _>>(registry);
    register_common_comparison::<StringType, StringType, Borrowing<Natural<str>, _, _>>(registry);
    register_common_comparison::<BooleanType, BooleanType, Natural<_>>(registry);
    register_common_comparison::<TimestampType, TimestampType, Natural<_>>(registry);
    register_common_comparison::<IntervalType, IntervalType, Natural<_>>(registry);
    register_common_comparison::<VariantType, VariantType, VariantCompare>(registry);

    register_common_comparison::<IntType, UIntType, NumericCompare>(registry);
    register_common_comparison::<UIntType, IntType, NumericCompare>(registry);
    register_common_comparison::<IntType, FloatType, NumericCompare>(registry);
    register_common_comparison::<UIntType, FloatType, NumericCompare>(registry);
    register_common_comparison::<FloatType, IntType, NumericCompare>(registry);
    register_common_comparison::<FloatType, UIntType, NumericCompare>(registry);

fn register_common_comparison<T, U, C>(registry: &mut FuncRegistry)
where
    T: ConcreteType,
    U: ConcreteType,
    for<'a> C: PartialCompare<T::DatumRef<'a>, U::DatumRef<'a>> + Default + Copy,
{ .. }

You can imagine if we have to wrap all the types here with Reverse or similar approaches, it would be quite challenging to forward all the traits and make the type system happy.

Solution sketch

Bringing Compare and PartialCompare into the core lib would help in this case, where the compare crate is linked above, and the PartialCompare implementation is inlined above.

I'd first submit this issue to see if this is the way to go, further implement details can go as follow-ups.

Add four traits:

pub trait PartialEqual<L: ?Sized, R: ?Sized = L> {
  fn eq(&self, l: &L, r: &R) -> bool;
  fn ne(&self, l: &L, r: &R) -> bool { .. }
}

pub trait Equal<T: ?Sized>: PartialEqual<T, T> {}

pub trait PartialCompare<L: ?Sized, R: ?Sized = L>: PartialEqual<L, R> {
  fn partial_cmp(&self, l: &L, r: &R) -> Option<Ordering>;
  fn lt(&self, l: &L, r: &R) -> bool { .. }
  fn le(&self, l: &L, r: &R) -> bool { .. }
  fn gt(&self, l: &L, r: &R) -> bool { .. }
  fn ge(&self, l: &L, r: &R) -> bool { .. }
}

pub trait Compare<T: ?Sized>: Equal<T> {
  fn cmp(&self, l: &T, r: &T) -> Ordering;
  fn max(&self, l: T, r: T) -> bool { .. }
  fn min(&self, l: T, r: T) -> bool { .. }
  fn clamp(&self, v: T, l: T, r: T) -> bool { .. }
}

Alternatives

Just use the compare crate or make a new crates.io crate.

However, the compare crate isn't updates for more than nine years, and somehow the core lib has cmp mod that already provides some functionality to support this use case. I wonder if it's suitable as a supplement to the core lib.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@tisonkun tisonkun added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Jan 26, 2025
@tisonkun
Copy link
Author

cc @Amanieu as a libs-team member and ever touch the compare crate.

@the8472
Copy link
Member

the8472 commented Jan 26, 2025

Would it fit into existing collections like BTreeMap?

@tisonkun
Copy link
Author

@the8472 Using these Compare/PartialCompare trait in collections would introduce more topics to discuss.

I found a previous discussion in IRLO. It seems to be no blocker for adding a new generic parameter like the extension of allocator-api. But I'd break it down into a few steps:

  1. Determinate whether patching existing collections is possible (so far, it is)
  2. Initiate a proposal for the patch, for both implementation and compatibility.
  3. Implement it.

This is, however, not a blocker for introducing the Compare/PartialCompare trait as a core lib API for every Rust user, as they're proposed to be public APIs.

@the8472
Copy link
Member

the8472 commented Jan 26, 2025

This is, however, not a blocker for introducing the Compare/PartialCompare trait as a core lib API

Implementing is not a blocker, checking whether the design is compatible would be imo, since it would be quite unfortunate to add a mechanism to the standard library which doesn't integrate with other parts of the library.

@tisonkun
Copy link
Author

tisonkun commented Jan 26, 2025

@the8472 IIUC the integration is possible.

You can check binary-heap-plus linked above as an example. And the allocator-api show a way for evolution the generic of existing collections. HashMap also has an S type parameter that can be an analogy to BTreeMap's to-be-added Compare type parameter.

@programmerjake
Copy link
Member

finally a way to compare things in standard algorithms where you need extra state outside of the items you're comparing!

@programmerjake
Copy link
Member

I think also having a CompareEq trait would be useful where you don't have ordering, but just eq/ne, like PartialEq/Eq but with state.

@tisonkun
Copy link
Author

tisonkun commented Jan 28, 2025

@programmerjake Yeah. Then if we follow the Hash/Hasher/BuildeHasher analogy, we may have:

  • Compare/BuildCompare
  • PartialCompare/BuildPartialCompare
  • CompareEq/BuildCompareEq
  • PartialEq (UPDATE: I noticed that Eq and PartialEq have the same method and signature)

I'd like to confirm that you're agree on the general direction, and ask if you have better names suggested.

@programmerjake
Copy link
Member

@programmerjake Yeah. Then if we follow the Hash/Hasher/BuildeHasher analogy, we may have:

  • Compare/BuildCompare
  • PartialCompare/BuildPartialCompare
  • CompareEq/BuildCompareEq
  • PartialEq (UPDATE: I noticed that Eq and PartialEq have the same method and signature)

they have significant semantic differences, so I think we need both ComparePartialEq and CompareEq.

I don't think we need the Build* traits -- hasher needs them because each hasher impl almost always doesn't know which types it's being called on so needs to accumulate state over multiple calls and finally return a hash, whereas Compare* doesn't have that problem since it's only called once for each comparison and it knows which types it's comparing. Default is sufficient for creating a new Compare* instance when you don't want to pass a custom-constructed one in.

@the8472
Copy link
Member

the8472 commented Jan 28, 2025

If we add CustomEq we'd also need externalized hashing because a Map that uses custom key comparison can also need custom key hashing.

@tisonkun
Copy link
Author

they have significant semantic differences, so I think we need both ComparePartialEq and CompareEq.

Well. Then maybe for naming:

  • Compare
  • ComparePartial
  • CompareEq
  • ComparePartialEq

Default is sufficient for creating a new Compare* instance when you don't want to pass a custom-constructed one in.

Yes. This is the trait bound I currently use. I saw your comment mention "with state" so wonder whether we need a builder. And Default should be a optional bound if we'd support with_comparator(c) to pass a struct.

because a Map that uses custom key comparison can also need custom key hashing

I think this is dedicated to HashMap that requires K: Eq + Hash? What do you mean by "externalized hashing" in the context of comparator (this issue)?

@programmerjake
Copy link
Member

they have significant semantic differences, so I think we need both ComparePartialEq and CompareEq.

Well. Then maybe for naming:

  • Compare
  • ComparePartial
  • CompareEq
  • ComparePartialEq

looks good!

because a Map that uses custom key comparison can also need custom key hashing

I think this is dedicated to HashMap that requires K: Eq + Hash? What do you mean by "externalized hashing" in the context of comparator (this issue)?

Imagine you have a HashSet<String> except instead of using PartialEq, you supply CaseInsensitive: CompareEq, now you have a problem because "a" and "A" have different hashes but compare equal. having something like CompareEq but for hashing fixes that problem, so you'd have CaseInsensitive: BikeshedHash + CompareEq in HashSet<String, CaseInsensitive>

@programmerjake
Copy link
Member

since hashes don't easily combine and it's usually better to pass all input parts into a hashing algorithm and extract a hash at the end, maybe have an API like so:

pub trait BikeshedHash<T: ?Sized> {
    type Hasher<'a>: 'a
    where
        Self: 'a;
    fn build_hasher(&self) -> Self::Hasher<'_>;
    fn hash<'a>(&'a self, hasher: &mut Self::Hasher<'a>, value: &T);
    fn finish<'a>(&'a self, hasher: Self::Hasher<'a>) -> u64;
}

this allows recursive calls of hash, e.g.:

struct StringPair(String, String);
struct CaseInsensitive(DefaultHasher);
impl BikeshedHash<StringPair> for CaseInsensitive {
    type Hasher<'a> = RandomState;
    fn build_hasher(&self) -> Self::Hasher<'_> {
        self.0.build_hasher()
    }
    fn hash<'a>(&'a self, hasher: &mut Self::Hasher<'a>, value: &StringPair) {
        self.hash(hasher, &value.0);
        self.hash(hasher, &value.1);
    }
    fn finish<'a>(&'a self, hasher: Self::Hasher<'a>) -> u64 {
        hasher.finish()
    }
}

impl BikeshedHash<String> for CaseInsensitive {
    type Hasher<'a> = RandomState;
    fn build_hasher(&self) -> Self::Hasher<'_> {
        self.0.build_hasher()
    }
    fn hash<'a>(&'a self, hasher: &mut Self::Hasher<'a>, value: &String) {
        value.len().hash(hasher);
        for b in value.bytes() {
            let b = b.to_ascii_uppercase(); // I'm lazy, so skip unicode
            b.hash(hasher);
        }
    }
    fn finish<'a>(&'a self, hasher: Self::Hasher<'a>) -> u64 {
        hasher.finish()
    }
}

@Amanieu
Copy link
Member

Amanieu commented Jan 28, 2025

An alternative approach would be a separate type along the lines of hashbrown's HashTable which gives you full control over hashing and equality comparisons. An advantage of this approach over a Compare-like trait is that the auxiliary data needed for comparisons doesn't need to be fully inside the collection but can instead be passed in by reference only when it is needed. This allows it to be shared between multiple collections.

@programmerjake
Copy link
Member

well, if only the completely external to the container approach were in std for all the other things that use comparison, e.g. BTreeMap/Set, BinaryHeap, etc.

@tisonkun
Copy link
Author

I've updated the description "Solution sketch" for adding four traits (analogy to PartialEq/Eq/PartialOrd/Ord).

For collections like BinaryHeap and BTreeMap, we can add a new state field and its type parameter, revoke the trait bound of key in insert/remove methods, and update the constructor with comparator (default to natural order + K: Ord).

An advantage of this approach over a Compare-like trait is that the auxiliary data needed for comparisons doesn't need to be fully inside the collection

I wonder how we pass the customized Compare logic if we don't introduce a Compare trait. Or it's an implementation method with the Compare trait.

a problem because "a" and "A" have different hashes but compare equal

For user-defined type, we already require well-defined implementation. That said, if we wrap str in a new type and implement hash unaligned with equal, it would behave wrongly.

@tisonkun
Copy link
Author

For HashMap, I may need some time to work a clear solution. But it seems the internal HashMap(HashTable) can have a comparator as state to wrap the equivalent implementation?

@tisonkun
Copy link
Author

tisonkun commented Feb 3, 2025

An alternative approach would be a separate type along the lines of hashbrown's HashTable which gives you full control over hashing and equality comparisons. An advantage of this approach over a Compare-like trait is that the auxiliary data needed for comparisons doesn't need to be fully inside the collection but can instead be passed in by reference only when it is needed. This allows it to be shared between multiple collections.

@Amanieu I looked into this solution today and noticed that we don't expose HashTable in the std lib (apache/arrow-rs#6035 (comment)). Is there a plan to do so?

Anyway, having such a solution with the current status to HashSet/HashMap may unblock the concern "whether the design is compatible".

@Amanieu
Copy link
Member

Amanieu commented Feb 4, 2025

@Amanieu I looked into this solution today and noticed that we don't expose HashTable in the std lib (apache/arrow-rs#6035 (comment)). Is there a plan to do so?

Not at the moment: the recommendation is for people who need HashTable to use it directly from the hashbrown crate. This is fine since HashMap/HashSet are sufficient for most use cases.

@the8472
Copy link
Member

the8472 commented Feb 4, 2025

We discussed this during today's libs-API meeting. We're sympathetic to making custom comparisons more ergonomic, but to make a decision we need sketches of the potential broader API designs (including alternatives and how ergonomic they are) and how this would fit into collections and other APIs that use comparisons.

A lot of the discussion in the meeting echoed comments from this thread. If we introduce a new trait in std it should also be usable by and with other std types and APIs. And a major potential user of that would be collections, sorting and iterators or collection contents.
For collections then there is a tradeoff is whether it's important for ergonomics to include the Compare in collections themselves or whether lower-level collection interfaces that take an externally supplied Compare on each operation to reduce the memory footprint when there are many instances and Compare isn't zero-sized.

Compare would likely also need need convenience APIs similar to the various *_by_key methods we have and which binary-heap-plus uses too. Or combinators like then from the compare crate. Such things also should be part of an API outline.


Now writing this comment I notice we didn't actually spend much time on discussing your motivation.

You say

You can imagine if we have to wrap all the types here with Reverse or similar approaches, it would be quite challenging to forward all the traits and make the type system happy.

Bringing Compare and PartialCompare into the core lib would help in this case, where the compare crate is linked above, and the PartialCompare implementation is inlined above.

Can you explain how just having the traits in core without further integration into the rest of the standard library would improve the problems you're facing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

4 participants