Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IterableDataProvider::supported_locales is a misnomer #4872

Closed
sffc opened this issue May 6, 2024 · 15 comments
Closed

IterableDataProvider::supported_locales is a misnomer #4872

sffc opened this issue May 6, 2024 · 15 comments
Assignees
Labels
2.0-breaking Changes that are breaking API changes C-data-infra Component: provider, datagen, fallback, adapters

Comments

@sffc
Copy link
Member

sffc commented May 6, 2024

Now that we have a more well-established notion of a "supported locale" (#58), we should rename this trait function.

I suggest: iter_locales, and maybe make it return an iterator (if not too hard).

@sffc sffc added C-data-infra Component: provider, datagen, fallback, adapters 2.0-breaking Changes that are breaking API changes labels May 6, 2024
@sffc sffc added this to the ICU4X 2.0 milestone May 6, 2024
@robertbastian
Copy link
Member

I believe this should be

fn supported_requests(&self) -> Result<HashSet<DataRequest>, DataError>

This is because after moving key attributes to the request, supported-locales is not enough to enumerate all supported requests. And if we're using the "requests" phrasing, this will not clash with the notion of supported locales. Also, for performance, I'd like this to return a HashSet instead of an impl Iterator, like we already do internally in datagen.

@sffc
Copy link
Member Author

sffc commented May 31, 2024

I see you implemented this in #4981

I'm still a little worried about calling it "supported", since the set of supported requests is not finite, but it is not as bad as "supported locales". Calling it "iter" takes away any notion of the semantics of the return value.

@robertbastian
Copy link
Member

I made the change from locales to requests, as the PR requires that, and from Vec to HashSet, as that gets rid of a hash-set-to-vec conversion in icu_datagen. I have not made a final decision on the name.

@sffc
Copy link
Member Author

sffc commented May 31, 2024

Yes, those two changes are good. I would say to use BTreeSet since it is in alloc but we can't do that easily because we don't have Ord, and HashSet is probably going to be a bit faster anyway since there could be hundreds or thousands of entries. Just need to bikeshed the name.

@sffc
Copy link
Member Author

sffc commented May 31, 2024

As part of fixing this, we should make the value of the HashSet be something that can implement Borrow to a fully borrowed type so that lookups can be cheaper. #4981 (comment)

@sffc
Copy link
Member Author

sffc commented May 31, 2024

I suggested up there that icu_provider should export a helper struct called something like LocaleWithAttributes that has all the impls we need.

Such a struct could be a field of DataRequest, but I am not proposing that.

@sffc
Copy link
Member Author

sffc commented May 31, 2024

^ If we use this name, the function can be called iter_locales_with_attributes. Problem solved.

@robertbastian
Copy link
Member

Such a struct could be a field of DataRequest, but I am not proposing that.

No, because we use the fact that locale and attributes can be independently borrowed in fallback.

@sffc
Copy link
Member Author

sffc commented May 31, 2024

It would be another public exhaustive struct so things could still be borrowed separately

@robertbastian
Copy link
Member

A struct over two borrowed values cannot be borrowed as a struct over the owned values. The pair we're currently using is basically a struct, so the problem is not solved by having a struct.

Re name, what about allowed_requests?

@sffc
Copy link
Member Author

sffc commented Jun 3, 2024

Maybe just make them both cows?

pub struct LocaleAndAttributes<'a, 'b> {
    pub locale: Cow<'a, DataLocale>, // or LanguageIdentifier
    pub key_attributes: Cow<'b, KeyAttributes>, // or KeyAttributes<'b>
}

I'd rather have the indirection than all the extra cloning.

Re name, what about allowed_requests?

Maybe but I'm not sure "allowed" is much better than "supported". They both make it sound like these are the only requests you can pass in, but you can pass in any request.

@robertbastian
Copy link
Member

Yes I'm using Cows in #4995. This API can stay the same.

@robertbastian
Copy link
Member

robertbastian commented Jun 20, 2024

Preliminary discussion:

struct DataIdentityOwned { pub locale: DataLocale, pub marker_attributes: DataMarkerAttributes }
struct DataIdentityBorrowed<'a> { pub locale: &'a DataLocale, pub marker_attributes: &'a DataMarkerAttributes }

struct DataRequest<'a> { id: DataIdentityBorrowed<'a>, metadata: DataRequestMetadata }

/// A [`DataProvider`] that can iterate over all supported [`DataLocale`] for a certain marker.
///
/// Implementing this trait means that a data provider knows all of the data it can successfully
/// return from a load request.
pub trait IterableDataProvider<M: DataMarker>: DataProvider<M> {
    /// Returns a list of supported `DataRequests`, in owned form.
    fn iter_requests(&self) -> Result<BTreeSet<DataIdentityOwned>, DataError>;
}
pub trait Bikeshed<M: DataMarker>: DataProvider<M> { 
    fn can_load(&self, req: DataRequest) -> Result<bool, DataError>;
}
  • check_req is internal to DatagenProvider, so supports_request does not need to be part of the iter trait. The iter trait's invariant should be that it returns the tight set of requests, which is what check_req internally does

@sffc
Copy link
Member Author

sffc commented Jul 19, 2024

@robertbastian What is left on this issue?

@robertbastian
Copy link
Member

Nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes C-data-infra Component: provider, datagen, fallback, adapters
Projects
Status: Done
Development

No branches or pull requests

2 participants