Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DataPayload constructible from &'static M::Yokeable #3467

Merged
merged 8 commits into from
Jun 7, 2023

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented May 31, 2023

Part of #3020

This adds M::Yokeable: ZeroFrom<'static, M::Yokeable> bounds to DataPayload::map_project and DataPayload::try_map_project. This is a breaking change, however ZeroFrom is implemented for all data structs (through #[data_struct] or convention).

@jira-pull-request-webhook

This comment was marked as spam.

@jira-pull-request-webhook

This comment was marked as spam.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approach seems close to what we discussed.

Am hoping we can potentially reduce this down to Yoke<&'static M, C> with some tricks. I think it's possible by making the cart type more complex, and having with_mut() extract the cart object

pub struct DataPayload<M: DataMarker>(pub(crate) DataPayloadInner<M>);

pub(crate) enum DataPayloadInner<M: DataMarker> {
Yoke(Yoke<M::Yokeable, Option<Cart>>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: worried about stack size increase, and worried about runtime cost

Admittedly I do not know if there is a way around this, I just want to register the concern. It may be insurmountable and we can decide to do this anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though this does become easy to cfg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime cost: solved by using as_borrowed, and the fact that we don't need to ZeroFrom everything.

Stack size: it seems to be, at worst, one extra word per payload. Maybe if we make Yoke have better niches, we could get this to be stack-neutral.

DataPayload(match self.0 {
DataPayloadInner::Yoke(yoke) => DataPayloadInner::Yoke(yoke.map_project(f)),
DataPayloadInner::Ref(r) => {
DataPayloadInner::Yoke(Yoke::new_owned(r.clone()).map_project(f))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this does have a cost. I'm not sure how we can avoid this cost. However, there are a bunch of cases in the code where we don't need owned mapping: ought we introduce map_project_ref() that does not require cloning carts, but does operate on references?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there are a few places in datetime where we map_project in runtime code, so I agree it would be nice if we didn't need to downgrade to ZeroFrom, but I'm not sure if that's avoidable in the general case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not avoidable in the general case, it is avoidable in DateTime.

Unfortunately it's not avoidable in the map property FFI code without significant changes :/ But we might be able to figure out a different way to type-erase there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we pay this cost on every DataPayload construction. With this PR, we only pay the cost when we map_project, so this is a strict improvement.

map_project_cloned() already operates on references.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it also clones the cart, which isn't great.

Something that operates on self, uses &Y and doesn't clone the cart is the missing niche here

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the files in provider/core/src because I think the rest is from another PR

pub struct DataPayload<M: DataMarker>(pub(crate) DataPayloadInner<M>);

pub(crate) enum DataPayloadInner<M: DataMarker> {
Yoke(Yoke<M::Yokeable, Option<Cart>>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime cost: solved by using as_borrowed, and the fact that we don't need to ZeroFrom everything.

Stack size: it seems to be, at worst, one extra word per payload. Maybe if we make Yoke have better niches, we could get this to be stack-neutral.

provider/core/src/response.rs Outdated Show resolved Hide resolved
provider/core/src/response.rs Outdated Show resolved Hide resolved
DataPayloadInner::Ref(r) => {
let output: <M2::Yokeable as Yokeable<'static>>::Output =
f(Yokeable::transform(*r), PhantomData);
// Safety: <M2::Yokeable as Yokeable<'static>>::Output is the same type as M2::Yokeable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: just downgrade the thing to a Yoke via ZeroFrom instead of doing these unsafe gymnastics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? f operates on a reference so we can use that to avoid doing the expensive ZeroFrom which we're trying to avoid with this PR. It's not gymnastics if it's correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, well there's still necessarily a Clone or ZeroFrom happening inside of the f function, but the advantage is that in map_project_cloned we only need to do that on a subset of fields instead of the whole thing.

I think the safety comment is not quite right; the invariant is "The returned value must be destroyed before the data from was borrowing from is." The fact that you're setting 'a to 'static on line 390 is important to the safety of the borrowed data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "The returned value must be destroyed before the data from was borrowing from is." is irrelevant. The point here is that the types are the same but the compiler doesn't know, I could also use a transmute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I can't use a transmute because the compiler doesn't know they're the same size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ideally like to avoid having any unsafety here.

I think all we need here is fn upgrade_output<Y: Yokeable<'static>>(output: Y::Output) -> Y, which internally calls Yokeable::make(). Does that make sense? Yokeable::make() is only scarily unsafe when the lifetimes involved aren't 'static.

provider/core/src/response.rs Outdated Show resolved Hide resolved
provider/core/src/response.rs Outdated Show resolved Hide resolved
provider/core/src/response.rs Show resolved Hide resolved
provider/core/src/response.rs Outdated Show resolved Hide resolved
@robertbastian
Copy link
Member Author

This PR starts at the ref commit.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the changes not in the diffbase

utils/zerofrom/src/zero_from.rs Show resolved Hide resolved
provider/core/src/response.rs Outdated Show resolved Hide resolved
provider/core/src/response.rs Outdated Show resolved Hide resolved
@jira-pull-request-webhook

This comment was marked as spam.

Self(DataPayloadInner::Yoke(Yoke::new_owned(data)))
}

#[doc(hidden)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public?

@robertbastian robertbastian requested a review from sffc June 2, 2023 08:24
@sffc
Copy link
Member

sffc commented Jun 2, 2023

Discussed with @robertbastian: this PR is not a strict improvement. There are pros and cons:

  1. Pro: Makes BakedDataProvider faster, which shows up in a 5-15% improvement in the end-to-end datetime formatting benches
  2. Pro: Allows for const construction of DataPayload from a baked data provider
  3. Con: Some loss of performance of the postcard path
  4. Con: Some increase of code size, whether or not baked or postcard is used
  5. Note: We can't disable the Yoke variant across the board so long as there is library code using map_project

@Manishearth
Copy link
Member

I think the way to do this is probably Yoke<&'static Y, C> where the cart type is an enum between Box<Y> and the Rc/None variants. Projection turns things into a box variant

@Manishearth
Copy link
Member

I think the way to do this is probably Yoke<&'static Y, C> where the cart type is an enum between Box<Y> and the Rc/None variants. Projection turns things into a box variant if necessary

@robertbastian
Copy link
Member Author

It would have to be Box<(Y, Cart)>, right?

@robertbastian robertbastian added the discuss Discuss at a future ICU4X-SC meeting label Jun 5, 2023
@Manishearth
Copy link
Member

Hmm. Yeah maybe, depends on what we want to achieve.

@sffc
Copy link
Member

sffc commented Jun 5, 2023

If we can get map_project to work and keep the DataPayload in the StaticRef state, then we can actually disable the Yoke variant for non-postcard payloads, which I think was part of the motivation behind the issue. Basically it just means that DataPayload::map_project needs to be more restrictive than Yoke::map_project. You can return any field of the input, but you can't construct a new object and return it: you can return &T but not T.

The call sites of map_project in the datetime crate really take full advantage of the flexibility of map_project and use it to construct custom objects, though.

I have long thought that we need to rethink the data model of datetime formatters, and I made a comment recently that we should just resolve patterns into SmallVec<PatternItem> instead of doing Yoke gymnastics. This could be an impetus to make that change.

@Manishearth
Copy link
Member

Yeah though we also map_project() to type-erase CodePointTries in map property FFI.

But yes, that's what I was saying when I was talking about a map_project that still operates on references

@sffc
Copy link
Member

sffc commented Jun 6, 2023

Another thing I was thinking was that users who really want to map_project to a new type should use a new type like DataPayloadAlwaysYoke, or just Yoke directly. It doesn't solve the problem for datetime since those structs have really big stack size, but it works great for properties where the stack is not very big.

@robertbastian
Copy link
Member Author

I've added the databake AnyProvider simplification that this unlocks.

sffc
sffc previously approved these changes Jun 6, 2023
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm satisfied with the short-term and medium-term work that this change unlocks.

@robertbastian
Copy link
Member Author

I removed the datagen changes to keep this PR focused.

@robertbastian robertbastian requested a review from sffc June 6, 2023 23:41
DataPayload(DataPayloadInner::Yoke(
match self.0 {
DataPayloadInner::Yoke(yoke) => yoke,
DataPayloadInner::StaticRef(r) => Yoke::new_owned(zerofrom::ZeroFrom::zero_from(r)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this isn't expensive anymore, neat

DataPayloadInner::Ref(r) => {
let output: <M2::Yokeable as Yokeable<'static>>::Output =
f(Yokeable::transform(*r), PhantomData);
// Safety: <M2::Yokeable as Yokeable<'static>>::Output is the same type as M2::Yokeable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ideally like to avoid having any unsafety here.

I think all we need here is fn upgrade_output<Y: Yokeable<'static>>(output: Y::Output) -> Y, which internally calls Yokeable::make(). Does that make sense? Yokeable::make() is only scarily unsafe when the lifetimes involved aren't 'static.

@robertbastian robertbastian merged commit a576f41 into unicode-org:main Jun 7, 2023
@robertbastian robertbastian deleted the ref branch June 7, 2023 08:36
@Manishearth Manishearth mentioned this pull request Sep 21, 2023
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Discuss at a future ICU4X-SC meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants