-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate build tag into wheel prioritization #3781
Conversation
CodSpeed Performance ReportMerging #3781 will degrade performances by 6.1%Comparing Summary
Benchmarks breakdown
|
)] | ||
#[archive(check_bytes)] | ||
#[archive_attr(derive(Debug))] | ||
pub struct BuildTag(u32, Option<String>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more compact representation for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Using Option<Arc<str>>
here will use one fewer words of memory (24 bytes instead of 32 on 64-bit targets). You could use Box<str>
, but I would go with Arc
because I noticed this is being cloned in version_map
.
Getting more compact than that I think would look more like how we deal with Version
: optimize some set of common cases into a more compact form and then use a bigger representation for "everything else." But this relies on being able to identify the common cases and the common cases being subject to optimization.
For example, maybe 95% of all build tags have a number less than 256 and no more than 7 bytes of suffix. Those could all be represented by [u8; 8]
. So something like this:
pub struct BuildTag(Arc<BuildTagInner>);
enum BuildTagInner {
Compact([u8; 8]),
General(u32, Option<Arc<str>>),
}
You'd probably want to use a bit somewhere in the [u8; 8]
to indicate whether the suffix is present or not.
But the problem here is that the size of BuildTagInner
is still the size of its biggest variant. So, you wouldn't really be saving any heap memory here. And I'm not sure you get much from the compact representation unless it can make comparisons substantially faster in a place where it matters.
However, the heap indirection here does reduce the size of BuildTag
itself to one word of memory. Maybe that's worth it to make WheelFilename
overall smaller. So, something like this:
pub struct BuildTag(Arc<BuildTagInner>);
struct BuildTagInner(u32, Option<Box<str>>);
But, this means every build tag necessarily requires an allocation, where as the representation you have now only does an alloc when there is a non-empty suffix.
Unless we have some data pointing at this as a problem, I'd just switch Option<String>
to Option<Arc<str>>
and call that good enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. The vast majority of wheels have no build tag, and of those that do, I've only ever seen them be numerical, so good not to allocate there. (But the spec says they can be non-numerical, hence this dance.)
2a1949a
to
e6bed3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -614,7 +614,7 @@ impl CacheBucket { | |||
Self::FlatIndex => "flat-index-v0", | |||
Self::Git => "git-v0", | |||
Self::Interpreter => "interpreter-v1", | |||
Self::Simple => "simple-v7", | |||
Self::Simple => "simple-v8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I was looking for this!
self.abi_tag.join("."), | ||
self.platform_tag.join(".") | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not need the build tag somewhere in the formatted string?
)] | ||
#[archive(check_bytes)] | ||
#[archive_attr(derive(Debug))] | ||
pub struct BuildTag(u32, Option<String>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Using Option<Arc<str>>
here will use one fewer words of memory (24 bytes instead of 32 on 64-bit targets). You could use Box<str>
, but I would go with Arc
because I noticed this is being cloned in version_map
.
Getting more compact than that I think would look more like how we deal with Version
: optimize some set of common cases into a more compact form and then use a bigger representation for "everything else." But this relies on being able to identify the common cases and the common cases being subject to optimization.
For example, maybe 95% of all build tags have a number less than 256 and no more than 7 bytes of suffix. Those could all be represented by [u8; 8]
. So something like this:
pub struct BuildTag(Arc<BuildTagInner>);
enum BuildTagInner {
Compact([u8; 8]),
General(u32, Option<Arc<str>>),
}
You'd probably want to use a bit somewhere in the [u8; 8]
to indicate whether the suffix is present or not.
But the problem here is that the size of BuildTagInner
is still the size of its biggest variant. So, you wouldn't really be saving any heap memory here. And I'm not sure you get much from the compact representation unless it can make comparisons substantially faster in a place where it matters.
However, the heap indirection here does reduce the size of BuildTag
itself to one word of memory. Maybe that's worth it to make WheelFilename
overall smaller. So, something like this:
pub struct BuildTag(Arc<BuildTagInner>);
struct BuildTagInner(u32, Option<Box<str>>);
But, this means every build tag necessarily requires an allocation, where as the representation you have now only does an alloc when there is a non-empty suffix.
Unless we have some data pointing at this as a problem, I'd just switch Option<String>
to Option<Arc<str>>
and call that good enough for now.
e6bed3e
to
1a5ee74
Compare
Summary
It turns out that in the spec, if a wheel filename includes a build tag, then we need to use it to break ties. This PR implements that behavior. (Previously, we dropped the build tag entirely.)
Closes #3779.
Test Plan
Run:
cargo run pip install -i https://pypi.anaconda.org/intel/simple mkl_fft==1.3.8 --python-platform linux --python-version 3.10
. This now resolves without error. Previously, we selected build tag 63 ofmkl_fft==1.3.8
, which led to an incompatibility with NumPy. Now, we select build tag 70.