Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate build tag into wheel prioritization #3781

Merged
merged 1 commit into from
May 23, 2024

Conversation

charliermarsh
Copy link
Member

Summary

It turns out that in the spec, if a wheel filename includes a build tag, then we need to use it to break ties. This PR implements that behavior. (Previously, we dropped the build tag entirely.)

Closes #3779.

Test Plan

Run: cargo run pip install -i https://pypi.anaconda.org/intel/simple mkl_fft==1.3.8 --python-platform linux --python-version 3.10. This now resolves without error. Previously, we selected build tag 63 of mkl_fft==1.3.8, which led to an incompatibility with NumPy. Now, we select build tag 70.

@charliermarsh charliermarsh added bug Something isn't working compatibility Compatibility with a specification or another tool labels May 23, 2024
Copy link

codspeed-hq bot commented May 23, 2024

CodSpeed Performance Report

Merging #3781 will degrade performances by 6.1%

Comparing charlie/build-tag (1a5ee74) with main (5bebadd)

Summary

⚡ 2 improvements
❌ 1 regressions
✅ 10 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main charlie/build-tag Change
wheelname_parsing_failure[flyte-long-extension] 1.8 µs 1.9 µs -6.1%
wheelname_tag_compatibility[flyte-long-incompatible] 1.5 µs 1.5 µs +6%
wheelname_tag_compatibility[flyte-short-incompatible] 926.7 ns 868.3 ns +6.72%

)]
#[archive(check_bytes)]
#[archive_attr(derive(Debug))]
pub struct BuildTag(u32, Option<String>);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more compact representation for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Using Option<Arc<str>> here will use one fewer words of memory (24 bytes instead of 32 on 64-bit targets). You could use Box<str>, but I would go with Arc because I noticed this is being cloned in version_map.

Getting more compact than that I think would look more like how we deal with Version: optimize some set of common cases into a more compact form and then use a bigger representation for "everything else." But this relies on being able to identify the common cases and the common cases being subject to optimization.

For example, maybe 95% of all build tags have a number less than 256 and no more than 7 bytes of suffix. Those could all be represented by [u8; 8]. So something like this:

pub struct BuildTag(Arc<BuildTagInner>);

enum BuildTagInner {
    Compact([u8; 8]),
    General(u32, Option<Arc<str>>),
}

You'd probably want to use a bit somewhere in the [u8; 8] to indicate whether the suffix is present or not.

But the problem here is that the size of BuildTagInner is still the size of its biggest variant. So, you wouldn't really be saving any heap memory here. And I'm not sure you get much from the compact representation unless it can make comparisons substantially faster in a place where it matters.

However, the heap indirection here does reduce the size of BuildTag itself to one word of memory. Maybe that's worth it to make WheelFilename overall smaller. So, something like this:

pub struct BuildTag(Arc<BuildTagInner>);

struct BuildTagInner(u32, Option<Box<str>>);

But, this means every build tag necessarily requires an allocation, where as the representation you have now only does an alloc when there is a non-empty suffix.

Unless we have some data pointing at this as a problem, I'd just switch Option<String> to Option<Arc<str>> and call that good enough for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. The vast majority of wheels have no build tag, and of those that do, I've only ever seen them be numerical, so good not to allocate there. (But the spec says they can be non-numerical, hence this dance.)

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -614,7 +614,7 @@ impl CacheBucket {
Self::FlatIndex => "flat-index-v0",
Self::Git => "git-v0",
Self::Interpreter => "interpreter-v1",
Self::Simple => "simple-v7",
Self::Simple => "simple-v8",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I was looking for this!

self.abi_tag.join("."),
self.platform_tag.join(".")
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not need the build tag somewhere in the formatted string?

)]
#[archive(check_bytes)]
#[archive_attr(derive(Debug))]
pub struct BuildTag(u32, Option<String>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Using Option<Arc<str>> here will use one fewer words of memory (24 bytes instead of 32 on 64-bit targets). You could use Box<str>, but I would go with Arc because I noticed this is being cloned in version_map.

Getting more compact than that I think would look more like how we deal with Version: optimize some set of common cases into a more compact form and then use a bigger representation for "everything else." But this relies on being able to identify the common cases and the common cases being subject to optimization.

For example, maybe 95% of all build tags have a number less than 256 and no more than 7 bytes of suffix. Those could all be represented by [u8; 8]. So something like this:

pub struct BuildTag(Arc<BuildTagInner>);

enum BuildTagInner {
    Compact([u8; 8]),
    General(u32, Option<Arc<str>>),
}

You'd probably want to use a bit somewhere in the [u8; 8] to indicate whether the suffix is present or not.

But the problem here is that the size of BuildTagInner is still the size of its biggest variant. So, you wouldn't really be saving any heap memory here. And I'm not sure you get much from the compact representation unless it can make comparisons substantially faster in a place where it matters.

However, the heap indirection here does reduce the size of BuildTag itself to one word of memory. Maybe that's worth it to make WheelFilename overall smaller. So, something like this:

pub struct BuildTag(Arc<BuildTagInner>);

struct BuildTagInner(u32, Option<Box<str>>);

But, this means every build tag necessarily requires an allocation, where as the representation you have now only does an alloc when there is a non-empty suffix.

Unless we have some data pointing at this as a problem, I'd just switch Option<String> to Option<Arc<str>> and call that good enough for now.

@charliermarsh charliermarsh enabled auto-merge (squash) May 23, 2024 21:07
@charliermarsh charliermarsh merged commit a9d9a6c into main May 23, 2024
44 of 45 checks passed
@charliermarsh charliermarsh deleted the charlie/build-tag branch May 23, 2024 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compatibility Compatibility with a specification or another tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wheel selection needs to break ties based on build tag
2 participants