Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive regression in compile times #939

Closed
CryZe opened this issue Jun 15, 2024 · 28 comments
Closed

Massive regression in compile times #939

CryZe opened this issue Jun 15, 2024 · 28 comments

Comments

@CryZe
Copy link

CryZe commented Jun 15, 2024

After #923 the dependency tree massively increased. The total amount of crates during compilation went from 8 to 36. In particular the "critical path" to compilation now consists of 17 crates as opposed to 5 crates. This means that the url crate is now the sole reason why basically all projects depending on the url crate (indirectly) should now take a whole 20 seconds or so longer to compile than before.

You can compile an entire GUI framework in the time the URL crate compiles now.

Before:
image

After:
image

@CryZe
Copy link
Author

CryZe commented Jun 25, 2024

I've seen multiple responses that syn is the main offender here... but while that looks like it may be the case, it's just not the full picture. The problem is that the url crate is fundamentally an "early" dependency, as in, almost all of the web backend ecosystem (which is the majority of Rust's usage) builds on top of the url crate, often by building their API bindings on top of reqwest which in turn depends on the url crate. This means that the majority of those crates require the url crate to have been built before any of those crates can start building. By having such a large dependency chain of 17 (and I believe even worse 19 with idna master), all those crates can't start compiling, effectively increasing compile times by a lot. And here's the thing... it has been argued that syn is usually already "sunk cost" by almost always being in the dependency tree already anyway, but that just doesn't matter if url and in turns almost all of the web ecosystem crates have to start compiling much later. That's why I said that real world projects that do already have syn are heavily affected by this as well. I almost feel like the icu family of crates got split up in many small crates to help with compilation times, but I'm almost certain that it had an overall negative effect on real world compile times, by causing an even longer chain, forcing the url crate to not be an early compilation crate anymore, which it definitely needs to be.

I think there's some widespread misunderstanding that the dependency count or something not already being in the dependency tree is what determines compile times, but the scheduling of when crates can start compiling is actually the most important and most overlooked problem.

@hsivonen
Copy link
Collaborator

Thanks for the extra context. Looking at the build of reqwest with default options and without any application code, it is indeed the case that syn gets absorbed into parallelism better than some other parts that show up sequentially in the image here.

I've filed unicode-org/icu4x#5121 and unicode-org/icu4x#5120 . (The latter thing should get LTOed out, so it's not an issue of ICU4X resulting in extra stuff in an optimized binary; you don't pay for what you don't use in binary size, but you do in compile time.)

@hsivonen
Copy link
Collaborator

Looking at the build of reqwest with default options and without any application code, it is indeed the case that syn gets absorbed into parallelism better than some other parts that show up sequentially in the image here.

This was a misreading on my part, and it seems that syn is still a part of the compile time problem.

@hsivonen
Copy link
Collaborator

hsivonen commented Jul 8, 2024

TL;DR: I have implemented a prototype (READMEs and metadata unpolished, repos not in the right org, crates not on crates.io) of a back end adapter for idna 1.0.x where ICU4X is the default and top-level dependents can (once published) opt into the unicode-rs back end by downgrading a crate called idna_adapter from its 1.2.x version stream to its 1.1.x version stream.

I’d appreciate it if the folks for whom compile times or MSRV increase arising from ICU4X was a problem could comment if the approach described below addresses their concerns.


Cargo does not have the Right Thing

Unfortunately, Cargo lacks a mechanism for the top level of the compilation to choose a back end for a crate deep in the dependency tree. There is a pre-RFC by @epage, but it can’t be used today.

Unsuitability of Cargo features

Unsuited, because:

  1. The dependencies of the default case can’t be removed from the dependency graph, which completely defeats the purpose of addressing build times and MSRV if ICU4X is the default (see below).
  2. An intervening crate in the dependency graph can force the decision that really belongs to the top level.
  3. Unused dependencies hanging around in the dependency graph makes it harder to make sure that subsequent additions to the application’s dependency graph don’t accidentally start using the crates that were supposed to be around latently and were not supposed to be contributing to release artifact.

Which back end should be the default

As for which one of ICU4X or unicode-rs should be the default back end, I strongly believe that the default should be the one that has the better attributes for the compilation product rather than the one that has the better attributes for the compilation process.

ICU4X results in a smaller binary (assuming that there isn’t something else in the dependency tree pulling in the unicode-rs crates anyway) and in faster run-time performance when the input contain Unicode/Punycode labels. ICU4X also has less opportunity for incorrect results due to different unicode-rs pieces not getting updated to a new Unicode version in sync (see below).

The run-time performance is equal between the two options when the input contains only ASCII labels consisting only of lower-case letters.

In other ASCII-only cases, unicode-rs is actually a bit (single-digit nanoseconds per domain) faster right now due to unicode-org/icu4x#5187 and the way the different tiers of ASCII optimizations are organized in idna. I think this oddity shouldn’t be decisive for the default: @sffc already suggested how to make this better on the ICU4X side. And even if that doesn’t work out, it would be possible to work around this on the idna side by increasing the idna code complexity if this is deemed truly a deal breaker. (Notably, when idna is used via url, the memory allocation to hold the ToASCII form dominates, so this difference is less notable than when the Cow-returning cases are benchmarked with idna directly.)

Mechanism

In the absence of global features for Cargo and regular Cargo features being unsuitable, I have implemented an approach where top-level dependents who wish to optimize for MSRV, compilation time, or for data overlap with something else in their dependency tree using unicode-normalization but nothing else in their dependency using icu_normalizer could (once published) opt into the unicode-rs back end by doing cargo update -p idna_adapter --precise 1.1.0.

Correctness

Compared to those who wish to avoid ICU4X staying on idna 0.5.0, this approach provides the correctness fixes of idna 1.0: The unicode-rs back end takes a new dependency on unicode-joining-type.

The unicode-rs back end comes with some future correctness risk, though. The two ICU4X crates, icu_normalizer and icu_properties can be expected to get updated together (unless the top level manually does something weird to pin one but not the other) and to form a coherent whole representing a single Unicode version at a time. In the case of the unicode-rs back end, unicode-normalization, unicode-bidi, unicode-joining-type, and idna_mapping (a new crate containing UTS 46 mapping data in the style of idna 0.5.0) are updated separately by different people. If idna_mapping represents a newer Unicode version than any of the other three crates, the results computed by idna will be bogus for characters introduced in the Unicode version delta. Right now, the situation is a special case: idna_mapping and unicode-normalization are on Unicode 15.1, but unicode-bidi and unicode-joining-type are on Unicode 15.0, but I think this works out, because the new characters in Unicode 15.1 are Han characters that have the defaults for the purpose of Bidi_Class and Joining_Type.

This can be addressed by not publishing a new idna_mapping crate until the other three have updated to a new Unicode version.

If unicode-joining-type continues to treat Unicode version updates as semver breaks, updating it requires either updating idna_adapter or publishing an adapter crate between idna_adapter and unicode-joining-type that would turn unicode-joining-type’s semver breaks into non-semver break as observed by idna_adapter. If idna_adapter needed to be updated, projects that have downgraded idna_adapter to opt into unicode-rs would need to update it, which would be inconvenient and, unless, idna_mapping also treated Unicode version updates as semver breaks, incorrect as idna_mapping updates could get picked up before unicode-joining-type updates.

The best solution would be to get unicode-joining-type to treat Unicode version updates as not semver-breaking.

Run-time performance

url crate benches (which don’t benefit properly from idna 1.0 returning a Cow in some cases but which demonstrate the usual way of reaching the idna crate in the Rust ecosystem):

M3 Pro unicode-rs

test hyphen ... bench: 194 ns/iter (+/- 4) = 159 MB/s
test leading_digit ... bench: 189 ns/iter (+/- 2) = 116 MB/s
test long ... bench: 263 ns/iter (+/- 1) = 163 MB/s
test plain ... bench: 135 ns/iter (+/- 1) = 148 MB/s
test punycode_ltr ... bench: 519 ns/iter (+/- 2) = 65 MB/s
test punycode_mixed ... bench: 488 ns/iter (+/- 7) = 57 MB/s
test punycode_rtl ... bench: 722 ns/iter (+/- 8) = 47 MB/s
test short ... bench: 168 ns/iter (+/- 1) = 148 MB/s
test unicode_ltr ... bench: 510 ns/iter (+/- 3) = 66 MB/s
test unicode_mixed ... bench: 555 ns/iter (+/- 19) = 45 MB/s
test unicode_rtl ... bench: 725 ns/iter (+/- 28) = 38 MB/s

M3 Pro ICU4X

test hyphen ... bench: 195 ns/iter (+/- 1) = 158 MB/s
test leading_digit ... bench: 198 ns/iter (+/- 1) = 111 MB/s
test long ... bench: 263 ns/iter (+/- 2) = 163 MB/s
test plain ... bench: 134 ns/iter (+/- 0) = 149 MB/s
test punycode_ltr ... bench: 457 ns/iter (+/- 20) = 74 MB/s
test punycode_mixed ... bench: 331 ns/iter (+/- 14) = 84 MB/s
test punycode_rtl ... bench: 435 ns/iter (+/- 2) = 78 MB/s
test short ... bench: 171 ns/iter (+/- 1) = 146 MB/s
test unicode_ltr ... bench: 452 ns/iter (+/- 9) = 75 MB/s
test unicode_mixed ... bench: 399 ns/iter (+/- 11) = 62 MB/s
test unicode_rtl ... bench: 461 ns/iter (+/- 4) = 60 MB/s

Binary size

urldemo

[profile.release]
opt-level = "z"
lto = true
strip = true

unicode-rs

Apple Silicon Mac
600040 bytes (600024 after strip)

wasm32-wasi
369071 bytes

wasm-opt -Oz -o opt.wasm --enable-bulk-memory target/wasm32-wasi/release/urldemo.wasm
352460 bytes

brotli -9 -o opt.wasm.br opt.wasm
139480 bytes

ICU4X

Apple Silicon Mac
484456 (484440 after strip)

wasm32-wasi
259253

wasm-opt -Oz -o opt.wasm --enable-bulk-memory target/wasm32-wasi/release/urldemo.wasm
218466

brotli -9 -o opt.wasm.br opt.wasm
89499

Compile time

These are the times for compiling reqwest with default macOS dependencies without any application code.

M3 Pro, dev

M3 Pro, release

MSRV

With ICU4X, 1.67. With unicode-rs, 1.57 (for panic in const context).

Testing it

Clone the repos https://github.com/hsivonen/rust-url , https://github.com/hsivonen/idna_adapter , and https://github.com/hsivonen/idna_mapping next to each other.

Switch rust-url to the branch deferadapter. Switch idna_adapter to main for ICU4X and to unicode-rs for unicode-rs. (Leave idna_mapping at main.)

If you also have reqwest next to these, add:

[patch.crates-io]
url = { path = "../rust-url/url/" }
form_urlencoded = { path = "../rust-url/form_urlencoded/" }

(Trying to patch idna directly didn’t work for me.)

@TheBlueMatt
Copy link

Honestly I'd like a way to opt out of IDN handling entirely - in most applications if a domain has unicode in it I can handle it on the frontend and get the punycode version.

@TheBlueMatt
Copy link

In fact, I'd strongly prefer the option to opt my entire application out of IDN handling entirely purely for homograph attack reasons. Not only does it remove code (yay!) but also makes my application (which has no use for IDNs, and in fact in may cases is requesting from URLs where other higher-level specifications mandate the lack of IDN usage) more secure!

@hsivonen
Copy link
Collaborator

hsivonen commented Jul 8, 2024

@TheBlueMatt , noted, but turning off IDN handling entirely is out of scope for the event of folks being unhappy about the implications of using ICU4X.

@TheBlueMatt
Copy link

Yea, that's totally fair, just highlighting it as a thing to consider as there may be nontrivial overlap between the groups.

@sffc
Copy link

sffc commented Jul 8, 2024

I had also thought it might be appropriate to take what we currently have with the ICU4X dependency (including the opportunity for incremental improvements) and make it the default in rust-url 3.0, keeping rust-url 2.0 on the current implementation. Then people need to effectively "opt in" to the worse compile time but better runtime characteristics.

@hsivonen
Copy link
Collaborator

hsivonen commented Jul 9, 2024

@TheBlueMatt

Yea, that's totally fair, just highlighting it as a thing to consider as there may be nontrivial overlap between the groups.

As a technical matter, I observe that your earlier comment said "in most applications if a domain has unicode in it I can handle it on the frontend and get the punycode version." Allowing Punycode through would require something more specific than taking the current idna_adapter design and replacing it with a stub implementation. A mere stub implementation of the current idna_adapter API would have the effect of rejecting Punycode labels, too.

@sffc

Then people need to effectively "opt in" to the worse compile time but better runtime characteristics.

This can become a problem if parts of the ecosystem stay on url 2 as a rejection of compile times. I think it would be worse for the ecosystem to have a split between url 2 and url 3 compared to providing the idna_adapter solution with both url 2 and url 3. The latter would not give an incentive to stay back on url 2 in order to avoid ICU4X compile times.

If we put the idna_adapter solution out there with other things being equal, then the crates.io download stats would become a rough telemetry indicator of how much the ecosystem really cares about avoiding ICU4X compile times. (Yes, there's the issue that some CI systems redownload crates on every run and projects that vendor dependencies don't show up as lots of crates.io downloads.)

@Manishearth
Copy link
Member

The unicode-rs back end takes a new dependency on unicode-joining-type.

Not necessarily, the data could be managed by this crate, which already maintains a copy of some Unicode data. I think the unicode-joining-type situation is more or less orthogonal to this.

TL;DR: I have implemented a prototype (READMEs and metadata unpolished, repos not in the right org, crates not on crates.io) of a back end adapter for idna 1.0.x where ICU4X is the default and top-level dependents can (once published) opt into the unicode-rs back end by downgrading a crate called idna_adapter from its 1.2.x version stream to its 1.1.x version stream.

cc @valenting

One thing missing from this plan is whether or not the Servo team is or ought to commit to a maintenance plan for idna_adapter 1.1.x. I think this might need to be explicitly discussed by the Servo TSC: if our documented way of telling people to obtain the old behavior is "pin the older version", we need to decide under what situations we keep that old version maintained (otherwise in a year or so we'll be in a situation where the solution which we tell people about doesn't work).

I'm not actually sure we need separate crates if this is going to be the default anyway

@sffc
Copy link

sffc commented Jul 9, 2024

Then people need to effectively "opt in" to the worse compile time but better runtime characteristics.
This can become a problem if parts of the ecosystem stay on url 2 as a rejection of compile times. I think it would be worse for the ecosystem to have a split between url 2 and url 3 compared to providing the idna_adapter solution with both url 2 and url 3. The latter would not give an incentive to stay back on url 2 in order to avoid ICU4X compile times.

My thoughts and assumptions related to my suggestion are:

  1. The median user of rust-url will prefer the runtime benefits of the icu4x backend over the compile time impact and therefore it makes sense as the 3.x default, even in its current form.
  2. rust-url 2.x will have a large user base for some time moving forward simply as a result of it being a widely used crate in the ecosytem, regardless of the icu4x dependency issue.
  3. The compile time impact may keep a subset of rust-url users on 2.x for longer than they would otherwise, but this buys us time to scope out the true impact of the situation and act on the numerous options that were proposed in Meta-issue: ICU4X as a low-level dependency unicode-org/icu4x#5124 to reduce the ICU4X compile times.
  4. If in ~6 months we don't land on a solution that reduces rust-url 3.x compile times to the level that are needed by clients, we can re-evaluate our options.

@hsivonen
Copy link
Collaborator

@Manishearth

The unicode-rs back end takes a new dependency on unicode-joining-type.

Not necessarily, the data could be managed by this crate, which already maintains a copy of some Unicode data.

It seems bad to take on even more non-ICU4X maintenance effort in order to proceed with ICU4X use. One way to deal with unicode-joining-type doing semver breaks for Unicode version updates would be to make idna_mapping rather than idna_adapter be the crate that depends on it. (But, really, if we proceed with this idna_adapter approach, it makes sense to just ask if unicode-joining-type could handle semver differently going forward.)

One thing missing from this plan is whether or not the Servo team is or ought to commit to a maintenance plan for idna_adapter 1.1.x.

The expected maintenance is updating idna_mapping to new Unicode data and possibly nudging unicode-normalization, unicode-bidi, and unicode-joining-type to update theirs first.

I note that when I asked for volunteers for whom old MSRV or syn avoidance is important (last paragraph of #937 (comment) ), no one volunteered.

I'm not actually sure we need separate crates if this is going to be the default anyway

What's "this" referring to in that sentence?

@sffc

Then people need to effectively "opt in" to the worse compile time but better runtime characteristics.
This can become a problem if parts of the ecosystem stay on url 2 as a rejection of compile times. I think it would be worse for the ecosystem to have a split between url 2 and url 3 compared to providing the idna_adapter solution with both url 2 and url 3. The latter would not give an incentive to stay back on url 2 in order to avoid ICU4X compile times.

My thoughts and assumptions related to my suggestion are:

  1. The median user of rust-url will prefer the runtime benefits of the icu4x backend over the compile time impact and therefore it makes sense as the 3.x default, even in its current form.

Is "default" in that sentence implying still offering a unicode-rs back end as an option?

@Manishearth
Copy link
Member

Manishearth commented Jul 10, 2024

It seems bad to take on even more non-ICU4X maintenance effort in order to proceed with ICU4X use. One way to deal with unicode-joining-type doing semver breaks for Unicode version updates would be to make idna_mapping rather than idna_adapter be the crate that depends on it. (But, really, if we proceed with this idna_adapter approach, it makes sense to just ask if unicode-joining-type could handle semver differently going forward.)

It's a single unicode property, the crate already maintains unicode tables, this is not that big a deal to include another table generated by the script.

The expected maintenance is updating idna_mapping to new Unicode data and possibly nudging unicode-normalization, unicode-bidi, and unicode-joining-type to update theirs first.

I note that when I asked for volunteers for whom old MSRV or syn avoidance is important (last paragraph of #937 (comment) ), no one volunteered.

That's not a maintenance plan: by maintenance plan I mean that for the benefit of the people who are choosing to stick to older idna/url, the Servo TSC / the url project needs to decide in what scenario those people are supported for bugfixes (which types of bugs, how long into the future, etc). It may decide to do nothing, and then we have to see if that is acceptable to the people in this thread

"There are no volunteers to do this work" is irrelevant: This crate is formally maintained by the Servo TSC (in practice, largely by @valenting with some help from me). When things break, the buck stops there. If people are getting annoyed at the crate for breaking something, it's Valentin, me, and the TSC who have to deal with it. If we pick the route you're proposing, we need to make sure that we're ready to maintain patches of the older stuff to the extent needed by the community, otherwise it's not a solution, it's something that will work for a couple months and then get people angry at us again for putting them in a catch-22 of needing to update their project but being unable to because of other concerns.

I would like us to pick a solution here that does not make the Rust community annoyed at us again for pulling the rug out underneath them. This may mean more maintenance work. If we can't commit to more maintenance work, then perhaps we shouldn't be doing this.

What's "this" referring to in that sentence?

IDNA-via-icu4x. I'm not sure we actually need a new pluggable adapter crate if we're not really supporting pluggability and instead are just switching defaults.

@sffc
Copy link

sffc commented Jul 10, 2024

  1. The median user of rust-url will prefer the runtime benefits of the icu4x backend over the compile time impact and therefore it makes sense as the 3.x default, even in its current form.

Is "default" in that sentence implying still offering a unicode-rs back end as an option?

I was suggesting exclusively unicode-rs for 2.x, icu4x for 3.x.

@hsivonen
Copy link
Collaborator

The expected maintenance is updating idna_mapping to new Unicode data and possibly nudging unicode-normalization, unicode-bidi, and unicode-joining-type to update theirs first.
I note that when I asked for volunteers for whom old MSRV or syn avoidance is important (last paragraph of #937 (comment) ), no one volunteered.

That's not a maintenance plan: by maintenance plan I mean that for the benefit of the people who are choosing to stick to older idna/url, the Servo TSC / the url project needs to decide in what scenario those people are supported for bugfixes (which types of bugs, how long into the future, etc). It may decide to do nothing, and then we have to see if that is acceptable to the people in this thread

The purpose of the idna_adapter suggestion is decoupling the Unicode back end from idna and url bug fixes. If idna 1.0.x gets idna_adapter, it removes reasons to stick to older idna/url due to ICU4X avoidance (since it would become possible to stick to idna_adapter 1.1.x instead).

That is, what I'm suggesting here is about avoiding a situation that would create demand for bug fixes for old versions.

(On a different topic:)

As a technical matter, I observe that your earlier comment said "in most applications if a domain has unicode in it I can handle it on the frontend and get the punycode version." Allowing Punycode through would require something more specific than taking the current idna_adapter design and replacing it with a stub implementation. A mere stub implementation of the current idna_adapter API would have the effect of rejecting Punycode labels, too.

This is probably not accurate and it probably would be possible to accept Punycode labels by stubbing out idna_adapter. Whether that's a good idea is another thing: It might have some legitimacy when some prior layer can actually be trusted to have performed the UTS 46 ToASCII operation, but otherwise, breaking access to the non-ASCII addressing seems bad.

@Manishearth
Copy link
Member

Manishearth commented Jul 11, 2024

The purpose of the idna_adapter suggestion is decoupling the Unicode back end from idna and url bug fixes. If idna 1.0.x gets idna_adapter, it removes reasons to stick to older idna/url due to ICU4X avoidance (since it would become possible to stick to idna_adapter 1.1.x instead).

Okay, but idna_adapter 1.1.x would need to then be maintained, yes?

I'll admit all these version numbers are confusing, I think you have a specific understanding of what each version number implies but it's not clear to me. It would be helpful if you had something listing the status quo and the plan in terms of version numbers for all crates involved (maybe that's already in this thread but I can't find it)

@hsivonen
Copy link
Collaborator

Current situation

  • The latest url is 2.5.2, which depends on idna 0.5.0, which uses unicode-rs.
  • The latest idna is 1.0.2, which depends on ICU4X 1.4.x. That is, the highest-version url does not depend on the highest-version idna at this time.
  • The previous url is 2.5.1, which depends on idna 1.0, which depends on ICU4X.
  • The url before that, 2.5.0, depended on idna 0.5.0.

idna 0.5.0 is less correct than idna 1.0.2, so the difference is not only about Unicode back ends. To the extent correctness means 1.0.2 rejecting inputs that 0.5.0 accepts, arguably the correctness fixes constitute semver breaks on some level of principle.

I'm suggesting

  • Publish idna 1.0.3 that depends on idna_adapter 1.
  • Publish url 2.5.3 that depends on idna 1.0.3.
  • Publish idna_adapter 1.2.0 that depends on ICU4X 1.4.x and declares MSRV as 1.67.
  • Publish idna_adapter 1.1.0 that depends on unicode-rs and idna_mapping and declares MSRV 1.57.
  • When url 3.0.0 comes along for non-ICU4X reasons, make it depend on idna 1.0.3.

This should lead to both url 2.5.3 and url 3.0.0 getting the ICU4X back end by default. Folks who want the unicode-rs back end would be expected to do cargo update -p idna_adapter --precise 1.1.0.

Once ICU4X 2.0 is out, I expect folks who want ICU4X 1.5 to do cargo update -p idna_adapter --precise 1.2.0 (with idna_adapter 1.2.1 depending on ICU4X 2.0 and declaring ICU4X 2.0's MSRV.)

Maintenance

I'm expecting the maintained crates to be the highest-version url, highest-version idna, highest-version idna_mapping, and both idna_adapter 1.1 and idna_adapter 1.2.

For idna_adapter 1.2, I expect the maintenance to be updating the ICU4X version and updating the adapter field types if/when ICU4X tweaks stuff around the types exposed by compiled_data.

For idna_adapter 1.1 and idna_mapping, I expect the maintenance to be rerunning the idna_mapping generator Python script every time a new Unicode version has been published and unicode-normalization, unicode-joining-type, and unicode-bidi have caught up with the new Unicode version. I expect idna_adapter 1.1 itself not to require maintenance except when one of its dependencies does a semver break.

Optional

Also publish idna_adapter 1.0.0 that's stubbed out not to have a Unicode back end and breaks idna to do ASCII only (Punycode not rejected) with dire README warnings that using it is likely a bad idea, but folks who have been asking for the option of turning off IDNA could get it by doing cargo update -p idna_adapter --precise 1.0.0. Maintenance not expected necessary on a level worth worrying about.

@hsivonen
Copy link
Collaborator

Data point about a project downgrading from url 2.5.2 to url 2.5.1 due to 2.5.1 resulting in smaller binary size: awslabs/llrt#487

@hsivonen
Copy link
Collaborator

Per off-issue discussion with maintainers, I plan to pursue my suggestion above, but I have higher-priority items (outside this crate) to deal with first.

@hsivonen
Copy link
Collaborator

Also publish idna_adapter 1.0.0 that's stubbed out not to have a Unicode back end and breaks idna to do ASCII only (Punycode not rejected) with dire README warnings that using it is likely a bad idea, but folks who have been asking for the option of turning off IDNA could get it by doing cargo update -p idna_adapter --precise 1.0.0.

This is indeed feasible.

Code

Tests

@hsivonen
Copy link
Collaborator

The implementation of the above plan is now #965

@hsivonen
Copy link
Collaborator

hsivonen commented Nov 4, 2024

With url 2.5.3 and idna 1.0.3, ICU4X is back, but it's possible to opt into a unicode-rs back end for faster compile times (and larger binary size and slower run-time performance). See https://docs.rs/crate/idna_adapter/latest for instructions.

@hsivonen hsivonen closed this as completed Nov 4, 2024
@tnull
Copy link

tnull commented Nov 4, 2024

With url 2.5.3 and idna 1.0.3, ICU4X is back, but it's possible to opt into a unicode-rs back end for faster compile times (and larger binary size and slower run-time performance). See https://docs.rs/crate/idna_adapter/latest for instructions.

Hm, I have to say that's very unfortunate as it still breaks our MSRV builds on multiple projects that are downstream ldependencies of url. But I guess our only option left is to pin idna_adapter back and hope that by doing so we don't introduce additional incompatibilities or miss critical fixes in the future.

@CryZe
Copy link
Author

CryZe commented Nov 4, 2024

I think the resolution to this issue is still very unsatisfying.

  1. The crate graph itself did not seem to have improved at all.
    1.1 Was there any investigation into whether syn could be removed by expanding the macros before publishing? That alone should heavily improve the situation.
    1.2 Can some of these crates not be merged into a single crate? Why are there so many small crates? That's so bad for compile times.
  2. I still think that with compile times this bad, affecting almost the entire Rust ecosystem (which is heavily web biased), this is absolutely the wrong default, for how little of an edge case it even addresses, at least until the compile times are addressed.

Update: Seems like there's some good discussion at: unicode-org/icu4x#5124

@NobodyXu
Copy link

NobodyXu commented Nov 5, 2024

Can some of these crates not be merged into a single crate? Why are there so many small crates? That's so bad for compile times

Smaller crates reduce compile time because in cargo, multiple crate can be compiled in parallel.

Even if they have a dependency link, once rmetadata is generated they can be compiled in parallel.

@hsivonen
Copy link
Collaborator

hsivonen commented Nov 5, 2024

I think the resolution to this issue is still very unsatisfying.

  1. The crate graph itself did not seem to have improved at all.

The crate graph will improve, but only a little bit and not enough to make syn go away, once ICU4X 2.0 and idna_adapter 1.2.1 are available. If you want a smaller crate graph now, you can pin idna_adapter 1.1.0 (or even 1.0.0 depending on your needs).

  1. I still think that with compile times this bad, affecting almost the entire Rust ecosystem (which is heavily web biased), this is absolutely the wrong default, for how little of an edge case it even addresses, at least until the compile times are addressed.

I note that as the reporter of this issue, you have had the opportunity to have visibility into the plan and the progress on its implementation since July 8, but you didn't speak up before all the work was done and, on November 4, released.

As for the right default:

  1. People whose preference doesn't end up as the default are the ones who show up on bug trackers. Usually there isn't even a signal of the other point of view, but this time there is: URL 2.5.x 4+MB in size after optimised build. #952.
  2. If the default is larger binary size and worse run-time performance in a relatively subtle way, people who would, if they had all the information, want smaller binary size and faster run-time performance may not realize that it's an option they could choose. Note how rustc itself defaults to debug builds, which results, by default, in binaries that are slow and large in a way that isn't even subtle, and still year after year, people show up to complain about Rust being bad instead of figuring out on their own to do a release build. On the other hand, if you are in a situation where you are looking at end-to-end CI build times (CI is more likely to build from scratch and on a single core whereas developers running repeated builds on their own machines are more likely to have cached artifacts of their dependencies and have cargo use multiple cores), there's a better chance that you'll end up reading the README of any of url, idna, or idna_adapter.

@sffc
Copy link

sffc commented Nov 7, 2024

The crate graph will improve, but only a little bit and not enough to make syn go away, once ICU4X 2.0 and idna_adapter 1.2.1 are available

There is a concept of a plan to remove the syn and proc-macro2 deps from this corner of ICU4X, but it needs more design work. This statement is accurate for what we have landed so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants