Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate Unicode functions that will be moved to crates.io #24428

Merged
merged 1 commit into from
Apr 18, 2015

Conversation

kwantam
Copy link
Contributor

@kwantam kwantam commented Apr 14, 2015

This patch

  1. renames libunicode to librustc_unicode,
  2. deprecates several pieces of libunicode (see below), and
  3. removes references to deprecated functions from
    librustc_driver and libsyntax. This may change pretty-printed
    output from these modules in cases involving wide or combining
    characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

  1. char.width() and str.width():
    --> use unicode-width crate
  2. str.graphemes() and str.grapheme_indices():
    --> use unicode-segmentation crate
  3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
    char.compose(), char.decompose_canonical(), char.decompose_compatible(),
    char.canonical_combining_class():
    --> use unicode-normalization crate

@rust-highfive
Copy link
Collaborator

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@kwantam
Copy link
Contributor Author

kwantam commented Apr 14, 2015

r? @alexcrichton

@rust-highfive rust-highfive assigned alexcrichton and unassigned huonw Apr 14, 2015
@kwantam
Copy link
Contributor Author

kwantam commented Apr 14, 2015

Crates are up on crates.io:

(cc @aturon carrying over from previous discussion in now-closed #24402)

@@ -161,6 +161,9 @@ enum DecompositionType {
/// External iterator for a string decomposition's characters.
///
/// For use with the `std::iter` module.
#[allow(deprecated)]
#[deprecated(reason = "use the crates.io `unicode-decomp` library instead",
since = "1.0.0-nightly-20150415")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These since tags should all be "1.0.0" for now (e.g. they're applicable for the 1.0.0 release)

@kwantam kwantam force-pushed the deprecate_unicode_fns branch 2 times, most recently from 1bf8bd0 to 7aec8a0 Compare April 14, 2015 23:24
@liigo
Copy link
Contributor

liigo commented Apr 15, 2015 via email

@aturon
Copy link
Member

aturon commented Apr 15, 2015

I'm also interested to understand the rationale for breaking these into fine-grained crates, rather than building out libunicode's functionality. Could you elaborate? cc @SimonSapin

@kwantam
Copy link
Contributor Author

kwantam commented Apr 15, 2015

The two things that convinced me each function should be in its own crate are:

  1. Some of these functions can get by with no_std, which makes them more generally useful (though you might argue that this is less useful nowadays).
  2. Each piece of functionality requires its own relatively large set of Unicode tables. Separating them means that these can be brought into a project piecemeal rather than all at once. (TBQH I'm not sure the extent to which rustc does link time optimization, but in the best case LTO would render this concern moot, at least for optimized builds.)

I suppose one other point in favor is that there are already a few small crates out there that provide their own small bits of Unicode functionality, so an omnibus crate would end up either duplicating their functionality or requiring people to pull in multiple crates anyway.

@kwantam
Copy link
Contributor Author

kwantam commented Apr 15, 2015

@liigo the idea of moving these out has been discussed in #24402, #24340, #15628, and rust-lang/rfcs#1054 .

In short, the notion is that there is no particular reason to have these in the standard library, and on the other hand, including them is a burden both to libstd and to libunicode, because of the stability guarantees that libstd wants to provide to users.

@liigo
Copy link
Contributor

liigo commented Apr 15, 2015

@kwantam

the idea of moving these out has been discussed in #24402, #24340, #15628, and rust-lang/rfcs#1054

I don't think grapheme/width management should move out of libunicode, since there're strong related to Unicode. How about moves out the whole libunicode? cc @alexcrichton

@SimonSapin
Copy link
Contributor

I think maybe you all are talking about different things:

One is: should these three new unicode-* crates on crates.io be a single unicode crate on crates.io? I personally tend to prefer multiple smaller single-purpose crates over fewer larger crates containing multiple loosely-related features, but this is debatable.

The other is: should this functionality be in a crate distributed with rustc? (As opposed to crates.io.) The unicode crate perhaps should be renamed rustc_unicode, if only to clarify what we’re talking about. It serves two purpose (with some overlap): provide functionality used in the compile itself, and provide functionality to be re-exported in std.

As @kwantam said, removing them from std (in favor of crates.io) has been discussed before.

But there is some compiler usage where this PR simply adds #[allow(deprecated)]. I think there should be a better path forward for the compiler, IMO “deprecated” means we intend to remove this at some point. Perhaps the required functionality could stay non-depracated in rustc_unicode (where using it directly would require #![feature(rustc_private)] and only the std/collections wrappers would be deprecated.

@huonw
Copy link
Member

huonw commented Apr 15, 2015

One is: should these three new unicode-* crates on crates.io be a single unicode crate on crates.io? I personally tend to prefer multiple smaller single-purpose crates over fewer larger crates containing multiple loosely-related features, but this is debatable.

(We could theoretically offer a unicode crate that just reexports functionality from a variety of unicode-* crates, to have our cake and eat it too.)

@kwantam
Copy link
Contributor Author

kwantam commented Apr 15, 2015

@SimonSapin I agree regarding renaming libunicode. I was going to suggest libcore_unicode: the non-deprecated functionality really is being used in libcore, libcollections, and libstd, so calling it librustc_unicode might be suggestive of a narrower set of uses than is actually the case.

Regarding the use of width() in librustc_driver and libsyntax: there are a couple spots where this is done for pretty printing. I argued in #24402 that these can just be removed, because they will almost never impact correctness of the pretty printed output. We can also just leave the width tables and functionality behind a feature gate if there's objection to removing them.

If there's general consensus on a rename, I can rename libunicode to whatever name we decide (libcore_unicode and librustc_unicode have been suggested so far), and then leave behind a dummy libunicode that re-exports everything from lib{core,rustc}_unicode with a #deprecated tag to push people toward fixing their code.

@SimonSapin
Copy link
Contributor

I don’t have an opinion on the new name, rustc_unicode was just a straw man proposal.

I argued in #24402 that these can just be removed, because they will almost never impact correctness of the pretty printed output.

If that’s an option, I’d rather have it in this PR than #[allow(deprecated)] outside of tests.

@kwantam
Copy link
Contributor Author

kwantam commented Apr 15, 2015

If that’s an option, I’d rather have it in this PR than #[allow(deprecated)] outside of tests.

👍 if everyone else is onboard.

@alexcrichton
Copy link
Member

I'm ok not handling graphemes and friends in width calculations for the compiler, hardwiring to 1 seems like it's definitely fine for now.

I don't have a super strong opinion on one crate vs many crates, but I might err on the side of small crates for now as it pushes back on the idea of a "dumping ground" for unicode-related functionality and as @huonw suggested we can always have our own facade crate if necessary.

I would also be fine renaming libunicode in-tree, and it's also probably fine to not have much of a deprecation strategy as it looks like very few crates are still using it and it's unstable to start out with. I would recommend rustc_unicode to align with rustc_bitflags.

@kwantam
Copy link
Contributor Author

kwantam commented Apr 15, 2015

@alexcrichton would you prefer I leave the feature name the same (unicode) or update it to match the name of the crate?

@alexcrichton
Copy link
Member

I think leaving the same feature name is fine, e.g. "unicode support in general"

@kwantam
Copy link
Contributor Author

kwantam commented Apr 15, 2015

👍 PR updated as discussed.

@aturon
Copy link
Member

aturon commented Apr 15, 2015

I'm on board with the proposed changes.

@kwantam kwantam force-pushed the deprecate_unicode_fns branch 2 times, most recently from d1341f9 to 503533c Compare April 15, 2015 19:42
@bors
Copy link
Contributor

bors commented Apr 17, 2015

⛄ The build was interrupted to prioritize another pull request.

@bors
Copy link
Contributor

bors commented Apr 17, 2015

⌛ Testing commit 29d1252 with merge 7616f19...

@bors
Copy link
Contributor

bors commented Apr 17, 2015

⛄ The build was interrupted to prioritize another pull request.

@bors
Copy link
Contributor

bors commented Apr 17, 2015

⌛ Testing commit 29d1252 with merge b7661c9...

@bors
Copy link
Contributor

bors commented Apr 17, 2015

⛄ The build was interrupted to prioritize another pull request.

@bors
Copy link
Contributor

bors commented Apr 17, 2015

⌛ Testing commit 29d1252 with merge c9069fe...

@bors
Copy link
Contributor

bors commented Apr 17, 2015

💔 Test failed - auto-mac-64-nopt-t

@kwantam
Copy link
Contributor Author

kwantam commented Apr 17, 2015

Not sure how to proceed here. Don't have a mac on which to try and reproduce, and I find it slightly hard to believe that a segfault building libserialize has anything to do with this PR.

@alexcrichton
Copy link
Member

@bors: retry

Ah I believe this was spurious

@kwantam
Copy link
Contributor Author

kwantam commented Apr 17, 2015

Thanks!

@bors
Copy link
Contributor

bors commented Apr 18, 2015

⌛ Testing commit 29d1252 with merge 7a5754b...

bors added a commit that referenced this pull request Apr 18, 2015
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
   librustc_driver and libsyntax. This may change pretty-printed
   output from these modules in cases involving wide or combining
   characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

1. char.width() and str.width():
   --> use unicode-width crate

2. str.graphemes() and str.grapheme_indices():
   --> use unicode-segmentation crate

3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
   char.compose(), char.decompose_canonical(), char.decompose_compatible(),
   char.canonical_combining_class():
   --> use unicode-normalization crate
@bors bors merged commit 29d1252 into rust-lang:master Apr 18, 2015
@kwantam kwantam deleted the deprecate_unicode_fns branch April 18, 2015 17:19
@ftxqxd
Copy link
Contributor

ftxqxd commented Apr 19, 2015

I suppose it’s a bit late now, but shouldn’t deprecating/removing an entire crate + a bunch of commonly-used methods require an RFC, even if it’s marked as unstable? I know there’s been a lot of discussion about this elsewhere, but I thought that any decent-sized change should still go through the full RFC process instead of informally reaching a decision through discussion all about GitHub.

@kwantam
Copy link
Contributor Author

kwantam commented Apr 19, 2015

In fairness, at this point nothing's been removed from libunicode, only deprecated (well, and renamed). We can still put things through the RFC process if that's deemed necessary.

It's possible that we could leave the width-related functions behind a rustc_private feature gate to keep #8706 closed. It's not clear to me how critical that bug is.

@alexcrichton
Copy link
Member

@P1start unfortunately requiring an RFC for nearly any modification to libstd is probably infeasible, especially when it comes to unstable APIs. We've long thought that these APIs would move out of the standard library at some point as they've stuck out as not quite belonging for some time now, but we may not have communicated that clearly enough.

Right now we don't have a great story for external libraries in the rust-lang organization (and elsewhere) in terms of evolving their API, possibly coming back into libstd, etc. I would expect an RFC if these libraries are to be re-included, but for now I wouldn't expect an RFC to move unstable features out into external crates.

@huonw
Copy link
Member

huonw commented Apr 19, 2015

Hm, I'm unclear why the compiler's use of these functions was removed: isn't the point of having the rustc_unicode crate to provide things that the compiler needs. Could you clarify? (i.e. yes, I think we should use the rustc_private feature gate.)

@alexcrichton
Copy link
Member

@huonw I would personally be somewhat uncomfortable having to maintain these tables in the standard library for private usage by the compiler when they don't really come up in practice that often, so to slate them for deletion the compiler was hardwired to char == 1-wide slot on the screen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants