From 2973b9976c3c9b1726e5755ba999235ae5bfda1e Mon Sep 17 00:00:00 2001
From: Simon Sapin <simon.sapin@exyr.org>
Date: Fri, 10 Apr 2015 16:50:51 +0200
Subject: [PATCH 1/2] =?UTF-8?q?Rename=20or=20replace=20`str::words`=20to?=
 =?UTF-8?q?=20side-step=20the=20ambiguity=20of=20=E2=80=9Ca=20word?=
 =?UTF-8?q?=E2=80=9D.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 text/0000-str-words.md | 67 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 text/0000-str-words.md

diff --git a/text/0000-str-words.md b/text/0000-str-words.md
new file mode 100644
index 00000000000..f91843d1fc7
--- /dev/null
+++ b/text/0000-str-words.md
@@ -0,0 +1,67 @@
+- Feature Name: str-words
+- Start Date: 2015-04-10
+- RFC PR:
+- Rust Issue:
+
+# Summary
+
+Rename or replace `str::words` to side-step the ambiguity of “a word”.
+
+
+# Motivation
+
+The [`str::words`](http://doc.rust-lang.org/std/primitive.str.html#method.words) method
+is currently marked `#[unstable(reason = "the precise algorithm to use is unclear")]`.
+Indeed, the concept of “a word” is not easy to define in precense of punctuation
+or languages with various conventions, including not using spaces at all to separate words.
+
+[Issue #15628](https://github.com/rust-lang/rust/issues/15628) suggests
+changing the algorithm to be based on [the *Word Boundaries* section of
+*Unicode Standard Annex #29: Unicode Text Segmentation*](http://www.unicode.org/reports/tr29/#Word_Boundaries).
+
+While a Rust implemention of UAX#29 would be useful, it belong on crates.io more than in `std`:
+
+* It carries significant complexity that may be surprising from something that looks as simple
+  as a parameter-less “words” method in the standard library.
+  Users may not be aware of how subtle defining “a word” can be.
+* It is not a definitive answer. The standard itself notes:
+
+  > It is not possible to provide a uniform set of rules that resolves all issues across languages
+  > or that handles all ambiguous situations within a given language.
+  > The goal for the specification presented in this annex is to provide a workable default;
+  > tailored implementations can be more sophisticated.
+
+  and gives many examples of such ambiguous situations.
+
+Therefore, `std` would be better off avoiding the question of defining word boundaries entirely.
+
+
+# Detailed design
+
+Rename the `words` method to `split_whitespace`, and keep the current behavior unchanged.
+(That is, return an iterator equivalent to `s.split(char::is_whitespace).filter(|s| !s.is_empty())`.)
+
+Rename the return type `std::str::Words` to `std::str::SplitWhitespace`.
+
+Optionally, keep a `words` wrapper method for a while, both `#[deprecated]` and `#[unstable]`,
+with an error message that suggests `split_whitespace` or the chosen alternative.
+
+
+# Drawbacks
+
+`split_whitespace` is very similar to the existing `str::split<P: Pattern>(&self, P)` method,
+and having a separate method seems like weak API design. (But see below.)
+
+
+# Alternatives
+
+* Replace `str::words` with `struct Whitespace;` with a custom `Pattern` implementation,
+  which can be used in `str::split`.
+  However this requires the `Whitespace` symbol to be imported separately.
+* Remove `str::words` entirely and tell users to use
+  `s.split(char::is_whitespace).filter(|s| !s.is_empty())` instead.
+
+
+# Unresolved questions
+
+Is there a better alternative?

From 885fbdaed0c485f8df44182ceb0bbea2e07e6883 Mon Sep 17 00:00:00 2001
From: Simon Sapin <simon.sapin@exyr.org>
Date: Fri, 10 Apr 2015 17:10:27 +0200
Subject: [PATCH 2/2] Spelling

---
 text/0000-str-words.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0000-str-words.md b/text/0000-str-words.md
index f91843d1fc7..04bc7875220 100644
--- a/text/0000-str-words.md
+++ b/text/0000-str-words.md
@@ -12,14 +12,14 @@ Rename or replace `str::words` to side-step the ambiguity of “a word”.
 
 The [`str::words`](http://doc.rust-lang.org/std/primitive.str.html#method.words) method
 is currently marked `#[unstable(reason = "the precise algorithm to use is unclear")]`.
-Indeed, the concept of “a word” is not easy to define in precense of punctuation
+Indeed, the concept of “a word” is not easy to define in presence of punctuation
 or languages with various conventions, including not using spaces at all to separate words.
 
 [Issue #15628](https://github.com/rust-lang/rust/issues/15628) suggests
 changing the algorithm to be based on [the *Word Boundaries* section of
 *Unicode Standard Annex #29: Unicode Text Segmentation*](http://www.unicode.org/reports/tr29/#Word_Boundaries).
 
-While a Rust implemention of UAX#29 would be useful, it belong on crates.io more than in `std`:
+While a Rust implementation of UAX#29 would be useful, it belong on crates.io more than in `std`:
 
 * It carries significant complexity that may be surprising from something that looks as simple
   as a parameter-less “words” method in the standard library.