Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on string with em-dash #129

Closed
rkday opened this issue Apr 8, 2018 · 5 comments
Closed

Panic on string with em-dash #129

rkday opened this issue Apr 8, 2018 · 5 comments

Comments

@rkday
Copy link

rkday commented Apr 8, 2018

Here's an example program that causes the panic:

extern crate textwrap;

fn main() {
    let problematic_string = "osts: – https://knowledgetester.org/2016/09/20/generalist-specialist-t-and-pi-shaped-testers/ – http://www.dataminingblog.com/the-5-most-common-data-relationships-shown-through-visualization/";
    textwrap::fill(problematic_string, 86);
}

and here's the panic:

thread 'main' panicked at 'byte index 97 is not a char boundary; it is inside '–' (bytes 96..99) of `osts: – https://knowledgetester.org/2016/09/20/generalist-specialist-t-and-pi-shaped-testers/ – http://www.dataminingblog.com/the-5-most-common-data-relationships-shown-through-visualization/`', libcore/str/mod.rs:2257:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:207
   3: std::panicking::default_hook
             at libstd/panicking.rs:223
   4: std::panicking::begin_panic
             at libstd/panicking.rs:402
   5: std::panicking::try::do_call
             at libstd/panicking.rs:349
   6: std::panicking::try::do_call
             at libstd/panicking.rs:325
   7: core::ptr::drop_in_place
             at libcore/panicking.rs:72
   8: core::str::run_utf8_validation
             at libcore/str/mod.rs:0
   9: core::str::traits::<impl core::ops::index::Index<core::ops::range::RangeFrom<usize>> for str>::index
             at /Users/travis/build/rust-lang/rust/src/libcore/str/mod.rs:2010
  10: <core::str::CharIndices<'a> as core::iter::iterator::Iterator>::next
             at /Users/travis/build/rust-lang/rust/src/libcore/option.rs:376
  11: core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::RangeFrom<usize>>::index
             at /Users/travis/build/rust-lang/rust/src/libcore/str/mod.rs:2010
  12: core::str::traits::<impl core::ops::index::Index<core::ops::range::RangeFrom<usize>> for str>::index
             at /Users/travis/build/rust-lang/rust/src/libcore/str/mod.rs:1765
  13: textwrap_fail::main
             at /Users/rkd/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/textwrap-0.9.0/src/lib.rs:634
  14: textwrap_fail::main
             at /Users/rkd/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/textwrap-0.9.0/src/lib.rs:544
  15: alloc::string::<impl core::convert::From<&'a str> for alloc::borrow::Cow<'a, str>>::from
             at /Users/travis/build/rust-lang/rust/src/libcore/iter/mod.rs:1701
  16: textwrap_fail::main
             at /Users/rkd/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/textwrap-0.9.0/src/lib.rs:370
  17: textwrap::WrapIterImpl::impl_next::{{closure}}
             at /Users/rkd/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/textwrap-0.9.0/src/lib.rs:749
  18: textwrap_fail::main
             at src/main.rs:5
  19: std::rt::lang_start::{{closure}}
             at /Users/travis/build/rust-lang/rust/src/libstd/rt.rs:74
  20: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:306
  21: panic_unwind::dwarf::eh::read_encoded_pointer
             at libpanic_unwind/lib.rs:102
  22: std::sys_common::at_exit_imp::cleanup
             at libstd/panicking.rs:285
             at libstd/panic.rs:361
             at libstd/rt.rs:58
  23: std::rt::lang_start
             at /Users/travis/build/rust-lang/rust/src/libstd/rt.rs:74
  24: textwrap_fail::main

This is with textwrap 0.9.

@rkday
Copy link
Author

rkday commented Apr 8, 2018

The character in the error message is the Unicode em-dash (https://www.fileformat.info/info/unicode/char/2013/index.htm).

However, both these programs compile and run fine:

extern crate textwrap;

fn main() {
    let problematic_string = "–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––";
    textwrap::fill(problematic_string, 86);
}
extern crate textwrap;

fn main() {
    let problematic_string = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa –––––––––––––– bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––";
    textwrap::fill(problematic_string, 86);
}

so it's not just a problem with breaking on that character, or that character following a long string...

@rkday
Copy link
Author

rkday commented Apr 8, 2018

Looks like this is fixed in master - using textwrap = { git = "https://github.com/mgeisler/textwrap", rev = "80da77456c23b9f587c09d3657c932b3dbd51e95" } in Cargo.toml means I no longer hit this crash.

@rkday rkday closed this as completed Apr 8, 2018
@mgeisler
Copy link
Owner

mgeisler commented Apr 8, 2018

Hey @rkday, thanks for reporting this! I did fix an off-by-one error in #118 -- back then I thought it only affected the automatic hyphenation, but it might also be the cause of this: line 634 has

                let remaining_text = &self.source[self.split + self.split_len..];

and the patch in #118 touches self.split.

I better make a 0.10.0 release soon then!

@mgeisler
Copy link
Owner

mgeisler commented Apr 8, 2018

Bisecting points to #113 as the PR that fixed this bug and that makes this bug a duplicate of #99.

@mgeisler mgeisler changed the title textwrap panics on a string containing the '–' character Panic on string with em-dash Apr 8, 2018
@mgeisler
Copy link
Owner

mgeisler commented Apr 8, 2018

Thanks for supplying a failing test case -- I was able to shorten it to:

    #[test]
    fn issue_129() {
        // The dash is an em-dash which takes up four bytes. We used
        // to panic since we tried to index into the character.
        assert_eq!(wrap("x – x", 2), vec!["x", "–", "x"]);
    }

mgeisler added a commit that referenced this issue Apr 8, 2018
mgeisler added a commit that referenced this issue Apr 8, 2018
rkday added a commit to rkday/podcast-manager.rs that referenced this issue Apr 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants