Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_to_end / read_to_string returns Invalid Input ("buffer too small") on Windows MSVC #91722

Closed
agausmann opened this issue Dec 9, 2021 · 13 comments · Fixed by #91754
Closed
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-bug Category: This is a bug. O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@agausmann
Copy link
Contributor

agausmann commented Dec 9, 2021

I tried this code:

use std::io::{stdin, Read};

fn main() {
    let mut input = Vec::new();
    stdin().read_to_end(&mut input).unwrap();

    // Or:
    /*
    let mut input = String::new();
    stdin().read_to_string(&mut input).unwrap();
    */

    println!("{:?}", input);
}

This code can result in the error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: InvalidInput, message: "Windows stdin in console mode does not support a buffer too small to guarantee holding one arbitrary UTF-8 character (4 bytes)" }', main.rs:5:37

This error is likely to happen when input is read in small chunks (e.g. a few characters per line, when line buffering is used), to slowly fill the string/vec close to its capacity. Currently, it is easily reproduceable by pressing return 15 times on the terminal input.

This error is uncontrollable by me; when I call read_to_end, the actual read calls and the buffer sizes used are entirely controlled by code in std. I'd expect it to just do the right thing in this case and provide the correct size buffer, and not raise an error that the end user can't easily recover from.

Meta

rustc --version --verbose:

rustc 1.57.0 (f1edd0429 2021-11-29)
binary: rustc
commit-hash: f1edd0429582dd29cccacaf50fd134b05593bd9c
commit-date: 2021-11-29
host: x86_64-pc-windows-msvc
release: 1.57.0
LLVM version: 13.0.0

Also reproducible on nightly:

rustc 1.59.0-nightly (efec54529 2021-12-04)
binary: rustc
commit-hash: efec545293b9263be9edfb283a7aa66350b3acbf
commit-date: 2021-12-04
host: x86_64-pc-windows-msvc
release: 1.59.0-nightly
LLVM version: 13.0.0
Backtrace

Pretty sure this isn't relevant, as the panic is from an `unwrap()` in my code because of an error value created in `std`. But here you go:

stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/efec545293b9263be9edfb283a7aa66350b3acbf\/library\std\src\panicking.rs:498
   1: core::panicking::panic_fmt
             at /rustc/efec545293b9263be9edfb283a7aa66350b3acbf\/library\core\src\panicking.rs:107
   2: core::result::unwrap_failed
             at /rustc/efec545293b9263be9edfb283a7aa66350b3acbf\/library\core\src\result.rs:1660
   3: core::result::Result<T,E>::unwrap
   4: core::ptr::const_ptr::<impl *const T>::is_null
   5: core::ops::function::FnOnce::call_once

@agausmann agausmann added the C-bug Category: This is a bug. label Dec 9, 2021
@agausmann agausmann changed the title read_to_end / read_to_string returns Invalid Input ("buffer too small") on Windows MSVC target read_to_end / read_to_string returns Invalid Input ("buffer too small") on Windows MSVC Dec 9, 2021
@Patrick-Poitras
Copy link
Contributor

Patrick-Poitras commented Dec 9, 2021

Was able to replicate with the above example. Here is a full backtrace.

Backtrace
   0:     0x7ff66a138ccf - std::backtrace_rs::backtrace::dbghelp::trace
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
   1:     0x7ff66a138ccf - std::backtrace_rs::backtrace::trace_unsynchronized
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2:     0x7ff66a138ccf - std::sys_common::backtrace::_print_fmt
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\sys_common\backtrace.rs:67
   3:     0x7ff66a138ccf - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\sys_common\backtrace.rs:46
   4:     0x7ff66a147f2a - core::fmt::write
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\core\src\fmt\mod.rs:1163
   5:     0x7ff66a1368b8 - std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\io\mod.rs:1696
   6:     0x7ff66a13aff6 - std::sys_common::backtrace::_print
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\sys_common\backtrace.rs:49
   7:     0x7ff66a13aff6 - std::sys_common::backtrace::print
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\sys_common\backtrace.rs:36
   8:     0x7ff66a13aff6 - std::panicking::default_hook::closure$1
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panicking.rs:210
   9:     0x7ff66a13a9f7 - std::panicking::default_hook
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panicking.rs:227
  10:     0x7ff66a13b655 - std::panicking::rust_panic_with_hook
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panicking.rs:624
  11:     0x7ff66a13b23b - std::panicking::begin_panic_handler::closure$0
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panicking.rs:521
  12:     0x7ff66a1395f7 - std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure$0,never$>
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\sys_common\backtrace.rs:139
  13:     0x7ff66a13b199 - std::panicking::begin_panic_handler
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panicking.rs:517
  14:     0x7ff66a14cb40 - core::panicking::panic_fmt
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\core\src\panicking.rs:100
  15:     0x7ff66a14cbe3 - core::result::unwrap_failed
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\core\src\result.rs:1616
  16:     0x7ff66a131242 - <&T as core::fmt::Debug>::fmt::h6b71318ff72d370b
  17:     0x7ff66a131316 - std::sys_common::backtrace::__rust_begin_short_backtrace::hb40e258bfbdd6d09
  18:     0x7ff66a1312ec - std::rt::lang_start::{{closure}}::h302d67aaccd9565c
  19:     0x7ff66a1387bb - core::ops::function::impls::impl$2::call_once
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\library\core\src\ops\function.rs:259
  20:     0x7ff66a1387bb - std::panicking::try::do_call
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panicking.rs:403
  21:     0x7ff66a1387bb - std::panicking::try
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panicking.rs:367
  22:     0x7ff66a1387bb - std::panic::catch_unwind
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panic.rs:133
  23:     0x7ff66a1387bb - std::rt::lang_start_internal::closure$2
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\rt.rs:128
  24:     0x7ff66a1387bb - std::panicking::try::do_call
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panicking.rs:403
  25:     0x7ff66a1387bb - std::panicking::try
  26:     0x7ff66a1387bb - std::panic::catch_unwind
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\panic.rs:133
  27:     0x7ff66a1387bb - std::rt::lang_start_internal
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c\/library\std\src\rt.rs:128
  28:     0x7ff66a1312d7 - main
  29:     0x7ff66a14b804 - invoke_main
                               at d:\A01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
  30:     0x7ff66a14b804 - __scrt_common_main_seh
                               at d:\A01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
  31:     0x7ff894cb7034 - BaseThreadInitThunk
  32:     0x7ff896482651 - RtlUserThreadStart

It seems that the problem is caused by the vector running out of memory. If for example you use:

use std::io::{stdin, Read};

fn main() {
    let mut input = Vec::new();
    input.reserve(100);
    stdin().read_to_end(&mut input).unwrap();

    println!("{:?}", input);
}

Then it takes more inputs to make it crash.

I think the problem is probably somewhere around here: https://doc.rust-lang.org/stable/src/std/io/mod.rs.html#363

@hkratz
Copy link
Contributor

hkratz commented Dec 9, 2021

@rustbot label O-windows T-libs A-io

@rustbot rustbot added A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 9, 2021
@agausmann
Copy link
Contributor Author

agausmann commented Dec 9, 2021

I agree with with @Patrick-Poitras - I suspect that default_read_to_end is trying to read using a buffer / remaining capacity that is less than 4 bytes, which is rejected by the Read impl for the Windows Stdin type.

(And read_to_string also delegates to default_read_to_end internally, so fixing that would also fix read_to_string)

@Patrick-Poitras
Copy link
Contributor

I'm going to try to tinker with the code and see if I can get it to disappear by replacing line 369 with a <4 instead of ==.

Will report findings shortly.

@Patrick-Poitras
Copy link
Contributor

Patrick-Poitras commented Dec 10, 2021

I can confirm that the bug no longer occurs if line 369 is replaced with

    loop {
        if  buf.capacity() - buf.len() < 4 {
            buf.reserve(32); // buf is full, need more space
        }

I don't know if that fix is serviceable, or if this breaks other things. I'll run tests and if that works I'll create a pull request.

@hkratz
Copy link
Contributor

hkratz commented Dec 10, 2021

I think that should be fixed in the Windows-specific code in library/std/src/sys/windows/stdio.rs. That is the right place for Windows-specific console workarounds.

@Patrick-Poitras
Copy link
Contributor

Patrick-Poitras commented Dec 10, 2021

impl io::Read for Stdin {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        (...)
}

It seems to be a fixed-size array, the fix would require rewriting read to accept something that would allow for overflow. Alternatively, we could try catching the error, and then incrementing the buffer. I'll take a look at what that would modify downsteam.

@Patrick-Poitras
Copy link
Contributor

I'm testing adding the fix to the error handling section at line 377. Seems like the cleanest and easiest place to implement the modification and have it only trigger when there's a problem. At worst the performance hit will be that it will try to read from the buffer twice before returning an error.

@hkratz
Copy link
Contributor

hkratz commented Dec 10, 2021

The io::Read implementation for Stdin should conform to the behavior specified in the documentation of the Read trait. Essentially calling read() with a buf of 0 < length < 4 should work and return (at least) Ok(1) with one byte of the UTF-8 sequence if EOF has not been reached.

To accomplish that I would suggest replacing the one u16 of state for an unpaired surrogate in struct Stdin with an [u8;4], in which we can buffer UTF-8 bytes for the case that read is called with a buf of len < 4 and the UTF-16 read from the console converts to more than buf.len UTF-8 bytes.

@hkratz
Copy link
Contributor

hkratz commented Dec 10, 2021

Similar to what #83258 did for Stdout. CC @Count-Count @ChrisDenton

@Patrick-Poitras
Copy link
Contributor

I like that idea! I'll try implementing it, and this should remove the need to throw the error entirely, if I'm not mistaken.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 3, 2022
…dio-windows, r=Mark-Simulacrum

Modifications to `std::io::Stdin` on Windows so that there is no longer a 4-byte buffer minimum in read().

This is an attempted fix of issue rust-lang#91722, where a too-small buffer was passed to the read function of stdio on Windows. This caused an error to be returned when `read_to_end` or `read_to_string` were called. Both delegate to `std::io::default_read_to_end`, which creates a buffer that is of length >0, and forwards it to `std::io::Stdin::read()`. The latter method returns an error if the length of the buffer is less than 4, as there might not be enough space to allocate a UTF-16 character. This creates a problem when the buffer length is in `0 < N < 4`, causing the bug.

The current modification creates an internal buffer, much like the one used for the write functions

I'd also like to acknowledge the help of `@agausmann` and `@hkratz` in detecting and isolating the bug, and for suggestions that made the fix possible.

Couple disclaimers:

- Firstly, I didn't know where to put code to replicate the bug found in the issue. It would probably be wise to add that case to the testing suite, but I'm afraid that I don't know _where_ that test should be added.
- Secondly, the code is fairly fundamental to IO operations, so my fears are that this may cause some undesired side effects ~or performance loss in benchmarks.~ The testing suite runs on my computer, and it does fix the issue noted in rust-lang#91722.
- Thirdly, I left the "surrogate" field in the Stdin struct, but from a cursory glance, it seems to be serving the same purpose for other functions. Perhaps merging the two would be appropriate.

Finally, this is my first pull request to the rust language, and as such some things may be weird/unidiomatic/plain out bad. If there are any obvious improvements I could do to the code, or any other suggestions, I would appreciate them.

Edit: Closes rust-lang#91722
@bors bors closed this as completed in 50a66b7 Jan 4, 2022
@agausmann
Copy link
Contributor Author

Thank you @Patrick-Poitras! I was willing to fix this myself and finally get my first rust-lang contribution, but you swooped in out of nowhere and picked this up. It was a pleasure working with you though! 😄

@Patrick-Poitras
Copy link
Contributor

Patrick-Poitras commented Jan 5, 2022

@agausmann it was a pleasure for me as well!

Usually I go in and try to provide some diagnostics, and in this case, with your help I was able to figure out exactly where I could fix the problem. It was actually the first (and to date only time) that has just worked. Usually I just dump a bunch of clues that I've been able to find. I would probably not have been able to do anything without your and @hkratz's input.

On the subject of first contributions, there's a tag for easy issues, that gave me my first actually merged PRs. That was enough to get me set up with the toolchain, and that's roughly 80% of the work. I'd really recommend grabbing one if one opens up and you want to have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-bug Category: This is a bug. O-windows Operating system: Windows T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
4 participants