Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readlink uses PATH_MAX-sized buffer, but PATH_MAX can't be relied upon #1178

Closed
SolraBizna opened this issue Jan 12, 2020 · 8 comments
Closed

Comments

@SolraBizna
Copy link
Contributor

The current nix implementation of readlink, which is a rather nice improvement over the previous version, is:

fn wrap_readlink_result(v: &mut Vec<u8>, res: ssize_t) -> Result<OsString> {
    match Errno::result(res) {
        Err(err) => Err(err),
        Ok(len) => {
            unsafe { v.set_len(len as usize) }
            Ok(OsString::from_vec(v.to_vec()))
        }
    }
}

pub fn readlink<P: ?Sized + NixPath>(path: &P) -> Result<OsString> {
    let mut v = Vec::with_capacity(libc::PATH_MAX as usize);
    let res = path.with_nix_path(|cstr| {
        unsafe { libc::readlink(cstr.as_ptr(), v.as_mut_ptr() as *mut c_char, v.capacity() as size_t) }
    })?;

    wrap_readlink_result(&mut v, res)
}

The Linux man page for readlink has this to say:

Using a statically sized buffer might not provide enough room for the symbolic link contents. The required size for the buffer can be obtained from the stat.st_size value returned by a call to lstat(2) on the link. ... Dynamically allocating the buffer for readlink() and readlinkat() also addresses a common portability problem when using PATH_MAX for the buffer size, as this constant is not guaranteed to be defined per POSIX if the system does not have such limit.

Per the above, the fixed-size buffer should be replaced by one of the following approaches:

  • Use lstat to read the size of the link, allocate a buffer that large plus one, readlink. If the return value isn't exactly v.capacity()-1, start over. (An approach like this is recommended in the Linux manpage, but supposedly some filesystems' stat implementations report a size of zero bytes for all symlinks, so...)
  • Starting with a reasonable minimum size (PATH_MAX.max(128) or something?) and increasing each time, repeatedly readlink until the returned size is strictly less than the buffer size. (maybe fall back to this if lstat returns zero size?)

Also, we should probably guard against readlink returning a size larger than the buffer. I don't know of any actual UNIXoid that does this, but... this is Rust after all.

I would take a crack at this myself, but I'm still recovering from some nastiness. Just composing this issue took me six hours, drifting in and out of dizzy semi-consciousness. Not the best state to write important code in.

@asomers
Copy link
Member

asomers commented Jan 12, 2020

So POSIX doesn't require PATH_MAX to be defined if the system has no such limit. But if it is defined, does that mean it's guaranteed to work?

@SolraBizna
Copy link
Contributor Author

Well, IIRC PATH_MAX is 1024 on Linux, but just now I was able to create 10,000 nested directories named "test" and everything still seemed to work.

@asomers
Copy link
Member

asomers commented Jan 12, 2020

uggh. What's PATH_MAX for, then? Very well, let's try the stat-based approach. Perhaps fall back to the other approach if stat reports a length of 0?

@Artoria2e5
Copy link

PATH_MAX is pretty much just an arbitrary limit in old-school Unix so some of the APIs can have a non-infinite max size. See https://eklitzke.org/path-max-is-tricky.

@asomers
Copy link
Member

asomers commented May 2, 2020

@SolraBizna do you still want to take a crack at fixing this?

@SolraBizna
Copy link
Contributor Author

I'll give it a shot tomorrow.

bors bot added a commit that referenced this issue May 12, 2020
1231: Add support for reading symlinks longer than `PATH_MAX` to `readlink` and `readlinkat` r=asomers a=SolraBizna

This is in response to issue #1178.

The new logic uses the following approach.

- At any time, if `readlink` returns an error, or a value ≥ 0 and < (not ≤!) the buffer size, we're done.
- Attempt to `readlink` into a `PATH_MAX` sized buffer. (This will almost always succeed, and saves a system call over calling `lstat` first.)
- Try to `lstat` the link. If it succeeds and returns a sane value, allocate the buffer to be that large plus one byte. Otherwise, allocate the buffer to be `PATH_MAX.max(128) << 1` bytes.
- Repeatedly attempt to `readlink`. Any time its result is ≥ (not >!) the buffer size, double the buffer size and try again.

While testing this, I discovered that ext4 doesn't allow creation of a symlink > 4095 (Linux's `PATH_MAX` minus one) bytes long. This is in spite of Linux happily allowing paths in other contexts to be longer than this—including on ext4! This was probably instated to avoid breaking programs that assume `PATH_MAX` will always be enough, but ironically hindered my attempt to test support for *not* assuming. I tested the code using an artificially small `PATH_MAX` and (separately) a wired-to-fail `lstat`. `strace` showed the code behaving precisely as expected. Unfortunately, I can't add an automatic test for this.

Other changes made by this PR:
- `wrap_readlink_result` now calls `shrink_to_fit` on the buffer before returning, potentially reclaiming kilobytes of memory per call. This could be very important if the returned buffer is long-lived.
- `readlink` and `readlink_at` now both call an `inner_readlink` function that contains the bulk of the logic, avoiding copy-pasting of code. (This is much more important now that the logic is more than a few lines long.)

Notably, this PR does *not* add support for systems that don't define `PATH_MAX` at all. As far as I know, I don't have access to any POSIX-ish OS that doesn't have `PATH_MAX`, and I suspect it would have other compatibility issues with `nix` anyway.

Co-authored-by: Solra Bizna <solra@bizna.name>
@SUPERCILEX
Copy link
Contributor

@asomers Pretty sure this should have been closed by #1231

@SolraBizna
Copy link
Contributor Author

Oh, neat, I can close it as completed. Guess I should've done that a while ago. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants