Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back io::copy file to pipe optimization #128300

Open
NobodyXu opened this issue Jul 28, 2024 · 13 comments
Open

Add back io::copy file to pipe optimization #128300

NobodyXu opened this issue Jul 28, 2024 · 13 comments
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@NobodyXu
Copy link
Contributor

NobodyXu commented Jul 28, 2024

So to sum up, I think we can add back optimization via:

  • if the file is on squashfs/erofs, then it's immutable (detectable via statfs
  • if the is sealed with F_SEAL_WRITE (prevent modifying of existing contents, shrinking and appending is allowed) and F_SEAL_SEAL (not changing of seals), detectable via fcntl(fd, F_GET_SEALS)
  • if it's on read-only btrfs subvolume (e.g. snapshot), likewise we can do the same for zfs, nilfs, bcachefs
  • If the file is /dev/zero, or from other pseudo fs (e.g. procfs, sysfe), then I think we can assume it's immutable

Otherwise, we'd fallback to:

  • read + write if the file is really small
  • splice to an intermediate pipe with F_MOVE and then splice to the pipe without F_MOVE if the file is more than 1-2 pages
  • if the fs supports reflink and file is really large, then we could create a O_TMPFILE on the system, copy_file_range to it and then splice it

Also, if the file is a named fifo, then just a splice without F_MOVE would work fine.

Originally posted by @NobodyXu in #108283 (comment)

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 28, 2024
@the8472
Copy link
Member

the8472 commented Jul 28, 2024

splice to an intermediate pipe with F_MOVE and then splice to the pipe without F_MOVE if

We never used F_MOVE and it was still a problem. So I'm skeptical it'll work at all but maybe it depends on the direction (from/to pipe).

then it's immutable (detectable via statfs
detectable via fcntl(fd, F_GET_SEALS)

Those cost extra syscalls, which could be unnecessary overhead for small files. So it probably only makes sense to even try these things for large files.

And many of those cases seem rare that it's not worth it. Like who serves e.g. a webserver from a btrfs snapshot?

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 28, 2024

Yeah I think we'd only want tohandle sealed file here, I was probably listing too many cases during my brain storming 😂

However, I don't think sealed file is used very often.

We never used F_MOVE and it was still a problem. So I'm skeptical it'll work at all but maybe it depends on the direction (from/to pipe).

Hmm it seems that SPLICE_F_MOVE is a no-op according to man page, so I think using an intermediate pipe to workaround this will not work.

So the only possible workarond for a really large file and fs supporting reflink would be to use copy_file_range to a O_TMPFILE and then splice.

So it probably only makes sense to even try these things for large files.

Agree, if it is small enough, read + write is OK.

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 28, 2024

I think we might want someway for the user to tell std::io::copy that they know that the file won't change before they sent out the data, the application itself could make some reasonable assumption about this.

Checking for sealed file, checking for read-only fs, etc is all about niche cases, I think we need application to convey some information about this?

@the8472
Copy link
Member

the8472 commented Jul 28, 2024

Currently, file seals can be applied only to a file descriptor returned by memfd_create(2)

Also doesn't seem like a common thing.

I think we need application to convey some information about this?

Yeah. The most common thing where sendfile makes sense is a webserver. And the files might not even be marked read-only there.

@NobodyXu
Copy link
Contributor Author

Currently, file seals can be applied only to a file descriptor returned by memfd_create(2)

Also doesn't seem like a common thing.

Thanks I missed that one, if so then file seals seems very uncommon

@NobodyXu
Copy link
Contributor Author

And the files might not even be marked read-only there.

Agree, the more I look at it, the more I think we need another API like rust-lang/libs-team#202 propose, which will always assume using splice is ok:

pub struct ImmutableFile(File);

That's the simplest way I can think of, no need for another copy function as we just want to promise that the file is indeed immutable.

@the8472
Copy link
Member

the8472 commented Jul 28, 2024

I think we don't want to promise anything that relies on specialization until there's a plan to stabilize specialization or at least parts thereof.

@NobodyXu
Copy link
Contributor Author

I think we don't want to promise anything that relies on specialization until there's a plan to stabilize specialization or at least parts thereof.

That makes sense, in that case maybe a dedicated API on File?

impl File {
    /// This is similar to [`std::io::copy`], except it will attempt to use `splice` on Linux or `sendfile` on Unix
    /// to avoid copying data if possible.
    ///
    /// NOTE that if you change the content of the file after this call, it might also change the data in `writer`.
    pub fn copy(&self, writer: &mut W) -> io::Result<u64>
    where
        W: Write + ?Sized;
}

@the8472
Copy link
Member

the8472 commented Jul 28, 2024

That would still require specialization for the writer side. The only way to do this with stable rust is to require both sides to implement all the necessary traits which expose the internal buffers and so on. What CopyRead and CopyWrite currently do. And that's some gnarly machinery that might not be the right fit for std, considering that only useful on some platforms.
Nobody has even generalized the current optimizations to other platforms.

@NobodyXu
Copy link
Contributor Author

That would still require specialization for the writer side.

Yeah but isn't that same as std::io::copy where we specialise internally and not exposing it?

@the8472
Copy link
Member

the8472 commented Jul 28, 2024

We don't guarantee that that happens though, so we can remove it if specialization becomes an issue.

@NobodyXu
Copy link
Contributor Author

NobodyXu commented Jul 28, 2024

We don't guarantee that that happens though, so we can remove it if specialization becomes an issue.

So...maybe we just add it, saying that we assume the File is immutable/append-only for the sake of possible optimization in File::copy, and leave out all the details?

If specialisation is removed, it'd be just a convenient method of std::io::copy

@the8472
Copy link
Member

the8472 commented Jul 28, 2024

Except that we'd have to make that distinction very clear to avoid people getting corrupt data. And adding warnings and choosing a different name for something we can't promise probably isn't worth it.
Now that I have written that I should probably close that ACP.... wait no, that one doesn't strictly require specialization...

@saethlin saethlin added C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants