Keep reading when there's space in the buffer for tokio::io::copy #3694

Darksonn · 2021-04-12T12:24:16Z

This is the current poll method for tokio::io::copy and tokio::io::copy_bidirectional.

Lines 28 to 76 in adad8fc

    
           pub(super) fn poll_copy<R, W>( 
        
               &mut self, 
        
               cx: &mut Context<'_>, 
        
               mut reader: Pin<&mut R>, 
        
               mut writer: Pin<&mut W>, 
        
           ) -> Poll<io::Result<u64>> 
        
           where 
        
               R: AsyncRead + ?Sized, 
        
               W: AsyncWrite + ?Sized, 
        
           { 
        
               loop { 
        
                   // If our buffer is empty, then we need to read some data to 
        
                   // continue. 
        
                   if self.pos == self.cap && !self.read_done { 
        
                       let me = &mut *self; 
        
                       let mut buf = ReadBuf::new(&mut me.buf); 
        
                       ready!(reader.as_mut().poll_read(cx, &mut buf))?; 
        
                       let n = buf.filled().len(); 
        
                       if n == 0 { 
        
                           self.read_done = true; 
        
                       } else { 
        
                           self.pos = 0; 
        
                           self.cap = n; 
        
                       } 
        
                   } 
        
                   // If our buffer has some data, let's write it out! 
        
                   while self.pos < self.cap { 
        
                       let me = &mut *self; 
        
                       let i = ready!(writer.as_mut().poll_write(cx, &me.buf[me.pos..me.cap]))?; 
        
                       if i == 0 { 
        
                           return Poll::Ready(Err(io::Error::new( 
        
                               io::ErrorKind::WriteZero, 
        
                               "write zero byte into writer", 
        
                           ))); 
        
                       } else { 
        
                           self.pos += i; 
        
                           self.amt += i as u64; 
        
                       } 
        
                   } 
        
                   // If we've written all the data and we've seen EOF, flush out the 
        
                   // data and finish the transfer. 
        
                   if self.pos == self.cap && self.read_done { 
        
                       ready!(writer.as_mut().poll_flush(cx))?; 
        
                       return Poll::Ready(Ok(self.amt)); 
        
                   } 
        
               } 
        
           }

However, as long as self.cap < 2048, it would be possible to attempt to read more data into the buffer. We should do this. One might also consider copying data from the end of the buffer to the start to make space for further reading.

See also #3572.

The text was updated successfully, but these errors were encountered:

Rustin170506 · 2021-04-20T05:27:29Z

I would like to try this.

Darksonn · 2021-04-20T07:04:36Z

Go ahead!

Rustin170506 · 2021-04-22T07:21:18Z

@cssivision ~~Can't you see I'm trying to fix this?~~

~~Shouldn't you ask before you do it? Don't waste my time right?~~

It doesn't matter. Whatever.

Darksonn · 2021-04-22T10:03:18Z

Please be more careful when you take on PRs that someone have already claimed. If you are concerned that it is stalled, you should ask whether they are still working on it.

Rustin170506 · 2021-04-22T14:31:35Z

Yes, you are right.

@cssivision You don't have to close your PR, I'm just saying you should have asked me in advance instead of just submitting a PR, which would have caused everyone's time to be wasted. Please feel free to reopen your PR.(If you don't want to, that's fine) Thanks!

rneswold · 2021-04-23T15:23:10Z

Is this a real problem that needs to be handled? I find it hard to believe that, with modern drives and network speeds, the 2k buffer isn't filled all the time. I started a conversation that resulted in #3722 because I found reading and writing 2k at a time severely limited throughput.

I've found that reducing the calls into the kernel can increase performance due to context switches being a fairly expensive operation. Checking to see if a socket is readable or writable probably results in a kernel call (via ioctl) and then being added to the underlying poll is a call, albeit for all handles the scheduler blocks on. Making it do this multiple times for a 2k buffer seems like the wrong direction.

Unless you're targeting tiny machines (I don't know if Rust compiles code for embedded micros) 2k is a tiny amount of RAM (barely bigger than an MTU and much smaller than a GigE jumbo packet!)

I'm asking that, if this is a feature that is to be added, please benchmark it to see how it performs in reading or writing to a gigabit Ethernet link.

rneswold · 2021-04-26T15:50:35Z

we should not poll that resource again during this poll

That's good to know. However, I believe my other concerns are still valid. io::copy is a valuable building block but, if it doesn't transfer data quickly, everyone will be writing their own performant version.

Darksonn · 2021-04-26T16:03:35Z

Yeah, we should probably change the default buffer size too. The one in std uses 8k, so I guess that would be reasonable?

rneswold · 2021-04-26T18:19:16Z

8k should definitely be the minimum since Jumbo frames may be more common as GigE becomes the "slowest" network interface. It'll cut the number of kernel calls to a quarter of they are now, at least.

Another thing to consider is TCP's NO_DELAY option. Some network traffic is chit-chatty (e.g. ssh and telnet.) By default, TCP tries to buffer up more data to fill the MTU. But for login sessions or protocols with short request/reply messages, you don't want to wait for a full buffer to send because more data may be held off until a reply is received. So some applications turn on the NO_DELAY option for the socket so that small packets are sent immediately.

Trying to fill the buffer defeats that option. So I'd avoid trying to fill the 8k buffer; just send what was received. This covers short exchanges that don't fill the buffer and large data transfers that completely fill it.

(in other words, short messages won't see a benefit of 8k but large data transfers will.)

Darksonn · 2021-04-26T20:25:39Z

When it comes to NO_DELAY, that's not something io::copy controls. You can set it on the socket when you create it.

As for filling the buffer, the design I had in mind would only try to read more data into the buffer if the AsyncWrite is not able to send all of the data immediately. So it seems to me that this would not affect the case where the IO resources are extremely fast. It's still worth testing it of course.

Darksonn · 2021-04-26T20:29:20Z

That is, the idea is that if we are not able to write anything right now, we might as well read some more data from the socket instead if we have space for it. It would not involve postponing writing any data in the cases where we are able to write data for the sake of buffering many smaller reads together.

rneswold · 2021-04-26T20:35:50Z

When it comes to NO_DELAY, that's not something io::copy controls. You can set it on the socket when you create it.

Sorry, I wasn't being clear. If I set NO_DELAY on the socket, but you're waiting to fill the 2k buffer before sending it on, then you've defeated the purpose of NO_DELAY.

But your later response indicates you're only filling when you can't write which I'm more in favor for. Especially if you increase the buffer size.

Darksonn · 2021-04-26T20:41:14Z

Right, I understand now what you meant with NO_DELAY. To confirm, I am not suggesting that we wait for the buffer to be filled before we try to send, rather I am suggesting that if the send returns WouldBlock, then we try to do another read if there is space in the buffer.

b-naber · 2022-01-10T17:31:18Z

the design I had in mind would only try to read more data into the buffer if the AsyncWrite is not able to send all of the data immediately

@Darksonn This sounds to me like the condition i < self.cap, where i = ready!(writer.as_mut().poll_write(cx, &me.buf[me.pos..me.cap]))?, but in the current implementation we already get back into the reading part if that condition holds. Or do you mean to try to read more data if poll_write returns Pending? Can you clarify what you mean by 'not able to send all of the data'?

CthulhuDen · 2022-08-21T19:09:33Z

Hey! So it does seem like the issue got a little stale, do you think I could offer my take on this?

Noah-Kennedy · 2022-08-21T19:14:55Z

@CthulhuDen go ahead!

farnz · 2022-09-23T13:58:53Z

Is anyone currently actively working on this (@CthulhuDen perhaps?) If not, I'd like to tackle it.

CthulhuDen · 2022-09-23T14:35:13Z

@farnz Please go ahead! Sorry I don't have numbers on hand to back me, but I found this fix changes nothing, cause we're always bottlenecked on either input or output and basically either way change did not affect transfer throughput for me. But, maybe you will arrive at positive results. Good luck!

rneswold · 2022-09-23T14:52:28Z

Definitely do some benchmarks before and after to make sure your changes improve things. I know it's not easy writing test benchmarks since they don't necessarily reflect the real world, but maximizing throughput can be affected in unexpected ways.

In a (C++) project, we wrapped a helper function around getting the system time. It was so easy to use, it got called in several places every time the program handled an I/O event. Getting the system time requires entering the kernel, which is pretty expensive. We changed the program to get the system time when it woke up and saved it. The helper function simply returned the cached value for the current I/O event. That easily doubled our throughput.

In another instance, I called an ioctl to see how much space was in the socket buffer to decide whether to do a send. This meant every successful send required two entries into the kernel. Instead, the socket was set to return an error if there wasn't enough space. That change gave us a substantial bump in throughput.

I know these approaches don't translate well to async coding because futures need to be cancelable so, sometimes, you need to make the extra system calls to meet async requirements. But if you can make the code "Correct" while reducing system calls, it'll be a win.

farnz · 2022-09-28T19:55:06Z

The changes to actually do extra reads while waiting for the write to become possible aren't that complex; I can put them up for review, but I'd prefer to finish the benchmarks to show that they're worthwhile first. I'm working now on benchmarks showing that:

When doing memory-to-memory "I/O", there's no regression caused - this is the best case for the existing code, since neither reader nor writer ever block, and thus any slowdown here indicates that we're slowing things down in the fast case.
That there is a case where the change helps - this is likely to need a reader supplying data in tiny blocks (on the order of 2 KiB, so that we can get multiple blocks into the buffer), and a writer that is IOPS-bound, not throughput bound (so can handle very large writes, but only lets you make a small number of writes per second).

farnz · 2022-09-29T15:15:38Z

So, benchmarks done, and I've been able to create one case in which the change speeds things up, and 4 in which it doesn't affect things.

The no change cases are:

Straight fast-as-possible memory to memory copy
Memory to HDD simulation copy
Slow network simulation to memory copy
Slow network simulation to HDD simulation copy

And the one that benefits (significantly)

Slow network simulation to IOPS throttled unbuffered device copy.

However, I cannot see a case in which an IOPS throttled unbuffered device is a realistic target. If you're sending over a network, the network stack adds buffering; if it's local, the kernel will normally add buffering. The only time this could matter is if you've implemented AsyncWrite yourself for a custom device, and it's badly implemented, and you can't afford the memory cost of adding a BufWriter with a small buffer around your AsyncWrite.

My changes (with benchmark numbers on my aging Xeon E3-1245v2) are at master...farnz:tokio:copy_buffer - I'm happy to turn them into a proper pull request if there's still interest in this, because they do at least make it slightly harder to shoot yourself in the foot.

farnz · 2022-09-30T14:24:41Z

I've come up with a realistic benchmark that shows improvements, and I'm going to force-push my branch ready for a pull request. farnz@382c0bc is the old benchmarks before the force push for historic reference (my previous comment).

The "trick" is to make the HDD stall randomly as the buffer fills up. I believe this is a realistic behaviour, since it's what a real NAS would do if it was under overload scenario, and you were unlucky enough to come in just after the NAS assigns buffer space to another writer.

farnz · 2022-09-30T14:37:53Z

And with a bit more thinking, I've been able to write a realistic benchmark that does show improvement - I'm going to force push my branch and use it for the pull request, but farnz@382c0bc has the old benchmarks.

The missing part in my thinking is that concurrent users introduce random chances of blocking because all buffers are full; it's reasonable to model this as a chance of blocking that's proportional to how full your buffer is (since your buffer empties regularly, this represents more of a chance of blocking as the underlying HDD buffer fills up).

With this model in the benchmark, I see wins copying from a slow source to an overloaded HDD simulation.

Darksonn added E-help-wanted Call for participation: Help is requested to fix this issue. A-tokio Area: The main tokio crate M-io Module: tokio/io E-medium Call for participation: Experience needed to fix: Medium / intermediate labels Apr 12, 2021

cssivision mentioned this issue Apr 22, 2021

io: CopyBuffer should keep reading when there's space in the buffer #3719

Closed

Darksonn mentioned this issue Apr 30, 2021

add new method for io::copy that take buffer size #3722

Open

farnz mentioned this issue Sep 30, 2022

make copy continue filling the buffer when writer stalls #5066

Merged

Darksonn closed this as completed in #5066 Oct 3, 2022

Darksonn removed E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate labels Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep reading when there's space in the buffer for tokio::io::copy #3694

Keep reading when there's space in the buffer for tokio::io::copy #3694

Darksonn commented Apr 12, 2021 •

edited

Loading

Rustin170506 commented Apr 20, 2021

Darksonn commented Apr 20, 2021

Rustin170506 commented Apr 22, 2021 •

edited

Loading

Darksonn commented Apr 22, 2021

Rustin170506 commented Apr 22, 2021 •

edited

Loading

rneswold commented Apr 23, 2021

rneswold commented Apr 26, 2021 •

edited

Loading

Darksonn commented Apr 26, 2021

rneswold commented Apr 26, 2021

Darksonn commented Apr 26, 2021

Darksonn commented Apr 26, 2021 •

edited

Loading

rneswold commented Apr 26, 2021

Darksonn commented Apr 26, 2021 •

edited

Loading

b-naber commented Jan 10, 2022

CthulhuDen commented Aug 21, 2022

Noah-Kennedy commented Aug 21, 2022

farnz commented Sep 23, 2022

CthulhuDen commented Sep 23, 2022 •

edited

Loading

rneswold commented Sep 23, 2022

farnz commented Sep 28, 2022

farnz commented Sep 29, 2022

farnz commented Sep 30, 2022

farnz commented Sep 30, 2022

Keep reading when there's space in the buffer for tokio::io::copy #3694

Keep reading when there's space in the buffer for tokio::io::copy #3694

Comments

Darksonn commented Apr 12, 2021 • edited Loading

Rustin170506 commented Apr 20, 2021

Darksonn commented Apr 20, 2021

Rustin170506 commented Apr 22, 2021 • edited Loading

Darksonn commented Apr 22, 2021

Rustin170506 commented Apr 22, 2021 • edited Loading

rneswold commented Apr 23, 2021

rneswold commented Apr 26, 2021 • edited Loading

Darksonn commented Apr 26, 2021

rneswold commented Apr 26, 2021

Darksonn commented Apr 26, 2021

Darksonn commented Apr 26, 2021 • edited Loading

rneswold commented Apr 26, 2021

Darksonn commented Apr 26, 2021 • edited Loading

b-naber commented Jan 10, 2022

CthulhuDen commented Aug 21, 2022

Noah-Kennedy commented Aug 21, 2022

farnz commented Sep 23, 2022

CthulhuDen commented Sep 23, 2022 • edited Loading

rneswold commented Sep 23, 2022

farnz commented Sep 28, 2022

farnz commented Sep 29, 2022

farnz commented Sep 30, 2022

farnz commented Sep 30, 2022

Darksonn commented Apr 12, 2021 •

edited

Loading

Rustin170506 commented Apr 22, 2021 •

edited

Loading

Rustin170506 commented Apr 22, 2021 •

edited

Loading

rneswold commented Apr 26, 2021 •

edited

Loading

Darksonn commented Apr 26, 2021 •

edited

Loading

Darksonn commented Apr 26, 2021 •

edited

Loading

CthulhuDen commented Sep 23, 2022 •

edited

Loading