-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
writev – zero-copy gather output #291
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very useful PR. Overall I'm just wondering if it makes more sense to put a Bigarray-based version of this call in Lwt_bytes, and leave the Bytes version in Lwt_unix. Right now that's the split between Bigarray and Bytes functions in Lwt.
module IO_vectors = | ||
struct | ||
type _bigarray = | ||
(char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use Lwt_bytes.t
instead of a new type definition here? It is the same definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lwt_bytes
depends on Lwt_unix
, otherwise yes.
|
||
type _buffer = | ||
| Bytes of Bytes.t | ||
| Bigarray of _bigarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_buffer
is not exposed in the external signature so shouldnt need the _
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically use the _
to indicate at a glance that something isn't exposed (or in rare cases, like _bigarray
, is exposed, but I wish it weren't).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutable reversed_suffix : _io_vector list; | ||
mutable count : int} | ||
|
||
let create () = {prefix = []; reversed_suffix = []; count = 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's worth the extra complexity in the interface, but an optional argument of ?bytes
and ?bigarrays
here could be used to initialise the iov in one call rather than multiple appends. I could use that in cstruct to perform the write with less overhead from the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the ordering if both are provided? Perhaps it is better to have separate calls such as from_bigarray_list
?
@avsm Thanks. I chose to have one function handle mixtures of With that and other considerations in mind, I have an ill-defined long-term goal of merging |
includes its first [n] bytes. *) | ||
|
||
val count : t -> int | ||
(** [count vs] is the number of I/O vectors in the sequence [vs]. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit curious as to why did you decide to make this function part of the interface. I can't think of any use cases for it off the top of my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that somebody might want to compare count vs
with system_limit
, in case the latter matters to them. I suppose, in the vast majority of cases, such users could easily keep a counter while accumulating vs
. If you agree, I'll hide this function. We can always expose it later, if there is a request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it also gives an easy way to know when all the data in the vectors has been written. Here is some kind of writev
loop skeleton:
let rec loop () =
Lwt_unix.writev fd vecs >>= fun n ->
Lwt_unix.IO_vectors.drop vecs n;
if Lwt_unix.IO_vectors.count vecs = 0 then Lwt.return_unit
else loop ()
Of course, the user could keep track of how much data is in the vectors, but having count
for this seems like less of a pain.
This could be an argument to expose a size
or byte_count
instead, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular case, it seems like it would be better to add a predicate that would check if the whole io vector has been written yet, rather than expose the raw count. Hopefully we can fulfill other cases in this direct way as well.
@rgrinberg I've replaced
However, |
Apparently, that's only the case on Mac (and I guess other BSDs). Automated testing FTW. |
On 11/28, Anton wrote:
@rgrinberg I've replaced `count` by `empty`. Not fully sure about the new interface:
```
(** [empty vs] is [true] if and only if [vs] has no I/O vectors, or all I/O
vectors in [vs] have zero bytes. *)
```
I'm also not fully sure either. Also, shouldn't `empty` be used to construct an
empty io_vector? Something like is_empty is more in line with conventions (such
as core's)
But I think something like empty/is_empty is pretty normal for most containers
so I don't see the harm in adding it.
|
It seems that the BSD behavior is more strict, while still being POSIX-compliant. There are two options. I favor the first one:
After a brief search, I haven't found any mention that the Linux behavior is deliberate. Will probably look further later, however. BSD writev:
POSIX writev:
Linux writev:
I'll change |
160aa90
to
fcbc348
Compare
Regarding zero-length I/O vector lists, I decided it's better to only document the difference and tell users not to rely on the behavior. Don't really have any argument for dealing with this corner case in any particular way. |
Lwt_unix.writev
takes a sequence ofbytes
andBigarray
buffers, and writes the data to a file descriptor in one system call.Bigarray
buffers are always written without copying.bytes
buffers are written without copying if the file descriptor is in non-blocking mode, which is typical for sockets and pipes in Lwt.Example usage:
There is also a
drop
function to trimio_vectors
if not all bytes are written.Performance
I did some rudimentary performance testing (
writev.c
,writev.ml
), measuring throughput. The conclusions are:Lwt_unix.writev
should always be preferred over multiple calls toLwt_unix.write
(orLwt_bytes.write
, or stdlib'sUnix.single_write
).Lwt_unix.write
on the coalesced buffer. This is presumably due to the overhead of dealing with the I/O vectors by the application and kernel.writev
outperforms coalescing by up to 50%.writev
and coalescing dramatically outperform multiple calls towrite
. Both are about 7× faster on non-blocking Lwt file descriptors, and over 100× faster on blocking ones. The latter is probably due to Lwt synchronizing with worker threads for I/O on blocking descriptors.write
andwritev
system calls from C gives very similar ratios, and the order of magnitude of buffer size at whichwritev
becomes faster than coalescing is the same.Lwt_unix.writev
is at least 90% as fast as thewritev
system call, except for very small buffer sizes mentioned above, where OCaml allocations become relatively significant, but where coalescing is faster in both OCaml and C.The test machines were my OS X computer, and a Linux virtual machine running in Digital Ocean. These aren't very controlled environments. The point was to make sure there are no serious errors affecting performance in this initial implementation, rather than obtain really high-quality measurements. Perhaps I will do some more thorough testing later, and write a short article on the results.
Other notes
Lwt_unix.IO_vector.t
using Cstruct iovec
s directly, though the latter may have to carry extra data or make assumptions about the user retaining references to buffers, and potentially the GC not running between I/O vectors assembly and the call towritev
forbytes
buffers (this may be especially tough in future multicore).Bigarray
s, I put it inLwt_unix
. Hopefully, one day we will have modular implicits, andLwt_bytes
can be folded intoLwt_unix
.send_msg
andrecv_msg
can be switched to use the heterogenous I/O vectors from this PR. I am not in a hurry to break compatibility, however. Also, modular implicits may help here.writev
on Windows for Windows sockets only, but I have left that to be done on demand, or as part of some future effort to port more functions to Windows.Resolves #281.
cc @rgrinberg, @seliopou