-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please consider node like pread / pwrite operations #34
Comments
I forgot to elaborate on why this is necessary for p2p use cases - That is because participating peer usually gets different chunks from different peers in the swarm and also seeds those chunks as they are being requested. Which is why read / write of specific byte ranges is important for such use cases. |
/cc @mafintosh here as he could offer far more detailed insight into this than I can. |
At least the currently described API only has pread/pwrite like operations (well, pwrite, for reading it relies on the existing blob reading APIs, which thanks to slicing more or less are pretty pread like anyway). I expect we'll end up with some combination of a pread/pwrite like API and a way to read/write using Readable/WritableStreams, which hopefully should address all use cases. Exact shape TBD, but good to hear that there is a strong desire for pread/pwrite style APIs. |
Did it change ? I believe when I submitted writer did not had
Are you referring to FileReader API,I would disagree, it requires reading the whole file and taking only a slice of it after the fact. Which for some of these applications would be a problem as for instance Dat stores all the data in a single sparse file loading it all into memory just too reach few bytes would be a highly inefficient.
👍 |
Yeah, at least at some point the sample code didn't have that position argument, but I changed my mind :)
That is not necesarily true. You can just call slice() on the blob/file you have, and then only pass the sliced blob to FileReader (or wrap the sliced blob in a Response to use a more modern reading API). Creating a slice() should be fairly efficient, and at least for file backed blobs in chrome will not involve any reading or copying of the data. |
I did not realized that, thanks for the clarification! |
The writing API was changed, but still has pwrite/random access capabilities. The previous So since I believe this is addressed, closing this issue. |
I have not being paying close attention to this, but there’s fundamental difference between reading writing at random offsets and using seek based API as later requires task coordination to avoid races on concurrent access while former doesn’t. |
From what I can tell looking at current spec > Writes the content of data into the file associated with stream at position bytes from the top of the file. Also updates the current file cursor offset to the end of the written data. Even writes with specified offset would exhibit problem mentioned in a previous comment. As updating position affect concurrent read / writes |
Currently all writes always operate on a temporary/shadow file. And until you close the writer no changes will be reflected in the actual file. You can't read from the shadow file, so there is definitely no issue between reads and writes. (we do realize this is somewhat limiting, and there definitely are use cases where this model doesn't work, in particular your use case isn't very efficient in this model). I'm not sure I entirely understand the concern about writes either. Is the concern there that if you use both "pwrite" style writes and writes without explicit offset those might interfere with each other? I'm not sure I understand when you'd be in such a situation. I'd expect in most cases you'd either want to write at an explicit offset, or you'd want to just stream data at the "current position". What use cases would require mixing the two? |
@mkruisselbrink let me try to set some context on the use case. Many p2p applications tend to create arbitrary number of network connections to get chunks of data in parallel which that then is written to the underlying file. This implies many concurrent read and write operations can occur. More specific example would be dat which uses random-access-storage for data storage. random-access-file implementation is very performant as nodejs provides pread / pwrite and there for read / write operations do not require task queues. On the other end of the spectrum is random-access-idb which is only cross browser solution, but also performs poorly due to indexedDB not having a way to read ranges of data. And then there is This queuing limitation affects significantly performance which is why I brought this issue up. That being said if changes aren't reflected until file is closed that is probably going to be even more problematic than lack or pread/pwrite. It is also worth pointing out that this is not tied to dat specificially. Any use case that assumes concurrent access to file that is large enough to be impractical for keeping in memory will be face these challenges. |
Thanks for the detailed explanation. Unfortunately that is indeed a use case that isn't very well supported yet today. Something like the NativeIO proposal might be a better fit, with the caveat that as currently envisioned that only gives access to origin scoped/private files. That might still work for P2P applications though, if they're fine with not actually storing the final downloaded file to a user specified location until the entire file is downloaded? Either way, we'll definitely use cases like this in mind as we think about the future of this API... |
Many p2p use cases depend on pread / pwrite operations that is reading a byte range from the provided offset / writing byte range from the provided offset without changing file position as that allows for scheduling parallel read / write operations.
While it is possible to use hand rolled scheduler and queue read / write operations it does greatly affect throughput and writing such code is non trivial.
The text was updated successfully, but these errors were encountered: