Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Blob and FileReader #164

Closed
manvalls opened this issue Dec 15, 2014 · 8 comments
Closed

Implement Blob and FileReader #164

manvalls opened this issue Dec 15, 2014 · 8 comments

Comments

@manvalls
Copy link
Contributor

In the browser we've got the Blob object, which represents some binary data that typically is stored in the hard drive. If we want to read said data, we use the FileReader object. Sometimes, we just don't need to read that data, instead we want to send it, and the actual contents of it don't mind us at all.

In the browser this is straightforward, we just obj.send(blob);, but if we want to do the same thing using node's Buffer API, we've got to read that data from the hard drive first, and thus consuming a considerable amount of memory, and then send it. The closest thing to sending Blobs I can think of in the Node world is stream.pipe(otherStream); but this is still implemented on plain javascript, loading hd data in memory bit by bit. Multiply those bits of binary data by thousands of connections and you may find a bottleneck.

Such operations should be done on a lower level, allowing things like the sendfile system call. Imo, implementing the Blob object, at first as some sugar around fs and later as a full lower level implementation, would allow us to save some memory and, at the same time, reduce the gap between Node and the browser's worlds.

@chrisdickinson
Copy link
Contributor

While sendfile is definitely something I'd like to see implemented I don't see much benefit to copying the browser's Blob / FileReader API. Something -- be it the kernel, or userland javascript -- has to read parts of the file into memory. Blob just allows us to skip the marshalling-into-userland step, which is something that a sendfile implementation would already let us do at a lower level of abstraction (which would enable userland to build APIs like Blob).

@iankronquist
Copy link
Contributor

sendfile is not posix and the OS X and Linux versions differ wildly. I assume you're talking about the Linux version. I'm interested in working on something like this if it is needed.

@bnoordhuis
Copy link
Member

Libuv has a sendfile() abstraction that papers over the platform differences. It round-trips through the thread pool so it's not necessarily faster than plain read+write but it's a good starting point.

That said, sendfile() is a one-trick pony. In libuv, we would want a more generic API - possibly one loosely modeled after UNIX pipes - so we can use splice() or other operating system-specific primitives where appropriate.

@benjamingr
Copy link
Member

It's a one trick pony but it's a very common use case IMO.

@manvalls I'm not sure I understand why you want a Blob object - why does Buffer not suffice here?

@manvalls
Copy link
Contributor Author

@benjamingr Buffer represents some data that is already in memory. sendfile is all about file descriptors: it transfers data from one into another at the kernel level. So, either obj.sendFile(blob); or obj.sendFile(fd);, or even a Stream.pipe implementation which internally uses sendfile, but sending a Buffer defeats the purpose of sendfile. Anyways, I wanted a Blob implementation for code portability, but I guess sendfile functionality should be made a separate issue.

@faridnsh
Copy link
Contributor

This has been discussed in nodejs/node-v0.x-archive#1802...

@chrisdickinson
Copy link
Contributor

In the browser we've got the Blob object, which represents some binary data that typically is stored in the hard drive. If we want to read said data, we use the FileReader object. Sometimes, we just don't need to read that data, instead we want to send it, and the actual contents of it don't mind us at all.

Blob does not specify whether the information it represents is in memory or on disk. At some point, to send the data, something (the kernel, userland code, or JS) will have to load at least part of the information into memory, regardless of whether it was marshalled using FileReader or not. The fact that Blob does not specify the location of the data lets the surrounding APIs "get away" with not having to marshall data into JS, if not necessary.

In other words, the Blob, File, and FileReader APIs are not the cornerstone of this proposal: the goal is to be able to transparently reduce unnecessary marshalling and unmarshalling from kernel to heap, and from heap to JS during reads and writes to resources. While a small win could be attained by adding a net.Socket#sendfile(fd) method, users would have to know to use sendfile, and know to benchmark it against read+write to make sure they're getting real performance gains. A sendfile method is a dubious win; one that doesn't have the desired "transparency" of something like Blob / File / FileReader. Ultimately, I think the best way to realize that goal is through streams.

Work towards enabling this sort of communication between streams is underway on the whatwg/streams spec to make off-main-thread piping possible. This is delicate, intricate work; and rather than implement this functionality ourselves at present, it would be best to contribute to that work and see what lessons can be learned from its outcomes. Happily, this fits in with the larger desire to gradually move towards whatwg specs where feasible.

In summary: the File, Blob, and FileReader abstractions do not fit iojs' needs. Instead, keep an eye on the whatwg streams spec, and especially that pull request. Once it lands and the dust settles a bit, we will have a more clear view of the terrain and the potential pitfalls. At that point, we can make an informed decision about whether or not to work towards this goal.

@calvinmetcalf
Copy link
Contributor

one use case that is hard to do in node that blobs excel at in the browser is blob urls, the ability to put random data in a blob and treat it like a url, the inability to do this in node (especially for files) can be annoying when trying to use poorly written libraries that will only take a file path (e.g. compositing in image magic since you need to pass it 2 files and it can only take one from stdin). While there are some hacks using named pipes that sorta do the trick, god I miss blob urls in node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants