-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebAssembly + NPM #83
Comments
Good feature to support, and I'd be interested at taking a stab. Dependency-wise, here is how things look for wasm support: Out of the box:
Require tweaking: ndarray = {version = "0.13", optional = true, default_features = false}
chrono = {version = "^0.4.13", optional = true} // via a flag
arrow = {version = "1.0.1", default_features = false} // need to disable pretty-print via a feature flag
rayon = "^1.3.1" // need to use cond_iter or a cfg flag - wasm doesn't have threads Might have issues: rand = {version = "0.7", optional = true} // wasm support being ruled out in 0.8, use getrandom crate instead
rand_distr = {version = "0.3", optional = true} // similar to rand All in all, from a dependency stand point, things look good. I'll see if I can get a PR up that flips the right flags and uses cond_iter instead of the normal rayon iter. |
Cool.. I am really excited about this one. I couldn't find anything about |
https://github.com/cuviper/rayon-cond It's just the same as a conditional compilation with |
Yes.. I'd rather have that, as it won't increase the compilation times. |
|
https://github.com/jkelleyrtp/polars/blob/jk/wasm/wasm-test/src/lib.rs I had a slight hiccup, but the basic examples work with some tweaking of the feature flags. Disabling In terms of npm... is the goal to release a "polars.js" package that exposes a rust-based dataframe? It would be nice if the series could be coerced into |
Nice work! Do you know why
My idea was a bit like I've done in Python. As much of the memory and operations in rust and an option to get data out to python. It depenends on the memory layout of the Series if this can be done without copy. If it is a single chunk and there are no null's in the array, it can be don zero-copy by giving ownership to Python/numpy. Otherwise I just allocate a new array. I don't know about JS/WASM. Is everything in rust memory also WASM linear memory and thus accessible from JS? If it is, it can be zero copy if before mentioned conditions are right. Getting data in from JS will probably always be copy as Arrow memory is 64-byte aligned which most allocations aren't. |
With regards to the question:
I am not a WebAssembly expert, but I recently did the
Then, what I understand is that you can only communicate JavaScript and Web-Assembly using Array of scalar. Moreover, Rust is compiled to web-assembly, then, as long as your object lives in Rust also will live in Web-Assembly. |
Then it seems that we can access all data with minimal overhead, so that's great. |
It seems that in more recent versions since 8.1, some new dependencies makes the wasm compilation more difficult, ie: comfy-table. It would be nice if polars-core could contain only computations dependencies, getting rid of all IOs formatting and compressions from it. |
Some dependencies have gotten easier. For example, arrow compiles to wasm with the default configurations now. |
As I understand, rayon is not an issue anymore, and Polars can be ran without SIMD, so the dependencies that are still bothering compilation are trivially replaced/ turned off.
I will make sure that IO and all formatting libraries are optional. Formatting could best be done in JS. |
This appears to be mostly working. What tasks do you need help with? |
Yes, I made a small POC. I wanted to mimic the python api, but some things were not yet possible in wasm bindgen, such as sending a What it boils down to is that there is quite some work to do, and I think we should split it up in 2 packages.
|
Very cool. What do you think about returning Arrow from the WASM context to the JS context and then exposing it to users via the Arrow JS library? The idea is to use Arrow as an IPC format between WASM and JS. You could also use Arrow as an IPC between a web worker and the main thread. We've done something similar in another WASM project with great success. |
@domoritz I was thinking about the WASM solution as a replacement for Arrow JS, but your proposal actually makes sense. There is no need for duplicate ChunkedArray->Primitive->ChunkedArray implementation in polars, that can be shared with Arrow JS. I assume it's not hot code path and when it is (strings, array of numbers) it'll have to be in JS land anyways. I'm not sure about the schema, dictionary and recordbatch header handling though. In which library would you handle (parse) it? |
Glad you like the proposal. We've actually had our own iterator implementation first as well and then switched to Arrow JS so we don't duplicate work. It's been a good decision and I agree that the performance should be almost unaffected (if not better since we avoid repeated calls into wasm).
Not sure I understand the question but I'll try to answer it. If you send record batches from wasm to js, arrow js would construct the schema from the IPC. |
My question was that would we use the full Arrow IPC for messaging or a simpler / lower level component, specific "ChunkedArray" types (as pyarrow refers to them). I don't think eg. pyarrow uses Apache IPC for arrow<->pyarrow communication. Is pyarrow <-> arrow communication a wrong model here (it has similar calling cost, primitives, lista and complex types are different etc)? |
You could probably use the arrow vectors (which we are changing to be always chunked) but I'm not sure of the benefits. The difference in Python, I think, is that communication between contexts is cheaper. In WASM, you still would need to get e.g. the schema across the boundary, and Arrow's binary format would be more efficient than say JSON. But I might be wrong. I'd say try the simplest solution first and then see whether there are bottlenecks. |
This was my question. Does the JS part have to know anything about headers, footers and metadata? I don't think a python<->c++ call is cheaper than JS<->WASM. I might be wrong, but I know that WASM functions are cheaper to call in NodeJS than their C++ implementations. |
I think that whatever is feasible we should investigate. Ideally I'd like to have a seamless interop with js-arrow and Polars
For interop with pyarrow (e.g. C++ arrow) / Rust arrow we use the arrow C data interface. This is zero-copy and we just send some pointers around. I don't think it can get much faster than that. |
You are right that the C data interface is the best way to interop with languages that can somehow consume these C headers. I'd recommend to just pack your buffers via the IPC format. Also everything that consumes your buffers will likely be javascript which will quickly engage any handbrakes it can find. |
@ritchie46 Was hoping to get some feedback on the POC i suggested when you have some time. I didn't know if you had strong opinions on using WASM and supporting the browser, or if supporting only nodejs was sufficient. |
I will. I had surgery yesterday, need some recovery time. Thank you for the contribution! |
@universalmind303 I'd be glad to help you with Neon bindings. I came here specifically to see if there are efforts going in that direction and saw that you wish to start it. |
Hey guys, first off, props to the amazing work you are doing. I would like to contribute to the wasm implementation in js-polars. I think having a wasm port can be of great value, it could play an important role in cloud native computing. When trying to add lazyframes and expressions I ran into the problem that I was unable to compile to My question now is, does it make sense at all to compile to a target that doesn't have a filesystem? Could you pass in the data in some other way? Through streams? If you don't need a filesystem, how much effort would it be to use a feature flag for everything that depends on the filesystem? I would really appreciate your help. |
Help on this would be awesome! I would really love seeing polars in WASM. We just need to feature gate the We don't need lazy readers to get data into polars lazy. We can just read data eagerly and then continue with E.g. df.select(some_expr)
# is syntactig sugar for
df.lazy().select(some_expr).collect(no_optimizations=True) Polars is not very well suited for streaming because our algorithms are not streaming. So a stream must be materialized into polars. There is already some work done by @universalmind303 and his Note that while doing this we must see |
Hey @ritchie46, thanks for the quick reply. That sounds great! I will have to familiarize myself a bit with the code to see what exactly needs to be behind the feature gate. I copied all tests from the |
Okay, false alarm. You can already get it to compile with the right feature gates. Sorry about that |
Something I wanted to do if implementing a browser based js-polars was to use the same Typescript wrapper that is already implemented for node. It would give the users a unified api, as well as massively reduce code duplication. I started work on it a bit on this branch js-polars In theory, a wasm binary with the same interface as the node binary should be plug & play. You should be able to even reuse the test suite. You can see the additional logic added to |
This definitely makes a lot of sense. I have to wrap my head around this and have a look at how the As I said I copied the tests from the |
The wasm-bindgen & js-sys crates are much more robust than napi-rs, so you could do it all in rust. However, some things that are not performance sensitive are much easier to do in Typescript, so it made sense to do them there (hence the wrapper). The wrapper behaves pretty much the same as the python wrapper. Some examples of things that made sense to do in JS.
That would work, but you would end up unnecessarily duplicating a lot of logic. Regarding IO, the js WASM implementation would likely have a slimmed down version of this, as there is no filesystem in the browser. So Also, feel free to join the discord channel, and we could discuss in more detail & I could answer any questions you may have about the JS wrapper. |
Thanks a lot @universalmind303. I will try to think of a way to merge my branch with yours. And I will join the discord channel in case I have some questions. Looking forward |
Has there been any progress here? I have started playing around with nodejs-polars but it would be amazing to get something running in the browser. |
Sorry to use the issue tracker for a question. But it seems that most people that have extended knowledge are participating here. I am not sure if this is possible right now. I want to use JS to define lazy operations on a data frame. The operations are then performed inside of WASM/Rust. JS is only used to compose the operation graph. It seems that pure WASM polars is still not possible. But what should work is reading the lazy data frame in Rust, exposing it using the arrow c ffi, and then using that ffi inside of WASM to actually do the operations. The data frame can then be read-only accessed using the same ffi from any of the boundaries (JS/WASM/Rust) I think. Thanks for answering! |
There's some discussion on this on the discord. @gitkwr put together a minimal example of using polars in wasm. That uses their slightly modified version of polars. @gitkwr has expressed interest on the discord of merging their changes back into polars main.
Not sure I follow your proposal here; there's no ffi between Rust and wasm, just between wasm and JS I think. I did create an example of exposing Arrow data from wasm to JS via the Arrow C Data Interface, that might be relevant in the future. |
Hey folks, has there been any more movement on this card? Happy to lend a hand if there's work that needs doing to move this along |
@eddie-atkinson Polars can compile to wasm after #6050. If you are specifically inquiring about usage within browser via javasrcipt, I'd suggest taking a look at js-polars. It is an MVP of running within the browser. Ideally I'd like to build out more functionality in it, but my knowledge of building browser packages (as well as my time) is quite limited. We'd love additional contributors to help build it out! |
So we had a PR merged a while ago that allowed polars to be compiled to wasm, and here's a minimum working example: https://github.com/gitkwr/polars-wasm-mwe However, since then I think this has regressed with some arrow2 incompatibility ( lz4 zstd dependencies). This needs to be looked at but haven't had the time. If you want to look into it, the MWE will go a long way in helping you. Right now I have locked on my repos to the last working commit:
|
Has anyone had any success reading a remote arrow file directly from |
@benjaminrwilson have you had success compiling polars to |
I haven't tested this change in depth but it appears that the gap is small: https://gist.github.com/bngo92/40a96093e4ef4643ca5128f34bbb1b98. With this diff, I was able to compile my yew app and use basic polars functionality without any issues. This diff is off of rs-0.30.0 |
It looks like @lorepozo has a better version of my diff. I still needed a few changes of top of the fork in order to get the json, lazy, and sql features to work for me: https://github.com/bngo92/polars/commits/master |
Leaving some breadcrumbs about this… tl;dr: # Cargo.toml
polars = { version = "*", default_features = false } Without it, I see errors beginning with:
Full output
The only relevant hits I found on the web are this SO and this reddit comment, seemingly both by @llalma in Oct '22 (before #6050, and referring to Polars 0.25.0). Searching "sys::position" in Discord turned up this useful convo between @dimasukr and @universalmind303 from Apr '22, which led me to the Here's a sketch of an MWE:
|
Currently, the CI checks each feature for wasm build-ability with exceptions in crates/Makefile#check-wasm such as Though as you've noticed successful compilation does not imply full support (e.g. file system code compiles but panics at runtime if used). I've used polars on CSVs in wasm by reading raw CSV contents separately, loading it via |
Hello @lorepozo
|
You could use the use std::io::Cursor;
use polars_io::csv::CsvReader;
use polars_io::SerReader;
let bytes_array = include_bytes!("path/to/file.csv");
let rdr = CsvReader::new(Cursor::new(&bytes_array[..]));
let df = rdr.finish().expect("csv reader error");
// log first row to console
use web_sys::console;
let row = df.get_row(0).expect("get_row error").0;
console::log_1(&format!("{:?}", row).into()); I think you could also use reqwest to fetch the file at runtime instead of including it at compile time but I haven't tried this. |
@alicja-januszkiewicz |
I'm having issues with this.
but under a patch in my [patch.crates-io]
polars = { version = "*", git = "https://github.com/pola-rs/polars", branch = "main", default-features = false, features=["csv"]} That didn't work. I still get the error: Anyone gotten it to work when |
See if we can support this with an optional feature.
The text was updated successfully, but these errors were encountered: