Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly + NPM #83

Open
ritchie46 opened this issue Sep 22, 2020 · 54 comments
Open

WebAssembly + NPM #83

ritchie46 opened this issue Sep 22, 2020 · 54 comments
Labels
help wanted Extra attention is needed

Comments

@ritchie46
Copy link
Member

See if we can support this with an optional feature.

@ritchie46 ritchie46 added the help wanted Extra attention is needed label Sep 23, 2020
@jkelleyrtp
Copy link

Good feature to support, and I'd be interested at taking a stab.

Dependency-wise, here is how things look for wasm support:

Out of the box:

num = "^0.2.1"
fnv = "^1.0.7"
unsafe_unwrap = "^0.1.0"
thiserror = "^1.0.16"
itertools = "^0.9.0"
prettytable-rs = { version="^0.8.0", features=["win_crlf"], optional = true, default_features = false}
parquet = {version = "1", optional = true}
packed_simd_2 = "0.3.4"

Require tweaking:

ndarray = {version = "0.13", optional = true, default_features = false}
chrono = {version = "^0.4.13", optional = true}  // via a flag
arrow = {version = "1.0.1", default_features = false} // need to disable pretty-print via a feature flag
rayon = "^1.3.1" // need to use cond_iter or a cfg flag - wasm doesn't have threads

Might have issues:

rand = {version = "0.7", optional = true}               // wasm support being ruled out in 0.8, use getrandom crate instead
rand_distr = {version = "0.3", optional = true}     // similar to rand

All in all, from a dependency stand point, things look good. I'll see if I can get a PR up that flips the right flags and uses cond_iter instead of the normal rayon iter.

@ritchie46
Copy link
Member Author

Cool.. I am really excited about this one. I couldn't find anything about cond_iter. Is it also conditional compilation like cfg?

@jkelleyrtp
Copy link

Cool.. I am really excited about this one. I couldn't find anything about cond_iter. Is it also conditional compilation like cfg?

https://github.com/cuviper/rayon-cond

It's just the same as a conditional compilation with cfg, and by the looks of it, using cfg might be better because rayon-cond is out of date (2 years!)

@ritchie46
Copy link
Member Author

ritchie46 commented Oct 13, 2020

It's just the same as a conditional compilation with cfg, and by the looks of it, using cfg might be better because rayon-cond is out of date (2 years!)

Yes.. I'd rather have that, as it won't increase the compilation times.

@cuviper
Copy link

cuviper commented Oct 20, 2020

rayon-cond is runtime-conditional, not at compilation time. The idea was that it might help with dynamic decisions about parallelism, but if you want a static choice, cfg would be a lot better.

@jkelleyrtp
Copy link

https://github.com/jkelleyrtp/polars/blob/jk/wasm/wasm-test/src/lib.rs

I had a slight hiccup, but the basic examples work with some tweaking of the feature flags. Disabling pretty and simd make the basic examples work on wasm. Not sure if I just haven't ran into the right conditions to trip up and result in a panic.

In terms of npm... is the goal to release a "polars.js" package that exposes a rust-based dataframe? It would be nice if the series could be coerced into TypedArrayBuffers with some metadata so that the data could move in and out of wasm with little serialization/deserialization overhead.

@ritchie46
Copy link
Member Author

Nice work! Do you know why pretty gave problems?

In terms of npm... is the goal to release a "polars.js" package that exposes a rust-based dataframe? It would be nice if the series could be coerced into TypedArrayBuffers with some metadata so that the data could move in and out of wasm with little serialization/deserialization overhead.

My idea was a bit like I've done in Python. As much of the memory and operations in rust and an option to get data out to python. It depenends on the memory layout of the Series if this can be done without copy. If it is a single chunk and there are no null's in the array, it can be don zero-copy by giving ownership to Python/numpy. Otherwise I just allocate a new array.

I don't know about JS/WASM. Is everything in rust memory also WASM linear memory and thus accessible from JS? If it is, it can be zero copy if before mentioned conditions are right.

Getting data in from JS will probably always be copy as Arrow memory is 64-byte aligned which most allocations aren't.

@marioloko
Copy link
Contributor

With regards to the question:

I don't know about JS/WASM. Is everything in rust memory also WASM linear memory and thus accessible from JS? If it is, it can be zero copy if before mentioned conditions are right.

I am not a WebAssembly expert, but I recently did the wasm-pack (tutorial)[https://rustwasm.github.io/book/game-of-life/implementing.html] and read this that may is useful for you:

JavaScript's garbage-collected heap — where Objects, Arrays, and DOM nodes are allocated — is distinct from WebAssembly's linear memory space, where our Rust values live. WebAssembly currently has no direct access to the garbage-collected heap (as of April 2018, this is expected to change with the "Interface Types" proposal). JavaScript, on the other hand, can read and write to the WebAssembly linear memory space, but only as an ArrayBuffer of scalar values (u8, i32, f64, etc...). WebAssembly functions also take and return scalar values. These are the building blocks from which all WebAssembly and JavaScript communication is constituted.

Then, what I understand is that you can only communicate JavaScript and Web-Assembly using Array of scalar. Moreover, Rust is compiled to web-assembly, then, as long as your object lives in Rust also will live in Web-Assembly.

@ritchie46
Copy link
Member Author

Then it seems that we can access all data with minimal overhead, so that's great.

@jcheype
Copy link

jcheype commented Feb 13, 2021

It seems that in more recent versions since 8.1, some new dependencies makes the wasm compilation more difficult, ie: comfy-table. It would be nice if polars-core could contain only computations dependencies, getting rid of all IOs formatting and compressions from it.

@domoritz
Copy link

Some dependencies have gotten easier. For example, arrow compiles to wasm with the default configurations now.

@ritchie46
Copy link
Member Author

As I understand, rayon is not an issue anymore, and Polars can be ran without SIMD, so the dependencies that are still bothering compilation are trivially replaced/ turned off.

It seems that in more recent versions since 8.1, some new dependencies makes the wasm compilation more difficult, ie: comfy-table. It would be nice if polars-core could contain only computations dependencies, getting rid of all IOs formatting and compressions from it.

I will make sure that IO and all formatting libraries are optional. Formatting could best be done in JS.

ritchie46 added a commit that referenced this issue Apr 13, 2021
ritchie46 added a commit that referenced this issue Apr 14, 2021
ritchie46 added a commit that referenced this issue Apr 14, 2021
@tbro
Copy link

tbro commented May 18, 2021

This appears to be mostly working. What tasks do you need help with?

@ritchie46
Copy link
Member Author

Yes, I made a small POC.

I wanted to mimic the python api, but some things were not yet possible in wasm bindgen, such as sending a Vec<Series>, to a DataFrame. So I was thinking that we probably want some javascript wrapper DataFrame and wrapper Series (I also have that in Python) that can use things like builder patterns under the hood to mimic the Python API.

What it boils down to is that there is quite some work to do, and I think we should split it up in 2 packages.

  • js-polars-core -> backend /core wasm
  • js-polars -> written in js/ts that creates a nice api around js-polars-core

@domoritz
Copy link

Very cool. What do you think about returning Arrow from the WASM context to the JS context and then exposing it to users via the Arrow JS library? The idea is to use Arrow as an IPC format between WASM and JS. You could also use Arrow as an IPC between a web worker and the main thread. We've done something similar in another WASM project with great success.

@alippai
Copy link

alippai commented May 19, 2021

@domoritz I was thinking about the WASM solution as a replacement for Arrow JS, but your proposal actually makes sense. There is no need for duplicate ChunkedArray->Primitive->ChunkedArray implementation in polars, that can be shared with Arrow JS. I assume it's not hot code path and when it is (strings, array of numbers) it'll have to be in JS land anyways. I'm not sure about the schema, dictionary and recordbatch header handling though. In which library would you handle (parse) it?

@domoritz
Copy link

Glad you like the proposal. We've actually had our own iterator implementation first as well and then switched to Arrow JS so we don't duplicate work. It's been a good decision and I agree that the performance should be almost unaffected (if not better since we avoid repeated calls into wasm).

I'm not sure about the schema, dictionary and recordbatch header handling though. In which library would you handle (parse) it?

Not sure I understand the question but I'll try to answer it. If you send record batches from wasm to js, arrow js would construct the schema from the IPC.

@alippai
Copy link

alippai commented May 19, 2021

My question was that would we use the full Arrow IPC for messaging or a simpler / lower level component, specific "ChunkedArray" types (as pyarrow refers to them). I don't think eg. pyarrow uses Apache IPC for arrow<->pyarrow communication. Is pyarrow <-> arrow communication a wrong model here (it has similar calling cost, primitives, lista and complex types are different etc)?

@domoritz
Copy link

domoritz commented May 19, 2021

You could probably use the arrow vectors (which we are changing to be always chunked) but I'm not sure of the benefits. The difference in Python, I think, is that communication between contexts is cheaper. In WASM, you still would need to get e.g. the schema across the boundary, and Arrow's binary format would be more efficient than say JSON. But I might be wrong. I'd say try the simplest solution first and then see whether there are bottlenecks.

@alippai
Copy link

alippai commented May 19, 2021

This was my question. Does the JS part have to know anything about headers, footers and metadata? I don't think a python<->c++ call is cheaper than JS<->WASM. I might be wrong, but I know that WASM functions are cheaper to call in NodeJS than their C++ implementations.

@ritchie46
Copy link
Member Author

Very cool. What do you think about returning Arrow from the WASM context to the JS context and then exposing it to users via the Arrow JS library? The idea is to use Arrow as an IPC format between WASM and JS. You could also use Arrow as an IPC between a web worker and the main thread. We've done something similar in another WASM project with great success.

I think that whatever is feasible we should investigate. Ideally I'd like to have a seamless interop with js-arrow and Polars Series similar to how that works in Python Polars.

My question was that would we use the full Arrow IPC for messaging or a simpler / lower level component, specific "ChunkedArray" types (as pyarrow refers to them). I don't think eg. pyarrow uses Apache IPC for arrow<->pyarrow communication. Is pyarrow <-> arrow communication a wrong model here (it has similar calling cost, primitives, lista and complex types are different etc)?

For interop with pyarrow (e.g. C++ arrow) / Rust arrow we use the arrow C data interface. This is zero-copy and we just send some pointers around. I don't think it can get much faster than that.

@ankoh
Copy link

ankoh commented May 19, 2021

Very cool. What do you think about returning Arrow from the WASM context to the JS context and then exposing it to users via the Arrow JS library? The idea is to use Arrow as an IPC format between WASM and JS. You could also use Arrow as an IPC between a web worker and the main thread. We've done something similar in another WASM project with great success.

I think that whatever is feasible we should investigate. Ideally I'd like to have a seamless interop with js-arrow and Polars Series similar to how that works in Python Polars.

My question was that would we use the full Arrow IPC for messaging or a simpler / lower level component, specific "ChunkedArray" types (as pyarrow refers to them). I don't think eg. pyarrow uses Apache IPC for arrow<->pyarrow communication. Is pyarrow <-> arrow communication a wrong model here (it has similar calling cost, primitives, lista and complex types are different etc)?

For interop with pyarrow (e.g. C++ arrow) / Rust arrow we use the arrow C data interface. This is zero-copy and we just send some pointers around. I don't think it can get much faster than that.

You are right that the C data interface is the best way to interop with languages that can somehow consume these C headers.
But arrow-js cannot read the C data interface today.
Conceptually, this would require the arrow-js devs to interpret the C headers and all the pointers manually out of your wasm heap which is rather unrealistic.
A compromise would be to consume the C data interface on the wasm side and point arrow-js to the right arrays (maybe just dump the schema and all relevant offsets as json or thrift) but that's also code that does not exist today.

I'd recommend to just pack your buffers via the IPC format.
That's what we do and it works quite well.
It allows you to expose your buffers as real record batch streams and you can still eliminate the explicit ipc packing later without anyone noticing.

Also everything that consumes your buffers will likely be javascript which will quickly engage any handbrakes it can find.
This will outweigh the additional IPC packing by a lot.

@universalmind303
Copy link
Collaborator

@ritchie46 Was hoping to get some feedback on the POC i suggested when you have some time. I didn't know if you had strong opinions on using WASM and supporting the browser, or if supporting only nodejs was sufficient.

@ritchie46
Copy link
Member Author

@ritchie46 Was hoping to get some feedback on the POC i suggested when you have some time. I didn't know if you had strong opinions on using WASM and supporting the browser, or if supporting only nodejs was sufficient.

I will. I had surgery yesterday, need some recovery time. Thank you for the contribution!

@dashmug
Copy link

dashmug commented Nov 23, 2021

@universalmind303 I'd be glad to help you with Neon bindings. I came here specifically to see if there are efforts going in that direction and saw that you wish to start it.

@JanKaul
Copy link

JanKaul commented Feb 21, 2022

Hey guys,

first off, props to the amazing work you are doing. I would like to contribute to the wasm implementation in js-polars. I think having a wasm port can be of great value, it could play an important role in cloud native computing.
I have started working on some features at this branch https://github.com/JanKaul/polars/tree/js-polars-v0.0.2.

When trying to add lazyframes and expressions I ran into the problem that I was unable to compile to wasm32-unknown-unknown. From what I understood, the reason comes from the dirs: "4.0" dependency of polars-io which is required from polars-lazy. This makes total sense since there is no filesystem available in wasm32-unknown-unknown.

My question now is, does it make sense at all to compile to a target that doesn't have a filesystem? Could you pass in the data in some other way? Through streams?

If you don't need a filesystem, how much effort would it be to use a feature flag for everything that depends on the filesystem?

I would really appreciate your help.

@ritchie46
Copy link
Member Author

Help on this would be awesome! I would really love seeing polars in WASM. We just need to feature gate the dirs dependency then. No problem.

We don't need lazy readers to get data into polars lazy. We can just read data eagerly and then continue with df.lazy(). This is what's needed anyway because the interpreted eager polars operations are syntactic sugar for df.lazy().operation().collect().

E.g.

df.select(some_expr)

# is syntactig sugar for

df.lazy().select(some_expr).collect(no_optimizations=True)

Polars is not very well suited for streaming because our algorithms are not streaming. So a stream must be materialized into polars.

There is already some work done by @universalmind303 and his nodejs impelmentation likely has already a lot of work done.

Note that while doing this we must see python-polars as the reference implemenation for interpreters.

@JanKaul
Copy link

JanKaul commented Feb 21, 2022

Hey @ritchie46, thanks for the quick reply. That sounds great! I will have to familiarize myself a bit with the code to see what exactly needs to be behind the feature gate.

I copied all tests from the nodejs implementation and I'm trying to make those two as similar as possible.

@JanKaul
Copy link

JanKaul commented Feb 21, 2022

Okay, false alarm. You can already get it to compile with the right feature gates. Sorry about that

@universalmind303
Copy link
Collaborator

Something I wanted to do if implementing a browser based js-polars was to use the same Typescript wrapper that is already implemented for node. It would give the users a unified api, as well as massively reduce code duplication.

I started work on it a bit on this branch js-polars

In theory, a wasm binary with the same interface as the node binary should be plug & play. You should be able to even reuse the test suite.

You can see the additional logic added to series.ts to detect the correct binary here.

@JanKaul
Copy link

JanKaul commented Feb 21, 2022

This definitely makes a lot of sense. I have to wrap my head around this and have a look at how the napi-rs crate works exactly. So far I tried to make wasm-pack emit the same Typescript interface than nodejs-polars. I need to have a deeper look at your branch to see where the wrapper comes into play. As long as wasm-pack emits the same interface, wouldn't that be sufficient?

As I said I copied the tests from the nodejs-polars folder and tried to make the interface accordingly.

@universalmind303
Copy link
Collaborator

universalmind303 commented Feb 21, 2022

The wasm-bindgen & js-sys crates are much more robust than napi-rs, so you could do it all in rust. However, some things that are not performance sensitive are much easier to do in Typescript, so it made sense to do them there (hence the wrapper). The wrapper behaves pretty much the same as the python wrapper.

Some examples of things that made sense to do in JS.


As long as wasm-pack emits the same interface, wouldn't that be sufficient?

That would work, but you would end up unnecessarily duplicating a lot of logic.


Regarding IO, the js WASM implementation would likely have a slimmed down version of this, as there is no filesystem in the browser. So scan_* would likely be impossible, but read_* could be implemented for string, Buffer, and Stream

Also, feel free to join the discord channel, and we could discuss in more detail & I could answer any questions you may have about the JS wrapper.

@JanKaul
Copy link

JanKaul commented Feb 21, 2022

Thanks a lot @universalmind303. I will try to think of a way to merge my branch with yours. And I will join the discord channel in case I have some questions.

Looking forward

@einarpersson
Copy link

Has there been any progress here? I have started playing around with nodejs-polars but it would be amazing to get something running in the browser.

@mainrs
Copy link

mainrs commented Dec 8, 2022

Sorry to use the issue tracker for a question. But it seems that most people that have extended knowledge are participating here.

I am not sure if this is possible right now. I want to use JS to define lazy operations on a data frame. The operations are then performed inside of WASM/Rust. JS is only used to compose the operation graph.

It seems that pure WASM polars is still not possible. But what should work is reading the lazy data frame in Rust, exposing it using the arrow c ffi, and then using that ffi inside of WASM to actually do the operations. The data frame can then be read-only accessed using the same ffi from any of the boundaries (JS/WASM/Rust) I think.

Thanks for answering!

@kylebarron
Copy link
Contributor

It seems that pure WASM polars is still not possible

There's some discussion on this on the discord. @gitkwr put together a minimal example of using polars in wasm. That uses their slightly modified version of polars. @gitkwr has expressed interest on the discord of merging their changes back into polars main.

But what should work is reading the lazy data frame in Rust, exposing it using the arrow c ffi, and then using that ffi inside of WASM to actually do the operations

Not sure I follow your proposal here; there's no ffi between Rust and wasm, just between wasm and JS I think. I did create an example of exposing Arrow data from wasm to JS via the Arrow C Data Interface, that might be relevant in the future.

@eddie-atkinson
Copy link

Hey folks, has there been any more movement on this card?

Happy to lend a hand if there's work that needs doing to move this along

@universalmind303
Copy link
Collaborator

universalmind303 commented Mar 27, 2023

@eddie-atkinson Polars can compile to wasm after #6050. If you are specifically inquiring about usage within browser via javasrcipt, I'd suggest taking a look at js-polars. It is an MVP of running within the browser. Ideally I'd like to build out more functionality in it, but my knowledge of building browser packages (as well as my time) is quite limited. We'd love additional contributors to help build it out!

@rohit-ptl
Copy link
Contributor

Hey folks, has there been any more movement on this card?

Happy to lend a hand if there's work that needs doing to move this along

So we had a PR merged a while ago that allowed polars to be compiled to wasm, and here's a minimum working example: https://github.com/gitkwr/polars-wasm-mwe

However, since then I think this has regressed with some arrow2 incompatibility ( lz4 zstd dependencies). This needs to be looked at but haven't had the time. If you want to look into it, the MWE will go a long way in helping you.

Right now I have locked on my repos to the last working commit:

[dependencies.polars]
features = [...]
default-features = false
git =  "https://github.com/pola-rs/polars"
rev = "d6da86b8ec32fe4b001e96f1fdedda3db188f7af"

@benjaminrwilson
Copy link

benjaminrwilson commented Apr 5, 2023

Has anyone had any success reading a remote arrow file directly from s3 into a DataFrame when targeting wasm32-unknown-unknown? If so, would you mind pointing me to an example?

@highway900
Copy link

@benjaminrwilson have you had success compiling polars to wasm32-unknown-unknown ?

@bngo92
Copy link

bngo92 commented Jul 13, 2023

I haven't tested this change in depth but it appears that the gap is small: https://gist.github.com/bngo92/40a96093e4ef4643ca5128f34bbb1b98. With this diff, I was able to compile my yew app and use basic polars functionality without any issues.

This diff is off of rs-0.30.0

@bngo92
Copy link

bngo92 commented Jul 14, 2023

It looks like @lorepozo has a better version of my diff. I still needed a few changes of top of the fork in order to get the json, lazy, and sql features to work for me: https://github.com/bngo92/polars/commits/master

@ryan-williams
Copy link

Leaving some breadcrumbs about this…

tl;dr: default_features = false in Cargo.toml allowed my wasm-pack build --target web to succeed:

# Cargo.toml
polars = { version = "*", default_features = false }

Without it, I see errors beginning with:

   Compiling crossterm v0.26.1
error[E0432]: unresolved import `sys::position`
  --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/cursor.rs:51:9
   |
51 | pub use sys::position;
   |         ^^^^^^^^^^^^^ no `position` in `cursor::sys`
…
Full output
   Compiling crossterm v0.26.1
error[E0432]: unresolved import `sys::position`
  --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/cursor.rs:51:9
   |
51 | pub use sys::position;
   |         ^^^^^^^^^^^^^ no `position` in `cursor::sys`

error[E0432]: unresolved import `sys::supports_keyboard_enhancement`
   --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/terminal.rs:101:9
    |
101 | pub use sys::supports_keyboard_enhancement;
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no `supports_keyboard_enhancement` in `terminal::sys`

error[E0425]: cannot find value `source` in this scope
  --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/event/read.rs:24:22
   |
24 |         let source = source.ok().map(|x| Box::new(x) as Box<dyn EventSource>);
   |                      ^^^^^^ a field by this name exists in `Self`

error[E0425]: cannot find function `enable_raw_mode` in module `sys`
   --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/terminal.rs:122:10
    |
122 |     sys::enable_raw_mode()
    |          ^^^^^^^^^^^^^^^ not found in `sys`

error[E0425]: cannot find function `disable_raw_mode` in module `sys`
   --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/terminal.rs:129:10
    |
129 |     sys::disable_raw_mode()
    |          ^^^^^^^^^^^^^^^^ not found in `sys`

error[E0425]: cannot find function `size` in module `sys`
   --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/terminal.rs:136:10
    |
136 |     sys::size()
    |          ^^^^ not found in `sys`

error[E0046]: not all trait items implemented, missing: `eval`
  --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/event/filter.rs:52:1
   |
6  |     fn eval(&self, event: &InternalEvent) -> bool;
   |     ---------------------------------------------- `eval` from trait
...
52 | impl Filter for EventFilter {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing `eval` in implementation

error[E0308]: mismatched types
   --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossterm-0.26.1/src/terminal.rs:106:33
    |
106 | pub fn is_raw_mode_enabled() -> Result<bool> {
    |        -------------------      ^^^^^^^^^^^^ expected `Result<bool, Error>`, found `()`
    |        |
    |        implicitly returns `()` as its body has no tail or `return` expression
    |
    = note:   expected enum `std::result::Result<bool, std::io::Error>`
            found unit type `()`

Some errors have detailed explanations: E0046, E0308, E0425, E0432.
For more information about an error, try `rustc --explain E0046`.
error: could not compile `crossterm` (lib) due to 8 previous errors

The only relevant hits I found on the web are this SO and this reddit comment, seemingly both by @llalma in Oct '22 (before #6050, and referring to Polars 0.25.0).

Searching "sys::position" in Discord turned up this useful convo between @dimasukr and @universalmind303 from Apr '22, which led me to the default_features = false fix above 🎉.

Here's a sketch of an MWE:

Cargo.toml
[package]
name = "polars-wasm-test"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
polars = { version = "*", default-features = false }
wasm-bindgen = { version = "0.2.87", features = ["serde-serialize"] }
src/lib.rs
use polars::prelude::*;
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn df_test() -> JsValue {
    let df = DataFrame::default();
    assert!(df.is_empty());
    true.into()
}

I haven't tried to characterize which features can be added back in; in Discord, @dimasukr said:

Of default features only ipc, fmt don't compile, also dtype-full compiles

I'm only using features = ["csv"], which seems to compile, though reading/writing from the filesystem fails at runtime (as expected).

That basically unblocks me, but it took a while to figure out. Hopefully the above is accurate / helps future searchers. Updates/Corrections from more knowledgeable folks here also welcome!

@lorepozo
Copy link
Contributor

lorepozo commented Oct 9, 2023

Currently, the CI checks each feature for wasm build-ability with exceptions in crates/Makefile#check-wasm such as fmt.

Though as you've noticed successful compilation does not imply full support (e.g. file system code compiles but panics at runtime if used). I've used polars on CSVs in wasm by reading raw CSV contents separately, loading it via arrow2 and using TryInto<DataFrame>.

@HarukiUchito
Copy link

HarukiUchito commented Oct 31, 2023

Hello @lorepozo
It would be really helpfull if you provide me a sample of doing this.

I've used polars on CSVs in wasm by reading raw CSV contents separately, loading it via arrow2 and using TryInto<DataFrame>.

@alicja-januszkiewicz
Copy link
Contributor

alicja-januszkiewicz commented Nov 5, 2023

Hello @lorepozo It would be really helpfull if you provide me a sample of doing this.

I've used polars on CSVs in wasm by reading raw CSV contents separately, loading it via arrow2 and using TryInto<DataFrame>.

You could use the include_bytes macro to include the bytes of the file at compile time. I'm not sure how you'd proceed via arrow2, but instead you could use polars_io::csv::CsvReader to convert those bytes into a polars dataframe:

use std::io::Cursor;
use polars_io::csv::CsvReader;
use polars_io::SerReader;
let bytes_array = include_bytes!("path/to/file.csv");
let rdr = CsvReader::new(Cursor::new(&bytes_array[..]));
let df = rdr.finish().expect("csv reader error");

// log first row to console
use web_sys::console;
let row = df.get_row(0).expect("get_row error").0;
console::log_1(&format!("{:?}", row).into());

I think you could also use reqwest to fetch the file at runtime instead of including it at compile time but I haven't tried this.

@HarukiUchito
Copy link

@alicja-januszkiewicz
Thank you, your example helped me a lot!

@nleroy917
Copy link

nleroy917 commented Jan 6, 2024

I'm having issues with this. polars is a dependency of a dependency, which is causing my build to fail. I tried the suggestion by @ryan-williams:

tl;dr: default_features = false in Cargo.toml allowed my wasm-pack build --target web to succeed:

but under a patch in my Cargo.toml:

[patch.crates-io]
polars = { version = "*", git = "https://github.com/pola-rs/polars", branch = "main", default-features = false, features=["csv"]}

That didn't work. I still get the error: error[E0425]: cannot find function enable_raw_modein modulesys``

Anyone gotten it to work when polars is a sub-dependency?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests