Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: burn-train in the browser #938

Closed
wants to merge 87 commits into from

Conversation

AlexErrant
Copy link
Contributor

Pull Request Template

Checklist

  • Confirm that run-checks script has been executed.

This has never worked on my machine. Gonna lean on CICD for this.

Related Issues/PRs

Closes #921

Changes

Here's part 1, where I replace a spawn in EventStoreClient with a webworker implementation pretty much copied straight from here. The total files changed is pretty large, but most of it is just a new example project I made called examples/train-web. That project uses PNPM and Vite, which stylistically diverges from the minimalistic examples/mnist-inference-web. I do so to more accurately reflect the usage in a real-world project that uses Typescript and NPM packages.

To make this PR easier to review, I'm following the atomic commit style and will refrain from force-pushing this branch.

Testing

I have a branch here that demos sender.send working. Console output as follows:

train.js:325 INFO examples/train-web/train/src/lib.rs:17 
Hello from Rust
train.js:325 INFO burn-train/src/metric/store/client.rs:110 
Got msg: OnEventTrain(EndEpoch(1337))
train.js:325 INFO burn-train/src/metric/store/client.rs:110 
Got msg: OnEventValid(EndEpoch(314159))
train.js:325 INFO burn-train/src/metric/store/client.rs:110 
Got msg: End

Next step

Either getting data loading of mnist working or exploring what it'll take to write to the filesystem.

Copy link

codecov bot commented Nov 10, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (8686082) 84.45% compared to head (1bddd63) 85.55%.
Report is 24 commits behind head on main.

❗ Current head 1bddd63 differs from pull request most recent head a527fae. Consider uploading reports for the commit a527fae to get more accurate results

Files Patch % Lines
burn-core/src/module/base.rs 0.00% 7 Missing ⚠️
burn-train/src/learner/builder.rs 0.00% 1 Missing ⚠️
burn-train/src/metric/processor/full.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #938      +/-   ##
==========================================
+ Coverage   84.45%   85.55%   +1.09%     
==========================================
  Files         546      509      -37     
  Lines       61507    53925    -7582     
==========================================
- Hits        51946    46133    -5813     
+ Misses       9561     7792    -1769     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add the instruction to install cargo-watch, otherwise there is an error.
I think an important step is have a way to import datasets easily, at least for the example to work.

@nathanielsimard
Copy link
Member

@AlexErrant Thanks a lot for the video, it helps reviewing the PR. See my comments bollow:

  1. We need to support Rust stable on the CI. While testing on nightly or requiring nightly to run the example is acceptable, switching to nightly by default is not an option.

  2. Regarding spawn: I would prefer using a trait instead of FnOnce for a cleaner implementation:

trait AsyncTask {
    fn join(self: Box<Self>) -> Result<(), AsyncTaskError>;
}
pub type AsyncTaskBoxed = Box<dyn AsyncTask>;

You can then use AsyncTaskBoxed where the type is needed.

  1. Concerning the changes in runchecks, many aspects seem "patched" to support nightly. However, for simplification, I suggest avoiding changes in this PR. We should carefully consider a clean way to check parts of the framework that require nightly. I prefer initially not checking them rather than complicating the CI too much. These changes are also easier to review when they are the only modifications in a PR.

While I appreciate your efforts on the CI, since I discovered several small issues, such as not adding --workspace to avoid feature unification, these can be addressed in a subsequent PR.

In conclusion, please avoid updating the CI for this PR. Just incorporate the example and the new way of launching threads.

@AlexErrant
Copy link
Contributor Author

Sorry it's been a while - I had to take care of some things in real life.

I believe this PR requires changes to the CI because it adds the wasm-bindgen-rayon dependency which in turn requires nightly. Due to feature unification, this means nightly must be used even if we aren't using the browser feature. This can be seen in the above CI run:

Did you forget to enable atomics and bulk-memory features as outlined in wasm-bindgen-rayon README?

This is impossible to do because it's on stable.

I tried adding AsyncTaskBoxed as you suggested here. Please LMK if it's not in line with what you were envisioning.

@antimora antimora added feature The feature request enhancement Enhance existing features and removed feature The feature request labels Jan 31, 2024
@L-M-Sherlock
Copy link

How to build on this branch? I run this command:

RUSTFLAGS='-C target-feature=+atomics,+bulk-memory,+mutable-globals' cargo +nightly build --target wasm32-unknown-unknown -Z build-std=panic_abort,std

But I get these errors:

error: failed to run custom build command for `zstd-sys v2.0.9+zstd.1.5.5`

and

warning: zstd-sys@2.0.9+zstd.1.5.5: error: unable to create target: 'No available targets are compatible with triple "wasm32-unknown-unknown"'

@Luni-4
Copy link
Collaborator

Luni-4 commented Feb 2, 2024

@AlexErrant

Is it possible to use Trunk to build this project? It might simplify the building phase and CI

@AlexErrant
Copy link
Contributor Author

AlexErrant commented Feb 3, 2024

@L-M-Sherlock are you sure you're building burn/examples/train-web/train? Your command works for me there. (Though FWIW, I've been running ./dev.sh.) When I run cargo tree | grep zstd from that dir I get no results; but when I run it from the top-most burn dir I see its a dependency for burn-tch, which I've no intention of running in the browser. Note that if you want to observe rayon actually working in the browser, you'll need to add some minor changes as shown in the rayon-interleaved branch.

@Luni-4 Trunk looks interesting. It has a library mode, which I think is more relevant for my purposes. Since I'm not building a Rust WASM application (it's JS), I'm not entirely sure if Trunk provides significant advantages for my example project. It has two features: dev server and change detection. My example project currently uses Vite (since I'm (sigh) a boring webdev) which provides a dev server and change detection in JS. For Rust change detection, I'm using cargo-watch. An advantage of cargo-watch is that it can be used for non-wasm targets. It's also entirely optional. Vite serves as the build tool, and I don't think Trunk will simplify the CI process since the vast majority of the complexity is going to come from adding nightly.

@Luni-4
Copy link
Collaborator

Luni-4 commented Feb 3, 2024

@AlexErrant I think you were writing something similar to this example for what concerns the wasm part, as you are importing wasm dependencies in burn/train . Obviously this part cannot be useful for a JS-only application

@L-M-Sherlock
Copy link

are you sure you're building burn/examples/train-web/train?

It works in building burn/examples/train-web/train. But the ./run-checks.sh is in the top-most dir. I don't know how to pass the CI check.

@AlexErrant
Copy link
Contributor Author

AlexErrant commented Feb 4, 2024

@Luni-4 I'm not sure I understand. Let me see if I can try to re-address your concerns.

Is it possible to use Trunk to build this project? It might simplify the building phase and CI

I don't think Trunk will (significantly) simplify the building phase as that's handled by Vite. Trunk is meant for Rust web apps, and my web app is JS. CI requires nightly, which is where the majority of the complexity will come from. As far as I can tell the only thing Trunk will do for me is automatically download and install wasm-bindgen, which isn't that great a benefit as users will first need to download and install Trunk. So really we're just replacing one download with another.

I think you were writing something similar to this example for what concerns the wasm part, as you are importing wasm dependencies in burn/train

That example is a Rust web app, while my example project is JS. burn/examples/train-web/train is a library used by burn/examples/train-web/web, a JS web app. burn/examples/train-web/train depends on burn/burn-train which in turn depends on wasm-bindgen-rayon, but it's still just a library (not a Rust WASM application) at the end of the day. This difference (library vs application) is the main reason why I'm unsure what benefits Trunk provides to my example project.


@L-M-Sherlock ah yeah, this PR currently fails CI (as evidenced by the many ❌s). I was working on getting CI passing, primarily by splitting up the build so feature unification doesn't occur, but ended up discarding all those changes when @nathanielsimard said "please avoid updating the CI for this PR". However, due to "this PR requires changes to the CI because it adds the wasm-bindgen-rayon dependency which in turn requires nightly", I think I'll need to go back to fiddling with CI. Waiting for approval though to avoid wasting everyone's time :)

@Luni-4
Copy link
Collaborator

Luni-4 commented Feb 4, 2024

@AlexErrant

My concerns were born from the fact that you are building wasm in your code, I can see a shell script which compiles your example for the wasm32 target, so emitting WebAssembly as output.
Since trunk automatically manages wasm-bindgen-cli and it also offers JavaScript interoperability, I thought it would have been possible to use that framework to simplify the building process, and thus solve some CI problems. That's why I had asked the question, to figure out if it was feasible, but since your use-case is different, that's fine for me.

@nathanielsimard
Copy link
Member

This pull request has expanded to the extent that I am not comfortable merging it. It interacts with numerous parts of the framework, and I am concerned that merging it might introduce excessive technical debt and slow down the development of other areas of the framework.

I appreciate @AlexErrant for exploring the training of models on wasm, but I believe the scope is somewhat extensive for a single pull request. There is still valuable insight to be gained from this experiment, so I have created issues to track progress on better supporting wasm (#1256).

Ultimately, I believe we need to establish a robust web worker implementation that functions effectively with Rust stable before attempting to support complex workflows on wasm, aside from inference. This is the primary issue to address in enhancing our wasm support (#1250).

Thank you once again for your hard work. Not all pull requests are meant to be merged to add value to the project, and this one is a perfect example. Gaining knowledge and exploring new use cases is highly valuable, and I appreciate your contributions in that regard.

@mfranzs
Copy link

mfranzs commented Feb 4, 2024

Hi - to confirm the current state of this, is burn-train not supported in the browser currently? That may be the cause of my issue in #1257.

Thank you!

@nathanielsimard
Copy link
Member

No, burn-train does not currently support wasm. Right now, if you want to train on wasm, you would need to write your own training loop. This is because burn-train extensively relies on the file system, multi-threading, and sqlite for dataset loading, with other functionalities that are not compatible with wasm.

@AlexErrant
Copy link
Contributor Author

Just to make sure we're on the same page - this PR demoed burn-train working with just multi-threading (via web workers). Sqlite was not required for training to work. (I did use sql.js, but the data could have just as easily been provided via JSON/CSV/uint8array.) I didn't use the file system either, though I added a (trivial) to_bytes method to the model since the only other way to do model saving is to disk.

As a closing thought, since we probably want atomics for communication between web workers, I don't think that'll be landing in stable for quite some time. Relevant issue: rust-lang/rust#77839

@AlexErrant
Copy link
Contributor Author

FYI I more or less finished a proof-of-concept of getting the non-trivial project https://github.com/open-spaced-repetition/fsrs-rs training in the browser https://github.com/AlexErrant/fsrs-browser

I git submoduled burn to get the above working https://github.com/AlexErrant/burn/tree/fsrs-browser I've no intention of maintaining it for anyone else's use; this message simply serves as an FYI so other people can copy me if they wish. In particular, fsrs-rs uses burn v0.11.1, and I don't intend to update that project unless fsrs-rs also updates.

Notably for me, training with ndarray is viable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhance existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

burn-train in the browser
6 participants