Skip to content

Latest commit

 

History

History
127 lines (74 loc) · 9.87 KB

README.md

File metadata and controls

127 lines (74 loc) · 9.87 KB

whisper.cpp for ScribeAR

See the original repo for README of whisper.cpp

A Primer on WASM

What is WASM?

To paraphrase wikipedia, WebAssembly (WASM) was created to let us to run code at "near-native" speed on the front-end.

WASM achieves this by creating a binary-format, low-level, compiled language that can be directly executed by a browser. Developers would code in a high level language, i.e. C, then use a special compiler to compile their code to WASM code, which can then be served to the frontend and ran. (Constrast this with javascript, which is sent as plain text to the browser and interpreted)

Since WASM is an open standard, many compiler toolchains exist. For ScribeAR we chose emscripten, a gcc-like C / C++ to WASM compiler - mainly because whisper.cpp chose that.

How does Emscripten work?

Refer to the MDN and Emscripten official documentations for more juicy info. We recommend going through this tutorial first to get a feeling of it running.

Similar to gcc, emscripten takes in a bunch of .c or .cpp files, and compiles them into a single executable .wasm file. However, since the .wasm file must be able to intereact with a webpage (and the browser at large), it also generates a 'glue' .js file that loads and supports the WASM code. Optionally, it can also generate a demo .html file that runs the WASM code, but we will soon see how to run the WASM code in our own webpage.

A typical compilation result

Also similar to gcc, what exactly emscripten outputs can be controlled with the -o flag.

The demo hello.html file just runs the WASM code and print out the output. How does it do that? If you dig into it, you should see two <script> elements. The first sets up a global object called Module with some members, and the second one just includes the glue hello.js file.

This Module object serves as an interface between hello.html (and our js code in general) and the WASM code. Recall that when an html file is loaded, the (non-module non-async) scripts are ran in order. Thus, the first <script> runs, initializes the Module objects, and populate it with values and callbacks. Then, the second <script> runs hello.js, which reads from Module to get its arguments, then loads and runs the WASM code (specifically its main function) using them, and finally store everything in Module (making it an WASM instance). Thus, hello.html can pass data to WASM, and WASM can pass data back.

(More specifically, the print member of Module serves as a stdout redirect, so to speak. It is called whenever the WASM code tries to print to stdout. See here for a full specification of Module)

You may realize that there are two major problems with how WASM is ran so far:

  1. It relies on <script> tags executed in order, which doesn't work once we move from plain html files to something like React
  2. There is no way to directly call a C function in our JS code, or vice versa (print is called implicitly when we printf in C)

There is also a more hidden third problem - What happens when we step it up and introduce pthreads to our C program?

We will see how all of these can be solved in the following sections on modularize, binding stuff, and web workers.

WASM code as a JS Module

As we just said, relying on <script> tags limit what we can do with WASM quite a lot. Luckily, there is an emscripten option aptly named MODULARIZE that outputs hello.js as a JS module exporting a constructor for Module, which can be ran anywhere at any time.

(This is a good place to introduce the myriad of options emscripten has, which are helpfully listed on this very hidden website. To enable an option add -s OPTION to the emcc command)

To use Modularize without a build error, we must also use EXPORT_NAME='name' to rename the constructor, and build to a .js file. This is because emscripten's default .html template is not designed for modularized WASM.

We also highly recommend using the EXPORT_ES6 option, which lets you statically import ... from the .js file (It generates the .js files as an ES6 rather an UMD module).

To sum it up, we make a modularized version of the hello.c file:

$ emcc -s MODULARIZE -s EXPORT_NAME='makeHello' -s EXPORT_ES6 -o hello.html hello.c

Which yields the hello.js file below:

(You may also noticed that the js file has been minified. We recommend using tools such as Prettier to un-minify it for manual manipulations)

To use this module in our code, we import makeHello, call it with an object containing arguments, and it will return a fully populated WASM instance. Notice that the constructor is async, so you would need to do .then((module) => {...}) or something similar.

For example, if we were to modify the default hello.html to use the modularized hello.js, we would get something like this:

Notice that we have to use dynamic import here since this is a script. In a proper ES6 module we could just do:

Calling C functions in JS, and JS functions in C

WIP, see here

Threading in WASM

WASM implements pthreads using web workers. See here for more info.

To share memory between worker threads WASM uses SharedArrayBuffer, which is disabled by default by browsers due to security risks. To enable it an website must:

  1. Be in a secure context
  2. Be cross-origin isolated

See here for a more details. To cross-origin isolate your site you need to modify your response header, or use this hack to modify it on the client side.

Note that in contrast to pthread_create, to create a web worker we need to call new worker() with the URL of a separate worker js script. As a result, compiling with the USE_PTHREADS option generates an additional *.worker.js file.

This also causes issues with webpack, as emscripten does not expect the worker and main js files to be bundled. In particular two functions need to be manually fixed:

  • The main .js calls new Worker(url) to create web workers. However, in Webpack4 worker-loader must be used instead, and in Webpack5 the new Worker(new URL(...)) syntax must be used instead.
  • The worker calls importScripts() to run the main .js file, which is largely broken by webpack5. If your main file is modularized you need to import and call the constructor instead.

On a side note, SINGLE_FILE option makes emscripten embed the .wasm file into the .js file as a blob, which also helps dealing with webpack.

Building WASM Whisper for ScribeAR

You will need to first download and install emscripten. You will also need cmake.

This instance of Whisper is built from source code in /examples/whisper.wasm. Go into whisper.cpp/ and do:

mkdir build & cd build
emcmake cmake ..
make libmain

This compiles whisper into libmain.js and libmain.worker.js in build/bin (libmain.worker.js is ran by the web worker). Copy them both into src/components/api/whisper.

(If you are reading this guide for your own React project, make sure to copy them into the same folder. This is important because we will hardcode some relative paths in a moment which will break if they are in separate folders.)

What did we change to make WASM interface with React?

We made a few changes to the CMakeLists.txt scripts to make the WASM whisper build interface with ScribeAR (Or React.js + Webpack5 app in general) properly.

Changes to /examples/whisper.wasm/CMakeLists.txt :

  • Added MODULARIZE, EXPORT_NAME='makeWhisper', and EXPORT_ES6 to modularize whisper
  • Added ENVIRONMENT=web,worker to build for a browser environment (as opposed to backend node.js environment)

Changes to ScribeAR:

  • coi-serviceworker.js was modified to be typescript compliant, and ran by the app to give us access to SharedArrayBuffer for threading
  • Adapter code in index.html is adopted into whisperRecognizer to instantiate and run the WASM module

To elaborate on the last point, if you want to use libmain.js in your own project you need to do the following:

  • Import makeWhisper from libmain.js
  • The following functions are exposed by the WASM module to the js code: init for loading a ggml module into Whisper, and full_default for transcribing a piece of audio. You can find their signatures in the emscripten.c file in the whisper.wasm folder
  • To let the WASM module pass data (in particular transcript) back to the js code, redirect its stderr (see above to see how)
  • We recommend referring to index.html to see exactly how these functions are used to create a complete web app