Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package STT wasm for npm publishing #2265

Merged
merged 3 commits into from
Jul 28, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,9 @@ jobs:
with:
fetch-depth: 0
submodules: 'recursive'
- uses: actions/setup-node@v2
with:
node-version: 12
- name: Install dependencies
run: |
apt-get update
Expand All @@ -477,10 +480,17 @@ jobs:
- run: ./ci_scripts/tf-setup.sh
- run: ./ci_scripts/wasm-build.sh
- run: ./ci_scripts/wasm-package.sh
- run: BAZEL_WASM_EXTRA_FLAGS="--//native_client:wasm_emit=es6" ./ci_scripts/wasm-build.sh
- run: make -C native_client/wasm clean pack
shell: bash
- uses: actions/upload-artifact@v2
with:
name: "libstt.tflite.wasm.zip"
path: ${{ github.workspace }}/artifacts/libstt.zip
- uses: actions/upload-artifact@v2
with:
name: "libstt.tflite.wasm.es6.tgz"
path: ${{ github.workspace }}/native_client/javascript/stt-*.tgz
ccoreilly marked this conversation as resolved.
Show resolved Hide resolved
test-cpp-Linux:
name: "Lin|Test C++ binary"
runs-on: ubuntu-20.04
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
/runs
/logs
/exports
/artifacts
/data/ldc93s1
/native_client/setup.cfg
/native_client/build
Expand All @@ -29,6 +30,8 @@
/native_client/python/dist
/native_client/python/impl.py
/native_client/python/impl_wrap.cpp
/native_client/wasm/dist
/native_client/wasm/package.json
/doc/.build/
/doc/xml-c/
/doc/xml-java/
Expand Down
4 changes: 3 additions & 1 deletion ci_scripts/package-utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -133,5 +133,7 @@ package_libstt_wasm()

${ZIP} -r9 --junk-paths "${artifacts_dir}/${artifact_name}" \
${stt_dir}/native_client/kenlm/COPYING \
${tensorflow_dir}/bazel-bin/native_client/stt_wasm_bindings/*
${tensorflow_dir}/bazel-bin/native_client/stt_wasm_bindings/stt_wasm.js \
${tensorflow_dir}/bazel-bin/native_client/stt_wasm_bindings/stt_wasm.worker.js \
${tensorflow_dir}/bazel-bin/native_client/stt_wasm_bindings/stt_wasm.wasm
}
2 changes: 1 addition & 1 deletion ci_scripts/wasm-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ BAZEL_OPT_FLAGS="--copt=-pthread --copt=-fexceptions"
# Bazel caching and emsdk do not play nice together: unless path
# is explicitly passed, emsdk would end up using an old version of
# Python which does not support f-strings, making build fail.
BAZEL_EXTRA_FLAGS="${BAZEL_EXTRA_FLAGS} --action_env=PATH"
BAZEL_EXTRA_FLAGS="${BAZEL_EXTRA_FLAGS} ${BAZEL_WASM_EXTRA_FLAGS} --action_env=PATH"
BAZEL_BUILD_FLAGS="${BAZEL_OPT_FLAGS} ${BAZEL_EXTRA_FLAGS}"
SYSTEM_TARGET=

Expand Down
57 changes: 39 additions & 18 deletions native_client/BUILD
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Description: Coqui STT native client library.

load("@bazel_skylib//rules:common_settings.bzl", "string_flag")
load("@org_tensorflow//tensorflow:tensorflow.bzl", "lrt_if_needed")
load("@com_github_nelhage_rules_boost//:boost/boost.bzl", "boost_deps")
load("@build_bazel_rules_apple//apple:ios.bzl", "ios_static_framework")
Expand Down Expand Up @@ -352,6 +353,37 @@ cc_binary(
linkopts = DECODER_LINKOPTS,
)

DECODER_LINKOPTS_WASM = DECODER_LINKOPTS + [
"-Os",
reuben marked this conversation as resolved.
Show resolved Hide resolved
"--bind",
"-sWASM=1",
# Allow to grow memory allocation when needed (for example for loading models).
"-sALLOW_MEMORY_GROWTH",
"-sMAXIMUM_MEMORY=4GB",
"-sMALLOC=emmalloc",
"-sMODULARIZE=1",
"-sEXPORT_NAME=STT",
# We need to specify a threadpool size, otherwise TFLite will just do nothing
# and deadlock.
"-sPTHREAD_POOL_SIZE=4",
# This is a library, so no 'main' is expected.
"--no-entry",
# Experiencing problems? Uncomment the flags below to have better
# error messages and sanity checks.
# "-sASSERTIONS=2",
# "-g3",
# "-gsource-map",
]

string_flag(name = "wasm_emit", build_setting_default = "wasm_es5")

config_setting(
name = "wasm_es6",
flag_values = {
":wasm_emit": "es6",
}
)

cc_binary(
# Note that the .js suffix is significant here. See the `-o`
# option at https://emscripten.org/docs/tools_reference/emcc.html
Expand All @@ -361,24 +393,13 @@ cc_binary(
],
deps = [":decoder", ":coqui_stt_bundle"],
copts = ["-std=c++14", "-fno-exceptions", "-fwrapv", "-pthread"],
linkopts = DECODER_LINKOPTS + [
"--bind",
"-sWASM=1",
# Allow to grow memory allocation when needed (for example for loading models).
"-sALLOW_MEMORY_GROWTH",
"-sMAXIMUM_MEMORY=4GB",
"-sMALLOC=emmalloc",
# We need to specify a threadpool size, otherwise TFLite will just do nothing
# and deadlock.
"-sPTHREAD_POOL_SIZE=4",
# This is a library, so no 'main' is expected.
"--no-entry",
# Experiencing problems? Uncomment the flags below to have better
# error messages and sanity checks.
# "-sASSERTIONS=2",
# "-g3",
# "-gsource-map",
],
linkopts = select({
":wasm_es6": DECODER_LINKOPTS_WASM + [
"-sEXPORT_ES6=1",
"-sSINGLE_FILE=1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you kindly add a comment here explaining why, even though we have SINGLE_FILE here, it's producing multiple files? (because of the "pthread" story you mentioned, right?)

Copy link
Collaborator Author

@ccoreilly ccoreilly Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SINGLE_FILE exists since before pthread support and it currently only inlines the wasm bytecode as a base64 string inside the main JS. Pthreads are implemented as separate (smaller) workers which actually load the main JS from within. There are some requests to inline or merge both files when using SINGLE_FILE (see emscripten#9796 and emscripten#14329)

Generally, as noted in the references above, using SINGLE_FILE generates a large asset (larger than the sum of both stt_wasm.js and stt_wasm.wasm), which means longer download and startup times. Blinded by the original goal of making it available via CDN, I thought it would be necessary to inline the wasm binary for ease of use as a library, but I just tested it and webpack can handle the separate asset perfectly (rollup should also be fine; I am not sure about parcel, but I guess it'd also work.)

I will thus remove this flag. At this point we could consider dropping es5 support (ES6 is necessary for bundlers, see emscripten#14135) but ES6 imports are still not supported in workers under Firefox (reference) so I would still add it to the github releases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I agree with what you outlined. Thank you for the thorough explaination !

],
"//conditions:default": DECODER_LINKOPTS_WASM,
}),
)

wasm_cc_binary(
Expand Down
27 changes: 27 additions & 0 deletions native_client/wasm/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
NPM_TOOL ?= npm
PROJECT_NAME ?= stt-wasm
PROJECT_VERSION ?= $(shell cat ../../training/coqui_stt_training/VERSION | tr -d '\n')

clean:
rm -rf ./stt-wasm-*.tgz package.json
rm -rf dist

package.json: package.json.in
sed \
-e 's/$$(PROJECT_NAME)/$(PROJECT_NAME)/' \
-e 's/$$(PROJECT_VERSION)/$(PROJECT_VERSION)/' \
package.json.in > package.json && cat package.json

dist:
mkdir -p dist

dist/stt_wasm.js: dist ../../tensorflow/bazel-bin/native_client/stt_wasm_bindings/stt_wasm.js
cp ../../tensorflow/bazel-bin/native_client/stt_wasm_bindings/stt_wasm.js dist

dist/stt_wasm.worker.js: dist ../../tensorflow/bazel-bin/native_client/stt_wasm_bindings/stt_wasm.worker.js
cp ../../tensorflow/bazel-bin/native_client/stt_wasm_bindings/stt_wasm.worker.js dist

build: package.json dist/stt_wasm.js dist/stt_wasm.worker.js

pack: build
${NPM_TOOL} pack
ccoreilly marked this conversation as resolved.
Show resolved Hide resolved
14 changes: 14 additions & 0 deletions native_client/wasm/package.json.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"name" : "$(PROJECT_NAME)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the packaged library to see if it's possible to use it from within an NPM-based project?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test it with an example I would like to contribute eventually to stt-examples but I think it is a good idea to test it in the CI pipeline as well. I'll add a test step for it.

"version" : "$(PROJECT_VERSION)",
"description": "A Webassembly build for doing speech recognition using a Coqui STT model",
"main": "./dist/stt_wasm.js",
"files": [
"dist"
],
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "Coqui GmbH",
ccoreilly marked this conversation as resolved.
Show resolved Hide resolved
"license": "LGPL-2.1-only"
}