Stream weights to the GPU when loading a model #7994

mattsoulanille · 2023-10-09T17:24:58Z

When downloading model weight data, slice it into weight tensors and push them to the GPU eagerly. This avoids storing an extra copy of the weights on CPU, allowing for larger models (1.3B to possibly ~6.7B or larger) to be loaded without causing a V8 OOM crash.

When streaming the weights, check CPU_HANDOFF_SIZE_THRESHOLD or WEBGPU_CPU_HANDOFF_SIZE_THRESHOLD to determine whether the weight should be sent to GPU or remain on CPU.

This feature is guarded by the streamWeights option in LoadOptions. Since most of TFJS's graph model saving relies on the CPU copy of the model, model saving is disabled when the model was streamed (i.e. it will throw an error since the weights ArrayBuffer is missing).

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

pyu10055 · 2023-11-15T23:53:40Z

tfjs-core/src/io/http.ts

+    return fetchURLs;
+  }
+
+  // private get loadOptions(): LoadOptions {


pyu10055 · 2023-11-16T00:02:20Z

tfjs-core/src/io/types.ts

+   * Whether to stream the model directly to the backend or cache all its
+   * weights on CPU. Useful for large models.
+   */
+  streamWeights?: boolean;


this flag is the same name as the function streamWeights?: () => ReadableStream<ArrayBuffer>?

Yes, but they're part of different interfaces. One is on the ModelArtifacts and is the actual function that, when called, will start streaming the weights. It's internal, and users shouldn't really need to access it.

The other is on the LoadOptions interface, and it's used to configure whether the model loader should stream the weights or load them normally. It's exposed to the user.

I can rename the one on ModelArtifacts to something like startWeightsStream or streamWeightsData.

Linchenn

Thanks Matt!

mattsoulanille force-pushed the weights_stream branch from bf9c6fa to af89cf0 Compare November 14, 2023 18:56

mattsoulanille and others added 11 commits November 14, 2023 23:56

Stream weights

9453536

Fix not passing loadOptions

fe23272

Test graph model weights streaming

149460d

Only upload to GPU if it's above the cpu forwarding threshold

7e22599

fix lint

ae7bd51

Refactor decodeWeights

c4a0967

Fix string tensor weight reading

cd8ce23

Test decodeWeightsStreaming

12a22ee

Fix lint

eac3e2d

Do not check webgpu handoff if not using webgpu

03f504c

Remove commented line

ed74c0e

mattsoulanille force-pushed the weights_stream branch from 5c93720 to ed74c0e Compare November 15, 2023 07:56

mattsoulanille marked this pull request as ready for review November 15, 2023 16:57

mattsoulanille requested review from pyu10055 and Linchenn November 15, 2023 16:57

pyu10055 approved these changes Nov 16, 2023

View reviewed changes

mattsoulanille added 4 commits November 15, 2023 16:15

Remove commented code

7685f1a

Rename streamWeights to getWeightStream

c036afc

Formatting

a987e88

Apply streamWeights -> getWeightStream rename to converter

f4baf51

Linchenn approved these changes Nov 20, 2023

View reviewed changes

Merge branch 'master' into weights_stream

c2b6559

mattsoulanille merged commit e2ba43c into tensorflow:master Nov 28, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream weights to the GPU when loading a model #7994

Stream weights to the GPU when loading a model #7994

mattsoulanille commented Oct 9, 2023

pyu10055 Nov 15, 2023

pyu10055 Nov 16, 2023

mattsoulanille Nov 16, 2023

Linchenn left a comment

Stream weights to the GPU when loading a model #7994

Stream weights to the GPU when loading a model #7994

Conversation

mattsoulanille commented Oct 9, 2023

pyu10055 Nov 15, 2023

Choose a reason for hiding this comment

pyu10055 Nov 16, 2023

Choose a reason for hiding this comment

mattsoulanille Nov 16, 2023

Choose a reason for hiding this comment

Linchenn left a comment

Choose a reason for hiding this comment