Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark GPT-tfjs #659

Merged
merged 17 commits into from
May 1, 2024
Merged

Benchmark GPT-tfjs #659

merged 17 commits into from
May 1, 2024

Conversation

JulienVig
Copy link
Collaborator

@JulienVig JulienVig commented Apr 16, 2024

  • CLI benchmark of gpt-tfjs. Closes Benchmark gpt-tfjs integration #657
  • Fixes a bunch of issues as I uncover them, memory leaks, training loop end condition, clean the GPT code.
  • Bumps vue-tsc version

Training

Benchmark on a 2022 MacBook Air M2 with 16GB of RAM.
To reproduce, check out 58f018f and run npm -w cli run benchmark_gpt -- --contextLength 128 --batchSize 8 for example.

Time per token is obtained by measuring the time of 10 training update iterations and diving by (batch size * context length)
Memory values are the max memory allocated between the attention mechanism and the memory after computing the gradients. So far, the attention mechanism always had higher memory requirements. The actual peak memory allocated during training may be different but tfjs doesn't let us get this information easily.

I leave empty - cells where I deemed the benchmark too slow to perform. If needed, missing values can be extrapolated.

gpt-nano:

  • 2.50M parameters
  • number of layers: 3
  • number of heads: 3
  • embedding dimension: 48
gpt-nano batch_size=8 batch_size=16 batch_size=32 batch_size=64
context_length=128 0.53 ms/token
0.33 GB
0.60 ms/token
0.56 GB
0.82 ms/token
1.12 GB
1 ms/token
2.18 GB
context_length=256 0.53 ms/token
0.64 GB
0.68 ms/token
1.22 GB
1.06 ms/token
2.36 GB
1.81 ms/token
4.66 GB
context_length=512 0.72 ms/token
1.42 GB
1.08ms/token
2.75 GB
2 ms/token
5.42 GB
-
context_length=1024 1.24 ms/token
3.56 GB
2.10 ms/token
6.98 GB
- -
context_length=2048 2.47 ms/token
10.2 GB
- - -

gpt-micro:

  • 7.23M parameters
  • number of layers: 4
  • number of heads: 4
  • embedding dimension: 128
gpt-micro batch_size=8 batch_size=16 batch_size=32
context_length=128 0.84 ms/token
0.6 GB
0.77 ms/token
1 GB
1.08 ms/token
1.86 GB
context_length=256 0.77 ms/token
1.1 GB
1.07 ms/token
2 GB
1.57 ms/token
3.8 GB
context_length=512 1.09 ms/token
2.3 GB
2.12 ms/token
4.4 GB
-
context_length=1024 2.11 ms/token
5.8 GB
- -

gpt-mini:

  • 12.32M
  • number of layers: 6
  • number of heads: 6
  • embedding dimension: 192
gpt-mini batch_size=8 batch_size=16
context_length=128 1.3 ms/token
1 GB
1.12 ms/token
1.75 GB
context_length=256 1.17 ms/token
1.9 GB
1.43 ms/token
3.5 GB

gpt2:

  • number of layers: 12
  • number of heads: 12
  • embedding dimension: 768
gpt2 batch_size=8
context_length=128 7.56 ms/token
7.7 GB
context_length=256 6.83 ms/token
12.7 GB

Comparisons

Using the Python nanoGPT benchmark script on the same machine, I get the following comparisons between Python and JS:

gpt-nano gpt-tfjs python (nanoGPT repo)
batch size=8 and context_length=128 0.53 ms/tokens 0.17 ms/tokens
batch size=32 and context_length=512 2 ms/tokens 0.29 ms/tokens

Inference

Run npm -w cli run benchmark_gpt -- --inference --modelPath <path to trained model>
For gpt-nano trained with context length 128, inference time averages between 6 and 8 ms/token.

WebGPT reports 3 ms/token at 5M parameters, which is between gpt-nano (2.5M) and gpt-micro (7.2M). They also managed to scale up to 1.5B parameters on a M1 Mac with WebGPU.

@JulienVig JulienVig added the documentation Improvements or additions to documentation label Apr 16, 2024
@martinjaggi
Copy link
Member

thanks, very interesting!
would you have a just very rough reference value also for the python equivalent (nanoGPT python)?

@JulienVig
Copy link
Collaborator Author

JulienVig commented Apr 18, 2024

@tharvik I'd be curious to hear your opinion on a few things:

  1. Where do you think we should report the benchmark? I was thinking of all reporting them in this PR and linking it where relevant (e.g. in gpt/config.ts or the GPT class docstring)
  2. Benchmarking performance needs modifying the gpt source code to keep track of memory, do you think it's worth keeping around or leave the benchmark on this branch and not merge it?

@JulienVig JulienVig marked this pull request as ready for review April 18, 2024 12:32
@JulienVig JulienVig requested a review from tharvik April 18, 2024 12:32
Copy link
Collaborator

@tharvik tharvik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superbe! that's very nice to have metrics on what's we're doing, thanks!

  1. Where do you think we should report the benchmark? I was thinking of all reporting them in this PR and linking it where relevant (e.g. in gpt/config.ts or the GPT class docstring)

the nicest thing would be to be able to generate such metrics via the CLI, with the example output of the cmd being what you've here (not the tables I mean, but the same content).

  1. Benchmarking performance needs modifying the gpt source code to keep track of memory, do you think it's worth keeping around or leave the benchmark on this branch and not merge it?

not merging it means that it'll slowly drift off. I think adding the memory usage to the EpochLogs as you did is the way to go. see my comments related to it.

discojs/discojs-core/src/models/index.ts Outdated Show resolved Hide resolved
discojs/discojs-core/src/models/model.ts Outdated Show resolved Hide resolved
docs/examples/benchmark_gpt.ts Outdated Show resolved Hide resolved
discojs/discojs-core/src/models/model.ts Outdated Show resolved Hide resolved
docs/examples/benchmark_gpt.ts Outdated Show resolved Hide resolved
docs/examples/benchmark_gpt.ts Outdated Show resolved Hide resolved
discojs/discojs-core/src/models/gpt/model.ts Outdated Show resolved Hide resolved
@JulienVig JulienVig requested a review from tharvik April 20, 2024 16:43
@JulienVig JulienVig merged commit 2c7fb87 into develop May 1, 2024
23 checks passed
@JulienVig JulienVig deleted the 657-benchmark-gpt branch May 1, 2024 12:29
@JulienVig
Copy link
Collaborator Author

The memory values needs to be slightly updated when #807 is merged

@martinjaggi
Copy link
Member

up or down? ;)

@JulienVig
Copy link
Collaborator Author

A very superficial benchmark showed 10-20% decrease in memory usage!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmark gpt-tfjs integration
3 participants