Fixing recompiles in KV-cache + compile #1663

SalmanMohammadi · 2024-09-24T15:05:44Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Unbeknownst to me, when #1449 landed it broke compatibility with compile. This PR fixes it.

First, let's look at the problem. With torch._logging.set_logs(recompiles=True):

@torch.no_grad()
def fn(x, mod):
    return mod(x).mean()

fn = torch.compile(fn, fullgraph=True, backend="eager") # backend=eager is sufficient to diagnose dynamo issues
x = torch.ones((30)).to(torch.int64).unsqueeze(0)
model.setup_caches(batch_size=1, dtype=torch.bfloat16, decoder_max_seq_len=30)
model(x[:, :4])
for i in range(30):
    print(model.layers[0].attn.kv_cache.size)
    print(fn(x[:, 4+i].unsqueeze(1), model))

4
V0924 15:11:42.775000 98048 torch/_dynamo/guards.py:2830] [0/1] [__recompiles] Recompiling function inner in [/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38](https://file+.vscode-resource.vscode-cdn.net/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38)
V0924 15:11:42.775000 98048 torch/_dynamo/guards.py:2830] [0/1] [__recompiles]     triggered by the following guard failure(s):
V0924 15:11:42.775000 98048 torch/_dynamo/guards.py:2830] [0/1] [__recompiles]     - 0/0: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 4
tensor(-0.0110)
5
V0924 15:11:45.333000 98048 torch/_dynamo/guards.py:2830] [0/2] [__recompiles] Recompiling function inner in [/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38](https://file+.vscode-resource.vscode-cdn.net/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38)
V0924 15:11:45.333000 98048 torch/_dynamo/guards.py:2830] [0/2] [__recompiles]     triggered by the following guard failure(s):
V0924 15:11:45.333000 98048 torch/_dynamo/guards.py:2830] [0/2] [__recompiles]     - 0/1: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 5
V0924 15:11:45.333000 98048 torch/_dynamo/guards.py:2830] [0/2] [__recompiles]     - 0/0: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 4
tensor(-0.0108)
6
V0924 15:11:47.975000 98048 torch/_dynamo/guards.py:2830] [0/3] [__recompiles] Recompiling function inner in [/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38](https://file+.vscode-resource.vscode-cdn.net/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38)
V0924 15:11:47.975000 98048 torch/_dynamo/guards.py:2830] [0/3] [__recompiles]     triggered by the following guard failure(s):
V0924 15:11:47.975000 98048 torch/_dynamo/guards.py:2830] [0/3] [__recompiles]     - 0/2: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 6
V0924 15:11:47.975000 98048 torch/_dynamo/guards.py:2830] [0/3] [__recompiles]     - 0/1: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 5
V0924 15:11:47.975000 98048 torch/_dynamo/guards.py:2830] [0/3] [__recompiles]     - 0/0: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 4
tensor(-0.0106)
7
V0924 15:11:50.650000 98048 torch/_dynamo/guards.py:2830] [0/4] [__recompiles] Recompiling function inner in [/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38](https://file+.vscode-resource.vscode-cdn.net/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38)
V0924 15:11:50.650000 98048 torch/_dynamo/guards.py:2830] [0/4] [__recompiles]     triggered by the following guard failure(s):
V0924 15:11:50.650000 98048 torch/_dynamo/guards.py:2830] [0/4] [__recompiles]     - 0/3: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 7
V0924 15:11:50.650000 98048 torch/_dynamo/guards.py:2830] [0/4] [__recompiles]     - 0/2: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 6
V0924 15:11:50.650000 98048 torch/_dynamo/guards.py:2830] [0/4] [__recompiles]     - 0/1: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 5
V0924 15:11:50.650000 98048 torch/_dynamo/guards.py:2830] [0/4] [__recompiles]     - 0/0: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 4
tensor(-0.0104)
8
...
W0924 15:17:49.365000 98170 torch/_dynamo/convert_frame.py:853] [0/8] torch._dynamo hit config.cache_size_limit (8)
W0924 15:17:49.365000 98170 torch/_dynamo/convert_frame.py:853] [0/8]    function: 'inner' (/Users/salmanmohammadi/projects/torchtune/venv/lib/python3.12/site-packages/torch/_dynamo/external_utils.py:38)
W0924 15:17:49.365000 98170 torch/_dynamo/convert_frame.py:853] [0/8]    last reason: 0/0: L['args'][1]._modules['layers']._modules['0']._modules['attn']._modules['kv_cache'].size == 4
W0924 15:17:49.365000 98170 torch/_dynamo/convert_frame.py:853] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W0924 15:17:49.365000 98170 torch/_dynamo/convert_frame.py:853] [0/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.
tensor(0.0176)
CacheLimitExceeded: cache_size_limit reached

What's going on here? In KVCache.update:

        cache_pos = torch.arange(self.size, self.size + seq_len, device=k_val.device)
        self.size += seq_len

        k_out = self.k_cache
        v_out = self.v_cache

        k_out.index_copy_(2, cache_pos, k_val)
        v_out.index_copy_(2, cache_pos, v_val)

we're using self.size to track the current position of the cache and use it to retrieve subsequent positions, and we're triggering a recompile every time we successively run the graph and increment self.size.

How can we get around this? No dynamism! Let's prefill the whole length of cache_pos we need, and then just index into them correctly. To get around having to keep track of an integer variable of our current cache position, we can just increment our positions every time we update:

        self.register_buffer(
            "cache_pos", torch.arange(0, cache_max_seq_len), persistent=False
        )
        # (0, 1, 2, 3, 4, 5, ...)
        ... 

        # seq_len = 4
        # self.cache_pos[:seq_len] is (0, 1, 2, 3)

        k_out.index_copy_(2, self.cache_pos[:seq_len], k_val)
        v_out.index_copy_(2, self.cache_pos[:seq_len], v_val)

        # forward cache_pos seq_len positions along
        self.cache_pos += seq_len

        # self.cache_pos is now (4, 5, 6, 7, 8, ...)

        return k_out, v_out

It seems kind of weird to do it this way (imagine if I used torch.roll??), but it ends up being very compile friendly. We're initializing cache_pos for the longest possible pre-fill, after which we'll only ever need cache_pos[0] to indicate the current position in the cache for next-token prediction.

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2024-09-24T15:05:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1663

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3110bf9 with merge base 3fddc56 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

tests/torchtune/modules/test_kv_cache.py

SalmanMohammadi · 2024-09-24T15:07:34Z

torchtune/modules/kv_cache.py

-        cache_pos = torch.arange(self.size, self.size + seq_len, device=k_val.device)
-        self.size += seq_len
-
+        assert (self.cache_pos[0] + seq_len) <= self.k_cache.shape[2]


dynamo doesn't like control flow so the error is not as informative : (

i dont 100% understand why, but in previous PRs i was instructed to use ValueError instead of assert. I dont know if this is worth an exception. Probably because assertion msg is not a good as value error

The error message without the assertion is even worse (raising like we did before was ideal)

The message was worse or ValueError is worse? If its ok to use ValueError, can we keep it and use a nicer msg instead?

Sorry, there can't be any branching logic here because dynamo will complain about control flow (this is because we're checking the value of a tensor) so the plain assertion error is the best we can do.

We can do something experimental like torch._check but I'll investigate that another time.

felipemello1

I did a quick pass and didnt think it through if this solution is more performant than the previous one (i worry about all of these pre allocated tensors and their memory consumption). With that being said, would something like below have fixed it?

torch._dynamo.mark_dynamic(self.size)

tests/torchtune/modules/test_kv_cache.py

felipemello1 · 2024-09-24T15:19:34Z

torchtune/modules/kv_cache.py

+
+    @property
+    def size(self) -> int:
+        return self.cache_pos[0].item()


as a rule of thumb, its never good to call item. It is not performant and has issues with torch export

Noob q, how should I be doing this?

felipemello1 · 2024-09-24T15:21:20Z

torchtune/modules/kv_cache.py

-        cache_pos = torch.arange(self.size, self.size + seq_len, device=k_val.device)
-        self.size += seq_len
-
+        assert (self.cache_pos[0] + seq_len) <= self.k_cache.shape[2]


i dont 100% understand why, but in previous PRs i was instructed to use ValueError instead of assert. I dont know if this is worth an exception. Probably because assertion msg is not a good as value error

SalmanMohammadi · 2024-09-24T15:29:02Z

torch._dynamo.mark_dynamic(self.size)

mark_dynamic only works for tensors. I did try using this with a cache_pos that was resized every update but no dice.

torchtune/modules/kv_cache.py

ebsmothers

Very slick. I think the PR summary is really clear, might be good to add a bit more detail in the code itself (and/or docstring), especially around the incrementing of the cache_pos buffer. Personally I had to think about it for a minute

felipemello1

hey @SalmanMohammadi , thank you so much for this PR! I think we are a bit afraid of changing things after already testing them. Do you mind holding it for a bit and have a bit testing later this week or next week, just for peace of mind?

… into fix_kv_compile

SalmanMohammadi · 2024-09-27T13:38:06Z

Summary

`generate_v2`

Branch	Time for inference(s)	tokens/sec	Bandwidth achieved (GB/\s)	Max memory allocated (GB)
`main`	0.88	9.04	146.92	16.28
`fix_kv_compile`	0.88	9.09	147.69	16.29

`eleuther_eval`

Branch	Time for completion(s)	it/sec	Max memory allocated (GB)
`main`	368.69	3.45	21.31
`fix_kv_compiile`	368.70	3.45	21.32

Outputs for generation and metrics for evaluation were identical. The suite of tests in tests/torchtune/generation also guard against generation outputs changing using KV-cacheing. Please let me know if there's more testing you'd like to see!

Raw logs

On `main`

`generate_v2`

INFO:torchtune.utils._logging:Running InferenceRecipe with resolved config:

checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
  checkpoint_files:
  - model-00001-of-00004.safetensors
  - model-00002-of-00004.safetensors
  - model-00003-of-00004.safetensors
  - model-00004-of-00004.safetensors
  model_type: LLAMA3
  output_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
  recipe_checkpoint: null
device: cuda
dtype: bf16
log_level: INFO
max_new_tokens: 200
model:
  _component_: torchtune.models.llama3_1.llama3_1_8b
prompt:
  system: You are a helpful and creative AI assistant.
  user: What is the capital of France?
seed: 1234
temperature: 0.6
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  max_seq_len: null
  path: /tmp/Meta-Llama-3.1-8B-Instruct/original/tokenizer.model
top_k: 300

INFO:torchtune.utils._logging:Model was initialized with precision torch.bfloat16.
INFO:torchtune.utils._logging:

The capital of France is Paris!

INFO:torchtune.utils._logging:Time for inference: 0.88 sec total, 9.04 tokens/sec
INFO:torchtune.utils._logging:Bandwidth achieved: 146.92 GB/s
INFO:torchtune.utils._logging:Max memory allocated: 16.29 GB

`eleuther_eval`

INFO:torchtune.utils._logging:Running EleutherEvalRecipe with resolved config:

batch_size: 2
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
  checkpoint_files:
  - model-00001-of-00004.safetensors
  - model-00002-of-00004.safetensors
  - model-00003-of-00004.safetensors
  - model-00004-of-00004.safetensors
  model_type: LLAMA3
  output_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
  recipe_checkpoint: null
device: cuda
dtype: bf16
enable_kv_cache: true
limit: 100
max_seq_length: 4096
model:
  _component_: torchtune.models.llama3_1.llama3_1_8b
quantizer: null
seed: 1234
tasks:
- truthfulqa_gen
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  max_seq_len: null
  path: /tmp/Meta-Llama-3.1-8B-Instruct/original/tokenizer.model

INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
/usr/local/lib/python3.11/dist-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
INFO:lm-eval:`group` and `group_alias` keys in TaskConfigs are deprecated and will be removed in v0.4.5 of lm_eval. The new `tag` field will be used to allow for a shortcut to a group of tasks one does not wish to aggregate metrics across. `group`s which aggregate across subtasks must be only defined in a separate group config file, which will be the official way to create groups that support cross-task aggregation as in `mmlu`. Please see the v0.4.4 patch notes and our documentation: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#advanced-group-configs for more information.
INFO:torchtune.utils._logging:Running evaluation on the following tasks: ['truthfulqa_gen']
INFO:lm-eval:Building contexts for truthfulqa_gen on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 4057.80it/s]
INFO:lm-eval:Running generate_until requests
Running generate_until requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [05:45<00:00,  3.45s/it]
INFO:torchtune.utils._logging:Eval completed in 368.69 seconds.
INFO:torchtune.utils._logging:Max memory allocated: 21.31 GB
INFO:torchtune.utils._logging:

Tasks	Version	Filter	Metric		Value		Stderr
truthfulqa_gen	3	none	bleu_acc	↑	0.5700	±	0.0498
		none	bleu_diff	↑	14.1211	±	3.2247
		none	bleu_max	↑	36.3770	±	2.5794
		none	rouge1_acc	↑	0.5700	±	0.0498
		none	rouge1_diff	↑	20.3622	±	4.4249
		none	rouge1_max	↑	60.7925	±	2.9996
		none	rouge2_acc	↑	0.5300	±	0.0502
		none	rouge2_diff	↑	20.8200	±	4.8412
		none	rouge2_max	↑	49.8049	±	3.5523
		none	rougeL_acc	↑	0.5600	±	0.0499
		none	rougeL_diff	↑	19.7338	±	4.4891
		none	rougeL_max	↑	59.3272	±	3.1075

On `fix_kv_compile`

`generate_v2`

INFO:torchtune.utils._logging:Running InferenceRecipe with resolved config:

checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
  checkpoint_files:
  - model-00001-of-00004.safetensors
  - model-00002-of-00004.safetensors
  - model-00003-of-00004.safetensors
  - model-00004-of-00004.safetensors
  model_type: LLAMA3
  output_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
  recipe_checkpoint: null
device: cuda
dtype: bf16
log_level: INFO
max_new_tokens: 200
model:
  _component_: torchtune.models.llama3_1.llama3_1_8b
prompt:
  system: You are a helpful and creative AI assistant.
  user: What is the capital of France?
seed: 1234
temperature: 0.6
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  max_seq_len: null
  path: /tmp/Meta-Llama-3.1-8B-Instruct/original/tokenizer.model
top_k: 300

INFO:torchtune.utils._logging:Model was initialized with precision torch.bfloat16.
INFO:torchtune.utils._logging:

The capital of France is Paris!

INFO:torchtune.utils._logging:Time for inference: 0.88 sec total, 9.09 tokens/sec
INFO:torchtune.utils._logging:Bandwidth achieved: 147.69 GB/s
INFO:torchtune.utils._logging:Max memory allocated: 16.29 GB

`eleuther_eval`

INFO:torchtune.utils._logging:Running EleutherEvalRecipe with resolved config:

batch_size: 2
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
  checkpoint_files:
  - model-00001-of-00004.safetensors
  - model-00002-of-00004.safetensors
  - model-00003-of-00004.safetensors
  - model-00004-of-00004.safetensors
  model_type: LLAMA3
  output_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
  recipe_checkpoint: null
device: cuda
dtype: bf16
enable_kv_cache: true
limit: 100
max_seq_length: 4096
model:
  _component_: torchtune.models.llama3_1.llama3_1_8b
quantizer: null
seed: 1234
tasks:
- truthfulqa_gen
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  max_seq_len: null
  path: /tmp/Meta-Llama-3.1-8B-Instruct/original/tokenizer.model

INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
/usr/local/lib/python3.11/dist-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
INFO:lm-eval:`group` and `group_alias` keys in TaskConfigs are deprecated and will be removed in v0.4.5 of lm_eval. The new `tag` field will be used to allow for a shortcut to a group of tasks one does not wish to aggregate metrics across. `group`s which aggregate across subtasks must be only defined in a separate group config file, which will be the official way to create groups that support cross-task aggregation as in `mmlu`. Please see the v0.4.4 patch notes and our documentation: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#advanced-group-configs for more information.
INFO:torchtune.utils._logging:Running evaluation on the following tasks: ['truthfulqa_gen']
INFO:lm-eval:Building contexts for truthfulqa_gen on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3991.08it/s]
INFO:lm-eval:Running generate_until requests
Running generate_until requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [05:45<00:00,  3.45s/it]
INFO:torchtune.utils._logging:Eval completed in 368.70 seconds.
INFO:torchtune.utils._logging:Max memory allocated: 21.31 GB
INFO:torchtune.utils._logging:

Tasks	Version	Filter	Metric		Value		Stderr
truthfulqa_gen	3	none	bleu_acc	↑	0.5700	±	0.0498
		none	bleu_diff	↑	14.1211	±	3.2247
		none	bleu_max	↑	36.3770	±	2.5794
		none	rouge1_acc	↑	0.5700	±	0.0498
		none	rouge1_diff	↑	20.3622	±	4.4249
		none	rouge1_max	↑	60.7925	±	2.9996
		none	rouge2_acc	↑	0.5300	±	0.0502
		none	rouge2_diff	↑	20.8200	±	4.8412
		none	rouge2_max	↑	49.8049	±	3.5523
		none	rougeL_acc	↑	0.5600	±	0.0499
		none	rougeL_diff	↑	19.7338	±	4.4891
		none	rougeL_max	↑	59.3272	±	3.1075

… into fix_kv_compile

SalmanMohammadi added 2 commits September 24, 2024 14:56

fixing kv-cache + compile

b3a9cb3

nit

6776c06

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 24, 2024

SalmanMohammadi commented Sep 24, 2024

View reviewed changes

tests/torchtune/modules/test_kv_cache.py Show resolved Hide resolved

SalmanMohammadi commented Sep 24, 2024

View reviewed changes

felipemello1 reviewed Sep 24, 2024

View reviewed changes

whoops

67bb9fc

ebsmothers reviewed Sep 25, 2024

View reviewed changes

torchtune/modules/kv_cache.py Outdated Show resolved Hide resolved

ebsmothers approved these changes Sep 25, 2024

View reviewed changes

felipemello1 requested changes Sep 25, 2024

View reviewed changes

SalmanMohammadi added 2 commits September 25, 2024 15:19

addressing some comments

b633fcd

Merge branch 'main' of github.com:pytorch/torchtune into fix_kv_compile

15a870f

SalmanMohammadi marked this pull request as draft September 25, 2024 14:34

SalmanMohammadi added 3 commits September 26, 2024 12:10

Merge branch 'main' into fix_kv_compile

8f2bdf4

updating reset

e528cd7

Merge branch 'fix_kv_compile' of github.com:SalmanMohammadi/torchtune…

e9bf00c

… into fix_kv_compile

SalmanMohammadi marked this pull request as ready for review September 27, 2024 13:38

SalmanMohammadi added 2 commits September 30, 2024 11:31

Merge branch 'main' into fix_kv_compile

e882b50

Merge branch 'fix_kv_compile' of github.com:SalmanMohammadi/torchtune…

3110bf9

… into fix_kv_compile

felipemello1 approved these changes Oct 2, 2024

View reviewed changes

felipemello1 merged commit bae4b27 into pytorch:main Oct 2, 2024
17 checks passed

SalmanMohammadi deleted the fix_kv_compile branch October 9, 2024 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing recompiles in KV-cache + compile #1663

Fixing recompiles in KV-cache + compile #1663

SalmanMohammadi commented Sep 24, 2024

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading

SalmanMohammadi Sep 24, 2024

felipemello1 Sep 24, 2024

SalmanMohammadi Sep 24, 2024

felipemello1 Sep 25, 2024

SalmanMohammadi Sep 25, 2024 •

edited

Loading

felipemello1 left a comment •

edited

Loading

felipemello1 Sep 24, 2024

SalmanMohammadi Sep 25, 2024

felipemello1 Sep 24, 2024

SalmanMohammadi commented Sep 24, 2024

ebsmothers left a comment

felipemello1 left a comment

SalmanMohammadi commented Sep 27, 2024 •

edited

Loading

Fixing recompiles in KV-cache + compile #1663

Fixing recompiles in KV-cache + compile #1663

Conversation

SalmanMohammadi commented Sep 24, 2024

Context

Test plan

UX

pytorch-bot bot commented Sep 24, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1663

✅ No Failures

SalmanMohammadi Sep 24, 2024

Choose a reason for hiding this comment

felipemello1 Sep 24, 2024

Choose a reason for hiding this comment

SalmanMohammadi Sep 24, 2024

Choose a reason for hiding this comment

felipemello1 Sep 25, 2024

Choose a reason for hiding this comment

SalmanMohammadi Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

felipemello1 left a comment • edited Loading

Choose a reason for hiding this comment

felipemello1 Sep 24, 2024

Choose a reason for hiding this comment

SalmanMohammadi Sep 25, 2024

Choose a reason for hiding this comment

felipemello1 Sep 24, 2024

Choose a reason for hiding this comment

SalmanMohammadi commented Sep 24, 2024

ebsmothers left a comment

Choose a reason for hiding this comment

felipemello1 left a comment

Choose a reason for hiding this comment

SalmanMohammadi commented Sep 27, 2024 • edited Loading

Summary

generate_v2

eleuther_eval

Raw logs

On main

generate_v2

eleuther_eval

On fix_kv_compile

generate_v2

eleuther_eval

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading

SalmanMohammadi Sep 25, 2024 •

edited

Loading

felipemello1 left a comment •

edited

Loading

SalmanMohammadi commented Sep 27, 2024 •

edited

Loading

`generate_v2`

`eleuther_eval`

On `main`

`generate_v2`

`eleuther_eval`

On `fix_kv_compile`

`generate_v2`

`eleuther_eval`