[NVIDIA] Relax the requirement for providing both `query_seq_lengths` and `key_value_seq_lengths` #23415

kaixih · 2024-09-03T20:41:17Z

This PR addresses an issue where the original nn.dot_product_attention required both query_seq_lengths and key_value_seq_lengths. With this update, users can now provide only one of these lengths, thereby relaxing the previous requirement.

This is motivated by this issue.

cc. @superbobry

kaixih · 2024-09-03T20:41:49Z

Also, @sbodenstein for review.

superbobry

Can you add a regression test for #23349, please?

superbobry · 2024-09-04T09:37:30Z

jax/_src/nn/functions.py

-  kv_indices = jnp.arange(0, S)[None, None, :]
-  q_mask = q_indices < q_seqlen[:, None, None]
-  kv_mask = kv_indices < kv_seqlen[:, None, None]
+  q_mask = jnp.array(True, dtype=jnp.bool_)


Can you do jnp.bool_(True) or just True here?

superbobry · 2024-09-04T09:37:44Z

tests/nn_test.py

@@ -122,7 +122,6 @@ def testDotProductAttentionMask(self, mask_mode):

    is_causal = 'causal' in mask_mode
    if 'padding' in mask_mode:
-      q_seqlen = jnp.array([T // 2, T // 4], dtype=jnp.int32)


Is this change necessary?

sbodenstein · 2024-09-10T21:32:23Z

LGTM

kaixih mentioned this pull request Sep 3, 2024

jax.nn.dot_product_attention does not respect key_value_seq_lengths #23349

Open

superbobry requested changes Sep 4, 2024

View reviewed changes

Relax q_seqlen and kv_seqlen

2d2cbbc

kaixih force-pushed the key_value_seq_lengths branch from 12d93a4 to 2d2cbbc Compare September 5, 2024 17:43

superbobry approved these changes Sep 11, 2024

View reviewed changes

google-ml-butler bot added kokoro:force-run pull ready Ready for copybara import and testing labels Sep 11, 2024

kokoro-team removed the kokoro:force-run label Sep 11, 2024

copybara-service bot merged commit e869a9d into jax-ml:main Sep 11, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Relax the requirement for providing both `query_seq_lengths` and `key_value_seq_lengths` #23415

[NVIDIA] Relax the requirement for providing both `query_seq_lengths` and `key_value_seq_lengths` #23415

kaixih commented Sep 3, 2024

kaixih commented Sep 3, 2024

superbobry left a comment

superbobry Sep 4, 2024

kaixih Sep 5, 2024

superbobry Sep 4, 2024

kaixih Sep 5, 2024

sbodenstein commented Sep 10, 2024

[NVIDIA] Relax the requirement for providing both query_seq_lengths and key_value_seq_lengths #23415

[NVIDIA] Relax the requirement for providing both query_seq_lengths and key_value_seq_lengths #23415

Conversation

kaixih commented Sep 3, 2024

kaixih commented Sep 3, 2024

superbobry left a comment

Choose a reason for hiding this comment

superbobry Sep 4, 2024

Choose a reason for hiding this comment

kaixih Sep 5, 2024

Choose a reason for hiding this comment

superbobry Sep 4, 2024

Choose a reason for hiding this comment

kaixih Sep 5, 2024

Choose a reason for hiding this comment

sbodenstein commented Sep 10, 2024

[NVIDIA] Relax the requirement for providing both `query_seq_lengths` and `key_value_seq_lengths` #23415

[NVIDIA] Relax the requirement for providing both `query_seq_lengths` and `key_value_seq_lengths` #23415