-
Notifications
You must be signed in to change notification settings - Fork 320
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[JAX] Consolidate FFI and old descriptor implementation for fused attention.
#1295
opened Oct 28, 2024 by
mgoldfarb-nvidia
Loading…
8 of 13 tasks
[PyTorch] Skip t3hd/th3d for MQA/GQA tests
#1293
opened Oct 28, 2024 by
cyanguwa
Loading…
8 of 13 tasks
[TE/JAX] XLA FFI calls for layer norm and RMS norm
#1290
opened Oct 26, 2024 by
huanghua1994
Loading…
6 of 13 tasks
[TE/JAX] Custom call with FFI - lowering all attributes with bind all
#1289
opened Oct 25, 2024 by
phu0ngng
Loading…
6 of 13 tasks
Add check for GPU availability in attention
#1287
opened Oct 24, 2024 by
cyanguwa
Loading…
8 of 13 tasks
[PyTorch] Fix get_swa_mask() for padding masks
#1281
opened Oct 21, 2024 by
cyanguwa
Loading…
6 of 13 tasks
[PyTorch] MultiheadAttention: Pass cu_seqlens to apply_rotary_pos_emb
#1279
opened Oct 21, 2024 by
Marks101
Loading…
1 of 13 tasks
attention_mask fill with -inf for UnfusedDotProductAttention
#1268
opened Oct 18, 2024 by
Agoniii
Loading…
1 of 13 tasks
Draft: reduce cudagraph mem via preoallcations
#1253
opened Oct 15, 2024 by
JimmyZhang12
Loading…
13 tasks
Save CUDA Graph memory by reusing input and output tensors
#1234
opened Oct 9, 2024 by
buptzyb
Loading…
5 of 13 tasks
Draft: Use fused push_send_recv kernel for TP AG and RS overlaps
#1200
opened Sep 24, 2024 by
erhoo82
Loading…
13 tasks
[PyTorch] Fused dbias-cast-transpose in bias operation
#1168
opened Sep 6, 2024 by
timmoon10
Loading…
7 of 13 tasks
[PyTorch] Avoid saving fp8_tensors in certain scenarios
#1143
opened Aug 28, 2024 by
cyanguwa
Loading…
8 of 13 tasks
[PyTorch] Userbuffers support in operation-based API
#1142
opened Aug 27, 2024 by
timmoon10
Loading…
7 of 13 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.