{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":669192119,"defaultBranch":"main","name":"triton","ownerLogin":"jungpark-mlir","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2023-07-21T15:09:46.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/83697902?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1727174069.0","currentOid":""},"activityList":{"items":[{"before":"7784890987559022327177119068f0ec307af869","after":"146d74bdd87f8c737e8550773c4324164be7e7d8","ref":"refs/heads/qsmemfix","pushedAt":"2024-09-25T20:57:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"antiagainst","name":"Lei Zhang","path":"/antiagainst","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/487928?s=80&v=4"},"commit":{"message":"Fix function name","shortMessageHtmlLink":"Fix function name"}},{"before":"e5c07e2b567f220ff1b8590590e0fb967b502ea5","after":"7784890987559022327177119068f0ec307af869","ref":"refs/heads/qsmemfix","pushedAt":"2024-09-25T16:06:53.000Z","pushType":"push","commitsCount":8,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"Merge branch 'main' into qsmemfix","shortMessageHtmlLink":"Merge branch 'main' into qsmemfix"}},{"before":"af233354a45aa7b685f19c3de924fd5efa97cbbf","after":"e5c07e2b567f220ff1b8590590e0fb967b502ea5","ref":"refs/heads/qsmemfix","pushedAt":"2024-09-25T16:06:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"Add lit test.","shortMessageHtmlLink":"Add lit test."}},{"before":"0e4deb42ff3d3f27780731559f029ab4828380ef","after":null,"ref":"refs/heads/prefetch-amd","pushedAt":"2024-09-24T10:34:29.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"}},{"before":"576426bccfb9a2c90f2abaa405995738d4a79403","after":"af233354a45aa7b685f19c3de924fd5efa97cbbf","ref":"refs/heads/qsmemfix","pushedAt":"2024-09-23T16:51:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[AMD][Bugfix] Fix a bug in shared memory address calculation\n\nThere was a bug in shared memory base pointer calculation which misuses\n`gep` build function and pass `ptr` type as sharedmem's element type.\nFixed the bug.","shortMessageHtmlLink":"[AMD][Bugfix] Fix a bug in shared memory address calculation"}},{"before":null,"after":"576426bccfb9a2c90f2abaa405995738d4a79403","ref":"refs/heads/qsmemfix","pushedAt":"2024-09-23T16:44:42.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[BACKEND] Switch back to use llvm.load for shared memory load (#4776)\n\nWhen we don't have predicates we can use llvm.load. Using inline asm for\r\ni8 types can cause inefficient code generation in llvm due to the\r\ninteraction with DAG legalizer.","shortMessageHtmlLink":"[BACKEND] Switch back to use llvm.load for shared memory load (triton…"}},{"before":"3ae95a858eac26088102075500e3860864432106","after":"576426bccfb9a2c90f2abaa405995738d4a79403","ref":"refs/heads/main","pushedAt":"2024-09-23T16:44:21.000Z","pushType":"push","commitsCount":8,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[BACKEND] Switch back to use llvm.load for shared memory load (#4776)\n\nWhen we don't have predicates we can use llvm.load. Using inline asm for\r\ni8 types can cause inefficient code generation in llvm due to the\r\ninteraction with DAG legalizer.","shortMessageHtmlLink":"[BACKEND] Switch back to use llvm.load for shared memory load (triton…"}},{"before":"9624bb00ea5629bb33e47e4aa27bb487f84d7dd3","after":"0e4deb42ff3d3f27780731559f029ab4828380ef","ref":"refs/heads/prefetch-amd","pushedAt":"2024-09-23T12:56:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"Add comment for the changes.","shortMessageHtmlLink":"Add comment for the changes."}},{"before":"a71a661e623aa223372b9cac3f29247a5f122151","after":"9624bb00ea5629bb33e47e4aa27bb487f84d7dd3","ref":"refs/heads/prefetch-amd","pushedAt":"2024-09-23T12:32:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"Revert fix, better commited separately.","shortMessageHtmlLink":"Revert fix, better commited separately."}},{"before":"3ae95a858eac26088102075500e3860864432106","after":"a71a661e623aa223372b9cac3f29247a5f122151","ref":"refs/heads/prefetch-amd","pushedAt":"2024-09-20T16:47:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[PREFETCH][AMD] Enable prefetch pass for the amdgpu\nWe've investigated tritongpu-prefetch on the amdgpu and it shows\npositive result in some cases. This change allows the prefetch pass to\nrewrite the loop with the `tt.dot` using `amd_mfma` with a fix in\nthe sharedmem offset.\nThis doesn't insert the pass to the compilation pipeline yet. The pass\nis supposed to be placed just after the pipelining pass.","shortMessageHtmlLink":"[PREFETCH][AMD] Enable prefetch pass for the amdgpu"}},{"before":null,"after":"3ae95a858eac26088102075500e3860864432106","ref":"refs/heads/prefetch-amd","pushedAt":"2024-09-20T09:48:47.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[AMD][CanonicalizePtr] Add a series of fixes for the new pipeliner (#4743)\n\nThis PR is fixing some issues with the `CanonicalizePointer` pass and\r\nthe new pipeliner:\r\n- Don't traverse twice the same nodes\r\n- Don't assume the operation to delete are in the correct order, but\r\n force dropping the reference of the ops before we delete them\r\n- Add support for select operation (+test), which is used when dealing\r\nwith multiple buffer (this part has been coauthored with @sjw36)","shortMessageHtmlLink":"[AMD][CanonicalizePtr] Add a series of fixes for the new pipeliner (t…"}},{"before":"5000e3264ab360711158ef999203d039c24ea7d6","after":"3ae95a858eac26088102075500e3860864432106","ref":"refs/heads/main","pushedAt":"2024-09-20T09:47:59.000Z","pushType":"push","commitsCount":14,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[AMD][CanonicalizePtr] Add a series of fixes for the new pipeliner (#4743)\n\nThis PR is fixing some issues with the `CanonicalizePointer` pass and\r\nthe new pipeliner:\r\n- Don't traverse twice the same nodes\r\n- Don't assume the operation to delete are in the correct order, but\r\n force dropping the reference of the ops before we delete them\r\n- Add support for select operation (+test), which is used when dealing\r\nwith multiple buffer (this part has been coauthored with @sjw36)","shortMessageHtmlLink":"[AMD][CanonicalizePtr] Add a series of fixes for the new pipeliner (t…"}},{"before":"5000e3264ab360711158ef999203d039c24ea7d6","after":"3d93a45575428d4c69cddcb7191f5a403a1328cb","ref":"refs/heads/experiment-prefetch","pushedAt":"2024-09-17T12:26:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"Experiment sharedmem prefetch\nincludes\n- ignoring nvidia mma check\n- fix for sharedmem base\n- compiler tweak to provide modified IR","shortMessageHtmlLink":"Experiment sharedmem prefetch"}},{"before":null,"after":"5000e3264ab360711158ef999203d039c24ea7d6","ref":"refs/heads/experiment-prefetch","pushedAt":"2024-09-17T12:20:32.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[AMD] Add check to fix test_store_cache_modifier for MI300 (#4726)\n\nCo-authored-by: Lei Zhang ","shortMessageHtmlLink":"[AMD] Add check to fix test_store_cache_modifier for MI300 (triton-la…"}},{"before":"2df33bbcd32c99dee4c758db9a1a89794affd833","after":"5000e3264ab360711158ef999203d039c24ea7d6","ref":"refs/heads/main","pushedAt":"2024-09-13T20:35:23.000Z","pushType":"push","commitsCount":19,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[AMD] Add check to fix test_store_cache_modifier for MI300 (#4726)\n\nCo-authored-by: Lei Zhang ","shortMessageHtmlLink":"[AMD] Add check to fix test_store_cache_modifier for MI300 (triton-la…"}},{"before":"310405647df51a909943bed71c5a6fd9a3e402b4","after":"2df33bbcd32c99dee4c758db9a1a89794affd833","ref":"refs/heads/main","pushedAt":"2024-09-10T11:00:24.000Z","pushType":"push","commitsCount":10,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[Frontend] Add TRITON_DEV_MODE for easier debugging of frontend errors (#4683)\n\nCurrently triton filters out parts of the stack trace that come from\r\ninside the compiler itself which is great for not confusing users.\r\nHowever this adds the ability to disable it by setting\r\n`TRITON_DEV_MODE=1` in the environment (open to bikeshedding on the\r\nname) for when you're trying to debug the frontend itself.","shortMessageHtmlLink":"[Frontend] Add TRITON_DEV_MODE for easier debugging of frontend errors ("}},{"before":"5e3d85548928df11f3f7c96733df986905c57563","after":"310405647df51a909943bed71c5a6fd9a3e402b4","ref":"refs/heads/main","pushedAt":"2024-09-06T17:34:12.000Z","pushType":"push","commitsCount":17,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"Revert \"[AMD] Disable block merging to avoid block argument explosion (#4176)\" (#4631)\n\nTurn back aggressive strategy by default to enable block merging\r\ngiven now we have upstream fixes for it brought in:\r\nhttps://github.com/triton-lang/triton/pull/4619.\r\n\r\nThis reverts commit cf2ad02324fc253970c3ab2666e775406405f213.","shortMessageHtmlLink":"Revert \"[AMD] Disable block merging to avoid block argument explosion ("}},{"before":"241e89c24a0fa5297a59e8e14ad3d62295f54c7a","after":"5e3d85548928df11f3f7c96733df986905c57563","ref":"refs/heads/main","pushedAt":"2024-09-05T09:19:36.000Z","pushType":"push","commitsCount":30,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[PROTON] Improve user experience on the CUPTI backend (#4647)\n\nPreviously, we would search for `libcupti.so` in the `LD_LIBRARY_PATH`.\r\nHowever, we already have `libcupti.so` downloaded from the conda\r\npackage, and it is the version that exactly matches the CUPTI header\r\nfile. Therefore, we can simply copy and paste `libcupti.so` to\r\n`third_party/nvidia/lib/cupti` and search there first. This eliminates\r\nthe need to set `LD_LIBRARY_PATH` in many cases.","shortMessageHtmlLink":"[PROTON] Improve user experience on the CUPTI backend (triton-lang#4647)"}},{"before":"2defd41111ae9001a547d0d9d1f139d460935bf9","after":"fb25cd05ee7573b82cddf2735247c42fd1a05078","ref":"refs/heads/reorder-sh","pushedAt":"2024-09-01T07:26:43.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"antiagainst","name":"Lei Zhang","path":"/antiagainst","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/487928?s=80&v=4"},"commit":{"message":"Fix","shortMessageHtmlLink":"Fix"}},{"before":"ea1f8a544d5866be6f826b4d05cb5fa1c3f26a4e","after":"2defd41111ae9001a547d0d9d1f139d460935bf9","ref":"refs/heads/reorder-sh","pushedAt":"2024-09-01T07:24:33.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"antiagainst","name":"Lei Zhang","path":"/antiagainst","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/487928?s=80&v=4"},"commit":{"message":"Fix","shortMessageHtmlLink":"Fix"}},{"before":"b2a5d9ef44f97c4d463a332662d610032310ef6b","after":"ea1f8a544d5866be6f826b4d05cb5fa1c3f26a4e","ref":"refs/heads/reorder-sh","pushedAt":"2024-09-01T07:21:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"antiagainst","name":"Lei Zhang","path":"/antiagainst","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/487928?s=80&v=4"},"commit":{"message":"Fix","shortMessageHtmlLink":"Fix"}},{"before":"1cd3e1e8135180f724767ca5fb36cb1abb15f132","after":"b2a5d9ef44f97c4d463a332662d610032310ef6b","ref":"refs/heads/reorder-sh","pushedAt":"2024-09-01T07:11:33.000Z","pushType":"push","commitsCount":15,"pusher":{"login":"antiagainst","name":"Lei Zhang","path":"/antiagainst","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/487928?s=80&v=4"},"commit":{"message":"Merge remote-tracking branch 'origin/main' into reorder-sh","shortMessageHtmlLink":"Merge remote-tracking branch 'origin/main' into reorder-sh"}},{"before":"66895a567fd705d673028b6dcf80147749d28812","after":"1cd3e1e8135180f724767ca5fb36cb1abb15f132","ref":"refs/heads/reorder-sh","pushedAt":"2024-08-28T16:35:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"Fix format.","shortMessageHtmlLink":"Fix format."}},{"before":"d4c3eb0eaa1eaca98aa7c8650e73e98d4e1e75d1","after":"66895a567fd705d673028b6dcf80147749d28812","ref":"refs/heads/reorder-sh","pushedAt":"2024-08-28T15:53:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"add test in the lit test","shortMessageHtmlLink":"add test in the lit test"}},{"before":"98ad927e5eafa2c9439a9191e07c5b4e5593d052","after":"d4c3eb0eaa1eaca98aa7c8650e73e98d4e1e75d1","ref":"refs/heads/reorder-sh","pushedAt":"2024-08-28T15:24:57.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"Merge branch 'triton-lang:main' into reorder-sh","shortMessageHtmlLink":"Merge branch 'triton-lang:main' into reorder-sh"}},{"before":"f48dbc1b106c93144c198fbf3c4f30b2aab9d242","after":"241e89c24a0fa5297a59e8e14ad3d62295f54c7a","ref":"refs/heads/main","pushedAt":"2024-08-28T15:24:39.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[nvidia backend] Replace cvt instructions with bitwise operations in s8->bf16 conversions (#4563)\n\nHopper has very low throughput of conversion instructions that cause\r\nthis operations to quickly become an ALU bottleneck. Restating it in\r\nterms of bitwise ops and SIMD bf16 instructions increases the throughput\r\nsignificantly and translates to meaningful speedups (e.g. 10% end-to-end\r\non one matmul I was looking at).\r\n\r\nCo-authored-by: Adam Paszke ","shortMessageHtmlLink":"[nvidia backend] Replace cvt instructions with bitwise operations in …"}},{"before":"f48dbc1b106c93144c198fbf3c4f30b2aab9d242","after":"98ad927e5eafa2c9439a9191e07c5b4e5593d052","ref":"refs/heads/reorder-sh","pushedAt":"2024-08-28T14:02:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[AMD] reorder convert_layout in the tritonamdgpu-reorder-instructions pass.\nconvert_layout optimization mainly tries to minimize the data movement.\nIt can also use shared memory and this change tries to avoid unnecessary\nshared memory allocation.","shortMessageHtmlLink":"[AMD] reorder convert_layout in the tritonamdgpu-reorder-instructions…"}},{"before":null,"after":"f48dbc1b106c93144c198fbf3c4f30b2aab9d242","ref":"refs/heads/reorder-sh","pushedAt":"2024-08-27T16:25:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[CODEGEN] Support CUDA 12.6 (#4588)\n\nAccording to the\r\n[table](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history),\r\nboth CUDA 12.5 and 12.6 use PTX ISA 8.5","shortMessageHtmlLink":"[CODEGEN] Support CUDA 12.6 (triton-lang#4588)"}},{"before":"f21009031e94f4f4da2d01641d7f20f8b8e2d70b","after":"f48dbc1b106c93144c198fbf3c4f30b2aab9d242","ref":"refs/heads/main","pushedAt":"2024-08-27T16:19:10.000Z","pushType":"push","commitsCount":16,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[CODEGEN] Support CUDA 12.6 (#4588)\n\nAccording to the\r\n[table](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history),\r\nboth CUDA 12.5 and 12.6 use PTX ISA 8.5","shortMessageHtmlLink":"[CODEGEN] Support CUDA 12.6 (triton-lang#4588)"}},{"before":null,"after":"f21009031e94f4f4da2d01641d7f20f8b8e2d70b","ref":"refs/heads/onepipe","pushedAt":"2024-08-23T10:50:37.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jungpark-mlir","name":"Jungwook Park","path":"/jungpark-mlir","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/83697902?s=80&v=4"},"commit":{"message":"[TEST] Use device fixture for test_math_extern (#4558)\n\nMissed this one test the last time.","shortMessageHtmlLink":"[TEST] Use device fixture for test_math_extern (triton-lang#4558)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yNVQyMDo1Nzo1MS4wMDAwMDBazwAAAATAyouu","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0yM1QxMDo1MDozNy4wMDAwMDBazwAAAASiPl75"}},"title":"Activity · jungpark-mlir/triton"}