Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【MetaX】Merge Metax's modifications to mxmaca/2.6 branch #68534

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
1b696a1
fix windows bug for common lib (#60308)
risemeup1 Dec 26, 2023
a4cd847
[Dy2St] Disable `test_bert` on CPU (#60173) (#60324)
SigureMo Dec 26, 2023
20d3558
[Cherry-pick] fix weight quant kernel bug when n div 64 != 0 (#60184)
wwbitejotunn Dec 26, 2023
203754e
tile (#60261)
lizexu123 Dec 26, 2023
b065877
add chunk allocator posix_memalign return value check (#60208) (#60495)
danleifeng Jan 2, 2024
83ce809
update 2023 security advisory, test=document_fix (#60532)
VigiZhang Jan 3, 2024
ae2e588
fix fleetutil get_online_pass_interval bug2; test=develop (#60545)
danleifeng Jan 4, 2024
97b65c7
fix fused_rope diff (#60217) (#60593)
tianhaodongbd Jan 8, 2024
bbc13eb
[cherry-pick]fix fleetutil get_online_pass_interval bug3 (#60620)
danleifeng Jan 9, 2024
ccdf528
[cherry-pick]update pdsa-2023-019 (#60649)
VigiZhang Jan 9, 2024
e50f43e
[Dy2St][2.6] Disable `test_grad` on release/2.6 (#60662)
SigureMo Jan 9, 2024
7b0d2e9
fix bug of ci (#59926) (#60785)
zyfncg Jan 15, 2024
e738f49
[Dy2St][2.6] Disable `test_transformer` on `release/2.6` and update R…
SigureMo Jan 15, 2024
d788e9b
[Dy2St][2.6] Increase `test_transformer` and `test_mobile_net` ut tim…
SigureMo Jan 17, 2024
1aa5f4b
[Cherry-pick] fix set_value with scalar grad (#60930)
zoooo0820 Jan 19, 2024
0f732a5
[cherry-pick] This PR enable offset of generator for custom device. (…
zhaohaixu Jan 23, 2024
ac1702b
fix core dump when fallback gather_nd_grad and MemoryAllocateHost (#6…
raining-dark Jan 25, 2024
ff119d0
fix qat tests (#61211) (#61284)
zzjjay Jan 30, 2024
aeaa0ca
[Security] fix draw security problem (#61161) (#61338)
wanghuancoder Jan 31, 2024
0227a0d
fix _decompress security problem (#61294) (#61337)
wanghuancoder Jan 31, 2024
f99d4f2
Fix CVE-2024-0521 (#61032) (#61287)
wanghuancoder Jan 31, 2024
af9b8c5
[Security] fix security problem for prune_by_memory_estimation (#61382)
vn-ncvinh Feb 1, 2024
9cd0c91
[Security] fix security problem for run_cmd (#61285) (#61398)
wanghuancoder Feb 1, 2024
5f3bbeb
[Security] fix download security problem (#61162) (#61388)
wanghuancoder Feb 1, 2024
60325a1
check eval for security (#61389)
wanghuancoder Feb 1, 2024
0ccb9cb
[cherry-pick] adapt c_embedding to phi namespace for custom devices (…
BeingGod Feb 2, 2024
f025385
[CherryPick] Fix issue 60092 (#61427)
yuanlehome Feb 5, 2024
3452e61
Fix unique (#60840) (#61044)
warrentdrew Feb 5, 2024
a37f6fb
cinn(py-dsl): skip eval string in python-dsl (#61380) (#61586)
6clc Feb 5, 2024
9250f66
remove _wget (#61356) (#61569)
lyuwenyu Feb 5, 2024
e5a85b6
fix layer_norm decompose dtyte bugs, polish codes (#61631)
lxd-cumt Feb 6, 2024
12e5c97
fix doc style (#61688)
GreatV Feb 19, 2024
39010bf
merge (#61866)
wanghuancoder Feb 21, 2024
b6a38d0
[security] refine _get_program_cache_key (#61827) (#61896)
wanghuancoder Feb 21, 2024
96c2aaf
repeat_interleave support bf16 dtype (#61854) (#61899)
tianhaodongbd Feb 21, 2024
2175de0
Support Fake GroupWise Quant (#61900)
RachelXu7 Feb 21, 2024
f09d9d8
fix launch when elastic run (#61847) (#61878)
wentaoyu Feb 21, 2024
df0155f
[Paddle-TRT] fix solve (#61806)
zhink Feb 22, 2024
04ac1c0
[Cherry-Pick] Fix CacheKV Quant Bug (#61966)
ZhangHandi Feb 22, 2024
5f048c9
Sychronized the paddle2.4 adaptation changes
Mar 15, 2024
50c9e9a
clear third_part dependencies
Mar 15, 2024
2a21139
change submodules to right commits
Mar 18, 2024
a6a5259
build pass with cpu only
Mar 18, 2024
6ea0936
build success with maca
Mar 21, 2024
a400805
build success with cutlass and fused kernels
Mar 25, 2024
c08de70
build with flash_attn and mccl
Mar 28, 2024
633780b
build with test, fix some bugs
Apr 29, 2024
c6b7c36
fix some bugs
May 21, 2024
0c0d79c
fixed some compilation bugs
May 23, 2024
126ca25
fix bug in previous commit
May 24, 2024
8a5bf30
fix bug with split when col_size biger than 256
May 27, 2024
85f37b2
add row_limit to show full kernel name
Jun 11, 2024
f46156d
add env.sh
Jun 19, 2024
4616eaf
add shape record
Jun 24, 2024
4adb9f9
modify paddle version
Jun 25, 2024
e47f023
wuzhao optimized the performance of elementwise kernel.
Jul 8, 2024
2ccdb35
fix split when dtype is fp16
Jul 11, 2024
2eaad07
fix bug in previous commit
Jul 17, 2024
fd69368
adapt flash_attn new capi
Jul 24, 2024
e9afe2c
change eigen path
Jul 26, 2024
b86cf6d
modify mcname -> replaced_name
Aug 1, 2024
2aa1e91
fix some build bugs
Aug 9, 2024
edfb3ee
Merge "fix some build bugs" into mx/dev-2.6.0
Aug 9, 2024
fcdeee1
add PADDLE_ENABLE_SAME_RAND_A100
Aug 18, 2024
e13e1bd
Merge "add PADDLE_ENABLE_SAME_RAND_A100" into mx/dev-2.6.0
Aug 19, 2024
0b844ea
remove redundant warning, add patch from 2.6.1
Aug 27, 2024
ffef707
improve VectorizedBroadcastKernel
Jul 8, 2024
70b9ec9
fix bugs
Jul 14, 2024
b322680
split ElementwiseDivGrad
Jul 14, 2024
2902791
in VectorizedElementwiseKernel, it can now use vecSize = 8
Jul 18, 2024
508a177
improve ModulatedDeformableCol2imCoordGpuKernel:1.block size 512->64;…
Jul 19, 2024
393df39
Optimize depthwise_conv2d_grad compute (InputGrad):
Jul 22, 2024
c461333
improve VectorizedBroadcastKernel with LoadType =
Jul 25, 2024
3a47751
fix ElementwiseDivGrad
Jul 25, 2024
cefdffc
Revert "Optimize depthwise_conv2d_grad compute (InputGrad):"
Jul 25, 2024
4543355
improve ElementwiseDivGrad and ElementwiseMulGrad
Jul 26, 2024
df38d3c
improve FilterBBoxes
Jul 27, 2024
cb28f43
improve deformable_conv_grad op:1.adaptive block size;2.FastDivMod;3.…
Jul 28, 2024
bdf6be4
improve ModulatedDeformableIm2colGpuKernel:1.adaptive block size;2.Fa…
Jul 28, 2024
3d67920
improve KeBNBackwardData:replace 1.0/sqrt with rsqrt
Jul 28, 2024
2e72a0c
Improve KeBNBackwardData, FilterGradAddupGpuKernel kernels. Improve n…
Jul 29, 2024
8079046
Optimize depthwise_conv2d:
Jul 29, 2024
2f6170b
Improve CheckFiniteAndUnscaleKernel by splitting the kernel into mult…
Jul 30, 2024
8ceabc0
Revert "Improve CheckFiniteAndUnscaleKernel by splitting the kernel i…
Jul 31, 2024
add2987
improve ScatterInitCUDAKernel and ScatterCUDAKernel
Aug 1, 2024
7b7b153
fix bugs and make the code easier to read
Aug 1, 2024
2800d94
Optimize FilterGard and InputGradSpL
Aug 6, 2024
f6a6421
Improve CheckFiniteAndUnscaleKernel by putting address access to shar…
Aug 8, 2024
824a4cd
Optimize SwinTransformer
Aug 9, 2024
b99e439
Optimize LayerNormBackwardComputeGradInputWithSmallFeatureSize
Aug 15, 2024
b756d63
improve KeMatrixTopK:1.fix private memory;2.modify max grid size;3.ch…
Aug 16, 2024
37d4165
Modify LayerNorm Optimization
Aug 21, 2024
29999eb
improve roi_align op:1.adaptive block size;2.FastDivMod.
Aug 28, 2024
b37090c
add workaround for parameters dislocation when calling BatchedGEMM<fl…
Sep 2, 2024
531f3c7
fix McFlashAttn string
Sep 2, 2024
d180d30
[C500-27046] fix wb issue
Sep 3, 2024
5235601
Support compiling external ops
Sep 10, 2024
cd89fd8
Merge "Support compiling external ops" into mx/dev-2.6.0
Sep 10, 2024
f7eb629
support flash attn varlen api and support arm build
Sep 19, 2024
ee123e0
Add a copyright notice
Sep 27, 2024
898a221
Modify some third-party dependency addresses to public network addresses
Sep 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
40 changes: 13 additions & 27 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[submodule "third_party/protobuf"]
path = third_party/protobuf
url = https://github.com/protocolbuffers/protobuf.git
tag = paddle
ignore = dirty
[submodule "third_party/pocketfft"]
path = third_party/pocketfft
Expand All @@ -21,10 +22,11 @@
[submodule "third_party/utf8proc"]
path = third_party/utf8proc
url = https://github.com/JuliaStrings/utf8proc.git
tag = v2.6.1
ignore = dirty
[submodule "third_party/warpctc"]
path = third_party/warpctc
url = https://github.com/baidu-research/warp-ctc.git
url = http://pdegit.metax-internal.com/pde-ai/warp-ctc.git
ignore = dirty
[submodule "third_party/warprnnt"]
path = third_party/warprnnt
Expand All @@ -33,10 +35,12 @@
[submodule "third_party/xxhash"]
path = third_party/xxhash
url = https://github.com/Cyan4973/xxHash.git
tag = v0.6.5
ignore = dirty
[submodule "third_party/pybind"]
path = third_party/pybind
url = https://github.com/pybind/pybind11.git
tag = v2.4.3
ignore = dirty
[submodule "third_party/threadpool"]
path = third_party/threadpool
Expand All @@ -45,39 +49,25 @@
[submodule "third_party/zlib"]
path = third_party/zlib
url = https://github.com/madler/zlib.git
tag = v1.2.8
ignore = dirty
[submodule "third_party/glog"]
path = third_party/glog
url = https://github.com/google/glog.git
ignore = dirty
[submodule "third_party/eigen3"]
path = third_party/eigen3
url = https://gitlab.com/libeigen/eigen.git
ignore = dirty
[submodule "third_party/snappy"]
path = third_party/snappy
url = https://github.com/google/snappy.git
ignore = dirty
[submodule "third_party/cub"]
path = third_party/cub
url = https://github.com/NVIDIA/cub.git
ignore = dirty
[submodule "third_party/cutlass"]
path = third_party/cutlass
url = https://github.com/NVIDIA/cutlass.git
ignore = dirty
[submodule "third_party/xbyak"]
path = third_party/xbyak
url = https://github.com/herumi/xbyak.git
tag = v5.81
ignore = dirty
[submodule "third_party/mkldnn"]
path = third_party/mkldnn
url = https://github.com/oneapi-src/oneDNN.git
ignore = dirty
[submodule "third_party/flashattn"]
path = third_party/flashattn
url = https://github.com/PaddlePaddle/flash-attention.git
ignore = dirty
[submodule "third_party/gtest"]
path = third_party/gtest
url = https://github.com/google/googletest.git
Expand All @@ -98,15 +88,11 @@
path = third_party/rocksdb
url = https://github.com/Thunderbrook/rocksdb
ignore = dirty
[submodule "third_party/absl"]
path = third_party/absl
url = https://github.com/abseil/abseil-cpp.git
ignore = dirty
[submodule "third_party/jitify"]
path = third_party/jitify
url = https://github.com/NVIDIA/jitify.git
[submodule "third_party/cutlass"]
path = third_party/cutlass
url = http://pdegit.metax-internal.com/pde-ai/cutlass.git
ignore = dirty
[submodule "third_party/cccl"]
path = third_party/cccl
url = https://github.com/NVIDIA/cccl.git
[submodule "third_party/eigen3"]
path = third_party/eigen3
url = ssh://gerrit.metax-internal.com:29418/MACA/library/mcEigen
ignore = dirty
23 changes: 20 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# 2024 - Modified by MetaX Integrated Circuits (Shanghai) Co., Ltd. All Rights Reserved.
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand All @@ -24,7 +25,7 @@ endif()
# https://cmake.org/cmake/help/v3.0/policy/CMP0026.html?highlight=cmp0026
cmake_policy(SET CMP0026 OLD)
cmake_policy(SET CMP0079 NEW)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake" $ENV{CMAKE_MODULE_PATH})
set(PADDLE_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
set(PADDLE_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})

Expand Down Expand Up @@ -92,6 +93,7 @@ endif()

if(WITH_GPU AND NOT APPLE)
enable_language(CUDA)
set(CMAKE_CUDA_COMPILER_VERSION 11.6)
message(STATUS "CUDA compiler: ${CMAKE_CUDA_COMPILER}, version: "
"${CMAKE_CUDA_COMPILER_ID} ${CMAKE_CUDA_COMPILER_VERSION}")
endif()
Expand Down Expand Up @@ -255,7 +257,7 @@ option(WITH_SYSTEM_BLAS "Use system blas library" OFF)
option(WITH_DISTRIBUTE "Compile with distributed support" OFF)
option(WITH_BRPC_RDMA "Use brpc rdma as the rpc protocal" OFF)
option(ON_INFER "Turn on inference optimization and inference-lib generation"
ON)
OFF)
option(WITH_CPP_DIST "Install PaddlePaddle C++ distribution" OFF)
option(WITH_GFLAGS "Compile PaddlePaddle with gflags support" OFF)
################################ Internal Configurations #######################################
Expand Down Expand Up @@ -283,7 +285,7 @@ option(
OFF)
option(WITH_LITE "Compile Paddle Fluid with Lite Engine" OFF)
option(WITH_CINN "Compile PaddlePaddle with CINN" OFF)
option(WITH_NCCL "Compile PaddlePaddle with NCCL support" ON)
option(WITH_NCCL "Compile PaddlePaddle with NCCL support" OFF)
option(WITH_RCCL "Compile PaddlePaddle with RCCL support" ON)
option(WITH_XPU_BKCL "Compile PaddlePaddle with BAIDU KUNLUN XPU BKCL" OFF)
option(WITH_CRYPTO "Compile PaddlePaddle with crypto support" ON)
Expand Down Expand Up @@ -474,6 +476,21 @@ if(WITH_GPU)
# so include(cudnn) needs to be in front of include(third_party/lite)
include(cudnn) # set cudnn libraries, must before configure
include(tensorrt)

include_directories("$ENV{MACA_PATH}/tools/cu-bridge/include")
include_directories("$ENV{MACA_PATH}/include")
include_directories("$ENV{MACA_PATH}/include/mcblas")
include_directories("$ENV{MACA_PATH}/include/mcr")
include_directories("$ENV{MACA_PATH}/include/mcdnn")
include_directories("$ENV{MACA_PATH}/include/mcsim")
include_directories("$ENV{MACA_PATH}/include/mcsparse")
include_directories("$ENV{MACA_PATH}/include/mcfft")
include_directories("$ENV{MACA_PATH}/include/mcrand")
include_directories("$ENV{MACA_PATH}/include/common")
include_directories("$ENV{MACA_PATH}/include/mcsolver")
include_directories("$ENV{MACA_PATH}/include/mctx")
include_directories("$ENV{MACA_PATH}/include/mcpti")
include_directories("$ENV{MACA_PATH}/mxgpu_llvm/include")
# there is no official support of nccl, cupti in windows
if(NOT WIN32)
include(cupti)
Expand Down
183 changes: 183 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
The following files may have been modified by MetaX Integrated Circuits (Shanghai) Co., Ltd. in 2024.

.gitmodules
CMakeLists.txt
cmake/cuda.cmake
cmake/cudnn.cmake
cmake/cupti.cmake
cmake/external/brpc.cmake
cmake/external/cryptopp.cmake
cmake/external/cutlass.cmake
cmake/external/dgc.cmake
cmake/external/dlpack.cmake
cmake/external/eigen.cmake
cmake/external/flashattn.cmake
cmake/external/jemalloc.cmake
cmake/external/lapack.cmake
cmake/external/libmct.cmake
cmake/external/mklml.cmake
cmake/external/protobuf.cmake
cmake/external/pybind11.cmake
cmake/external/utf8proc.cmake
cmake/flags.cmake
cmake/generic.cmake
cmake/inference_lib.cmake
cmake/nccl.cmake
cmake/third_party.cmake
env.sh
paddle/fluid/distributed/fleet_executor/test/interceptor_ping_pong_with_brpc_test.cc
paddle/fluid/eager/api/manual/eager_manual/forwards/multiply_fwd_func.cc
paddle/fluid/eager/auto_code_generator/eager_generator.cc
paddle/fluid/eager/auto_code_generator/generator/eager_gen.py
paddle/fluid/framework/details/build_strategy.cc
paddle/fluid/framework/distributed_strategy.proto
paddle/fluid/inference/api/resource_manager.cc
paddle/fluid/inference/api/resource_manager.h
paddle/fluid/inference/tensorrt/plugin/layernorm_shift_partition_op.cu
paddle/fluid/inference/tensorrt/plugin/matmul_op_int8_plugin.h
paddle/fluid/inference/tensorrt/plugin/preln_residual_bias_plugin.cu
paddle/fluid/memory/allocation/CMakeLists.txt
paddle/fluid/memory/allocation/allocator_facade.cc
paddle/fluid/operators/CMakeLists.txt
paddle/fluid/operators/correlation_op.cu
paddle/fluid/operators/elementwise/elementwise_op_function.h
paddle/fluid/operators/fused/CMakeLists.txt
paddle/fluid/operators/fused/attn_gemm_int8.h
paddle/fluid/operators/fused/cublaslt.h
paddle/fluid/operators/fused/fused_gate_attention.h
paddle/fluid/operators/fused/fused_gemm_epilogue_op.cu
paddle/fluid/operators/fused/fused_layernorm_residual_dropout_bias.h
paddle/fluid/operators/fused/fused_multi_transformer_int8_op.cu
paddle/fluid/operators/fused/fused_multi_transformer_op.cu
paddle/fluid/operators/fused/fused_multi_transformer_op.cu.h
paddle/fluid/operators/fused/fused_softmax_mask.cu.h
paddle/fluid/operators/math/inclusive_scan.h
paddle/fluid/operators/matmul_op.cc
paddle/fluid/operators/row_conv_op.cu
paddle/fluid/operators/sparse_attention_op.cu
paddle/fluid/platform/cuda_graph_with_memory_pool.cc
paddle/fluid/platform/device/gpu/cuda/cuda_helper.h
paddle/fluid/platform/device/gpu/cuda_helper_test.cu
paddle/fluid/platform/device/gpu/gpu_types.h
paddle/fluid/platform/device_context.h
paddle/fluid/platform/dynload/CMakeLists.txt
paddle/fluid/platform/dynload/cublas.h
paddle/fluid/platform/dynload/cublasLt.cc
paddle/fluid/platform/dynload/cublasLt.h
paddle/fluid/platform/dynload/cusparseLt.h
paddle/fluid/platform/init.cc
paddle/fluid/platform/init_phi_test.cc
paddle/fluid/pybind/eager_legacy_op_function_generator.cc
paddle/fluid/pybind/fleet_py.cc
paddle/fluid/pybind/pybind.cc
paddle/phi/api/profiler/profiler.cc
paddle/phi/backends/dynload/CMakeLists.txt
paddle/phi/backends/dynload/cublas.h
paddle/phi/backends/dynload/cublasLt.cc
paddle/phi/backends/dynload/cublasLt.h
paddle/phi/backends/dynload/cuda_driver.h
paddle/phi/backends/dynload/cudnn.h
paddle/phi/backends/dynload/cufft.h
paddle/phi/backends/dynload/cupti.h
paddle/phi/backends/dynload/curand.h
paddle/phi/backends/dynload/cusolver.h
paddle/phi/backends/dynload/cusparse.h
paddle/phi/backends/dynload/cusparseLt.h
paddle/phi/backends/dynload/dynamic_loader.cc
paddle/phi/backends/dynload/flashattn.h
paddle/phi/backends/dynload/nccl.h
paddle/phi/backends/dynload/nvjpeg.h
paddle/phi/backends/dynload/nvrtc.h
paddle/phi/backends/dynload/nvtx.h
paddle/phi/backends/gpu/cuda/cuda_device_function.h
paddle/phi/backends/gpu/cuda/cuda_helper.h
paddle/phi/backends/gpu/forwards.h
paddle/phi/backends/gpu/gpu_context.cc
paddle/phi/backends/gpu/gpu_context.h
paddle/phi/backends/gpu/gpu_decls.h
paddle/phi/backends/gpu/gpu_resources.cc
paddle/phi/backends/gpu/gpu_resources.h
paddle/phi/backends/gpu/rocm/rocm_device_function.h
paddle/phi/core/custom_kernel.cc
paddle/phi/core/distributed/check/nccl_dynamic_check.h
paddle/phi/core/distributed/comm_context_manager.h
paddle/phi/core/enforce.h
paddle/phi/core/flags.cc
paddle/phi/core/visit_type.h
paddle/phi/kernels/funcs/aligned_vector.h
paddle/phi/kernels/funcs/blas/blas_impl.cu.h
paddle/phi/kernels/funcs/blas/blaslt_impl.cu.h
paddle/phi/kernels/funcs/broadcast_function.h
paddle/phi/kernels/funcs/concat_and_split_functor.cu
paddle/phi/kernels/funcs/cublaslt.h
paddle/phi/kernels/funcs/deformable_conv_functor.cu
paddle/phi/kernels/funcs/distribution_helper.h
paddle/phi/kernels/funcs/dropout_impl.cu.h
paddle/phi/kernels/funcs/elementwise_base.h
paddle/phi/kernels/funcs/elementwise_grad_base.h
paddle/phi/kernels/funcs/fused_gemm_epilogue.h
paddle/phi/kernels/funcs/gemm_int8_helper.h
paddle/phi/kernels/funcs/inclusive_scan.h
paddle/phi/kernels/funcs/layer_norm_impl.cu.h
paddle/phi/kernels/funcs/math_cuda_utils.h
paddle/phi/kernels/funcs/reduce_function.h
paddle/phi/kernels/funcs/scatter.cu.h
paddle/phi/kernels/funcs/top_k_function_cuda.h
paddle/phi/kernels/funcs/weight_only_gemv.cu
paddle/phi/kernels/fusion/cutlass/utils/cuda_utils.h
paddle/phi/kernels/fusion/gpu/attn_gemm.h
paddle/phi/kernels/fusion/gpu/fused_dropout_add_utils.h
paddle/phi/kernels/fusion/gpu/fused_dropout_helper.h
paddle/phi/kernels/fusion/gpu/fused_layernorm_residual_dropout_bias.h
paddle/phi/kernels/fusion/gpu/fused_linear_param_grad_add_kernel.cu
paddle/phi/kernels/fusion/gpu/fused_softmax_mask_upper_triangle_utils.h
paddle/phi/kernels/fusion/gpu/fused_softmax_mask_utils.h
paddle/phi/kernels/fusion/gpu/mmha_util.cu.h
paddle/phi/kernels/gpu/accuracy_kernel.cu
paddle/phi/kernels/gpu/amp_kernel.cu
paddle/phi/kernels/gpu/batch_norm_grad_kernel.cu
paddle/phi/kernels/gpu/contiguous_kernel.cu
paddle/phi/kernels/gpu/decode_jpeg_kernel.cu
paddle/phi/kernels/gpu/deformable_conv_grad_kernel.cu
paddle/phi/kernels/gpu/depthwise_conv.h
paddle/phi/kernels/gpu/dist_kernel.cu
paddle/phi/kernels/gpu/flash_attn_grad_kernel.cu
paddle/phi/kernels/gpu/flash_attn_kernel.cu
paddle/phi/kernels/gpu/flash_attn_utils.h
paddle/phi/kernels/gpu/gelu_funcs.h
paddle/phi/kernels/gpu/generate_proposals_kernel.cu
paddle/phi/kernels/gpu/group_norm_kernel.cu
paddle/phi/kernels/gpu/interpolate_grad_kernel.cu
paddle/phi/kernels/gpu/kthvalue_kernel.cu
paddle/phi/kernels/gpu/llm_int8_linear_kernel.cu
paddle/phi/kernels/gpu/masked_select_kernel.cu
paddle/phi/kernels/gpu/nonzero_kernel.cu
paddle/phi/kernels/gpu/roi_align_grad_kernel.cu
paddle/phi/kernels/gpu/roi_align_kernel.cu
paddle/phi/kernels/gpu/strided_copy_kernel.cu
paddle/phi/kernels/gpu/top_k_kernel.cu
paddle/phi/kernels/gpu/top_p_sampling_kernel.cu
paddle/phi/kernels/gpu/unique_consecutive_functor.h
paddle/phi/kernels/gpu/unique_kernel.cu
paddle/phi/kernels/gpudnn/conv_cudnn_v7.h
paddle/phi/kernels/gpudnn/softmax_gpudnn.h
paddle/phi/kernels/impl/deformable_conv_grad_kernel_impl.h
paddle/phi/kernels/impl/llm_int8_matmul_kernel_impl.h
paddle/phi/kernels/impl/matmul_kernel_impl.h
paddle/phi/kernels/impl/multi_dot_kernel_impl.h
paddle/phi/kernels/primitive/datamover_primitives.h
paddle/phi/kernels/primitive/kernel_primitives.h
paddle/phi/tools/CMakeLists.txt
paddle/utils/flat_hash_map.h
patches/eigen/TensorReductionGpu.h
python/paddle/base/framework.py
python/paddle/distributed/launch/controllers/watcher.py
python/paddle/profiler/profiler_statistic.py
python/paddle/utils/cpp_extension/cpp_extension.py
python/paddle/utils/cpp_extension/extension_utils.py
test/CMakeLists.txt
test/cpp/CMakeLists.txt
test/cpp/jit/CMakeLists.txt
test/cpp/new_executor/CMakeLists.txt
test/legacy_test/test_flash_attention.py
tools/ci_op_benchmark.sh
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ PaddlePaddle is originated from industrial practices with dedication and commitm

## Installation

### Latest PaddlePaddle Release: [v2.5](https://github.com/PaddlePaddle/Paddle/tree/release/2.5)
### Latest PaddlePaddle Release: [v2.6](https://github.com/PaddlePaddle/Paddle/tree/release/2.6)

Our vision is to enable deep learning for everyone via PaddlePaddle.
Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest features of PaddlePaddle.
Expand Down
4 changes: 2 additions & 2 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@

## 安装

### PaddlePaddle最新版本: [v2.5](https://github.com/PaddlePaddle/Paddle/tree/release/2.5)
### PaddlePaddle 最新版本: [v2.6](https://github.com/PaddlePaddle/Paddle/tree/release/2.6)

跟进PaddlePaddle最新特性请参考我们的[版本说明](https://github.com/PaddlePaddle/Paddle/releases)
跟进 PaddlePaddle 最新特性请参考我们的[版本说明](https://github.com/PaddlePaddle/Paddle/releases)

### 安装最新稳定版本:
```
Expand Down
2 changes: 1 addition & 1 deletion README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ PaddlePaddle は、工業化に対するコミットメントを持つ工業的

## インストール

### PaddlePaddle の最新リリース: [v2.5](https://github.com/PaddlePaddle/Paddle/tree/release/2.5)
### PaddlePaddle の最新リリース: [v2.6](https://github.com/PaddlePaddle/Paddle/tree/release/2.6)

私たちのビジョンは、PaddlePaddle を通じて、誰もが深層学習を行えるようにすることです。
PaddlePaddle の最新機能を追跡するために、私たちの[リリースのお知らせ](https://github.com/PaddlePaddle/Paddle/releases)を参照してください。
Expand Down
Loading