AutoSharded_Transformer_Based_on_PyTorch

4 nodes deployment:

CUDA_VISIBLE_DEVICES=4,5,6,7 python Transformer_AutoShard_Test.py

50%+ speed up relative to FSDP

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python Transformer_AutoShard_Test.py

50%+ speed up relative to FSDP

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
model		model
test		test
README.md		README.md