PyTorch 1.12.1 Release, small bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
Optim
- Remove overly restrictive assert in adam #80222
Autograd
- Convolution forward over reverse internal asserts in specific case #81111
- 25% Performance regression from v0.1.1 to 0.2.0 when calculating hessian #82504
Distributed
- Fix distributed store to use add for the counter of DL shared seed #80348
- Raise proper timeout when sharing the distributed shared seed #81666
NN
- Allow register float16 weight_norm on cpu and speed up test #80600
- Fix weight norm backward bug on CPU when OMP_NUM_THREADS <= 2 #80930
- Weight_norm is not working with float16 #80599
- New release breaks torch.nn.weight_norm backwards pass and breaks all Wav2Vec2 implementations #80569
- Disable src mask for transformer and multiheadattention fastpath #81277
- Make nn.stateless correctly reset parameters if the forward pass fails #81262
- torchvision.transforms.functional.rgb_to_grayscale() + torch.nn.Conv2d() don`t work on 1080 GPU #81106
- Transformer and CPU path with src_mask raises error with torch 1.12 #81129
Data Loader
- [Locking lower ranks seed recepients https://github.com//pull/81071
CUDA
- os.environ["CUDA_VISIBLE_DEVICES"] has no effect #80876
- share_memory() on CUDA tensors no longer no-ops and instead crashes #80733
- [Prims] Unbreak CUDA lazy init #80899
- PyTorch 1.12 cu113 wheels cudnn discoverability issue #80637
- Remove overly restrictive checks for cudagraph #80881
ONNX
- ONNX cherry picks #82435
MPS
- MPS cherry picks #80898
Other
- Don't error if _warned_capturable_if_run_uncaptured not set #80345
- Initializing libiomp5.dylib, but found libomp.dylib already initialized. #78490
- Assertion error - _dl_shared_seed_recv_cnt - pt 1.12 - multi node #80845
- Add 3.10 stdlib to torch.package #81261
- CPU-only c++ extension libraries (functorch, torchtext) built against PyTorch wheels are not fully compatible with PyTorch wheels #80489