Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
Tom94 committed Oct 30, 2021
1 parent fa3dfe7 commit ef1cccc
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,8 @@ producing an image every 1000 training steps. Each 1000 steps should take roughl
- CMake __v3.17 or higher__.
- A __C++17__ capable compiler.
- A high-end NVIDIA GPU that supports TensorCores and has a large amount of shared memory. The framework was tested primarily with an RTX 3090.
- Ampere GeForce GPUs: compiles out of the box.
- Ampere A100: requires changing `CMAKE_CUDA_ARCHITECTURE` to 80 in `CMakeLists.txt`.
- Turing GPUs: requires changing `CMAKE_CUDA_ARCHITECTURE` to 75 in `CMakeLists.txt` as well as changing `SmArch` in `include/tiny-cuda-nn/cutlass_matmul.h` to `cutlass::arch::Sm75`.
- Ampere GPUs: compiles out of the box.
- Turing GPUs: requires changing `SmArch` in `include/tiny-cuda-nn/cutlass_matmul.h` to `cutlass::arch::Sm75`.
- The fully fused MLP component of this framework requires a __very large__ amount of shared memory in its default configuration. It will likely only work on an RTX 3090, an RTX 2080 Ti, or high-end enterprise GPUs. Lower end cards must reduce the `n_neurons` parameter or use the `CutlassMLP` (better compatibility but slower) instead.

## Compilation
Expand Down

0 comments on commit ef1cccc

Please sign in to comment.