Adds caching layer to tensor allocations #670

coreylowman · 2023-04-07T12:41:51Z

Instead of memory being immediately freed upon drop, this instead holds it in a device specific cache object. Then when another allocation is requested of the same size, it can re-use the allocated memory.

Related to #541

Summary

Changes the Cuda::Vec and Cpu::Vec to be wrapper objects around both data and a cache object:

#[derive(Debug)]
pub struct CachableVec<E> {
    pub(crate) data: Vec<E>,
    pub(crate) destination: Arc<RwLock<BTreeMap<usize, Vec<BytesPtr>>>>,
}

Where the destination field is the cache that the object will be placed into on Drop.

There are a number of special implentations/usages that are required for this to work:

When a new tensor allocation is requested, it can be pulled from the device specific cache object
impl<E> Clone for CachableVec<E> should attempt to pull the new memory from the cache
impl<E> Drop for CachableVec<E> should insert the memory back into the cache object

Results

Cpu

cargo +nightly bench --bench conv2d -F cpu-mkl-matmul

branch	fwd	bwd
main	320ms	990ms
this	280ms	900ms

cargo bench --bench batchnorm2d

branch	fwd	bwd
main	270ms	401ms
this	263ms	312ms

Cuda

cargo +nightly bench --bench conv2d -F cuda

branch	infer	fwd	bwd
main	4.8ms	5ms	20ms
this	3.7ms	4.2ms	15ms

cargo bench --bench batchnorm2d -F cuda

branch	infer	fwd	bwd
main	3.8ms	8.8ms	25ms
this	1.8ms	4.4ms	9.3ms

src/tensor/cpu/allocate.rs

src/tensor/cpu/device.rs

src/tensor/cuda/device.rs

…o cpu-caching

src/tensor/cpu/device.rs

src/tensor/mod.rs

Cargo.toml

Adding caching to cpu

421c107

coreylowman commented Apr 7, 2023

View reviewed changes

src/tensor/cpu/allocate.rs Outdated Show resolved Hide resolved

coreylowman commented Apr 7, 2023

View reviewed changes

src/tensor/cpu/device.rs Show resolved Hide resolved

coreylowman commented Apr 7, 2023

View reviewed changes

src/tensor/cpu/device.rs Outdated Show resolved Hide resolved

Merge branch 'main' into cpu-caching

f24863a

coreylowman mentioned this pull request Apr 8, 2023

Tensor allocation caching #678

Closed

coreylowman added 5 commits April 8, 2023 15:55

Merge branch 'main' into cpu-caching

3f000cb

Tmp commit of cuda caching

10fd4ef

check passing

2ae8496

Adding alloc_empty

7c6fc52

Updating conv2d

78bd1f4

coreylowman linked an issue Apr 9, 2023 that may be closed by this pull request

Tensor allocation caching #678

Closed

coreylowman added 4 commits April 10, 2023 13:09

Using alloc_empty in cuda kernels

a705132

Reusing on clone

d077ece

Using alloc_empty for tensor_from_host_buf

3d33c9c

Using dev.null instead of replace_with_empty

731096c

coreylowman commented Apr 10, 2023

View reviewed changes

src/tensor/cuda/device.rs Outdated Show resolved Hide resolved

coreylowman commented Apr 10, 2023

View reviewed changes

src/tensor/cuda/device.rs Outdated Show resolved Hide resolved

coreylowman added 10 commits April 10, 2023 14:24

Fixing issue with clone

2fe9084

Fixing cpu cache allocations

bfe00f7

Adding custom Clone impl for CachableVec

2d2fc69

Using alloc_elem in stack/concat

99054a5

Adding empty_cache to DeviceStorage

e112b24

Adds TensorCache and uses in Cpu

947a835

Merge branch 'cpu-caching' of https://github.com/coreylowman/dfdx int…

cb72ceb

…o cpu-caching

Using TensorCache object in cuda

0769887

Adding comments to tensor cache

fa83774

Styling

19daef0

coreylowman commented Apr 10, 2023

View reviewed changes

src/tensor/cpu/device.rs Show resolved Hide resolved

Cleanup

d472116

coreylowman commented Apr 10, 2023

View reviewed changes

src/tensor/cpu/device.rs Outdated Show resolved Hide resolved

coreylowman commented Apr 10, 2023

View reviewed changes

src/tensor/mod.rs Outdated Show resolved Hide resolved

coreylowman added 9 commits April 11, 2023 11:15

Making CPU caching safer

ea52f38

Adding ability to disable cache

995d2b9

Adding allocation details to tensor docstring

f99ad4f

Formatting and adding unit tests

d0fd8fa

Adding unit tests for Cpu & Cuda

ed19aff

Adding second pass forward for resnet18 integration test

fa9ceb5

Fixing integration tests

fcf33bf

Fixing cpu tests

a67104f

Fixing cuda unit tests

ec4badf

coreylowman changed the title ~~[WIP] Adds caching layer to tensor allocations~~ Adds caching layer to tensor allocations Apr 12, 2023

coreylowman added 3 commits April 12, 2023 14:24

Fixing allocation error without fast alloc

057f7a4

Fixing cudnn kernel

bbeebf3

Fixing memory usage from cuda -> cpu transfer

df1d960

coreylowman commented Apr 12, 2023

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

coreylowman added 7 commits April 12, 2023 15:00

Merge branch 'main' into cpu-caching

6e53e4d

Fixing tensor_to_array

c113a4a

Updating cudarc version

7635f85

Merge branch 'main' into cpu-caching

b5aadb0

Clippy suggestions

5200b1b

Fixing no-std support

9780e03

Satify clippy

4b86657

coreylowman merged commit 6eb8698 into main Apr 12, 2023

coreylowman deleted the cpu-caching branch April 12, 2023 17:49

coreylowman mentioned this pull request Apr 12, 2023

Memory Leakage in cudarc 0.9.x / dfdx 0.11.x #643

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds caching layer to tensor allocations #670

Adds caching layer to tensor allocations #670

coreylowman commented Apr 7, 2023 •

edited

Loading

Adds caching layer to tensor allocations #670

Adds caching layer to tensor allocations #670

Conversation

coreylowman commented Apr 7, 2023 • edited Loading

Summary

Results

Cpu

Cuda

coreylowman commented Apr 7, 2023 •

edited

Loading