Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TensorStorage memory deallocation #145

Merged
merged 8 commits into from
Sep 28, 2024

Conversation

emilmgeorge
Copy link
Contributor

Changes in this PR

Details
A new struct TensorCustomAllocationOwner is added to represent a custom memory allocations made by TensorAllocators. This struct implements the Drop trait to deallocate the associated memory. An instance of this struct is then passed as owner to arrow_buffer::Buffer::from_custom_allocation, so that when the Arrow buffer is dropped, this memory will also be deallocated using the correct TensorAllocator.

Future changes
When the allocator_api feature comes to stable rust, it may be possible to use a standard container such as Vec<T, A=..> instead of the above mentioned struct to handle custom deallocation.

Copy link
Member

@edgarriba edgarriba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added some suggestions to make testing more robust

crates/kornia-core/src/storage.rs Outdated Show resolved Hide resolved
crates/kornia-core/src/storage.rs Show resolved Hide resolved
crates/kornia-core/src/storage.rs Outdated Show resolved Hide resolved
crates/kornia-core/src/storage.rs Show resolved Hide resolved
// bytes_allocated value should not change.
{
let vec = Vec::<u8>::with_capacity(len);
let storage = TensorStorage::<u8, _>::from_vec(vec, allocator.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to resolve well here when we create a storage from a Vec how's the allocator involved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth to check this apache/arrow-rs#6362

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be good. This is my related understanding:

  • Std Vector and Buffer::from_vec do not support custom allocators until allocator_api comes to stable rust. So, for our future custom allocators like CudaAllocator, from_vec will have to involve copying and use Buffer::from_custom_allocator.
  • For CpuAllocator, we can have zero-copy using Buffer::from_vec (as it is currently).
    But to be fully safe, I think we should also change CpuAllocator to use std::alloc::{alloc,dealloc} (Global allocator) instead of std::alloc::System.{alloc,dealloc}. This is because vector uses the Global allocator. This is usually the same as std::alloc::System but the user can change it using the global_allocator attribute. By changing CpuAllocator to use std::alloc::{alloc,dealloc}, it always matches the allocator used by the vector (even when user changes it).

I'm not sure how to switch the implementation of TensorStorage::from_vec to one of the above based on whether A is CpuAllocator or CudaAllocator though. (maybe different functions? Ideas welcome!)

I haven't done any of the above in this PR though. Please let me know your thoughts and I can change accordingly.

Copy link
Member

@edgarriba edgarriba Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But to be fully safe, I think we should also change CpuAllocator to use std::alloc::{alloc,dealloc} (Global allocator) instead of std::alloc::System.{alloc,dealloc}

please, do 👍

I'm not sure how to switch the implementation of TensorStorage::from_vec to one of the above based on whether A is CpuAllocator or CudaAllocator though. (maybe different functions? Ideas welcome!)

maybe the behaviour for cuda should be that when a cuda storage is created via vec, the data is consumed, cuda allocated and copied to device, and deallocate the original cpu vector ? Haven't faced yet the full use case. Probably we should use the kornia::dnn module to try this workflows and prototype from there. Found a similar c++ implementation maybe to have a reference: https://gist.github.com/CommitThis/1666517de32893e5dc4c441269f1029a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more request in this direction, is the ability to easily create Image views. As Image is tuple struct out of Tensor: https://github.com/kornia/kornia-rs/blob/main/crates/kornia-image/src/image.rs#L59

In some workflows i have different types of images out of Tensor which i need to convert to Image::new (`as_slice().to_vec() everytime which involves copies) in order to use any kornia function. Unless we decide e.g to adapt the whole api to accept Tensor, and Image in the end it's just a trait in order to give some semantics and define specific types of images with formats e.g Rgb8U, Mono8U.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand the image view part. But for converting Tensor3 to Image without copy, we could implement the TryFrom trait or a function from_tensor that uses the passed tensor if channel dimension matches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image in the end is a Tensor3 — would be a bit overkill to have a from_tensor method. I think ideally for this case we might want to have a method that somehow transfers the ownership of the storage, shape and strides ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make myself clear, this is what I had in mind:

impl<T, const C: usize> TryFrom<Tensor<T, 3, CpuAllocator>> for Image<T, C>
where
    T: SafeTensorType,
{
    type Error = ImageError;

    fn try_from(value: Tensor<T, 3, CpuAllocator>) -> Result<Self, Self::Error> {
        if value.shape[2] == C {
            Ok(Self(value))
        } else {
            Err(ImageError::InvalidTensorShape)
        }
    }
}

Used like:

let image: Image<_, 3> = tensor.try_into().unwrap(); 
// OR
let image = Image::<_, 3>::try_from(tensor).unwrap();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, sounds good! I’ll try myself in a separated PR. I have also some potential improvements for the image struct, like adding a third ImageColorSpace in order to define more specific types like
type ImageRgb8 = Image<u8, 3, ColorSpace::Rgb> which color will have associated values too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this PR, I just want to note that it only fixes the deallocation issues noted in the first comment. It does not currently include the changes related to into_vec for non-CpuAllocators as discussed above in this review thread. I had started it, but it is not ready yet. I can send a separate PR when it's ready (hope that's ok).

crates/kornia-core/src/storage.rs Outdated Show resolved Hide resolved
Copy link
Member

@edgarriba edgarriba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Deallocation should happen when `storage` goes out of scope.
{
let _storage = TensorStorage::<u8, _>::new(len, allocator.clone())?;
assert_eq!(*allocator.bytes_allocated.borrow(), len as i32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No pointer test here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the storage is not created from another buffer so there is no original pointer to compare to. Are there any other checks I should have added?

@edgarriba edgarriba merged commit eeabb12 into kornia:main Sep 28, 2024
9 checks passed
@emilmgeorge emilmgeorge deleted the bugfix/mem-leak branch September 28, 2024 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants