Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error source; free leaked allocations; fix refcounting; BSTR is wide, not UTF-8; convert return_hr to expression #10

Merged
merged 7 commits into from
Nov 10, 2020

Conversation

MarijnS95
Copy link
Member

After a look at current code with @DBouma we decided that taking references to ComPtrs is ever so slightly more natural. More importantly DXC returns objects with their refcount already incremented (detached from an internal ComPtr by taking the pointer out without decreasing the refcount). To ensure adequate cleanup it is convenient if these pointers can't be left lingering around by accident.

Simplify conversion loops and in particular pointer magic in get_children to shove a raw pointer into a ComPtr, which com_rs was not designed for. The returned pointer

In case we want to walk the .source() trace and print all inner
exceptions (when these are usually rewrapped in thiserror types in
higher layers), this exception needs to be marked as source to be
returned from that function.
@MarijnS95 MarijnS95 changed the title Use *mut ComPtr instead of *mut *mut Fix error source; free leaked (array) allocations; convert return_hr to expression Nov 10, 2020
for i in 0..result_length {
// get_children allocates a buffer to pass the result in. Vec will
// free it on Drop.
Vec::from_raw_parts(result, result_length as usize, result_length as usize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be unsafe because result is from a different allocator then what Vec::drop will call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more inconvenient than thought because the tests use CoTaskMemFree, but the GetChildren function allocating these seem to override the thread allocator:

https://github.com/microsoft/DirectXShaderCompiler/blob/a8d9780046cb64a1cea842fa6fc28a250e3e2c09/tools/clang/tools/libclang/dxcisenseimpl.cpp#L836

I don't think this overrides what allocator is used by CoTaskMemFree, this is merely a concept in DXC?

So, while it is possible to allocate things on a different IMalloc (we have a SetMalloc on IDxcLibrary) I don't think that's used for this particular list (CoTaskMemAlloc), and should hence be safely freed using CoTaskMemFree on Windows and free everywhere else?

Perhaps we should open up an issue in DXC to clarify this interface, and provide a convenient function to do the cleanup for us in their OS-dependent way, as we can't be guaranteed this never changes.

Copy link
Member Author

@MarijnS95 MarijnS95 Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Windows CI didn't fail meaning I didn't mistype pub use winapi::um::combaseapi::CoTaskMemFree; nor forgot the feature flag, but I have not tested actually running this code on Windows yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think CoTaskMemFree here is fine because the DXC unit-tests also call CoTaskMemFree here.

@MarijnS95 MarijnS95 force-pushed the use-comptr-type branch 2 times, most recently from c2098e8 to 3aabfc6 Compare November 10, 2020 10:48
Solves the following Valgrind trace:

    ==168242== 24 bytes in 1 blocks are definitely lost in loss record 2 of 14
    ==168242==    at 0x483A77F: malloc (vg_replace_malloc.c:307)
    ==168242==    by 0x5AFC3F4: ???
    ==168242==    by 0x5AFCFB3: ???
    ==168242==    by 0x12BD23: hassle_rs::intellisense::ffi::IDxcCursor::get_children (macros.rs:108)
    ==168242==    by 0x1167E6: hassle_rs::intellisense::wrapper::DxcCursor::get_all_children (wrapper.rs:210)
    ==168242==    by 0x111B8A: intellisense_tu::main (intellisense-tu.rs:43)
    ==168242==    by 0x112F5A: core::ops::function::FnOnce::call_once (function.rs:227)
    ==168242==    by 0x1132FD: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:137)
    ==168242==    by 0x113510: std::rt::lang_start::{{closure}} (rt.rs:66)
    ==168242==    by 0x14F810: call_once<(),Fn<()>> (function.rs:259)
    ==168242==    by 0x14F810: do_call<&Fn<()>,i32> (panicking.rs:373)
    ==168242==    by 0x14F810: try<i32,&Fn<()>> (panicking.rs:337)
    ==168242==    by 0x14F810: catch_unwind<&Fn<()>,i32> (panic.rs:379)
    ==168242==    by 0x14F810: std::rt::lang_start_internal (rt.rs:51)
    ==168242==    by 0x1134E6: std::rt::lang_start (rt.rs:65)
Make return_hr more versatile by providing Ok or Err as expession
instead of forcing a return out of the current function. This follows
Rusts convention to provide a block result as an unterminated expression
on the last line, and allows it to be used in an expression context
including with the question mark operator.
On Windows the right API needs to be used to free this pointer in the
same way as it was allocated by DXC (using CoTaskMemAlloc). On
non-Windows platforms this call is [delegated] to `free` from libc.

[delegated]: https://github.com/microsoft/DirectXShaderCompiler/blob/a8d9780046cb64a1cea842fa6fc28a250e3e2c09/include/dxc/Support/WinAdapter.h#L46
The same call to SysFreeString happens in the Windows counterpart.

Solves the following leaks found by valgrind:

    ==213075== 40 bytes in 1 blocks are definitely lost in loss record 4 of 13
    ==213075==    at 0x483A77F: malloc (vg_replace_malloc.c:307)
    ==213075==    by 0x5B03395: ???
    ==213075==    by 0x5AFC77E: ???
    ==213075==    by 0x12C766: hassle_rs::intellisense::ffi::IDxcCursor::get_display_name (macros.rs:108)
    ==213075==    by 0x116373: hassle_rs::intellisense::wrapper::DxcCursor::get_display_name (wrapper.rs:236)
    ==213075==    by 0x11286D: intellisense_tu::main (intellisense-tu.rs:33)
    ==213075==    by 0x11371A: core::ops::function::FnOnce::call_once (function.rs:227)
    ==213075==    by 0x11348D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:137)
    ==213075==    by 0x111E10: std::rt::lang_start::{{closure}} (rt.rs:66)
    ==213075==    by 0x150A20: call_once<(),Fn<()>> (function.rs:259)
    ==213075==    by 0x150A20: do_call<&Fn<()>,i32> (panicking.rs:373)
    ==213075==    by 0x150A20: try<i32,&Fn<()>> (panicking.rs:337)
    ==213075==    by 0x150A20: catch_unwind<&Fn<()>,i32> (panic.rs:379)
    ==213075==    by 0x150A20: std::rt::lang_start_internal (rt.rs:51)
    ==213075==    by 0x111DE6: std::rt::lang_start (rt.rs:65)
    ==213075==
    ==213075== 56 (16 direct, 40 indirect) bytes in 1 blocks are definitely lost in loss record 5 of 13
    ==213075==    at 0x483ADEF: operator new(unsigned long) (vg_replace_malloc.c:342)
    ==213075==    by 0x4FD058D: ???
    ==213075==    by 0x4FC04D9: ???
    ==213075==    by 0x40112DD: call_init.part.0 (in /usr/lib/ld-2.32.so)
    ==213075==    by 0x40113C7: _dl_init (in /usr/lib/ld-2.32.so)
    ==213075==    by 0x4A100E4: _dl_catch_exception (in /usr/lib/libc-2.32.so)
    ==213075==    by 0x4015704: dl_open_worker (in /usr/lib/ld-2.32.so)
    ==213075==    by 0x4A10087: _dl_catch_exception (in /usr/lib/libc-2.32.so)
    ==213075==    by 0x4014F3D: _dl_open (in /usr/lib/ld-2.32.so)
    ==213075==    by 0x489434B: ??? (in /usr/lib/libdl-2.32.so)
    ==213075==    by 0x4A10087: _dl_catch_exception (in /usr/lib/libc-2.32.so)
    ==213075==    by 0x4A10152: _dl_catch_error (in /usr/lib/libc-2.32.so)
    ==213075==
    ==213075== 124 bytes in 3 blocks are definitely lost in loss record 9 of 13
    ==213075==    at 0x483A77F: malloc (vg_replace_malloc.c:307)
    ==213075==    by 0x5B03395: ???
    ==213075==    by 0x5AFC77E: ???
    ==213075==    by 0x12C766: hassle_rs::intellisense::ffi::IDxcCursor::get_display_name (macros.rs:108)
    ==213075==    by 0x116373: hassle_rs::intellisense::wrapper::DxcCursor::get_display_name (wrapper.rs:236)
    ==213075==    by 0x112F63: intellisense_tu::main (intellisense-tu.rs:52)
    ==213075==    by 0x11371A: core::ops::function::FnOnce::call_once (function.rs:227)
    ==213075==    by 0x11348D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:137)
    ==213075==    by 0x111E10: std::rt::lang_start::{{closure}} (rt.rs:66)
    ==213075==    by 0x150A20: call_once<(),Fn<()>> (function.rs:259)
    ==213075==    by 0x150A20: do_call<&Fn<()>,i32> (panicking.rs:373)
    ==213075==    by 0x150A20: try<i32,&Fn<()>> (panicking.rs:337)
    ==213075==    by 0x150A20: catch_unwind<&Fn<()>,i32> (panic.rs:379)
    ==213075==    by 0x150A20: std::rt::lang_start_internal (rt.rs:51)
    ==213075==    by 0x111DE6: std::rt::lang_start (rt.rs:65)

Fixes: 94fbad7 ("Support libdxcompiler.so on Linux (#5)")
Resulting strings were only one character long because the second byte
for ASCII text is NULL.

On Linux DXC properly uses wchar_t which is 32-bit instead of 16-bit.

Fixes: 94fbad7 ("Support libdxcompiler.so on Linux (#5)")
Valgrind shows that we are leaking the DxcIncludeHandlerWrapper
allocation:

    ==96758== 485 (80 direct, 405 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 17
    ==96758==    at 0x483A77F: malloc (vg_replace_malloc.c:307)
    ==96758==    by 0x1A1E4B: alloc::alloc::alloc (alloc.rs:74)
    ==96758==    by 0x1A1F09: alloc::alloc::Global::alloc_impl (alloc.rs:153)
    ==96758==    by 0x1A2069: <alloc::alloc::Global as core::alloc::AllocRef>::alloc (alloc.rs:203)
    ==96758==    by 0x1A1D9A: alloc::alloc::exchange_malloc (alloc.rs:281)
    ==96758==    by 0x199A1A: new<hassle_rs::wrapper::DxcIncludeHandlerWrapper> (boxed.rs:175)
    ==96758==    by 0x199A1A: hassle_rs::wrapper::DxcCompiler::prep_include_handler (wrapper.rs:248)
    ==96758==    by 0x199D6F: hassle_rs::wrapper::DxcCompiler::compile (wrapper.rs:278)
    ==96758==    by 0x195C70: hassle_rs::utils::compile_hlsl (utils.rs:115)
    ==96758==    by 0x127BDF: include::main (autogen_ops.rs:0)
    ==96758==    by 0x1275CA: core::ops::function::FnOnce::call_once (autogen_context.rs:1683)
    ==96758==    by 0x127ADD: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:137)
    ==96758==    by 0x127530: std::rt::lang_start::{{closure}} (rt.rs:66)

This happens because of Box::into_raw, which moves the pointer out of
the box. Taking this pointer out of a referenced Box makes it live on
till the end of the function where it is appropriately dropped.

Properly freeing DxcIncludeHandlerWrapper exposes a second issue: The
blobs are now freed but have already been freed within DXC becaues of a
missing refcount: we are supposed to increment it before writing the
pointer to it to `include_source` [1].
Now that the blobs have a proper refcount and live on until DXC cleans
them up when not needed, they do not need to be kept alive inside
DxcIncludeHandlerWrapper anymore.

Finally, clean up the unnecessary clone of `source`: This String is
already retrieved by value and can be moved straight into the Rc.

[1]: https://github.com/microsoft/DirectXShaderCompiler/blob/a8d9780046cb64a1cea842fa6fc28a250e3e2c09/tools/clang/unittests/dxc_batch/dxc_batch.cpp#L553-L554
@MarijnS95
Copy link
Member Author

This turned out to be a deeper rabbit hole than anticipated. A couple mistakes were made in the Linux conversion for BSTR: this string was not freed like its Windows counterpart and interpreted as UTF-8 instead of a widestring (UTF-16 on Windows but UCS-4, 32-bit on Unix).

DXC doesn't implement their BSTR type properly either. This string is allowed to contain NULL characters by prepending the size in the four preceding bytes of the pointer (that is, *(ptr as *const u32).offset(-1) contains the size in bytes), but still ends the entire thing with a trailing NULL character. I don't feel like addressing that upstream right now.

Finally leaks and use-after-free showed up around DxcIncludeHandlerWrapper: this object was never freed and sneakily held on to DxcBlobEncoding instances which were improperly refcounted. Fixing the leak in DxcIncludeHandlerWrapper surfaced the double drop on DxcBlobEncoding stored in blobs.


if let Some(source) = source {
let pinned_source = Rc::new(source.clone());
let pinned_source = Rc::new(source);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this where we should use a Pin<Rc<String>>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yeah, I'm not sure 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes me curious how it works. The documentation really only specifies self-referential structs and moving of that object, but that is of no relevance here: even if the String moves, the underlying slice on the heap is unlikely to move.

However, Pin works on pointer types. String is a "pointer" of type &str to some owned data, and this is clear when interacting with a Pin<String>: it will only ever give the pointer type &str back meaning we cannot do any disastrous modifications like .reserve (which usually reallocates) that are only available on the String type (whenever we have mutable access to it). In other words, it appears Pin takes over the ownership for us and only ever allows us to retrieve the value of the underlying pointer.

Note that I should have been more rigorous in cleaning the Rc up too: this value is not ever borrowed more than once and hence has no effect (except that the value is technically borrowed a second time inside IDxcBlobEncoding, but Rust does not know that and cannot model it as the object exposing that interface is owned by DXC).

@MarijnS95 MarijnS95 changed the title Fix error source; free leaked (array) allocations; convert return_hr to expression Fix error source; free leaked allocations; fix refcounting; BSTR is wide, not UTF-8; convert return_hr to expression Nov 10, 2020
@Jasper-Bekkers Jasper-Bekkers merged commit c7186a4 into master Nov 10, 2020
@Jasper-Bekkers Jasper-Bekkers deleted the use-comptr-type branch November 10, 2020 23:30
MarijnS95 added a commit that referenced this pull request Mar 15, 2021
We [recently] fixed memory leaks when calling into the intellisense part
of DXC and made the decision shortly after merging to [implement
null-invariant `BSTR`s properly] in DXC's shimmed type for non-Windows,
which was promptly accepted.

Currently our implementation of `SysFreeString` together with that PR
segfaults because it frees the pointer at offset `+4` of the allocation.
That is corrected together with `SysStringLen` now reading the size of
the string from this prefix instead of looking for a (wrong!) null
character. At least our Windows and Linux implementation `utils.rs` is
the same again.

[recently]: #10
[implement null-invariant `BSTR`s properly]: microsoft/DirectXShaderCompiler#3250
MarijnS95 added a commit that referenced this pull request May 5, 2021
We [recently] fixed memory leaks when calling into the intellisense part
of DXC and made the decision shortly after merging to [implement
null-invariant `BSTR`s properly] in DXC's shimmed type for non-Windows,
which was promptly accepted.

Currently our implementation of `SysFreeString` together with that PR
segfaults because it frees the pointer at offset `+4` of the allocation.
That is corrected together with `SysStringLen` now reading the size of
the string from this prefix instead of looking for a (wrong!) null
termination character.  At least our Windows and Linux implementation
`utils.rs` is the same again.

[recently]: #10
[implement null-invariant `BSTR`s properly]: microsoft/DirectXShaderCompiler#3250
MarijnS95 added a commit that referenced this pull request May 5, 2021
We [recently] fixed memory leaks when calling into the intellisense part
of DXC and made the decision shortly after merging to [implement
null-invariant `BSTR`s properly] in DXC's shimmed type for non-Windows,
which was promptly accepted.

Currently our implementation of `SysFreeString` together with that PR
segfaults because it frees the pointer at offset `+4` of the allocation.
That is corrected together with `SysStringLen` now reading the size of
the string from this prefix instead of looking for a (wrong!) null
termination character.  At least our Windows and Linux implementation
`utils.rs` is the same again.

[recently]: #10
[implement null-invariant `BSTR`s properly]: microsoft/DirectXShaderCompiler#3250
Jasper-Bekkers pushed a commit that referenced this pull request May 5, 2021
…ize (#11)

We [recently] fixed memory leaks when calling into the intellisense part
of DXC and made the decision shortly after merging to [implement
null-invariant `BSTR`s properly] in DXC's shimmed type for non-Windows,
which was promptly accepted.

Currently our implementation of `SysFreeString` together with that PR
segfaults because it frees the pointer at offset `+4` of the allocation.
That is corrected together with `SysStringLen` now reading the size of
the string from this prefix instead of looking for a (wrong!) null
termination character.  At least our Windows and Linux implementation
`utils.rs` is the same again.

[recently]: #10
[implement null-invariant `BSTR`s properly]: microsoft/DirectXShaderCompiler#3250
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants