Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up an automated valgrind test #69

Open
adamcrume opened this issue Mar 10, 2017 · 7 comments
Open

Set up an automated valgrind test #69

adamcrume opened this issue Mar 10, 2017 · 7 comments

Comments

@adamcrume
Copy link
Contributor

Not usually necessary for Rust, but since we're doing a lot of unsafe stuff with pointers, we should have a valgrind test. Ideally, Travis would run it and fail the build if there are issues.

@nbigaouette-eai
Copy link
Contributor

It is always a good idea to run code/tests through valgrind. One could simply run the current tests (without docs) through valgrind, and add more that use yet uncovered code path.

Something interesting too would be to add code coverage to make sure as much as possible is covered.

And then even fuzzing; I've heard good things of cargo fuzz.

@nbigaouette-eai
Copy link
Contributor

Note that jemalloc removed valgrind support (see jemalloc/jemalloc@9a8add1) but this does not seems to be included in rust's version of a jemalloc. Rust uses jemalloc @ 33184bf69813087bf1885b0993685f9d03320c69 plus a couple of patches, which is jemalloc 4.1.0 plus four patches.

@Enet4
Copy link
Contributor

Enet4 commented Mar 14, 2017

Well, I read this recent blog post on using Valgrind for this very purpose, and we are still advised to change the memory allocation system (using the nightly alloc_system feature). Other than that, this is a useful resource to whoever is attempting to resolve this issue.

@nbigaouette-eai
Copy link
Contributor

Great read, thanks.

Maybe an optional combination alloc_system+nightly+valgrind could be added. As @adamcrume mentioned in #66, it might not be a good idea to depend on alloc_system and nightly for the crate, but an option to increase the testing is definitely a good idea.

@adamcrume
Copy link
Contributor Author

We might copy the suggestion at rust-lang/rust#28224 (comment), which is to add a nightly feature and add this to the code:

#![cfg_attr(feature="nightly", feature(alloc_system))]
#[cfg(feature="nightly")]
extern crate alloc_system;

Then we can build on the nightly channel with --features=nightly and run valgrind against that.

@nbigaouette-eai
Copy link
Contributor

So I've experimented running the tests through valgrind, but it's not as trivial as it looks.

First, as mentioned above, using jemalloc (the default memory allocator in rust) will confuse valgrind. So one has to build the tests using the system allocator, which is only available when using a nightly build of rust. This means Travis would need to be setup to run nightly, possibly in addition to stable rust.

Second (and most importantly), we don't want to run cargo through valgrind, but the tests binaries. This means the command used to run the tests should not be something like valgrind cargo test because valgrind will catch cargo errors, not the bindings' ones. Suppressions could be written but this is a tedious task.

"Normally" people would run valgrind directly on the test binaries, for example valgrind tensorflow_rust.git/target/debug/deps/lib-aab328fe9f627d08. This has two problems:

  1. How to programatically extract the filename? There is a couple of these binaries in the target directory. We'd need to find them and execute them. This complexifies the .travis.yml file.
  2. The binaries are linked dynamically to libtensorflow.so and so must find them at run time. Simply running the binaries will fail with error while loading shared libraries: libtensorflow.so: cannot open shared object file: No such file or directory if LD_LIBRARY_PATH is not set properly.

Cargo will set a working environment for the binaries so it can run them. It also knows exactly what to run. So it's sad that can't directly run cargo valgrind test or something like that.

Instead of using valgrind, we could try out the different sanitizers as provided by llvm. See here for more information: https://github.com/japaric/rust-san
With a nightly rust newer than a month old, the sanitizers are included. Using them is quite easy and should work I think in Travis. On my machine for example:

$ cd tensorflow-sys
$ RUSTFLAGS="-Z sanitizer=leak" cargo +nightly test -vv -j 1 --target x86_64-unknown-linux-gnu -- --nocapture
[...]
=================================================================
==28527==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x55620849f0ea  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x400ea)
    #1 0x55620846d0ac  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe0ac)
    #2 0x55620846d020  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe020)
    #3 0x55620846d1bd  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe1bd)
    #4 0x55620847b5be  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x1c5be)
    #5 0x556208470040  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x11040)
    #6 0x5562084d8aaa  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
    #7 0x55620846f680  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x10680)
    #8 0x5562084d8aaa  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
    #9 0x5562084762f6  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x172f6)
    #10 0x5562084d0924  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x71924)
    #11 0x7ffbbbee02e6  (/usr/lib/libpthread.so.0+0x72e6)

SUMMARY: LeakSanitizer: 4 byte(s) leaked in 1 allocation(s).
error: test failed

when adding a simple std::mem::forget(Box::new(42)); to a test to force a memory leak.

This is not perfect; the implementation is not bug free nor finished. I'm also not seeing where the allocation was performed (as shown in the link example) but at least it's catching it.

What I think should be done is to first extend the Travis configuration to allow stable/beta/nightly rust. This would allow adding more testing pieces, like sanitizers or valgrind. I'll open a PR about this.

@adamcrume
Copy link
Contributor Author

I'd like to see how a compiler-based sanitizer works when interacting with code it didn't compile, i.e. what happens when we allocate something through libtensorflow.so and then don't release that.

I assume that the offsets in that stack trace could be traced back to a line of code via nm voodoo. Maybe they'll build that into the sanitizer at some point.

adamcrume added a commit to adamcrume/tensorflow-rust that referenced this issue Jun 17, 2017
This partially addresses tensorflow#69, but it requires us to build TensorFlow from source, which is too slow to do on Travis.
adamcrume added a commit to adamcrume/tensorflow-rust that referenced this issue Jun 17, 2017
This partially addresses tensorflow#69, but it requires us to build TensorFlow from source, which is too slow to do on Travis.
adamcrume added a commit to adamcrume/tensorflow-rust that referenced this issue Jul 29, 2017
This partially addresses tensorflow#69, but it requires us to build TensorFlow from source, which is too slow to do on Travis.
ramon-garcia pushed a commit to ramon-garcia/tensorflow-rust that referenced this issue May 20, 2023
* As part of tensorflow#69 refactor to break views from core dataset manipulation

* tensorflow#69 Allow for navigatable data sets that are wide
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants