Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure, node->inode != RM_NO_INODE. Lastest develop branch compiled on Raspberry pi 4 #549

Closed
james-cook opened this issue Jan 23, 2022 · 7 comments · Fixed by #551

Comments

@james-cook
Copy link

james-cook commented Jan 23, 2022

Version: latest develop branch, --version shows 2.10.1
Platform - Raspberry pi 4 with 4GB RAM

rmlint --progress -S dma -s -1TB --keep-all-tagged DIR1 // DIR2
?¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦?                Traversing (25831 usable files / 4013 + 2 ignored files / folders)
**
ERROR:lib/pathtricia.c:80:rm_node_check_inode: assertion failed: (node->inode != RM_NO_INODE)
ERROR: Aborting due to a fatal error. (signal received: Aborted)
ERROR: Please file a bug report (See rmlint -h)

I went back and recompiled 2.10.1 master to check and it compiles and runs without error with this same command, same directories, same files.

Note: This is a placeholder for the failure and investigation.
Hopefully I will have more time next week to recompile and run as advised here: #547 (comment)

@james-cook
Copy link
Author

james-cook commented Jan 24, 2022

Using information from:

Go ahead and open a separate issue for the assertion failure. It would be helpful if you could run rmlint in gdb (gdb --args rmlint ...) and print a backtrace. It seems like it should actually be impossible without some kind of corruption so building with ASAN would also be useful (CFLAGS='-fsanitize=address' LDFLAGS='-fsanitize=address' scons DEBUG=1 ).

This is the run with the recompiled rmlint, using the same command on the same directories and files:

(gdb) run
Starting program: /usr/bin/rmlint --progress -S dma -s -1TB --keep-all-tagged DIR1 // DIR2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0xb638c380 (LWP 5790)]
[New Thread 0xb59ff380 (LWP 5791)]
▕░░░░░░░░░░░░░░░░░░░░░░░░░▏                Traversing (25834 usable files / 4013 + 2 ignored files / folders)
[Thread 0xb638c380 (LWP 5790) exited]
**
ERROR:lib/pathtricia.c:80:rm_node_check_inode: assertion failed: (node->inode != RM_NO_INODE)

Thread 1 "rmlint" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0xb6a4df14 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0xb6a39230 in __GI_abort () at abort.c:79
#2  0xb6cbc8a8 in g_assertion_message () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#3  0xb6cbc948 in g_assertion_message_expr () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#4  0x00020378 in rm_node_check_inode ()
#5  0x00020550 in rm_node_get_inode ()
#6  0x0002082c in rm_file_parent_inode ()
#7  0x00020850 in rm_file_cmp_samefile ()
#8  0x00020a4c in rm_file_cmp_samefile_full ()
#9  0xb6c8e01c in  () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
(gdb) bt full
#0  0xb6a4df14 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {0 <repeats 27 times>, 100, 1, 0, 57, 7}}
        pid = <optimized out>
        tid = <optimized out>
#1  0xb6a39230 in __GI_abort () at abort.c:79
        save_stage = 1
        act =
          {__sigaction_handler = {sa_handler = 0x10, sa_sigaction = 0x10}, sa_mask = {__val = {0, 0, 492432, 557472, 4246540800, 117, 492432, 3204439296, 557472, 117, 0, 509800, 1, 509800, 3066799548, 3067431920, 3070224744, 3067434900, 0, 3070224744, 93, 1962934272, 3066674540, 509800, 0, 0, 4246540800, 509800, 509800, 3067435364, 94, 3070224744}}, sa_flags = 344292, sa_restorer = 0xbeffdd64}
        sigs = {__val = {32, 0 <repeats 31 times>}}
#2  0xb6cbc8a8 in g_assertion_message () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#3  0xb6cbc948 in g_assertion_message_expr () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#4  0x00020378 in rm_node_check_inode ()
#5  0x00020550 in rm_node_get_inode ()
#6  0x0002082c in rm_file_parent_inode ()
#7  0x00020850 in rm_file_cmp_samefile ()
#8  0x00020a4c in rm_file_cmp_samefile_full ()
#9  0xb6c8e01c in  () at /lib/arm-linux-gnueabihf/libglib-2.0.so.0
(gdb)

ASAN:
Compiling with the flags shown:
sudo CFLAGS='-fsanitize=address' LDFLAGS='-fsanitize=address' scons DEBUG=1 --prefix=/usr install leads to an error when I run the program in gdb:

(gdb) run
Starting program: /usr/bin/rmlint --progress -S dma -s -1TB --keep-all-tagged DIR1 // DIR2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
==7199==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
[Inferior 1 (process 7199) exited with code 01]
(gdb)

head of the compile "log":

Scons: Reading SConscript files ...
>> Appending custom build flags : -fsanitize=address
>> Appending custom link flags : -fsanitize=address
Checking whether the C compiler works... yes
Checking for git revision... (cached) yes
Checking for pkg-config... (cached) yes

@cebtenzzre
Copy link
Collaborator

ASAN generates its own reports so gdb isn't necessary. Seems like part of rmlint may have been built without ASAN. Try a clean rebuild with those flags:

$ scons -c
$ export CFLAGS='-fsanitize=address' LDFLAGS='-fsanitize=address'
$ scons config
$ scons DEBUG=1
$ sudo -E scons DEBUG=1 --prefix=/usr install

If you get the same error you can work around the issue with LD_PRELOAD=/usr/lib/libasan.so rmlint .... If ASAN reports nothing besides leaks you could also try valgrind on a clean build without ASAN (valgrind rmlint ...).

@james-cook
Copy link
Author

james-cook commented Jan 24, 2022

Investigating...

Just FYI, from the compiles (not just with the sanitise flags):

scons DEBUG=1
s/timestamp.c
Compiling ==> lib/formats/uniques.c
Compiling ==> lib/fts/fts.c
Building manpage from rst...
Using sphinx-build binary: /usr/bin/sphinx-build
Linking Static Library ==> librmlint.a
Ranlib Library ==> librmlint.a
Linking Program ==> rmlint
/usr/bin/ld: librmlint.a(reflink.o): in function `rm_dedupe_main':
reflink.c:(.text+0x249c): warning: lchmod is not implemented and will always fail
Cannot import `sphinx_bootstrap_theme`; falling back to `nature`.
^ This is no error, will cause only slightly different html output.
Zipping manpage...
scons: done building targets.

Not sure if the rm_dedupe_main - reflink.c - lchmod warning is important.

Raspberry pi does not have libasan at the location you mentioned.

I found it at /usr/lib/gcc/arm-linux-gnueabihf/8/libasan.so (assuming this is the correct libasan for gcc 8) (it's the only libasan.so under /usr)

Is it OK just to link in the dynamic lib and run as shown below or must I install libasan5 explicitly?

LD_PRELOAD=/usr/lib/gcc/arm-linux-gnueabihf/8/libasan.so rmlint --progress -S dma -s -1TB --keep-all-tagged DIR1 // DIR2
=================================================================
==10047==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xb2a03b80 at pc 0x00082e00 bp 0xb05fc20c sp 0xb05fc204
READ of size 8 at 0xb2a03b80 thread T2 (pool)
    #0 0x82dff in rm_file_new (/usr/bin/rmlint+0x82dff)
    #1 0x4d017 in rm_traverse_file (/usr/bin/rmlint+0x4d017)
    #2 0x4f503 in rm_traverse_directory (/usr/bin/rmlint+0x4f503)
    #3 0x36ad7 in rm_mds_factory (/usr/bin/rmlint+0x36ad7)

0xb2a03b80 is located 8 bytes to the right of 88-byte region [0xb2a03b20,0xb2a03b78)
allocated by thread T2 (pool) here:
    #0 0xb6a8bbbb in __interceptor_malloc (/usr/lib/gcc/arm-linux-gnueabihf/8/libasan.so+0xe1bbb)
    #1 0x79c57 in fts_alloc (/usr/bin/rmlint+0x79c57)

Thread T2 (pool) created by T0 here:
    #0 0xb69f59c7 in pthread_create (/usr/lib/gcc/arm-linux-gnueabihf/8/libasan.so+0x4b9c7)
    #1 0xb66be523  (/lib/arm-linux-gnueabihf/libglib-2.0.so.0+0x9c523)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/usr/bin/rmlint+0x82dff) in rm_file_new
Shadow bytes around the buggy address:
  0x36540720: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x36540730: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x36540740: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 05
  0x36540750: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x36540760: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
=>0x36540770:[fa]fa fa fa fd fd fd fd fd fd fd fd fd fd fd fa
  0x36540780: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fa
  0x36540790: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x365407a0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x365407b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
  0x365407c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==10047==ABORTING

This doesn't look like the original error though (?) - I don't see the initial traversal output on screen.
And, as you mention, these are only leaks.

@cebtenzzre
Copy link
Collaborator

I am able to reproduce the heap-buffer-overflow report with a 32-bit x86 build, but not on x86_64, so I suspect some errors in RM_PLATFORM_32 related code. Will debug further, thanks.

@cebtenzzre
Copy link
Collaborator

Try patching lib/config.h.in like this and rebuilding:

diff --git a/lib/config.h.in b/lib/config.h.in
index e9a5a3c0..30fda4e2 100644
--- a/lib/config.h.in
+++ b/lib/config.h.in
@@ -57,6 +57,7 @@
 #define LLI G_GINT64_FORMAT
 
 
+#include <stdint.h> /* for UINTPTR_MAX */
 #define RM_PLATFORM_32 (UINTPTR_MAX == 0xffffffff)
 #define RM_PLATFORM_64 (UINTPTR_MAX == 0xffffffffffffffff)
 

cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Jan 24, 2022
The worst of them were an undefined UINTPTR_MAX in RM_PLATFORM_*, which
could both be false and caused a stat struct to be mis-casted in
traverse.c, and a non-macro HASHER_FADVISE_FLAGS that made
rm_hasher_request_readahead a no-op since commit 31cd32f.

Also add a static assert in the usual ADD_FILE to make sure it never
casts between incompatible stat structs, and -Werror=undef so we don't
allow undefined macros to silently evaluate to zero.

Fixes sahib#549
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Jan 24, 2022
The worst of them were an undefined UINTPTR_MAX in RM_PLATFORM_*, which
could both be false and caused a stat struct to be mis-casted in
traverse.c, and a non-macro HASHER_FADVISE_FLAGS that made
rm_hasher_request_readahead a no-op since commit 31cd32f.

Also add a static assert in the usual ADD_FILE to make sure it never
casts between incompatible stat structs, and -Werror=undef so we don't
allow undefined macros to silently evaluate to zero.

The UINTPTR_MAX issue is a regression caused by 90edf02, which removed
the inttypes.h include in config.h.in.

Fixes sahib#549
@james-cook
Copy link
Author

I can confirm that the patch fixes the assertion failure on my platform.
Thanks :)

@james-cook
Copy link
Author

Closing. Please re-open if needed.

cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Aug 7, 2022
The worst of them were an undefined UINTPTR_MAX in RM_PLATFORM_*, which
could both be false and caused a stat struct to be mis-casted in
traverse.c, and a non-macro HASHER_FADVISE_FLAGS that made
rm_hasher_request_readahead a no-op since commit 31cd32f.

Also add a static assert in the usual ADD_FILE to make sure it never
casts between incompatible stat structs, and -Werror=undef so we don't
allow undefined macros to silently evaluate to zero.

The UINTPTR_MAX issue is a regression caused by 90edf02, which removed
the inttypes.h include in config.h.in.

Fixes sahib#549
@cebtenzzre cebtenzzre reopened this Aug 8, 2022
@cebtenzzre cebtenzzre linked a pull request Aug 9, 2022 that will close this issue
@cebtenzzre cebtenzzre removed the has-pr label Aug 9, 2022
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Aug 12, 2022
The worst of them were an undefined UINTPTR_MAX in RM_PLATFORM_*, which
caused a stat struct to be mis-casted in traverse.c, and a non-macro
HASHER_FADVISE_FLAGS that made rm_hasher_request_readahead a no-op since
commit 31cd32f.

Also add a static assert to make sure ADD_FILE never casts between
incompatible stat structs, and -Werror=undef so we don't allow undefined
macros to silently evaluate to zero.

The UINTPTR_MAX issue is a regression caused by 90edf02, which removed
the inttypes.h include from config.h.in.

Fixes sahib#549
cebtenzzre added a commit to cebtenzzre/rmlint that referenced this issue Sep 18, 2022
The worst of them were an undefined UINTPTR_MAX in RM_PLATFORM_*, which
could both be false and caused a stat struct to be mis-casted in
traverse.c, and a non-macro HASHER_FADVISE_FLAGS that made
rm_hasher_request_readahead a no-op since commit 31cd32f.

Also add a static assert in the usual ADD_FILE to make sure it never
casts between incompatible stat structs, and -Werror=undef so we don't
allow undefined macros to silently evaluate to zero.

The UINTPTR_MAX issue is a regression caused by 90edf02, which removed
the inttypes.h include in config.h.in.

Fixes sahib#549
cebtenzzre added a commit that referenced this issue Jun 20, 2023
The worst of them were an undefined UINTPTR_MAX in RM_PLATFORM_*, which
caused a stat struct to be mis-casted in traverse.c, and a non-macro
HASHER_FADVISE_FLAGS that made rm_hasher_request_readahead a no-op since
commit 31cd32f "bugfix: hopefully fix very seldom `g_mutex_clear
assert` by waiting for all buffers to return".

Also add static assertions to make sure ADD_FILE never casts between
incompatible stat structs, and -Werror=undef so we don't allow undefined
macros to silently evaluate to zero.

The UINTPTR_MAX issue is a regression caused by 90edf02 "includes:
de-lint a couple of hundred #includes", which removed the inttypes.h
include from config.h.in.

Fixes #549
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants