Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M1/M2 Compatibility #56

Open
gilaroni opened this issue Nov 20, 2022 · 57 comments
Open

M1/M2 Compatibility #56

gilaroni opened this issue Nov 20, 2022 · 57 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@gilaroni
Copy link

will there be a version for m1?

@LouisBrunner LouisBrunner changed the title m1 compatability M1/M2 Compatibility Dec 21, 2022
@LouisBrunner LouisBrunner added enhancement New feature or request help wanted Extra attention is needed labels Dec 21, 2022
@LouisBrunner
Copy link
Owner

Hi @gilaroni,

I have started on M1 support, it might take a bit but it's on the roadmap, yes.

@MalwarePup
Copy link

Hello,
any news here?

@eliesaikali
Copy link

Any news here, its always not possible to install valgrind ....

brew install --HEAD LouisBrunner/valgrind/valgrind
==> Fetching louisbrunner/valgrind/valgrind
==> Cloning https://github.com/LouisBrunner/valgrind-macos.git
Updating /Users/xxx/Library/Caches/Homebrew/valgrind--git
==> Checking out branch main
Already on 'main'
Your branch is up to date with 'origin/main'.
HEAD is now at ee485f9 docs: Update README for Homebrew error (#72)
==> Installing valgrind from louisbrunner/valgrind
==> ./autogen.sh
==> ./configure --prefix=/opt/homebrew/Cellar/valgrind/HEAD-ee485f9 --enable-only64bit --build=amd64-darwin
==> make
Last 15 lines from /Users/xxx/Library/Logs/Homebrew/valgrind/03.make:
fixup_macho_loadcmds.c:465:22: error: use of undeclared identifier 'x86_thread_state64_t'
= (x86_thread_state64_t*)(&w32s[2]);
^
fixup_macho_loadcmds.c:467:36: error: no member named '__rsp' in 'struct __darwin_arm_thread_state64'; did you mean '__sp'?
init_rsp = state64->__rsp;
^~~~~
__sp
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/include/mach/arm/_structs.h:141:13: note: '__sp' declared here
__uint64_t __sp; /* Stack pointer x31 */
^
7 errors generated.
make[2]: *** [fixup_macho_loadcmds] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

If reporting this issue please do so at (not Homebrew/brew or Homebrew/homebrew-core):
https://github.com/louisbrunner/homebrew-valgrind/issues

valgrind's formula was built from an unstable upstream --HEAD.
This build failure is expected behaviour.
Do not create issues about this on Homebrew's GitHub repositories.
Any opened issues will be immediately closed without response.
Do not ask for help from Homebrew or its maintainers on social media.
You may ask for help in Homebrew's discussions but are unlikely to receive a response.
Try to figure out the problem yourself and submit a fix as a pull request.
We will review it but may or may not accept it.

@TommyJD93
Copy link

any update for the M1/M2 compatibility?

@hacknus
Copy link

hacknus commented Mar 19, 2023

ran into the same error as @eliesaikali on my M2 Pro MacBook Pro running macOS Ventura 13.2.1 (22D68).

@whisper-bye
Copy link

+1

@carl-alphonce
Copy link

M1 Pro MacBook Pro running macOS Venture 13.3.1. Same issue as @eliesaikali and @hacknus.

@fakecore
Copy link

Any details about the roadmap of Valgrind runs in the M1 arch?

@zakariazh
Copy link

any update for the M1/M2 compatibility?

@moliqingwa

This comment was marked as spam.

@JoonasMykkanen
Copy link

Still need help?

@kalip2
Copy link

kalip2 commented Jul 8, 2023

Would really like Valgrind w/ MacOS Silicon for school purposes. We're using it at school!

brew install --HEAD LouisBrunner/valgrind/valgrind                  6s
Error: Valgrind is currently incompatible with ARM-based Macs, see https://github.com/LouisBrunner/valgrind-macos/issues/56

@kalip2
Copy link

kalip2 commented Jul 14, 2023

Pleeeassseee bump this up. We're using valgrind in our cs101 class to help with memory leaks in projects related to singly linked lists and pointers. We're also going to need valgrind in our upcoming cs102 class. I don't mind using our college's ubuntu desktops with valgrind, but the labs close at 10pm, and a lot of us do our best studying after 10pm.

@paulfloyd
Copy link
Contributor

Speaking from experience, a huge amount of work is needed to get things running smoothly.

@MartinDelille
Copy link

What are the alternative to valgrind on MacOS what would be usable in a continuous integration environment ?

@eliesaikali
Copy link

What are the alternative to valgrind on MacOS what would be usable in a continuous integration environment ?

@MartinDelille You can use leaks if you like on MacOS

@MartinDelille
Copy link

I wasn't aware of this xcode tool! Thanks a lot! 👌

@MalwarePup
Copy link

Speaking from experience, a huge amount of work is needed to get things running smoothly.

Yes probably, but some people want to contribute like @JoonasMykkanen, but he doesn't get any answer

@rogerburtonpatel
Copy link

I don't have the bandwidth to contribute to this right now-- can I buy you a coffee or some takeout? You're holding all our hopes and dreams.

@paulfloyd
Copy link
Contributor

FreeBSD arm64 is now working reasonably well. No signals or threads just yet.

== 709 tests, 235 stderr failures, 47 stdout failures, 5 stderrB failures, 6 stdoutB failures, 0 post failures ==

@julienhouyet

This comment was marked as duplicate.

@LouisBrunner
Copy link
Owner

LouisBrunner commented Feb 29, 2024

Great news, I have managed to run a guest binary on M1 through Valgrind for the first time. But while I am very pleased, let me be absolutely clear: this is in no way stable and can at best be described as an experimental prototype. I just thought it was too much progress not to post it.

If you'd like to test it for yourself, you can clone this repo and build the feature/m1 branch directly. Do not expect it to work, as it probably won't (at the moment only simple Unix utilities, most likely without threads/forks work). If it crashes, do not open a new issue or post a comment here for now (I will close/delete them). Posting on Discord is fine however.

Summary of the roadmap ahead:

  • Most binaries failing: need to investigate where those SIGSEGV come from (most likely initializers related) -> seemingly an issue when objc is being used
  • Many mmaps still failing with "Operation not permitted": no clue yet, very inconsistent (maybe some kind of kernel heuristic?)
  • Clearing initializers at startup: currently extremely hardcoded, might be doable using dyld (because Valgrind has this now)
  • mprotect + most errors: need to identify malloc metadata (would love to replicate the vmmap logic)
  • Other errors: combination of dyld shenanigans and other unknown issues
  • Other errors: still a lot left, esp with libxpc, libobjC, malloc shenanigans, etc etc etc
  • VEX PAUTH support: need actual support in place, or maybe passing directly to host
  • Malloc/free tracking is broken: just like on macOS 11 x86 when it came out
  • sys_icache_invalidate/___chkstk_darwin: not GPL compatible, will need a complete rewrite
  • MDEP Syscall class: clean up as it doesn't seem to exist in arm
  • Threads: currently broken, crash with NSThreadPoisoned
  • Fork: needs reviewing, most likely broken
  • Signals (syswrap/fixup_guest_state_after_syscall_interrupted): needs reviewing, most likely broken
  • GET_STARTREGS, get_StackTrace_wrk, N_CFI_REGS, pub_tool_machine: needs reviewing
  • ALLOW_RWX_WRITE: arguments are probably wrong in place
  • Tests: they don't even build yet

@Miljoen
Copy link

Miljoen commented Apr 16, 2024

Thanks for all of the hard work on this @LouisBrunner, getting valgrind to work for Apple Silicon is going to be so great for anyone trying to do development in C/C++ on a Mac.

As it stands I switched to Ubuntu for my tasks that require memchecks, but just know that many of us cannot wait to jump back and check out any first stable version.

@paulfloyd
Copy link
Contributor

And a bit of news from upstream. This morning I pushed the code for FreeBSD arm64. Other than the occasional failure related to setting up memory for new threads (I think) it works pretty well.

Might help a bit with stuff like signal resumption where Darwin and FreeBSD have very similar code.

@rdoeffinger
Copy link

rdoeffinger commented Apr 23, 2024

Not sure if helpful, but FYI I got something semiworking on a complex app with 2 patches, one that adds a (hackish) DC_ZVA instruction implementation and another that avoids crashes trying to load debug info for a lot a system libraries.
mypatch.diff.txt
EDIT: Also needs this, as some things assume 64-byte cache lines it seems:

--- a/coregrind/m_machine.c
+++ b/coregrind/m_machine.c
@@ -1859,7 +1859,8 @@ Bool VG_(machine_get_hwcaps)( void )
      vai.hwcaps |= VEX_HWCAPS_ARM64_LRCPC;
      vai.hwcaps |= VEX_HWCAPS_ARM64_DIT;
 
-     ULong ctr_el0 = 0;
+     // 64 byte cachelines
+     ULong ctr_el0 = 0x00040004;
 #else
      r = VG_(sigprocmask)(VKI_SIG_UNBLOCK, &tmp_set, &saved_set);
      vg_assert(r == 0);

With this, memcheck seems to work perfectly for my case. --tool=massif however does not...

@paulfloyd
Copy link
Contributor

There are a few bugzilla items similar to this.

First we need a better implementation of the mrs instructions.

Are they documented for Darwin?

@LouisBrunner
Copy link
Owner

Not sure if helpful, but FYI I got something semiworking on a complex app with 2 patches

Oh, very interesting! Does your app uses threads at all? Do you interface with objc in any way?

one that adds a (hackish) DC_ZVA instruction implementation and another that avoids crashes trying to load debug info for a lot a system libraries. mypatch.diff.txt EDIT: Also needs this, as some things assume 64-byte cache lines it seems:

--- a/coregrind/m_machine.c
+++ b/coregrind/m_machine.c
@@ -1859,7 +1859,8 @@ Bool VG_(machine_get_hwcaps)( void )
      vai.hwcaps |= VEX_HWCAPS_ARM64_LRCPC;
      vai.hwcaps |= VEX_HWCAPS_ARM64_DIT;
 
-     ULong ctr_el0 = 0;
+     // 64 byte cachelines
+     ULong ctr_el0 = 0x00040004;
 #else
      r = VG_(sigprocmask)(VKI_SIG_UNBLOCK, &tmp_set, &saved_set);
      vg_assert(r == 0);

Thanks, I will look into those. If you have any extra background on the reasoning behind those fixes, that would be very nice.

Some specific questions I have:

  • Regarding the debugging fix: does your binary have no debug symbols or truncated ones?
  • Regarding the cache lines: any reference for this?

With this, memcheck seems to work perfectly for my case. --tool=massif however does not...

If you get it working, feel free to post here. I am focusing mostly on memcheck as it's the default tool and there is still so much that is broken.

Are they documented for Darwin?

Couldn't find much personally. The OSS source code have a few mentions of the Apple-specific registries and that's kind of it. I don't even think you can run mrs at all in user-space (at least Valgrind always got SIGILL'd).

First we need a better implementation of the mrs instructions.

Not sure what you mean by this. I added a few extra ones here, here and here.

@paulfloyd
Copy link
Contributor

I've only started looking at this.

My understanding is that the MSR instructions are kind of like a 'soft' version of CPUID. Unlike CPUID which returns baked-in values MRS traps to the kernel. What I don't know yet is whether it is always safe to use a dirty helper or not. It will be a problem if the kernel reports capabilities that Valgrind doesn't support.

@rdoeffinger
Copy link

First we need a better implementation of the mrs instructions.

I don't think so, the OS parts of macOS at least seem to not bother and just assume things, like the granularity at which dc zva works As far as I can tell the libc usages don't care what the MSR contains (understandable, the loop would be far more complex and costly if it did).

Oh, very interesting! Does your app uses threads at all? Do you interface with objc in any way?

No, no threads at all (by default). Just a command-line application primarily targetting Linux, so no objc either.
There is a python dependency but for other reasons we can run that in a separate process, so as long as fork works all that can be skipped (though fork does not seem to work in massif is one of the problems).

Regarding the debugging fix: does your binary have no debug symbols or truncated ones?

Sorry, I should indeed have written more: there is no issue with my binary. The problem is with in-memory system libraries where this triggered. First I tried to skip all offending binaries by name in img_from_memory (it ONLY happens with system libraries and when img_from_memory is used), but it got a bit much and went with this more general solution.

Regarding the cache lines: any reference for this?

Not really, mostly guessing/trial and error. But the DCZID_EL0 register specifies the proper value, so could use that to double-check.
The current code sets this to an invalid value, hoping that generic code would not then use this instruction, but macOS OS-level code is not generic...
Purely by arch-spec we'd just need to make sure that the DCZID_EL0 return value and the size used by DC_ZVA match, but as far as I can tell that will not work on macOS, unless it all matches what the actual M processors do.
DISCLAIMER: it's all guesswork from my side.

If you get it working, feel free to post here. I am focusing mostly on memcheck as it's the default tool and there is still so much that is broken.

Yes, makes total sense. The massif issues, whatever they are, seem a bit beyond my skills. For now I'll just hope that someone else does some fixes that just happen to fix that, too!

@paulfloyd
Copy link
Contributor

First we need a better implementation of the mrs instructions.

I don't think so, the OS parts of macOS at least seem to not bother and just assume things, like the granularity at which dc zva works As far as I can tell the libc usages don't care what the MSR contains (understandable, the loop would be far more complex and costly if it did).

macOS isn't my only concern. I also want to ensure that everything works correctly on FreeBSD and Linux.

@rdoeffinger
Copy link

rdoeffinger commented Apr 24, 2024

None of my patches should change anything for FreeBSD or Linux.
At most there is a question if on non-macOS either the "dc zva" instruction should keep producing an illegal instruction error, or alternatively if DCZID_EL0 should be changed to report a value indicating support for dc zva.
But that seems at best a tiny improvement of questionable practical relevance that would come far down the priority list.

@paulfloyd
Copy link
Contributor

@rdoeffinger
Copy link

That ticket is old and lacks reproducers, but it sounds like Android might have the same issue (optimized libraries not checking if the instruction is available before using it).
If the cache line size is correctly detected by the (existing) code on Android, my patch should also work for that case.
A bit polish and/or finally updating the code for DCZID_EL0 to return the instruction as supported might make sense though.

@rdoeffinger
Copy link

rdoeffinger commented Apr 24, 2024

I put my patch also on that ticket, maybe there is some user feedback on it. Anyway feel free to re-use any part of my patches in any way you want.

@Yuri6037
Copy link

Yuri6037 commented Apr 30, 2024

Hello,

I've just tried building the M1 branch on my M1 Max and I got the exact same issue than rdoeffinger related to debug info.
This is the log I get on ANY application before applying rdoeffinger patch:

➜  valgrind-macos git:(feature/m1) ./vg-in-place ls
==18327== Memcheck, a memory error detector
==18327== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==18327== Using Valgrind-3.23.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==18327== Apple Silicon support is currently experimental, see https://github.com/LouisBrunner/valgrind-macos/issues/56 in case of issues.
==18327== Command: ls
==18327== 
==18327== Valgrind: debuginfo reader: ensure_valid failed:
==18327== Valgrind:   during call to ML_(img_strdup)
==18327== Valgrind:   request for range [845575592, +1) exceeds
==18327== Valgrind:   valid image size of 777363456 for image:
==18327== Valgrind:   "/usr/lib/system/libsystem_darwindirectory.dylib"
==18327== 
==18327== Valgrind: debuginfo reader: Possibly corrupted debuginfo file.
==18327== Valgrind: I can't recover.  Giving up.  Sorry.
==18327== 

After applying the patch and recompiling vg-in-place is now able to trace trivial applications such as ls. I have tried running a more advanced own app but it has 2 issues:

  • VG is unable to load a binary if it depends on a DYLD_LIBRARY_PATH. Environment seem completely ignored when launching vg.
  • When placing the dylib in a location the VG accepts it ends up throwing LOTs of memory errors.

This is the output of VG when attempting to trace a custom CLI tool using a Rust dylib (everything is built with debug info including the Rust dylib):

➜  valgrind-macos git:(feature/m1) ✗ DYLD_LIBRARY_PATH=../bpx-edit-core/target/debug/ ./vg-in-place ../bpx-edit-core/a.out
==19581== Memcheck, a memory error detector
==19581== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==19581== Using Valgrind-3.23.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==19581== Apple Silicon support is currently experimental, see https://github.com/LouisBrunner/valgrind-macos/issues/56 in case of issues.
==19581== Command: ../bpx-edit-core/a.out
==19581== 
--19581-- run: /usr/bin/dsymutil "/Users/xx/Projects/valgrind-macos/libBPXEditCore.dylib"
--19581-- UNKNOWN mach_msg2 unhandled MACH64_MSG_VECTOR option
--19581-- UNKNOWN mach_msg2 unhandled MACH64_MSG_VECTOR option (repeated 2 times)
==19581== Conditional jump or move depends on uninitialised value(s)
==19581==    at 0x18B0F2154: _ds_item (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1E87: ds_user_byuid (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1B83: search_item_bynumber (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F4287: getpwuid_r (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B11E813: _CFStringGetUserDefaultEncoding (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x18B11E233: __CFInitialize (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x1028B105B: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const (in /usr/lib/dyld)
==19581==    by 0x1028EF307: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581==    by 0x1028E299B: invocation function for block in dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028922FB: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028E192F: dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028EEE1B: dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581== 
==19581== Conditional jump or move depends on uninitialised value(s)
==19581==    at 0x18ACE9FE4: object_getClass (in /usr/lib/libobjc.A.dylib)
==19581==    by 0x18ADC2A8B: xpc_dictionary_get_int64 (in /usr/lib/system/libxpc.dylib)
==19581==    by 0x18B0F217F: _ds_item (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1E87: ds_user_byuid (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1B83: search_item_bynumber (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F4287: getpwuid_r (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B11E813: _CFStringGetUserDefaultEncoding (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x18B11E233: __CFInitialize (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x1028B105B: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const (in /usr/lib/dyld)
==19581==    by 0x1028EF307: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581==    by 0x1028E299B: invocation function for block in dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028922FB: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==19581== 
==19581== Conditional jump or move depends on uninitialised value(s)
==19581==    at 0x18ACE9FE8: object_getClass (in /usr/lib/libobjc.A.dylib)
==19581==    by 0x18ADC2A8B: xpc_dictionary_get_int64 (in /usr/lib/system/libxpc.dylib)
==19581==    by 0x18B0F217F: _ds_item (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1E87: ds_user_byuid (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1B83: search_item_bynumber (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F4287: getpwuid_r (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B11E813: _CFStringGetUserDefaultEncoding (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x18B11E233: __CFInitialize (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x1028B105B: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const (in /usr/lib/dyld)
==19581==    by 0x1028EF307: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581==    by 0x1028E299B: invocation function for block in dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028922FB: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==19581== 
==19581== Use of uninitialised value of size 8
==19581==    at 0x18ACE9FEC: object_getClass (in /usr/lib/libobjc.A.dylib)
==19581==    by 0x18ADC2A8B: xpc_dictionary_get_int64 (in /usr/lib/system/libxpc.dylib)
==19581==    by 0x18B0F217F: _ds_item (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1E87: ds_user_byuid (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F1B83: search_item_bynumber (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B0F4287: getpwuid_r (in /usr/lib/system/libsystem_info.dylib)
==19581==    by 0x18B11E813: _CFStringGetUserDefaultEncoding (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x18B11E233: __CFInitialize (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==19581==    by 0x1028B105B: invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const::$_0::operator()() const (in /usr/lib/dyld)
==19581==    by 0x1028EF307: invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void ( block_pointer)(unsigned int), void const*) const (in /usr/lib/dyld)
==19581==    by 0x1028E299B: invocation function for block in dyld3::MachOFile::forEachSection(void ( block_pointer)(dyld3::MachOFile::SectionInfo const&, bool, bool&)) const (in /usr/lib/dyld)
==19581==    by 0x1028922FB: dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==19581== 
[SNIP]
==19581== 
==19581== Process terminating with default action of signal 5 (SIGTRAP)
==19581==    at 0x18CA3D9E0: _NSThreadPoisoned (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18C272D07: -[NSThread init] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18CA3E6CF: ____mainNSThread_block_invoke (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18AF0A3E7: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==19581==    by 0x18AF0BC67: _dispatch_once_callout (in /usr/lib/system/libdispatch.dylib)
==19581==    by 0x18C272C9F: _NSThreadGet0 (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18C2728FF: _NSInitializePlatform (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==19581==    by 0x18ACE7CF3: load_images (in /usr/lib/libobjc.A.dylib)
==19581==    by 0x1028A478F: dyld4::RuntimeState::notifyObjCInit(dyld4::Loader const*) (in /usr/lib/dyld)
==19581==    by 0x1028AD44F: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const (in /usr/lib/dyld)
==19581==    by 0x1028AD3FF: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const (in /usr/lib/dyld)
==19581==    by 0x1028AD3FF: dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const (in /usr/lib/dyld)
==19581== 
==19581== HEAP SUMMARY:
==19581==     in use at exit: 33,966 bytes in 492 blocks
==19581==   total heap usage: 609 allocs, 117 frees, 42,575 bytes allocated
==19581== 
==19581== LEAK SUMMARY:
==19581==    definitely lost: 10,336 bytes in 323 blocks
==19581==    indirectly lost: 0 bytes in 0 blocks
==19581==      possibly lost: 2,536 bytes in 38 blocks
==19581==    still reachable: 21,094 bytes in 131 blocks
==19581==         suppressed: 0 bytes in 0 blocks
==19581== Rerun with --leak-check=full to see details of leaked memory
==19581== 
==19581== Use --track-origins=yes to see where uninitialised values come from
==19581== For lists of detected and suppressed errors, rerun with: -s
==19581== ERROR SUMMARY: 1285 errors from 161 contexts (suppressed: 2 from 2)
[1]    19581 trace trap  DYLD_LIBRARY_PATH=../bpx-edit-core/target/debug/ ./vg-in-place

I wonder if these errors are due to VG internal issues or if they are real and Apply really needs to fix their dyld... Are these errors expected?

EDIT: I also wonder why it's calling _NSThreadPoisoned as in my test I have not used any threads.

@LouisBrunner
Copy link
Owner

@Yuri6037 FYI, I edited your comment to remove the bulk of the errors (as they aren't particularly useful for this discussion and can still be accessed through the Github edit history otherwise) because your message made it difficult to scroll through this issue.

Regarding the debugging fix: does your binary have no debug symbols or truncated ones?

Sorry, I should indeed have written more: there is no issue with my binary. The problem is with in-memory system libraries where this triggered. First I tried to skip all offending binaries by name in img_from_memory (it ONLY happens with system libraries and when img_from_memory is used), but it got a bit much and went with this more general solution.

I've just tried building the M1 branch on my M1 Max and I got the exact same issue than rdoeffinger related to debug info. This is the log I get on ANY application before applying rdoeffinger patch:

Good to know that it isn't an isolated issue. I have yet to reproduce it, I will have to look into it a bit more.

  • VG is unable to load a binary if it depends on a DYLD_LIBRARY_PATH. Environment seem completely ignored when launching vg.

Valgrind should provide all DYLD env vars to the real dyld when running your binary. If you have a MRE, I would gladly look into that issue.

  • When placing the dylib in a location the VG accepts it ends up throwing LOTs of memory errors.

I wonder if these errors are due to VG internal issues or if they are real and Apply really needs to fix their dyld... Are these errors expected?

This is usually due to Valgrind not knowing about how the memory is laid out on macOS. In the past, they have been hidden through a copious amount of suppressions (a special Valgrind file). I haven't seen those specific one yet, so I don't know what's causing it but the general rule I found is: if you dig down to why an issue like _NSThreadPoisoned happens, you usually fix a lot of seemingly unrelated warnings (by mapping memory correctly).

EDIT: I also wonder why it's calling _NSThreadPoisoned as in my test I have not used any threads.

As stated in my latest update, this is currently expected. I haven't had time to investigate why that happens apart that objc is being loaded at some point. You don't have to use threads especially but if you use any macOS capability which rely on objc, you will have this crash.

Regarding the cache lines: any reference for this?

Not really, mostly guessing/trial and error. But the DCZID_EL0 register specifies the proper value, so could use that to double-check. The current code sets this to an invalid value, hoping that generic code would not then use this instruction, but macOS OS-level code is not generic... Purely by arch-spec we'd just need to make sure that the DCZID_EL0 return value and the size used by DC_ZVA match, but as far as I can tell that will not work on macOS, unless it all matches what the actual M processors do. DISCLAIMER: it's all guesswork from my side.

I am not very knowledgeable about this so thanks for this background as it will make easier to research and fix.

@paulfloyd
Copy link
Contributor

Not really, mostly guessing/trial and error. But the DCZID_EL0 register specifies the proper value, so could use that to double-check. The current code sets this to an invalid value, hoping that generic code would not then use this instruction, but macOS OS-level code is not generic... Purely by arch-spec we'd just need to make sure that the DCZID_EL0 return value and the size used by DC_ZVA match, but as far as I can tell that will not work on macOS, unless it all matches what the actual M processors do. DISCLAIMER: it's all guesswork from my side.

I am not very knowledgeable about this so thanks for this background as it will make easier to research and fix.

I just pushed a bunch of changes upstream for arm64 concerning several mrs and dc opcodes. That includes mrs dczd_el0 and dc zva.

@LouisBrunner
Copy link
Owner

I was planning to merge the arm64 changes into main last weekend but unfortunately I encountered a performance issue which needs to be addressed first.

Good news

Valgrind is basically functional on Apple Silicon on feature/m1. This branch incorporates a lot of changes:

  • macOS arm64 compatibility (of course):
    • Valgrind tools are now dynamic binaries using dummy libSystem.B and libdyld, this setup (which is mandatory) has minimal impact over the functioning of Valgrind
    • proper macOS JIT memory management
    • support loading Mach-O images with a slide (with a few hacks to bypass kernel restrictions)
  • improved macOS integration:
  • arm64 improvements (this benefits all arm64 targets):
    • most of FEAT_PAUTH should be emulated somewhat correctly
    • fmaxnm/fminnm support
    • improved FEAT_FP16 support
    • basic support for parts of FEAT_LRCPC and FEAT_DIT
  • improved reporting/debugging:
    • leverage macOS VM tags to improve memory layouts (similar to what vmmap does)
    • pick up symbols from __DATA/__DATA_DIRTY for clearer stacktrace/memory access reports
    • add GDB aarch64 registers definition (similar to all other Valgrind architectures)
    • improved LLDB support, including full symbols loading, by implementing part of their GDB Remote Protocol Extensions (and small vgdb changes)
    • very hacky support for arm64 tagged pointers in Memcheck

These changes mean that the regression tests have improved significantly, macOS 13 amd64 has a 30% reduction in failure and macOS 14 arm64 has less failures than the current main branch (amd64!) of this repository. arm64 support has also been tested extensively on macOS 14 and 15.

So you can run Valgrind, it will probably work well and all is great.

Bad news

There is some kind of memory issue which happens when Valgrind is running (ironic, isn't it?). This is very obvious when running a GUI application or the regression tests. Your machine will slow down to a crawl and simply stopping Valgrind will not be enough to restore your system into a usable state, you will need to restart. Such freezes, where a force reset was required, happened to me a handful of time when working on Valgrind.

I have been sampling my system with vm_stat and I am seeing some very weird things. However I am yet to find a reason behind this.

Due to the dramatic effect of this issue (requiring a reboot or basically crashing your computer altogether), I can't release the current state of arm64 on main.

Going forward

I have been crunching pretty hard trying to get Valgrind ready for release and need to take break from it. I would have wanted to report the release of the first macOS arm64 version but it wasn't meant to be. However, we have never been so close to a stable release and I am confident that we are at the end of this long journey.

@paulfloyd
Copy link
Contributor

Bravo! When you are ready we can work again on getting this merged upstream.

@pilotniq
Copy link

Thank you for the amazing work. I hope you are rejuvenated by your break.

I tried to run the version in the feature/m1 branch on a program of mine and got the following error. I'm on MacOS 14.7 (23H124) on an Apple M2 Max. I don't know if the following output is of interest:

` % ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump
==92036== Memcheck, a memory error detector
==92036== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==92036== Using Valgrind-3.24.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==92036== Command: ./ref_speaker_curl_pump
==92036==
--92036-- VALGRIND INTERNAL ERROR: Valgrind received a signal 10 (SIGBUS) - exiting
--92036-- si_code=1; Faulting address: 0x7000017F2BD0; sp: 0x7000017ea720

valgrind: the 'impossible' happened:
Killed by fatal signal

host stacktrace:
==92036== at 0x15A9F2974: vgModuleLocal_check_macho_and_get_rw_loads (readmacho.c:134)
==92036== by 0x15A9D28EB: vgPlain_di_notify_mmap (debuginfo.c:1395)
==92036== by 0x15AA42423: vgSysWrap_darwin_mmap_after (syswrap-darwin.c:4711)
==92036== by 0x15AA22C63: vgPlain_post_syscall (syswrap-main.c:2713)
==92036== by 0x15AA225B3: vgPlain_client_syscall (syswrap-main.c:2634)
==92036== by 0x15AA20643: handle_syscall (scheduler.c:1208)
==92036== by 0x15AA1E04B: vgPlain_scheduler (scheduler.c:1582)
==92036== by 0x15AA31AF3: run_a_thread_NORETURN (syswrap-darwin.c:126)

sched status:
running_tid=1

Thread 1: status = VgTs_Runnable syscall unix:197 (lwpid 259)
==92036== at 0x104039354: __mmap (in /usr/lib/dyld)
==92036== by 0x304FF563F: ??? (in /dev/ttys021)
==92036== by 0x10405D177: dyld4::SyscallDelegate::withReadOnlyMappedFile(Diagnostics&, char const*, bool, void ( block_pointer)(void const*, unsigned long, bool, dyld4::FileID const&, char const*)) const (in /usr/lib/dyld)
==92036== by 0x104058AD7: dyld4::JustInTimeLoader::makeJustInTimeLoaderDisk(Diagnostics&, dyld4::RuntimeState&, char const*, dyld4::Loader::LoadOptions const&, bool, unsigned int, mach_o::Layout const*) (in /usr/lib/dyld)
==92036== by 0x10404D877: dyld4::Loader::makeDiskLoader(Diagnostics&, dyld4::RuntimeState&, char const*, dyld4::Loader::LoadOptions const&, bool, unsigned int, mach_o::Layout const*) (in /usr/lib/dyld)
==92036== by 0x10404EFC3: invocation function for block in dyld4::Loader::getLoader(Diagnostics&, dyld4::RuntimeState&, char const*, dyld4::Loader::LoadOptions const&) (in /usr/lib/dyld)
==92036== by 0x10404DF23: dyld4::Loader::forEachResolvedAtPathVar(dyld4::RuntimeState&, char const*, dyld4::Loader::LoadOptions const&, dyld4::ProcessConfig::PathOverrides::Type, bool&, void ( block_pointer)(char const*, dyld4::ProcessConfig::PathOverrides::Type, bool&)) (in /usr/lib/dyld)
==92036== by 0x10403CFAB: dyld4::ProcessConfig::PathOverrides::forEachPathVariant(char const*, dyld3::Platform, bool, bool, bool&, void ( block_pointer)(char const*, dyld4::ProcessConfig::PathOverrides::Type, bool&)) const (in /usr/lib/dyld)
==92036== by 0x10404DA5B: dyld4::Loader::forEachPath(Diagnostics&, dyld4::RuntimeState&, char const*, dyld4::Loader::LoadOptions const&, void ( block_pointer)(char const*, dyld4::ProcessConfig::PathOverrides::Type, bool&)) (in /usr/lib/dyld)
==92036== by 0x10404E14F: dyld4::Loader::getLoader(Diagnostics&, dyld4::RuntimeState&, char const*, dyld4::Loader::LoadOptions const&) (in /usr/lib/dyld)
==92036== by 0x104056B8F: invocation function for block in dyld4::JustInTimeLoader::loadDependents(Diagnostics&, dyld4::RuntimeState&, dyld4::Loader::LoadOptions const&) (in /usr/lib/dyld)
==92036== by 0x104076C9F: invocation function for block in mach_o::Header::forEachDependentDylib(void ( block_pointer)(char const*, mach_o::DependentDylibAttributes, mach_o::Version32, mach_o::Version32, bool&)) const (in /usr/lib/dyld)
==92036== by 0x1040764CB: mach_o::Header::forEachLoadCommand(void ( block_pointer)(load_command const*, bool&)) const (in /usr/lib/dyld)
==92036== by 0x104076993: mach_o::Header::forEachDependentDylib(void ( block_pointer)(char const*, mach_o::DependentDylibAttributes, mach_o::Version32, mach_o::Version32, bool&)) const (in /usr/lib/dyld)
==92036== by 0x1040568EB: dyld4::JustInTimeLoader::loadDependents(Diagnostics&, dyld4::RuntimeState&, dyld4::Loader::LoadOptions const&) (in /usr/lib/dyld)
==92036== by 0x10403A88B: dyld4::prepare(dyld4::APIs&, dyld3::MachOAnalyzer const*) (in /usr/lib/dyld)
==92036== by 0x104039EF3: (below main) (in /usr/lib/dyld)
client stack range: [0x3047FC000 0x304FF7FFF] client SP: 0x304FF5500
valgrind stack range: [0x7000016EC000 0x7000017EBFFF] top usage: 16160 of 1048576

Note: see also the FAQ in the source distribution.
...
`

@paulfloyd
Copy link
Contributor

paulfloyd commented Nov 19, 2024

What kind of binary is ref_speaker_curl_pump?

The 4k buffer on line 1164 of debuginfo.c might not be big enough. Can you try making it bigger?

I need to clean up VG_(di_notify_mmap) and ML_(check_macho_and_get_rw_loads). ML_(check_macho_and_get_rw_loads) should be more like ML_(check_elf_and_get_rw_loads) taking the fd rather than relying on VG_(di_notify_mmap) to read the start of the binary into a fixed size buffer.

@pilotniq
Copy link

pilotniq commented Nov 19, 2024

Thanks for the response @paulfloyd !

What kind of binary is ref_speaker_curl_pump?

% file ref_speaker_curl_pump
ref_speaker_curl_pump: Mach-O 64-bit executable arm64

The main is a C program, compiled with clang:
Apple clang version 15.0.0 (clang-1500.3.9.4)

It is linked with code written in Rust and C++.

The 4k buffer on line 1164 of debuginfo.c might not be big enough. Can you try making it bigger?

Thanks! That did change the behavior. If i increase the size to 16384 or 65536, there is an error later:

% ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump
==55757== Memcheck, a memory error detector
==55757== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==55757== Using Valgrind-3.24.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==55757== Command: ./ref_speaker_curl_pump
==55757== 
==55757== Warning: set address range perms: large range [0x700001c000, 0x7e00024000) (defined)
==55757== Warning: set address range perms: large range [0x27e00024000, 0x107000020000) (defined)
[ two minute delay here ]
==55757== Warning: set address range perms: large range [0x7e00024000, 0x27e00024000) (noaccess)
objc[55757]: realized class 0x1ea441fb0 has corrupt data pointer: malloc_size(0x10bb008e0) = 0
zsh: killed     ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump

In this invocation, ref_speaker_curl_pump will just look at the command line arguments (none), then detect that some environment variables are not set, and do an error exit, all within C code. But I guess it doesn't even get that far.

The C files were compiled with -gdwarf-4 (since the valgrind on our Linux CI environment doesn't support the most recent debug info format). Replacing -gdwarf-4 with just -g and rebuilding, I get a similar output, except there is a run: /usr/bin/dsymutil "./ref_speaker_curl_pump" line before the warnings:

% ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump     
==7945== Memcheck, a memory error detector
==7945== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==7945== Using Valgrind-3.24.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==7945== Command: ./ref_speaker_curl_pump
==7945== 
--7945-- run: /usr/bin/dsymutil "./ref_speaker_curl_pump"
==7945== Warning: set address range perms: large range [0x700001c000, 0x7e00024000) (defined)
==7945== Warning: set address range perms: large range [0x27e00024000, 0x107000020000) (defined)
==7945== Warning: set address range perms: large range [0x7e00024000, 0x27e00024000) (noaccess)
objc[7945]: realized class 0x1ea441fb0 has corrupt data pointer: malloc_size(0x10b4008e0) = 0
zsh: killed     ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump

The executable built with these -f flags: -fno-omit-frame-pointer -fsanitize=address -fsanitize=float-cast-overflow -fsanitize=float-divide-by-zero -fsanitize=undefined -fsanitize-address-use-after-scope

@paulfloyd
Copy link
Contributor

Don't build your exe with sanitizers. Just -g and -fno-omit-frame-pointer are enough for Valgrind. Mixing sanitizers and Valgrind usually doesn't work.

Apple has done a lot of work hardening its allocators and deallocators recently using type-aware functions. I don't know if that has spilled over into malloc_size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests