Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

golang version 1.20 on MacOS ARM processors generates status-go-library with missing symbols #20135

Closed
siddarthkay opened this issue May 21, 2024 · 10 comments · Fixed by #20248
Closed
Assignees

Comments

@siddarthkay
Copy link
Contributor

siddarthkay commented May 21, 2024

Problem

make test-contract on MacOS fails with :

dyld[30877]: missing symbol called

The problem initially showcased itself after the golang 1.20 upgrade was merged.
ref -> #19802

To fix this problem quickly I had modified the status-go-library derivation to build with goPackage 1.19 only for Darwin
ref -> #19965

However a PR last week at status-go bumped go-waku version and along with it a bunch of libraries that are incompatible with golang 1.19, specifically quic-go
ref -> status-im/status-go#5150

Here are possible solutions to the current problem:

  • Ask Devs to stop developing on MacOS OR
  • revert the status-go PR OR
  • find a proper fix for the missing symbol issue.

note : further upgrading go to 1.21 or 1.22 may or may not fix this issue.

@siddarthkay siddarthkay self-assigned this May 21, 2024
@siddarthkay siddarthkay changed the title golang version 1.20 on MacOS Silicon generates status-go-library with missing symbols golang version 1.20 on MacOS ARM processors generates status-go-library with missing symbols May 21, 2024
@siddarthkay
Copy link
Contributor Author

To investigate I tried adding more debug flags to the node process that calls contract tests like this :

diff --git a/Makefile b/Makefile
index b2ad755f6..6b5b62d32 100644
--- a/Makefile
+++ b/Makefile
@@ -345,7 +345,7 @@ else
        yarn node-pre-gyp rebuild && \
        yarn shadow-cljs compile mocks && \
        yarn shadow-cljs compile test && \
-       node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
+       node --trace-warnings --trace-uncaught --inspect-brk --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
 endif

 test: export SHADOW_OUTPUT_TO := target/test/test.js

and then I open chrome://inspect and set check these 2 values :

  • pause on uncaught exceptions
  • pause on caught exceptions

just to get a feel of whats going on.

@siddarthkay
Copy link
Contributor Author

siddarthkay commented May 22, 2024

right away I see something suspicious :

"Cannot find module 'bufferutil'
Require stack:
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/buffer-util.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/permessage-deflate.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/websocket.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/index.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/jsonrpc-ws-connection/dist/index.cjs.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/core/dist/index.cjs.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/target/contract_test/test.js"

and

"Cannot find module 'utf-8-validate'
Require stack:
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/validation.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/receiver.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/websocket.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/index.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/jsonrpc-ws-connection/dist/index.cjs.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/core/dist/index.cjs.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/target/contract_test/test.js"

@siddarthkay
Copy link
Contributor Author

To validate whether this error is related to our missing symbols problem I nuked all of @WalletConnect dependencies, usage and its reference in tests and this error message disappeared but the make test-contract was still failing the same way.

This proves that the error messages related to 'bufferutil' and 'utf-8-validate' are not causing the failures we are interested in. It would also make sense for this not to be the culprit since what we are looking for has to be something specific to MacOS.

@siddarthkay
Copy link
Contributor Author

siddarthkay commented May 22, 2024

Next up I decided to try and find the backtrace of the abort signal. I added a signal tracing function like this in modules/react-native-status/nodejs/status.cpp

diff --git a/modules/react-native-status/nodejs/status.cpp b/modules/react-native-status/nodejs/status.cpp
index 620d77032..6ae37f637 100644
--- a/modules/react-native-status/nodejs/status.cpp
+++ b/modules/react-native-status/nodejs/status.cpp
@@ -8,6 +8,7 @@
 #include <node.h>
 #include <string>
 #include <queue>
+#include <execinfo.h>

 #include "../../../result/libstatus.h"

@@ -27,6 +28,25 @@ using v8::Number;
 using v8::Value;


+void SignalHandler(int signal) {
+    printf("Caught signal: %d\n", signal);
+
+    // Print the backtrace
+    void *backtraceBuffer[8];
+    int numFrames = backtrace(backtraceBuffer, 8);
+    char **symbolList = backtrace_symbols(backtraceBuffer, numFrames);
+
+    printf("Backtrace:\n");
+    for (int i = 0; i < numFrames; ++i) {
+        printf("%s\n", symbolList[i]);
+    }
+
+    free(symbolList);
+
+    std::abort();  // Terminate the program
+}
+
+
 void _MultiAccountGenerateAndDeriveAddresses(const FunctionCallbackInfo<Value>& args) {
        Isolate* isolate = args.GetIsolate();
         Local<Context> context = isolate->GetCurrentContext();
@@ -1925,6 +1945,8 @@ void _RestoreAccountAndLogin(const FunctionCallbackInfo<Value>& args) {


 void init(Local<Object> exports) {
+       signal(SIGABRT, SignalHandler);
+       signal(SIGSEGV, SignalHandler);
        NODE_SET_METHOD(exports, "multiAccountGenerateAndDeriveAddresses", _MultiAccountGenerateAndDeriveAddresses);
        NODE_SET_METHOD(exports, "multiAccountImportPrivateKey", _MultiAccountImportPrivateKey);
        NODE_SET_METHOD(exports, "multiAccountLoadAccount", _MultiAccountLoadAccount);

@siddarthkay
Copy link
Contributor Author

And then when the abort trap is triggered by node process we are able to latch on to that signal in the cpp code and print the backtrace.
Here is what I see :

dyld[51565]: missing symbol called
Caught signal: 6
Backtrace:
0   status_nodejs_addon.node            0x0000000160001dd0 _ZN6status13SignalHandlerEi + 48
1   libsystem_platform.dylib            0x00000001828a7584 _sigtramp + 56
2   dyld                                0x0000000182560628 abort_with_payload_wrapper_internal + 104
3   dyld                                0x000000018256065c dyld + 493148
4   dyld                                0x00000001824f26b0 _ZN5dyld411CacheFinderD2Ev + 0
5   dyld                                0x0000000182526b98 _ZN5dyld44APIs11_tlv_atexitEPFvPvES1_ + 0
6   status_nodejs_addon.node            0x000000016007853c runtime.syscall.abi0 + 44
7   status_nodejs_addon.node            0x0000000160076c7c runtime.asmcgocall.abi0 + 124
Caught signal: 6
Backtrace:
0   status_nodejs_addon.node            0x0000000160001dd0 _ZN6status13SignalHandlerEi + 48
1   libsystem_platform.dylib            0x00000001828a7584 _sigtramp + 56
2   libsystem_pthread.dylib             0x0000000182876c20 pthread_kill + 288
3   libsystem_c.dylib                   0x0000000182783a20 abort + 180
4   status_nodejs_addon.node            0x0000000160001e1c _ZN6status39_MultiAccountGenerateAndDeriveAddressesERKN2v820FunctionCallbackInfoINS0_5ValueEEE + 0
5   libsystem_platform.dylib            0x00000001828a7584 _sigtramp + 56
6   dyld                                0x0000000182560628 abort_with_payload_wrapper_internal + 104
7   dyld                                0x000000018256065c dyld + 493148
.
.
.

@siddarthkay
Copy link
Contributor Author

these env vars help in adding more metadata to dyld calls.

git diff Makefile
diff --git a/Makefile b/Makefile
index b2ad755f6..0eee25cf2 100644
--- a/Makefile
+++ b/Makefile
@@ -345,7 +345,7 @@ else
        yarn node-pre-gyp rebuild && \
        yarn shadow-cljs compile mocks && \
        yarn shadow-cljs compile test && \
-       node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
+       DYLD_PRINT_APIS=1 DYLD_PRINT_BINDINGS=1 node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
 endif

 test: export SHADOW_OUTPUT_TO := target/test/test.js

@siddarthkay
Copy link
Contributor Author

I guess one viable route here would be to build status-go-library with go 1.19 and compare the binary file with building status-go-library from 1.20
I wouldnt know what to compare between the two libraries but I can try comparing results of
nm -u libstatus.a since that should give me an idea of the -u flag is supposed to display undefined symbols.

@siddarthkay
Copy link
Contributor Author

I first found a tag in status-go where the go-waku upgrade had not happened and I picked https://github.com/status-im/status-go/releases/tag/v0.179.12 just to be safe.

I then built status-go library like this :

make status-go-library

I stored the output from reading symbols of the compiled libstatus.a like this :

nm -u result/libstatus.a > go-119-symbols.log

I then removed the go 1.19 hack fix which would ensure we now use go 1.20

diff --git a/nix/status-go/library/default.nix b/nix/status-go/library/default.nix
index 3a49a98a78d..831fbfd837a 100644
--- a/nix/status-go/library/default.nix
+++ b/nix/status-go/library/default.nix
@@ -1,9 +1,6 @@
-{ stdenv, meta, source, buildGo119Package, buildGo120Package }:
-let
-  # https://github.com/status-im/status-mobile/issues/19802
-  # only for Darwin to fix Integration Tests failing with missing symbols on go 1.20
-  buildGoPackageIntegrationTest = if stdenv.isDarwin then buildGo119Package else buildGo120Package;
-in buildGoPackageIntegrationTest {
+{ stdenv, meta, source, buildGoPackage }:
+
+buildGoPackage {
   pname = source.repo;
   version = "${source.cleanVersion}-${source.shortRev}";

and then I then cleaned up the status-go library with make clean and built status-go library again with make status-go-library. This would ensure that result folder would now contain binary built with go 1.20.

I then create a new log file to compare symbols like this :

nm -u result/libstatus.a > go-120-symbols.log

@siddarthkay
Copy link
Contributor Author

And indeed there were differences which is what I would expect since the obscure error message is of a missing symbol.

Screenshot 2024-05-29 at 6 45 43 PM

However I found a familiar symbol which was missing.

Screenshot 2024-05-29 at 6 45 57 PM

And indeed this problem was introduced in go 1.20 and happens only on Darwin
related issues in go repo

The suggestion was to add -lresolv flag along with netgo which indeed made the crash go away.

@siddarthkay
Copy link
Contributor Author

Here is how I added those flags

diff --git a/nix/status-go/library/default.nix b/nix/status-go/library/default.nix
index 831fbfd837a..da3bec1e11f 100644
--- a/nix/status-go/library/default.nix
+++ b/nix/status-go/library/default.nix
@@ -22,11 +22,14 @@ buildGoPackage {
   '';
 
   # Build the Go library
+  # ld flags and netgo tag are necessary for integration tests to work on MacOS
+  # https://github.com/status-im/status-mobile/issues/20135
   buildPhase = ''
     runHook preBuild
     go build \
       -buildmode='c-archive' \
-      -tags='gowaku_skip_migrations gowaku_no_rln' \
+      -ldflags '-w -s -extldflags "-lresolv"' \
+      -tags='gowaku_skip_migrations gowaku_no_rln netgo' \
       -o "$out/libstatus.a" \
       $NIX_BUILD_TOP/main.go
     runHook postBuild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant