Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to get (or compile) .pdb file for sharp.node for specfic sharp version? #3569

Closed
TomaterID opened this issue Feb 23, 2023 · 38 comments
Labels

Comments

@TomaterID
Copy link

Question about an existing feature

What are you trying to achieve?

I am currently using sharp in my photo management application Tonfotos. Library does great job, thank you guys!

However, it crushes from time to time at customer side. I am collecting crash dumps, but unfortunately I cannot get much of useful information from it as I don't have .pdb file corresponding to the binary that is being shipped with npm module. Best stack trace I can get looks like that:

(... skipped windows internals...)
KERNELBASE!RaiseException+0x69
sharp_win32_x64+0x294cc
sharp_win32_x64+0xbac6
sharp_win32_x64+0x421e2
sharp_win32_x64+0x2d680
sharp_win32_x64+0x2c5dd
(... more skipped ....)

Obviously somehting is happened inside sharp, but we will never know what exactly and where until we have corresponding .pdb file. I am using sharp 0.29.3, and I would love to update, if I would be able to check if that bug was indeed fixed in later versions. But this amount of information is far from enought to find similar issues here. Having name of the failing function will be a great help. I would be able to see the code changes history and make educated decision about version to chose.

Last resort would be to upgrade to latest and pray, but that would be a lottery. This will not guarantee that this issue will go, but would give non-zero probability of adding new issues, too. (No offense, it is just the way life is.)

When you searched for similar issues, what did you find that might be related?

Unfortunately, I was not able to find relevant discussions. Everything with 'pdb' keyword looks to be installation issue.

Please provide a minimal, standalone code sample, without other dependencies, that demonstrates this question

Hmm... not sure this is relevant to my question.

Please provide sample image(s) that help explain this question

I have no idea what causes crash, is it connected to any images, etc. It is happening at customer side and all I get is crash dump from crashpad.

So I would be greateful for any help:

  1. Instructions how do I build sharp.pdb for my version of sharp
  2. Or, mabe there is already prebuilt one somewhere
  3. Or maybe you know for sure this bug was already fixed and can point me to the right version to upgrade to
  4. Or any other suggestions.

Thank you very much!

@lovell
Copy link
Owner

lovell commented Feb 23, 2023

Rather than the sharp binary, I suspect this will be coming from one of libvips dependencies. This means you'll need to build your own debug binaries for the specific version of libvips you're using with https://github.com/libvips/build-win64-mxe and its --with-debug option.

I'm 80% certain this will relate to font discovery/rendering. The next v0.32.0 release of sharp will include prebuilt binaries that include the latest cairo, pango and fontconfig, all of which have seen a lot of font-related fixes and improvements in the last year, so waiting might be easier.

@TomaterID
Copy link
Author

Thank you for your reply! My next question was about libvips, since I have crashes with this one too, and now I will try building it myself and saving PDB file for future. However, my original question was indeed about sharp, as it does crash. Any ideas how to debug this?

@lovell
Copy link
Owner

lovell commented Feb 24, 2023

Please can you provide a full stack trace of a crash that you believe is caused directly by sharp (as opposed to a crash from one of libvips dependencies where a call to sharp happens to appear in the call stack).

@TomaterID
Copy link
Author

Sure. This archive includes sample .dmp file as well as stack trace decoded by cdb.exe
sharp_win32_x64+0x294cc.zip

@lovell
Copy link
Owner

lovell commented Feb 24, 2023

Thanks, the salient part appears to be:

0000007a`4a7f9f80 00007ff7`1c4f0d37     : 00000000`00000002 00007ff8`00000000 0000007a`c00000bb 00000000`00000000 : KERNELBASE!SleepEx+0x9e
0000007a`4a7fa020 00007ff8`d83a0207     : 00000000`00000000 00007ff7`1c4f0c70 00000000`00000000 aaaa0064`6c6f6873 : tonfotos!crashpad::`anonymous namespace'::UnhandledExceptionHandler+0xc7
0000007a`4a7fa1a0 00007ff8`da655530     : 0000007a`4a7fa3e0 00007ff8`da6f7658 00000000`00000000 0000007a`4a7fa378 : KERNELBASE!UnhandledExceptionFilter+0x1e7
0000007a`4a7fa2c0 00007ff8`da63c876     : 00007ff8`da724a24 00007ff8`da5b0000 0000007a`4a7fa3e0 00007ff8`da5e0e7b : ntdll!memset+0x13b0
0000007a`4a7fa300 00007ff8`da65241f     : 00000000`00000000 0000007a`4a7fa8e0 0000007a`4a7fb340 00000000`00000000 : ntdll!_C_specific_handler+0x96
0000007a`4a7fa370 00007ff8`da6014a4     : 00000000`00000000 0000007a`4a7fa8e0 0000007a`4a7fb340 00000000`00000001 : ntdll!_chkstk+0x11f
0000007a`4a7fa3a0 00007ff8`da6011f5     : 00000000`00000000 0000007a`4a7fb1e0 00000000`00000000 0000007a`4a7faaf0 : ntdll!RtlRaiseException+0x434
0000007a`4a7faab0 00007ff8`d82bcd29     : 0000007a`4a7fb388 00007ff8`b33e7020 0000007a`4a7fb4a0 0000007a`4a7fb4b8 : ntdll!RtlRaiseException+0x185
*** WARNING: Unable to verify checksum for sharp-win32-x64.node
0000007a`4a7fb320 00007ff8`b33b94cc     : 00006832`00e3df80 00000000`00000000 0000007a`4a7fc370 00007ff8`b3392154 : KERNELBASE!RaiseException+0x69
0000007a`4a7fb400 00007ff8`b339bac6     : 0000007a`4a7fd960 0000007a`4a7fd7b0 0000007a`4a7fd960 0000007a`4a7fd7b0 : sharp_win32_x64+0x294cc
0000007a`4a7fb460 00007ff8`b33d21e2     : 0000007a`4a7fc370 00007ff8`b33ba0a5 0000007a`4a7fb658 0000007a`4a7fd7b0 : sharp_win32_x64+0xbac6
0000007a`4a7fb560 00007ff8`b33bd680     : 00007ff8`b33d21cc 0000007a`4a7fda60 0000007a`4a7fda60 aaaaaaaa`aaaaaaaa : sharp_win32_x64+0x421e2
0000007a`4a7fb5a0 00007ff8`b33bc5dd     : 00007ff8`b33d21cc 0000007a`4a7fc4d8 0000034f`00000100 00007ff7`1a5a311c : sharp_win32_x64+0x2d680
0000007a`4a7fb5d0 00007ff8`da6517a6     : 00000000`00000000 00000000`00000002 0912df01`00000000 0000007a`4a7fd7b0 : sharp_win32_x64+0x2c5dd
0000007a`4a7fb6b0 00007ff8`b33a01eb     : aaaaaaaa`aaaaaaaa 0000024d`85289300 00006832`00e3df80 0000034f`cfc6a7d1 : ntdll!RtlCaptureContext2+0x4a6 (TrapFrame @ 0000007a`4a7fba38)
0000007a`4a7fda60 00007ff8`b339cb94     : 00006832`0061fff0 00007ff7`1a541d49 0000034f`cfc6a721 00006832`01a6bd28 : sharp_win32_x64+0x101eb
0000007a`4a7fdb10 00007ff7`194955bc     : aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa : sharp_win32_x64+0xcb94
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!`anonymous namespace'::uvimpl::Work::AfterThreadPoolWork::<lambda_1>::operator
()+0x3e (Inline Function @ 00007ff7`194955bc)
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!napi_env__::CallIntoModule+0x4c (Inline Function @ 00007ff7`194955bc)
0000007a`4a7fdbb0 00007ff7`195d0388     : aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa 0000007a`4a7fdcb0 00007ff7`1f5a02e0 : tonfotos!`anonymous namespace'::uvimpl::Work::AfterThreadPoolWork+0xdc
0000007a`4a7fdc90 00007ff7`18fabcbc     : aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa : tonfotos!uv__work_done+0xc8
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!uv_process_reqs+0x16f (Inline Function @ 00007ff7`18fabcbc)
0000007a`4a7fdcf0 00007ff7`18f7872b     : 00000000`00000000 00006832`002d1080 00006832`0060e900 00006832`00708000 : tonfotos!uv_run+0x1ec
0000007a`4a7fed90 00007ff7`18f78b22     : 00000000`000007bc 00007ff8`00000000 00000000`00000000 00007ff7`1aa63023 : tonfotos!node::Environment::CleanupHandles+0x14b
0000007a`4a7fee00 00007ff7`18f4502c     : 00007ff7`1f4f6260 00007ff7`1f4f6260 00006832`00708380 00007ff7`18f785c5 : tonfotos!node::Environment::RunCleanup+0xc2
0000007a`4a7ff080 00007ff7`1798c4f4     : aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa 00007ff7`1f4f6260 aaaaaaaa`aaaaaaaa : tonfotos!node::FreeEnvironment+0x6c
0000007a`4a7ff0e0 00007ff7`1797d723     : aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa 00000000`00000030 0000007a`4a7ff248 : tonfotos!electron::NodeEnvironment::~NodeEnvironment+0x14
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!std::__1::default_delete<electron::NodeEnvironment>::operator()+0x8 (Inline Fu
nction @ 00007ff7`1797d723)
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!std::__1::unique_ptr<electron::NodeEnvironment,std::__1::default_delete<electr
on::NodeEnvironment> >::reset+0x19 (Inline Function @ 00007ff7`1797d723)
0000007a`4a7ff110 00007ff7`1846ca9a     : 0000007a`4a7ff270 0000007a`4a7ff538 00000000`00000000 0000007a`4a7ff268 : tonfotos!electron::ElectronBrowserMainParts::PostMainMessageLoopRun+0xc3
0000007a`4a7ff1c0 00007ff7`1846e2ed     : 00006832`002bc8a0 0000007a`4a7ff538 00000000`00000000 aaaaaaaa`aaaaaaaa : tonfotos!content::BrowserMainLoop::ShutdownThreadsAndCleanUp+0x1ca
0000007a`4a7ff2d0 00007ff7`18469bb8     : 0000007a`4a7ff4a0 00007ff7`1aed54ee 00006832`00291970 00006832`002408c0 : tonfotos!content::BrowserMainRunnerImpl::Shutdown+0xad

This looks like it might be an attempt to shutdown libuv whilst threads from its worker pool are still processing data. If you've not seen it, Electron provides an API to handle graceful shutdowns - https://www.electronjs.org/docs/latest/api/app#event-before-quit

The stack dump mentions Electron v14.2.6, which is EOL and uses an out-dated Node.js, so I'd recommend upgrading that bit first.

@TomaterID
Copy link
Author

Sorry, I am not sure I understand. Yes, there is 'before quit' event in electron and I use it. Is there something I should also do with sharp library too in that call in order to avoid such crashes?

Updating Electron is huge effort as it contains lots of breaking changes. And it works just fine. In any case, it does not look like it is who is causing crashes, so I don't think this is the priority right now.

@TomaterID
Copy link
Author

By the way, I provided you with just one crash dump, but the thing is, this is what happens regularly for different users. And the stack is always the same, it always has this sharp_win32_x64+0x294cc line.

@lovell
Copy link
Owner

lovell commented Feb 24, 2023

Did you see sharp.counters() and the sharp.queue event emitter? These will allow you to check for any in-flight processing. You should prevent shutdown until these are all complete, i.e. counters are all zero.

https://sharp.pixelplumbing.com/api-utility#counters
https://sharp.pixelplumbing.com/api-utility#queue

@lovell
Copy link
Owner

lovell commented Feb 24, 2023

Upgrading Electron to a version with Node.js 16 would bring in nodejs/node#35021 that might fix this.

@kleisauke
Copy link
Contributor

Building a PDB locally for sharp v0.29.3 using this patch:

--- a/node_modules/sharp/binding.gyp
+++ b/node_modules/sharp/binding.gyp
@@ -205,7 +205,8 @@
               'VCCLCompilerTool': {
                 'ExceptionHandling': 1,
                 'Optimization': 1,
-                'WholeProgramOptimization': 'true'
+                'WholeProgramOptimization': 'true',
+                'DebugInformationFormat': 3 # Generate a PDB
               },
               'VCLibrarianTool': {
                 'AdditionalOptions': [

Produces this stack trace in WinDbg (the .symopt +0x40 option was used to load mismatched PDBs):

00 00007ff8`b33b94cc     : 00006832`00e3df80 00000000`00000000 0000007a`4a7fc370 00007ff8`b3392154 : KERNELBASE!RaiseException+0x69
01 00006832`00e3df80     : 00000000`00000000 0000007a`4a7fc370 00007ff8`b3392154 00007ff8`b3390000 : sharp_win32_x64!_invalid_parameter+0x9c [minkernel\crts\ucrt\src\appcrt\misc\invalid_parameter.cpp @ 112] 
02 00000000`00000000     : 0000007a`4a7fc370 00007ff8`b3392154 00007ff8`b3390000 00000000`19930520 : 0x00006832`00e3df80

So, it looks like you ran into #2999, which was fixed in sharp v0.30.0.

@kleisauke
Copy link
Contributor

So, it looks like you ran into #2999, which was fixed in sharp v0.30.0.

Actually, the stack trace of that issue was:

Details
ucrtbase!invoke_watson+0x18
ucrtbase!_invalid_parameter_internal+0x39260
ucrtbase!invalid_parameter+0x2c
ucrtbase!invalid_parameter_noinfo+0x9
ucrtbase!_get_osfhandle+0x3bf1f
libvips_42!vips__open+0x51
libvips_42!vips_tracked_open+0xa
libvips_42!vips_target_write_amp+0x655
libvips_42!vips_object_build+0x19
libvips_42!vips_target_new_to_file+0xae
libvips_42!vips_jpegsave_mime+0x996
libvips_42!vips_object_build+0x19
libvips_42!vips_cache_operation_buildp+0x43
libvips_cpp!vips::VImage::call_option_string+0x80 [node_modules\sharp\src\libvips\cplusplus\VImage.cpp @ 536] 
libvips_cpp!vips::VImage::call+0x11 [node_modules\sharp\src\libvips\cplusplus\VImage.cpp @ 562] 
libvips_cpp!vips::VImage::jpegsave+0xea [node_modules\sharp\src\libvips\cplusplus\vips-operators.cpp @ 1777] 
sharp_win32_x64!PipelineWorker::Execute+0x5cb9 [node_modules\sharp\src\pipeline.cc @ 975] 
sharp_win32_x64!Napi::AsyncWorker::OnExecute+0x1e [node_modules\node-addon-api\napi-inl.h @ 4890] 
node!v8::base::SharedMutex::SharedMutex+0x8e8
node!uv_queue_work+0x28e
node!uv_poll_stop+0xed
node!inflateValidate+0x24c30
KERNEL32!BaseThreadInitThunk+0x1d
ntdll!RtlUserThreadStart+0x28

So, it's probably not related to that.

@TomaterID
Copy link
Author

@lovell thank you for clarification. Actually, I have lots of other native stuff going on in background of my application, but I don't need to do any magic in 'before-quit' to wait for it to finish, as electron (the node core) just waits for all running tasks to finish before killing the process. I have pretty heavy computations for face recognition, and node waits for it to finish without issue. Therefore my quess is that in sharp case it will behave the same way, unless there is bug in sharp. My point is, rather than building strange workarounds it would make sense to figure out what exactly is wrong and maybe fix it.

As I can see from discussion (thank you guys!), there is no clarity on what exaclty this bug is. Therefore my plan probably should remain as it was:

  • Generate .pdb and new sharp.node (AFAIK you can't build compatible PDB for old code, the only way is to build one together with code)
  • Release new version, collect new dumps
  • Analyze the problem with confidence and then decide what to do - fix/upgrade/make workaround in 'before-quit'/etc.

Please let me know what you think. In the mean time I will try to figure out how to compile sharp.node when it is installed as npm library.

@lovell
Copy link
Owner

lovell commented Feb 28, 2023

Were you able to add a call to sharp.counters() in the before-quit event handler to verify the queue length? If so, have you been able to confirm that this always zero when Electron shuts down all threads?

@TomaterID
Copy link
Author

I don't believe this will give us anything regarding this issue for multiple reasons:

  • In electron 'will-quit' is called when app is only intending to quit, not right before it will be killed. At that moment sharp.counters() could legitimately be >0, but that does not mean anything. Right in response to this call I will initiate shutdown of background processing. And sometimes several seconds can pass between this call and actual moment program gets shut down (i.e. all threads finished their execution). I guess you would like to know counters at that final moment, but this is not where 'will-quit' can help us.
  • Moreover, even though this bug occurs from time to time, it only occurs in production, at user site. I cannot reproduce it in my test environment. I believe it would be helpful to know counters value right before program crushes, but I have no idea how to pass that value to me together with crash log.

@lovell
Copy link
Owner

lovell commented Mar 1, 2023

Commit 4ec883e addresses what could be the source of a possible race condition at shutdown, especially when using Node.js 14, as it sometimes becomes no longer possible to call back into JS from C++. This will be in v0.32.0.

Node.js 16 improves the situation too, via commits nodejs/node@e326c41 and nodejs/node@602fb3b

@TomaterID
Copy link
Author

Quick update from my side. I managed to modify my build system so now I can actually have matching .node file in production build and .pdb file in my archive for future crash dump analysis. That was tricky since npm ci were overwriting every time locally built sharp.node with downloaded prebuilt binary. But I managed to work around that with few scripts.

Long story short, now I have full stack of the crash. However, it does not really shares more light (at least to me) on the reasons for this, as this all is happening during closing stage. But we already knew that.

In any case I am updating my production to 0.31.3 now and we will see if that will fix the that particular issue. So far in my tests everything seems to work fine and I don't see any incompatibilities. We will see about crashed after I will release it.

However, though I can now reproduce crash stack with symbols for sharp.node, apparently this is not the biggest issue. There is much more crashes inside libvips-42. And that one is much trickier to build, let alone build with PDB. I spent few hours figuring out how to build library and managed to do that using build-win-64, but I can't really go any further and build PDB, as I have zero knowledge about all the building tools you guys are using.

I hope some of errors will go away with updating version, but it is hard to believe that all of them will go, as there is wide variety of them.

I would like to ask your help with instructions on how to build libvips with PDB. I assume it will be beneficial for everyone if I would be able to provide you with crash dumps with symbols from actual production use.

However, ideal solution be, if you add PDB as target to your build system, so every time you publish .zip with prebuilt binaries to github, you would also put .zip with matching .PDB's. But that is more chirtmas wish, I would be grateful for any help, such as hint on how to build PDB's myself.

@TomaterID
Copy link
Author

TomaterID commented Mar 28, 2023

Status update. I have updated sharp to 0.31.3 (latest at that moment) and now enough time has passed sice my release to analyze the results. Overall, the amount of crashes seems to decrease couple times (though this is not an acurate calculation, rather than an guesstimate.)

Nevertheless, the issue that I described in the beginning of this thread is still there, thougn it is not that frequent now. I can provide dump files if you'd like.

Any suggestions?

Also, any ideas on how to make PDB's for libvips?

@lovell
Copy link
Owner

lovell commented Apr 27, 2023

@TomaterID Are you still seeing this problem with the latest sharp v0.32.1?

@TomaterID
Copy link
Author

I still see it on 0.31.3 which was latest at that moment. Please give me some to update to 0.32.1

@TomaterID
Copy link
Author

Status update. I did some sorting of crash reports after updating to 0.32.1 I'm afraid, the problem is still there:

image

I can send dmp files if you want, there are many of them. I am actually getting owerwhelmed with sharp crashes (most of them in libvips though). As popularity of my application is growing I am getting less confident I can continue with sharp being so unreliable. Currenly it is responsible for 95%+ of all crashes on customers side of my application.

I would appreciate any thoughts and suggestions. Not only about this paricular crash, but also about all other crashes that come from libvips. I can provide you with lots of dmp files for debugging, but having no access to libvips pdb's or even not having opportunity to build ones myself I am not sure those dmp files will be of any use to anyone.

Please help.

@lovell
Copy link
Owner

lovell commented May 8, 2023

Thanks for the updates. Which version of Electron are you using now? Does it include a version of Node.js with commits nodejs/node@e326c41 and nodejs/node@602fb3b ?

@TomaterID
Copy link
Author

I am currently using electron 14.2.6, not sure how to check if certain commit is there.

@lovell
Copy link
Owner

lovell commented May 10, 2023

https://github.com/electron/electron/blob/v14.2.6/DEPS#L20 says Node.js v14.17.0, which would suggest not, unless it has been patched.

A possible side effect of not having commit nodejs/node@e326c41 would look like the crash highlighted in #3569 (comment)

I highly recommend you upgrade to a version of Electron that uses Node.js 16+.

@TomaterID
Copy link
Author

@lovell ok, thanks. I will update to electron 22 (last one to support winows 7/8/8.1 which is still significiant poriton of users). That will include lots of testing, but I had to do that anyway, sooner or later.

I hope that will help us to get rid of this particular issue at least, it accounts to roughly 25% of crashes, so that is a good start. I will let you know about results after a while.

@TomaterID
Copy link
Author

Electron v.22.3.8, sharp v.0.32.1 the error is still there.

If you want, I can provide .dmp files.

image

@kleisauke
Copy link
Contributor

Also, any ideas on how to make PDB's for libvips?

I'll look into producing PDBs by default for release builds, to facilitate post-mortem debugging with Windows Debugger. When building libvips' Windows binaries with the --with-debug option it produces debug info in DWARF format by default, this means you currently need to debug with GDB or LLDB.

Looking at the stack trace above, I'm not sure if this is a problem with libvips (or any of its dependencies), as I'm not seeing any libvips_42! references within the stack trace. AFAIK, this call stack information should also be present without having to recompile the binaries with debug symbols, e.g. the stack trace mentioned in #3569 (comment) was done on the released binaries.

Without a reproduction it'll be hard to debug further. Does this crash only happen during shutdown? If so, it may be worth recompiling sharp with the NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS definition, see: nodejs/node-addon-api#975.

By default, throwing an exception on a terminating environment (eg. worker threads) will cause a fatal exception, terminating the Node process. This is to provide feedback to the user of the runtime error, as it is impossible to pass the error to JavaScript when the environment is terminating. In order to bypass this behavior such that the Node process will not terminate, define the preprocessor directive NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS.
https://github.com/nodejs/node-addon-api/blob/main/doc/setup.md

@TomaterID
Copy link
Author

TomaterID commented May 23, 2023

I'll look into producing PDBs by default for release builds, to facilitate post-mortem debugging with Windows Debugger.

Thank you very much, @kleisauke ! That would help a lot, since unfortunately, sharp is not yet stable enough and crashes a lot, and most of happens inside libvips. Unfortunately that happens on end-user side and all I can get is crash dumps. Having PDB's would simplify troubleshooting a lot.

Looking at the stack trace above, I'm not sure if this is a problem with libvips (or any of its dependencies), as I'm not seeing any libvips_42! references within the stack trace. AFAIK, this call stack information should also be present without having to recompile the binaries with debug symbols, e.g. the stack trace mentioned in #3569 (comment) was done on the released binaries.

You are totally right, this one happens inside sharp itself, that is why I started from reporitng it first. This is very frequent one, but this is just one of many, and most of others are inside libvips. I will be posting separate issues as I process those dumps and triage them (unfortunately, this very laborous process). One of them I have already posted: #3677 Please stay tuned for more.

Without a reproduction it'll be hard to debug further. Does this crash only happen during shutdown?

Crash dumps is all we have, unfortunately. I was not able to reproduce any of it myself and have zero idea about circumstances. I hoped that reading stack trace will give you more info about situation when it happens and hint about what would be the way to reproduce. @lovell had theory that this one of the already fixed bugs, but unfortiately I cannot confirm that - after updating sharp and electron it is still there.

However, if this one is only happening during shutdow, then it may appear not to be a real problem for users, unlike other crashes that happen during photo archive processing and are pretty annoying, as you have to restart program many times until your archive gets processed.

Anyway, I am open for your suggestions and ready to provide any support, including providing you with crash dumps. I would even love to debug myselft and push PR, but unfortunately without PDB's there is nothing I can do.

kleisauke added a commit to libvips/build-win64-mxe that referenced this issue May 23, 2023
@kleisauke
Copy link
Contributor

Commit libvips/build-win64-mxe@676260e adds support for this.

The statically-linked 64-bit libvips Windows binaries and corresponding PDB files built from that commit can be found here:
https://libvips-packaging.s3.amazonaws.com/vips-dev-w64-web-8.14.2-static.zip
https://libvips-packaging.s3.amazonaws.com/vips-pdb-w64-web-8.14.2-static.zip

Hope this helps.

@TomaterID
Copy link
Author

@kleisauke thank you so much!

What do I need to do to make sure my production code has corresponding PDB files? Do I wait for next release or just replace binaries in latest version?

@TomaterID
Copy link
Author

Have just submit another crash: #3679

If we can fix these three (#3569, #3677 and #3679) that would mean my app would crash roughly four times less often, which is a big deal. I am looking forward for more info on how to apply PDB's to production build so I can get actionable crash dumps. As you know you can only use PDB that was build togeter with your binary that crashed, so I need to release those ones first to the public, then we will be able to collect dumps that match those PDB's

kleisauke added a commit to libvips/build-win64-mxe that referenced this issue May 23, 2023
@kleisauke
Copy link
Contributor

What do I need to do to make sure my production code has corresponding PDB files?

As a best effort, I rebased the above commit on top of the released v8.14.2 binaries. You can find these PDB files here:
https://libvips-packaging.s3.amazonaws.com/vips-pdb-w64-web-8.14.2-static-rebased.zip

However, I'm not sure if the Visual Studio debugger is able to associate these PDBs to the previous released binaries, since the COFF debug directory is not available on those DLLs. AFAIK, WinDbg would load these mismatched PDBs without any issues using the .symopt +0x40 option.

@TomaterID
Copy link
Author

You can be sure that VC will not load those PDB's even if you build them from same sources. For some reason it builds different ones every time, and I guess that VC checks the checksums. Due to that reason I now build sharp myself for production in order to get matching PDB's.

I never heard about .symopt +0x40 option. I will check if I can make it work with existing dumps in WinDbg using it and will report here later.

@TomaterID
Copy link
Author

However, I'm not sure if the Visual Studio debugger is able to associate these PDBs to the previous released binaries, since the COFF debug directory is not available on those DLLs. AFAIK, WinDbg would load these mismatched PDBs without any issues using the .symopt +0x40 option.

This approach seems to work fine with WinDbg, thank you! I will soon provide more info about crashes.

@TomaterID
Copy link
Author

Just in case, fresh stack trace for this particular crash, using symopt trick I can have line numers now:

.  0  Id: 604.24a0 Suspend: 0 Teb: 00000001`2f274000 Unfrozen ""
Child-SP          RetAddr               : Args to Child                                                           : Call Site
00000001`2fbf99f8 00007ffc`5832b44e     : 00000001`2fbf9ab8 00000001`2fbfb950 00000000`00035fef 00000000`00000028 : ntdll!NtDelayExecution+0x14
*** WARNING: Unable to verify checksum for tonfotos.exe
00000001`2fbf9a00 00007ff7`f4a71387     : 00000000`00000002 00007ffc`00000000 00000001`c00000bb 00000000`00000000 : KERNELBASE!SleepEx+0x9e
00000001`2fbf9aa0 00007ffc`5840dd57     : 00000000`00000000 00007ff7`f4a712c0 00000000`00000000 00007ff7`f28b4d90 : tonfotos!crashpad::`anonymous namespace'::UnhandledExceptionHandler+0xc7 [C:\projects\src\third_party\crashpad\crashpad\client\crashpad_client_win.cc @ 186]
00000001`2fbf9c20 00007ffc`5a8f54f0     : 00000001`2fbf9e60 00007ffc`5a997618 00000000`00000000 00000001`2fbf9df8 : KERNELBASE!UnhandledExceptionFilter+0x1e7
00000001`2fbf9d40 00007ffc`5a8dc876     : 00007ffc`5a9c4a24 00007ffc`5a850000 00000001`2fbf9e60 00007ffc`5a880e7b : ntdll!memset+0x13b0
00000001`2fbf9d80 00007ffc`5a8f23df     : 00000000`00000000 00000001`2fbfa360 00000001`2fbfad40 00000000`00000000 : ntdll!_C_specific_handler+0x96
00000001`2fbf9df0 00007ffc`5a8a14a4     : 00000000`00000000 00000001`2fbfa360 00000001`2fbfad40 00000000`00000001 : ntdll!_chkstk+0x11f
00000001`2fbf9e20 00007ffc`5a8a11f5     : 00000000`00000000 00000001`2fbfabe0 00000000`00000000 00000001`2fbfa570 : ntdll!RtlRaiseException+0x434
00000001`2fbfa530 00007ffc`5830cf19     : 00000000`00000000 00007ffc`3a03bbe0 00000001`2fbfaef0 00000001`2fbfaef0 : ntdll!RtlRaiseException+0x185
*** WARNING: Unable to verify checksum for sharp-win32-x64.node
00000001`2fbfad20 00007ffc`3a00d2cc     : 00000001`2fbfd130 00002670`00020cc0 00000001`2fbfd430 64342d33`3139342d : KERNELBASE!RaiseException+0x69
00000001`2fbfae00 00007ffc`39fede33     : 00000001`2fbfd2e0 00002670`0015c008 00000001`2fbfd2e0 00000001`2fbfd130 : sharp_win32_x64!_CxxThrowException+0x90 [d:\a01\_work\12\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75]
00000001`2fbfae60 00007ffc`3a026182     : 00000001`2fbfbd70 00007ffc`3a00dea5 00000001`2fbfb058 00000001`2fbfd130 : sharp_win32_x64!Napi::Error::ThrowAsJavaScriptException+0xe3 [D:\Programming\tonfotos\node_modules\sharp\node_modules\node-addon-api\napi-inl.h @ 3077]
00000001`2fbfaf60 00007ffc`3a011480     : 00007ffc`3a02616c 00000001`2fbfd430 00000001`2fbfd430 0000020c`00e5a4b5 : sharp_win32_x64!`Napi::details::WrapCallback<<lambda_5b9db19950e03d93a469a94c39aaa749> >'::`1'::catch$5+0x16 [D:\Programming\tonfotos\node_modules\sharp\node_modules\node-addon-api\napi-inl.h @ 81]
00000001`2fbfafa0 00007ffc`3a0103dd     : 00007ffc`3a02616c 00000001`2fbfbed8 00000000`00000100 00002670`0015c130 : sharp_win32_x64!_CallSettingFrame_LookupContinuationIndex+0x20 [d:\a01\_work\12\s\src\vctools\crt\vcruntime\src\eh\amd64\handlers.asm @ 98]
00000001`2fbfafd0 00007ffc`5a8f1766     : 00000000`00000000 00002670`00000002 0000020c`00000000 00000001`2fbfd130 : sharp_win32_x64!__FrameHandler4::CxxCallCatchBlock+0x115 [d:\a01\_work\12\s\src\vctools\crt\vcruntime\src\eh\frame.cpp @ 1393]
00000001`2fbfb0b0 00007ffc`39ff291b     : 00000000`00000028 00000154`f7fedd30 00002670`00020cc0 00000001`2fbfd5a0 : ntdll!RtlCaptureContext2+0x4a6 (TrapFrame @ 00000001`2fbfb438)
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : sharp_win32_x64!Napi::AsyncWorker::OnWorkComplete::__l5::<lambda_5b9db19950e03d93a469a94c39aaa749>::operator()+0xd (Inline Function @ 00007ffc`39ff291b) [D:\Programming\tonfotos\node_modules\sharp\node_modules\node-addon-api\napi-inl.h @ 5195]
00000001`2fbfd430 00007ffc`39feec44     : 00000001`2fbfead0 00007ff7`f2646ef9 aaaaaaaa`00000000 00002670`000dc000 : sharp_win32_x64!Napi::details::WrapCallback<<lambda_5b9db19950e03d93a469a94c39aaa749> >+0x2b [D:\Programming\tonfotos\node_modules\sharp\node_modules\node-addon-api\napi-inl.h @ 79]
00000001`2fbfd4e0 00007ff7`f147edaa     : 00002670`01305658 00000000`00000001 00002670`01305600 00002670`00214000 : sharp_win32_x64!Napi::AsyncWorker::OnWorkComplete+0x44 [D:\Programming\tonfotos\node_modules\sharp\node_modules\node-addon-api\napi-inl.h @ 5193]
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!`anonymous namespace'::uvimpl::Work::AfterThreadPoolWork::<lambda_1>::operator()+0x40 (Inline Function @ 00007ff7`f147edaa) [C:\projects\src\third_party\electron_node\src\node_api.cc @ 1108]
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!napi_env__::CallIntoModule+0x56 (Inline Function @ 00007ff7`f147edaa) [C:\projects\src\third_party\electron_node\src\js_native_api_v8.h @ 88]
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!node_napi_env__::CallbackIntoModule+0x56 (Inline Function @ 00007ff7`f147edaa) [C:\projects\src\third_party\electron_node\src\node_api.cc @ 82]
00000001`2fbfd580 00007ff7`f15baac2     : 00000000`00000000 00000000`00000000 00002670`00214870 00007ff7`f815f4b0 : tonfotos!`anonymous namespace'::uvimpl::Work::AfterThreadPoolWork+0xea [C:\projects\src\third_party\electron_node\src\node_api.cc @ 1107]
00000001`2fbfd670 00007ff7`f0eb7a2d     : 00000000`00000000 00000000`00000000 00007ff7`f815f3e8 00000000`00000000 : tonfotos!uv__work_done+0xc2 [C:\projects\src\third_party\electron_node\deps\uv\src\threadpool.c @ 312]
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!uv_process_reqs+0x161 (Inline Function @ 00007ff7`f0eb7a2d) [C:\projects\src\third_party\electron_node\deps\uv\src\win\req-inl.h @ 194]
00000001`2fbfd6d0 00007ff7`f0e89c53     : 00002670`0009c7e0 00002670`00167980 00000000`00000000 00002670`00214000 : tonfotos!uv_run+0x1dd [C:\projects\src\third_party\electron_node\deps\uv\src\win\core.c @ 619]
00000001`2fbfe770 00007ff7`f0e8a573     : 00000000`00000000 aaaaaaaa`aaaaaaaa aaaaaaaa`aaaaaaaa 00000000`6ca27b90 : tonfotos!node::Environment::CleanupHandles+0x163 [C:\projects\src\third_party\electron_node\src\env.cc @ 1033]
00000001`2fbfe7e0 00007ff7`f0e49ce2     : aaaaaaaa`aaaaaaaa 00007ff7`f736d890 00000000`00000021 00007ffc`58351568 : tonfotos!node::Environment::RunCleanup+0x223 [C:\projects\src\third_party\electron_node\src\env.cc @ 1087]
00000001`2fbfeab0 00007ff7`ef26f514     : 00000000`00000000 00000001`2fbfebe0 00000001`2fbfec10 00002670`00020000 : tonfotos!node::FreeEnvironment+0xb2 [C:\projects\src\third_party\electron_node\src\api\environment.cc @ 396]
00000001`2fbfeb80 00007ff7`ef25ca1f     : 00000000`00000000 00000000`00000000 00000000`00000040 00aaaaaa`aaaaaaaa : tonfotos!electron::NodeEnvironment::~NodeEnvironment+0x14 [C:\projects\src\electron\shell\browser\javascript_environment.cc @ 310]
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!std::Cr::default_delete<electron::NodeEnvironment>::operator()+0x8 (Inline Function @ 00007ff7`ef25ca1f) [C:\projects\src\buildtools\third_party\libc++\trunk\include\__memory\unique_ptr.h @ 49]
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!std::Cr::unique_ptr<electron::NodeEnvironment,std::Cr::default_delete<electron::NodeEnvironment> >::reset+0x19 (Inline Function @ 00007ff7`ef25ca1f) [C:\projects\src\buildtools\third_party\libc++\trunk\include\__memory\unique_ptr.h @ 281]
00000001`2fbfebb0 00007ff7`f0158d84     : 00000001`2fbfed40 00000001`2fbfed20 00000001`2fbfed50 00000001`2fbfed48 : tonfotos!electron::ElectronBrowserMainParts::PostMainMessageLoopRun+0x22f [C:\projects\src\electron\shell\browser\electron_browser_main_parts.cc @ 628]
00000001`2fbfeca0 00007ff7`f015a8ae     : aaaaaaaa`aaaaaaaa 00000000`00000000 00000000`67e9ed47 00000000`67df8e0a : tonfotos!content::BrowserMainLoop::ShutdownThreadsAndCleanUp+0x1c4 [C:\projects\src\content\browser\browser_main_loop.cc @ 1090]
00000001`2fbfedb0 00007ff7`f0155f28     : 00000000`00000010 00000000`003d0900 00000000`00000000 00000000`00000000 : tonfotos!content::BrowserMainRunnerImpl::Shutdown+0x8e [C:\projects\src\content\browser\browser_main_runner_impl.cc @ 191]
00000001`2fbfee50 00007ff7`ef3f51ef     : 0000d26d`7da4b02e 00000000`00000018 00000000`ffffffff 00007ff7`f737b440 : tonfotos!content::BrowserMain+0xd8 [C:\projects\src\content\browser\browser_main.cc @ 35]
00000001`2fbfef10 00007ff7`ef3f68b0     : 00000000`00000001 00007ff7`f0ecb368 aaaaaaaa`aaaaaaaa 00007ff7`f2c9af06 : tonfotos!content::RunBrowserProcessMain+0xdf [C:\projects\src\content\app\content_main_runner_impl.cc @ 716]
00000001`2fbff010 00007ff7`ef3f6258     : 00000000`00000001 00000000`00000008 00007ff7`f751000b 0000d26d`7da4afbe : tonfotos!content::ContentMainRunnerImpl::RunBrowser+0x610 [C:\projects\src\content\app\content_main_runner_impl.cc @ 1257]
00000001`2fbff190 00007ff7`ef3f2670     : 0000fee7`cdb0a857 00007ffc`5a8834f1 00000000`00000000 00007ff7`f751e858 : tonfotos!content::ContentMainRunnerImpl::Run+0x358 [C:\projects\src\content\app\content_main_runner_impl.cc @ 1116]
00000001`2fbff2d0 00007ff7`ef3f27af     : 00000001`2fbff601 00000154`cee84a48 00000154`cee84a50 00007ff7`f80fb8f0 : tonfotos!content::RunContentProcess+0x610 [C:\projects\src\content\app\content_main.cc @ 346]
00000001`2fbff530 00007ff7`ef16d0d1     : 00000000`00000010 00000001`2fbff6c1 00007ff7`eefa0000 00000000`00000000 : tonfotos!content::ContentMain+0x6f [C:\projects\src\content\app\content_main.cc @ 374]
00000001`2fbff5d0 00007ff7`f2fe3e62     : 00000000`00000000 00007ff7`f2fe3ed9 00000000`00000000 00000000`00000000 : tonfotos!wWinMain+0x3c1 [C:\projects\src\electron\shell\app\electron_main_win.cc @ 244]
*** WARNING: Unable to verify checksum for KERNEL32.DLL
(Inline Function) --------`--------     : --------`-------- --------`-------- --------`-------- --------`-------- : tonfotos!invoke_main+0x21 (Inline Function @ 00007ff7`f2fe3e62) [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 118]
00000001`2fbff7c0 00007ffc`59cc7614     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : tonfotos!__scrt_common_main_seh+0x106 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
00000001`2fbff800 00007ffc`5a8a26a1     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x14
00000001`2fbff830 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

@lovell
Copy link
Owner

lovell commented May 26, 2023

Thanks for the updates. Given you're compiling your own sharp, and if you've not already tried, please can you add NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS to the defines here and see if it helps.

sharp/binding.gyp

Lines 72 to 74 in 3340120

'defines': [
'NAPI_VERSION=7'
],

@TomaterID
Copy link
Author

OK, will do. We'll have to wait for results.

@lovell
Copy link
Owner

lovell commented Jun 3, 2023

I've added NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS via commit f5845c7 - this will be part of v0.32.2

@lovell
Copy link
Owner

lovell commented Jul 21, 2023

Closing as this was superceded by #3677

@lovell lovell closed this as completed Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants