Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated crash on 4.3.2-alpine with no log output #256

Closed
noahsbwilliams opened this issue Apr 24, 2021 · 30 comments · Fixed by #263
Closed

Repeated crash on 4.3.2-alpine with no log output #256

noahsbwilliams opened this issue Apr 24, 2021 · 30 comments · Fixed by #263

Comments

@noahsbwilliams
Copy link

noahsbwilliams commented Apr 24, 2021

Container mysteriously exits without error output (docker logs ghost shows only a standard restart message).

Site shows "We'll be right back" / updating screen during startup.

Tested with SQLite & MySQL.

Downgrading to 4.2.2 appears to fix temporarily.

Multiple other users have reported the same issue.

@pascalandy
Copy link
Contributor

pascalandy commented Apr 25, 2021

EDIT: it crash when I upload a picture.

@Gelmo
Copy link

Gelmo commented Apr 25, 2021

I am able to reproduce this issue consistently by uploading a page image. Reproducible on 4.3.2, not reproducible on 4.2.0

@Gelmo
Copy link

Gelmo commented Apr 25, 2021

@pascalandy Does the issue occur when you upload a page image for a new Page?

@leandrotoledo
Copy link

Here it's been crashing when it serves a static content I've uploaded previously.

Version: 4.3.2-alpine
Database: sqlite3
Theme: attila (attila-3.0.0)

Using the non-alpine image temporarily fixes the issue.

@pascalandy
Copy link
Contributor

I was able to replicate the issue as well.
@acburdine maybe you remember: we had a similar issue a while back (upload pic crash on alpine img.)

@acburdine
Copy link
Collaborator

acburdine commented Apr 28, 2021

I'll try and spend some time later today to see if I can get an idea on what's occurring. Edit: ran out of time today, will do my best to get this looked at either tomorrow or Friday

If I had to take a guess, I'd imagine the sharp dependency got updated, and we're missing some library somewhere on alpine that's causing the crash.

@pascalandy
Copy link
Contributor

Yeah, I remember this sharp thing, thanks for you time :)

@pascalandy
Copy link
Contributor

From a core developer from Ghost:

4.2.2 is the latest available version. The 4.3.x versions were deprecated because of a migrations bug.

@acburdine
Copy link
Collaborator

The 4.3.x versions were deprecated because of a migrations bug.

This is correct, but the migrations bug isn't related to the issue with alpine.

@acburdine
Copy link
Collaborator

definitely looks like sharp is the issue here - it bumped from v0.25 => v0.28 recently, which bumped the required version of libvips

@acburdine
Copy link
Collaborator

Not entirely sure what to do here unfortunately.... Sharp v0.28 requires v8.10.6 of libvips, but the latest version available on the 3.13 branch of alpine's package index is 8.10.5 😞 The edge branch has 8.10.6 available, but I'm not sure if it's a good idea to add the edge branch to the build.

@acburdine
Copy link
Collaborator

Ok, it's definitely a sharp issue - after doing some additional debugging/adding some log statements I was able to get this:

[2021-04-29 12:02:34] INFO "GET /ghost/api/canary/admin/users/?limit=all&include=roles" 200 204ms
[2021-04-29 12:02:34] INFO "GET /ghost/api/canary/admin/snippets/?limit=all" 200 174ms
[2021-04-29 12:02:37] INFO "GET /ghost/api/canary/admin/slugs/post/test%20post/" 200 32ms
[2021-04-29 12:02:37] INFO "POST /ghost/api/canary/admin/posts/" 201 135ms
Segmentation fault

That said, I'm not experienced enough with C/C++ to know how to fix/debug much further 😕 In the meantime, I'll look and see if there's a way to improve Ghost's validation checks so that it will not run sharp at all if something didn't install correctly.

@acburdine
Copy link
Collaborator

Ok - I think I've got at least a temporary solution. Switching back to node 12 for the alpine image fixes the issue for me. Will make a PR for that here shortly.

acburdine added a commit to acburdine/ghost-1 that referenced this issue Apr 29, 2021
refs docker-library#256
- node:14-alpine3.13 has an issue where the underlying sharp image library causes a segfault
- node:12-alpine3.12 doesn't appear to have this problem, so we'll use it for now
acburdine added a commit that referenced this issue Apr 29, 2021
refs #256
- node:14-alpine3.13 has an issue where the underlying sharp image library causes a segfault
- node:12-alpine3.12 doesn't appear to have this problem, so we'll use it for now
@acburdine
Copy link
Collaborator

the fixed 4-alpine image should hopefully go out today. Will keep this issue open until the underlying issue is discovered/resolved 👍🏻

@pascalandy
Copy link
Contributor

pascalandy commented Apr 29, 2021

I confirm that rolling back to node 12 fixed the issue for Ghost v4.3.3. See here.

@pascalandy
Copy link
Contributor

pascalandy commented Apr 29, 2021

As this is not the first time sharp is causing trouble, we should create a test in our CI. I guess we need to:

  1. start the container
  2. test the container is running the default ghost homepage
  3. create a new post
  4. upload a picture in the post

But I don't know how to automate steps 2-3. Here is my Github Action test for my Ghost image.

@tianon
Copy link
Member

tianon commented Apr 29, 2021

Ooof, sharp is causing issues for all non-amd64 architectures too... 🤕

ERR! sharp Use with glibc 2.28 requires manual installation of libvips >= 8.9.1
info sharp Attempting to build from source via node-gyp but this may fail due to the above error
info sharp Please see https://sharp.pixelplumbing.com/install for required dependencies

(Debian's libvips is only 8.7.4)

@acburdine
Copy link
Collaborator

acburdine commented Apr 29, 2021

Ooof, sharp is causing issues for all non-amd64 architectures too...

Hmm, I wonder if that will actually cause an issue with the running container though. Ghost has some checks in the source code and will skip running sharp code if sharp didn't install correctly. The issue afaict with Alpine is that sharp thinks It's installed correctly, and then segfaults whenever there's a call to its code.

That isn't to say we shouldn't try and fix sharp on non-amd64, but I'd be interested to know if it actually breaks the container at present.

@tianon
Copy link
Member

tianon commented Apr 29, 2021

All the non-amd64 builds of Ghost 4.x are failing completely (although frankly, the Node + Yarn + gyp error spam is really hard to parse through to find the actual root cause for why it's completely balking).

@tianon
Copy link
Member

tianon commented Apr 29, 2021

(Failing to build, to be clear.)

@acburdine
Copy link
Collaborator

acburdine commented Apr 29, 2021

ohhhhh ok, didn't realize the build itself was failing.

I have some code that I was working with locally earlier to try and debug the alpine issue that was essentially repeating our "force sqlite3 install" process, but for sharp instead of sqlite - I'll see if that approach works for debian. (though I'm not 100% sure how to test building on a different arch nvm I have a raspberry pi lying around that I can try and run the build on 😅 )

@tianon
Copy link
Member

tianon commented Apr 29, 2021

If you're using Docker Desktop (Mac or Windows) you should be able to get pretty far with docker build --platform linux/arm64 (for example), which will do some emulation and might be faster than your Pi. 😄

@acburdine
Copy link
Collaborator

docker build --platform linux/arm64

TIL - thanks! 😄

@leandrotoledo
Copy link

I can confirm it's working now 👍 Thank you!

@pascalandy
Copy link
Contributor

Yes, it works with linux/amd64, but not with linux/arm64,linux/arm/v7.

From: https://github.com/firepress-org/ghostfire/blob/master/.github/workflows/build.yml#L256

@tianon
Copy link
Member

tianon commented May 1, 2021

Yep, that's this:

All the non-amd64 builds of Ghost 4.x are failing completely (although frankly, the Node + Yarn + gyp error spam is really hard to parse through to find the actual root cause for why it's completely balking).

Which turns out to be related to/caused by sharp (it requires too new of a libvips and there aren't precompiled Linux releases of sharp for anything but amd64).

@acburdine
Copy link
Collaborator

acburdine commented May 1, 2021

I did figure out why the build error's occurring. In theory, sharp's an optional dep and should just be ignored if the build fails; however, we're force-installing sqlite3 with the --build-from-source option on non-amd64 architectures (or rather, arch w/o pre-built binaries). When we attempt to force install sqlite3, sharp also attempts to reinstall + build from source at the same time.

I went down the rabbit hole a bit trying to see if we could get sharp to install/build on non-amd64, and it looks like the only way to get it to work is to upgrade the version of glibc used in Debian Buster, which requires compiling glibc from source, and then (after maybe re-compiling the other intermediary tools used?) compiling libvips from source. I'm not experienced enough with C compilation to really understand all the steps involved, so I probably wouldn't be able to figure out (in a relatively timely manner) how to fix the sharp installation issue.

I'm thinking our best bet to fix the Debian build issue is to figure out how to get yarn to not attempt to re-install/recompile sharp from source when we're attempting to install sqlite3. If we're able to do that, sharp will still not be installed, but the Ghost code already knows how to handle that case and will still function

@pascalandy
Copy link
Contributor

This was fixed and confirm by the Ghost Core team:
TryGhost/Ghost#12967 (comment)

I was able to build Ghost using ARG NODE_VERSION="14-alpine3.13"

@acburdine
Copy link
Collaborator

I'll test updating to node 14 again on alpine to see if it's fixed 👍🏻

acburdine added a commit to acburdine/ghost-1 that referenced this issue Jul 8, 2021
closes docker-library#256
- sharp's issue has been fixed, node 14 doesn't cause segfaults on alpine anymore
@acburdine
Copy link
Collaborator

confirmed alpine + node 14 works again 🎉

acburdine added a commit that referenced this issue Jul 8, 2021
closes #256
- sharp's issue has been fixed, node 14 doesn't cause segfaults on alpine anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants