Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Helper error on macOS Apple Silicon: "qemu: uncaught target signal 11 (Segmentation fault)" #8841

Closed
sookwalinga opened this issue Feb 5, 2024 · 23 comments

Comments

@sookwalinga
Copy link

sookwalinga commented Feb 5, 2024

Using Docker helper with the latest version of CHT v4.5.1 results in a 502 Bad Gateway error and the couchdb docker container keeps restarting. It was working fine before with v4.2.4.

I ran it using a MacBook Air M1 chip, 8GB RAM


mrjones adds that @andrablaj was able to reproduce, so this is likely a CHT issue and not a user error or configuration problem:

I confirm I encounter the same error as @sookwalinga (502 Bad Gateway) on my Mac machine with Apple Silicon chip, by using the master branch of cht-core.

CouchDB logs below:

2024-02-05 17:28:46 Segmentation fault
2024-02-05 17:29:46 /docker-entrypoint.sh: line 38: warning: command substitution: ignored null byte in input
2024-02-05 17:29:46 /docker-entrypoint.sh: line 38: [: too many arguments
2024-02-05 17:29:46 /docker-entrypoint.sh: line 48: warning: command substitution: ignored null byte in input
2024-02-05 17:29:46 /docker-entrypoint.sh: line 48: [: too many arguments
2024-02-05 17:29:46 Waiting for cht couchdb
2024-02-05 17:29:47 Segmentation fault
@mrjones-plip
Copy link
Contributor

Hi @sookwalinga - thanks for filing this ticket to let us know you're having an issue.

I see there's a forum topic you started with the same subject. We most often engage with the community on the forums, so I'm going to close this ticket and encourage you and others to follow up on the forums.

thanks!

@mrjones-plip mrjones-plip changed the title public.ecr.aws/medic/cht-couchdb:4.5.1 keeps restarting and I get a 502 Bad Gateway on opening the launched app CHT 4.5.1 Docker Helper error on macOS Apple Silicon: qemu: uncaught target signal 11 (Segmentation fault) Feb 5, 2024
@mrjones-plip mrjones-plip reopened this Feb 5, 2024
@andrablaj
Copy link
Member

andrablaj commented Feb 5, 2024

I tested Docker Helper with several releases as below:

4.2.4 - Worked
4.3.0 - Worked
4.3.1 - Worked
4.3.2 - Worked
4.4.0 - Failed

4.4.0 CouchDB error logs similar to 4.5.1:

2024-02-05 19:50:19 /docker-entrypoint.sh: line 38: warning: command substitution: ignored null byte in input
2024-02-05 19:50:19 /docker-entrypoint.sh: line 38: [: too many arguments
2024-02-05 19:50:19 /docker-entrypoint.sh: line 48: warning: command substitution: ignored null byte in input
2024-02-05 19:50:19 /docker-entrypoint.sh: line 48: [: too many arguments
2024-02-05 19:50:19 Waiting for cht couchdb
2024-02-05 19:50:20 Segmentation fault

4.4.0 nginx container outputs errors too:

2024-02-05 19:50:18 Launching Nginx
2024-02-05 19:50:18 2024/02/05 19:50:18 [warn] 1#1: the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/nginx.conf:37
2024-02-05 19:50:18 nginx: [warn] the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/nginx.conf:37
2024-02-05 19:50:18 2024/02/05 19:50:18 [warn] 1#1: the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/nginx.conf:38
2024-02-05 19:50:18 nginx: [warn] the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/nginx.conf:38
2024-02-05 19:50:18 2024/02/05 19:50:18 [emerg] 1#1: SSL_CTX_use_PrivateKey("/etc/nginx/private/key.pem") failed (SSL: error:05800074:x509 certificate routines::key values mismatch)
2024-02-05 19:50:18 nginx: [emerg] SSL_CTX_use_PrivateKey("/etc/nginx/private/key.pem") failed (SSL: error:05800074:x509 certificate routines::key values mismatch)
2024-02-05 19:50:44 Running SSL certificate checks
2024-02-05 19:50:44 self signed SSL cert already exists.
2024-02-05 19:50:44 Launching Nginx
2024-02-05 19:50:44 2024/02/05 19:50:44 [warn] 1#1: the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/nginx.conf:37
2024-02-05 19:50:44 nginx: [warn] the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/nginx.conf:37
2024-02-05 19:50:44 2024/02/05 19:50:44 [warn] 1#1: the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/nginx.conf:38
2024-02-05 19:50:44 nginx: [warn] the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/nginx.conf:38
2024-02-05 19:50:44 2024/02/05 19:50:44 [emerg] 1#1: SSL_CTX_use_PrivateKey("/etc/nginx/private/key.pem") failed (SSL: error:05800074:x509 certificate routines::key values mismatch)
2024-02-05 19:50:44 nginx: [emerg] SSL_CTX_use_PrivateKey("/etc/nginx/private/key.pem") failed (SSL: error:05800074:x509 certificate routines::key values mismatch)
2024-02-05 19:50:44 /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
2024-02-05 19:50:44 /docker-entrypoint.sh: Configuration complete; ready for start up

4.4.1 - Failed

4.4.1 CouchDB error logs:

2024-02-05 20:06:12 /docker-entrypoint.sh: line 38: warning: command substitution: ignored null byte in input
2024-02-05 20:06:12 /docker-entrypoint.sh: line 38: [: too many arguments
2024-02-05 20:06:12 /docker-entrypoint.sh: line 48: warning: command substitution: ignored null byte in input
2024-02-05 20:06:12 /docker-entrypoint.sh: line 48: [: too many arguments
2024-02-05 20:06:12 Waiting for cht couchdb
2024-02-05 20:06:12 Segmentation fault

To conclude, it looks like the problem was introduced with 4.4.0 (maybe related to CouchDB 3 upgrade?).

Additionally, it seems like the not secure warning was introduced with 4.4.1. (couldn’t see it in 4.4.0 as the container didn’t start properly, but I guess it was a 4.4.0 change)

@mrjones-plip mrjones-plip changed the title CHT 4.5.1 Docker Helper error on macOS Apple Silicon: qemu: uncaught target signal 11 (Segmentation fault) Docker Helper error on macOS Apple Silicon: "qemu: uncaught target signal 11 (Segmentation fault)" Feb 5, 2024
@sookwalinga
Copy link
Author

Hi @sookwalinga - thanks for filing this ticket to let us know you're having an issue.

I see there's a forum topic you started with the same subject. We most often engage with the community on the forums, so I'm going to close this ticket and encourage you and others to follow up on the forums.

thanks!

Noted with thanks, will do that next time

@mrjones-plip
Copy link
Contributor

After doing some experimenting in fixing CHT Core dev env for CouchDB, I think we could automate this for docker helper. Basically, I think we could detect both macOS and Apple Silicon with calls like this: echo "OSTYPE";echo $OSTYPE;echo "\nuname -m";uname -m, and then custom docker build a native arm64 image.

Here's what that command looks like on my Linux desktop:

OSTYPE
linux-gnu

uname -m
x86_64

Here's my my intel mac shows:

OSTYPE
darwin17.0

uname -m
x86_64

And here's what at an M1 MacBook Pro shows:

OSTYPE
darwin23.0

uname -m
arm64

I'll work a PR up so docker helper will:

  1. Detect if we're running on an Apple Silicon mac
  2. Build the CouchDB image locally from the Dockerfile
  3. Create a 2nd CouchDB docker yaml file in the right directory which overrides which image to use - preferring the locally built one
  4. I believe when the upgrade service starts the CHT, it should just throw the 2nd file in the pile of yaml files to run and use the local image

@mrjones-plip
Copy link
Contributor

@dianabarsan - While I think my above idea might work, and unblock Apple Silicon Docker Helper users, I realize you could have docker helper install CHT Core 4.0.0 but if your local branch is checked out at ~4.5.1@master then you'll build an image with the very wrong version of couch (2.x vs 3.x). We cooouuuullllddd git check out the version you asked for, but then that's a whole rabbit hole I'm not sure is worth it.

In the end...maybe do the bigger lift and try and get native arm64 builds working on GHA? for at least CouchDB?

Hmm - Thoughts?

@dianabarsan
Copy link
Member

I think we should look into getting arm64 builds. This can be a separate build step that we only do for releases if we decide it's too much for branches.
I'm not exactly sure how this works? would we use two image tags?

@mrjones-plip
Copy link
Contributor

mrjones-plip commented Feb 8, 2024

@dianabarsan - yeah - good call on the bigger fix!

We're after Multi-Architecture (aka "Multi-Platform") builds. From my reading you just pass in some extra flags to build for arm64 in addition to amd64 (I'll avoid the temptation to say we should include linux/arm/v7 and linux/arm/v6 support for Raspberry Pi's). So it's not two tags on the image, it's platform flag in the build like --platform linux/amd64,linux/arm64.

There's some good notes from docker and this article on GHA look helpful.

@1yuv - since you have an Apple Silicon laptop - are you interested in researching and improving CHT Docker images we build to be Multi-Platform? I know you've enjoyed hacking on some other CHT Core pieces of code and you came to mind for this as well! Not a big deal if it doesn't work out for you to do the work.

@1yuv
Copy link
Member

1yuv commented Feb 8, 2024

Thanks @mrjones-plip , I have recently used the development environment directly without docker helper. But I can take a look into all these reference docs and see how we can improve the user experince.

@mrjones-plip
Copy link
Contributor

@1yuv - awesome - great to hear! I suspect if you tried to run the development set up on CHT 4.5 without a custom CouchDB image it would fail because of the reasons above.

The goal would be to implement a fix upstream: by updating our GHA build process to be Multi-Platfrom, both Docker Helper and development environment would just work with out any extra steps to build the apple-silicon-couchdb (as shown here under "macOS").

@1yuv
Copy link
Member

1yuv commented Feb 9, 2024

Hi @mrjones-plip , I know other services are working fine at the moment. Do we want to build multi-platform images for other services as well or only for couchdb at the moment? I suspect since everything else is working, we'd go only for couchdb?

@mrjones-plip
Copy link
Contributor

My hope is that the level of effort to do all images is only marginally more than to do one image. Let me know if that's not realistic!

My thinking is that ability to natively run all containers will be a huge benefit to developers, both internal and external to Medic, so we should just go for it \o/

@1yuv
Copy link
Member

1yuv commented Feb 9, 2024

I will test first with couchdb images. My understanding is image size will be different if we build for multi-platform vs now. Once we've proof of work for couchdb and we know image sizes, we can opt for other. Efforwise, it should not be different.

1yuv added a commit that referenced this issue Feb 9, 2024
@1yuv 1yuv self-assigned this Feb 9, 2024
1yuv added a commit that referenced this issue Feb 9, 2024
@1yuv
Copy link
Member

1yuv commented Feb 9, 2024

I've made the changes and build is successful with multi-platform locally as well as with gihtub actions. However, images built with buildx are not saved locally and need to be pushed directly. Currently build is failing because it tries to publish later, but unlike build's output, image is not available at the moment of publishing.

I tried some other options as well and there are limitations. I am hesitant to make this change as currently we first build, publish, and save images and this has made us easier to have different workflows. I will explore more if there is a way to save first and publish later and not break our current workflow.

@mrjones-plip
Copy link
Contributor

Awesome - thanks for the update!

My inclination is to go all in and carefully update our whole build process. While the images are going to be bigger, I think it's a very worthwhile trade off! Let's see what your research next shows as a path forward.

cc @garethbowen @dianabarsan @Hareet

@garethbowen
Copy link
Contributor

I'm not too concerned about the size of images.

However, images built with buildx are not saved locally and need to be pushed directly.

This sounds like it'll have unintended consequences. Is anyone else worried about pushing more and earlier?

1yuv added a commit that referenced this issue Feb 12, 2024
@mrjones-plip
Copy link
Contributor

mrjones-plip commented Feb 13, 2024

@garethbowen - can you speak to specific concerns you have?

I'm otherwise not at all concerned - assuming we don't publish images publicly that people think are a GA release.

@1yuv - this page states that we can run buildx with the load: true flag and it is not pushed. At a stage in the build you call it with push: true and it's published (to ECR in our case).

I assume the trick to get this to work for us is to run the test in the same runner that still has the cached build image? I'm not sure how hard/invasive this is to our current pipeline.

@1yuv
Copy link
Member

1yuv commented Feb 13, 2024

Hi @mrjones-plip, If you take a look at where load:true is used, it's just building single platform image. Loading is possible with single image, like regular image. The link refers to what I was suggesting in our call , building and testing in one architecture (amd64 in our case) and then later on building and pushing for both platform. When we're building and pushing image later on, it will use image from the cache.

@mrjones-plip
Copy link
Contributor

mrjones-plip commented Feb 13, 2024

Ah - gotcha - thanks for pointing that out. Sorry I missed that!

My take is that this is OK because we're testing the same image we're deploying as it caches the image on push as opposed to building it again (which might be different than the tested image):

The linux/amd64 image is only built once in this workflow. The image is built once, and the following steps use the internal cache from the first Build and push step. The second Build and push step only builds linux/arm64.

@garethbowen
Copy link
Contributor

can you speak to specific concerns you have?

Mostly about pushing garbage images that we then end up paying for. The later you push the less likely they are to be garbage. I don't think it'll be significantly different to what we already do.

1yuv added a commit that referenced this issue Feb 16, 2024
@1yuv
Copy link
Member

1yuv commented Feb 16, 2024

I was able to build muli-platform couchdb docker image and it was correctly pushed to intermediate repository (private). However, when we publish the image finally to public ecr, we pull and push. At this time, pull will pull docker image only for linux/amd64 since that's what the runner environment is and it'll push same image to public ecr, hence the arm64 image is not pubished to public ecr.

I am thinking about using regctl which seems capable of copying image properly.

Other alternative would be to directly publish image to public ecr at the time of testing, and use that. If a test fails, we can delete that image, otherwise leave it as is and there's no need to publish second time.

@garethbowen , @dianabarsan, Interested to hear your opinion on this.

1yuv added a commit that referenced this issue Feb 16, 2024
1yuv added a commit that referenced this issue Feb 17, 2024
1yuv added a commit that referenced this issue Feb 17, 2024
@garethbowen
Copy link
Contributor

@1yuv What about the "2022 docker built-in solution", the second comment on that SO post you linked?

Other alternative would be to directly publish image to public ecr at the time of testing, and use that. If a test fails, we can delete that image, otherwise leave it as is and there's no need to publish second time.

Better to keep it out of public for as long as possible. And deleting public images might be confusing if someone installs it before it gets deleted.

1yrr added a commit to 1yrr/cht-core that referenced this issue Mar 7, 2024
1yuv added a commit that referenced this issue Mar 7, 2024
1yrr added a commit to 1yrr/cht-core that referenced this issue Mar 7, 2024
1yuv added a commit that referenced this issue Mar 7, 2024
1yuv added a commit that referenced this issue Mar 7, 2024
1yuv added a commit that referenced this issue Mar 7, 2024
1yuv added a commit that referenced this issue Mar 8, 2024
@mrjones-plip mrjones-plip modified the milestone: 4.7.0 Mar 8, 2024
1yuv added a commit that referenced this issue Mar 15, 2024
feat(#8841): Support multiplatform images
@mrjones-plip
Copy link
Contributor

With #8918 merged, we can close this (but please re-open if I'm being premature!)

HUGE thanks to @1yuv for all work here - excellent job!

@mrjones-plip
Copy link
Contributor

@sookwalinga - thanks for filing this ticket! While the ticket is closed as "fixed" CHT 4.7 is not out yet, so you'll need to take some extra steps to run Docker Helper on Apple Silicon with latest CHT Core. Mainly you need to select a compatible version when prompted, as seen on the final step:

  1. run the helper script: ./cht-docker-compose.sh
  2. Choose "yes" to create a new project: Would you like to initialize a new project [y/N]? y
  3. Choose "no" when prompted to use the latest: Do you want to run the latest CHT Core version (4.5.2) [Y/n]? n
  4. when the large list of versions is shown select a version that is between 4.0.0 and 4.3.2 or above 4.7.0. As 4.7.0 is not yet out, you can choose 8841-multiplatform-images until it is released!

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants