Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devspace dev command terminated with exit code 139 #2448

Closed
GDXbsv opened this issue Dec 1, 2022 · 5 comments · Fixed by #2771
Closed

devspace dev command terminated with exit code 139 #2448

GDXbsv opened this issue Dec 1, 2022 · 5 comments · Fixed by #2771
Labels
bug kind/bug Something isn't working

Comments

@GDXbsv
Copy link

GDXbsv commented Dec 1, 2022

What happened?
Devspace can not sync with error start_dev: initial sync: Sync - connection lost: command terminated with exit code 139 with amd64 container in arm64 cluster.
In directory i see:

ls -la
total 468444
drwxr-xr-x 2 root root      4096 Dec  1 08:51 .
drwxr-xr-x 1 root root      4096 Nov  2 09:05 ..
-rw------- 1 root root 164651008 Dec  1 08:51 core
-rw-r--r-- 1 root root  28102656 Dec  1 08:44 qemu_devspacehelper_20221201-084439_162.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:44 qemu_devspacehelper_20221201-084439_182.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:44 qemu_devspacehelper_20221201-084439_189.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:45 qemu_devspacehelper_20221201-084504_207.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:45 qemu_devspacehelper_20221201-084504_227.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:45 qemu_devspacehelper_20221201-084504_234.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:46 qemu_devspacehelper_20221201-084616_248.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:46 qemu_devspacehelper_20221201-084617_267.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:46 qemu_devspacehelper_20221201-084617_274.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:46 qemu_devspacehelper_20221201-084625_288.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:46 qemu_devspacehelper_20221201-084626_308.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:46 qemu_devspacehelper_20221201-084626_315.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:48 qemu_devspacehelper_20221201-084811_328.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:48 qemu_devspacehelper_20221201-084816_331.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:51 qemu_devspacehelper_20221201-085145_344.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:51 qemu_devspacehelper_20221201-085146_364.core
-rw-r--r-- 1 root root  28102656 Dec  1 08:51 qemu_devspacehelper_20221201-085146_371.core

Where qemu_devspacehelper is what appear after each attempt.

What did you expect to happen instead?
Success sync

How can we reproduce the bug? (as minimally and precisely as possible)
Mac ARM (m1/m2)

brew install colima docker k3d devspace 
colima start --cpu 7 --memory 7 --disk 100 --runtime docker
k3d cluster create
Then try to sync with amd64 image

My devspace.yaml:

version: v2beta1
name: devspace-all
vars:
  DEVSPACE_ENV_FILE: ".env"
  DEVSPACE_FLAGS: '-n all'
  GITLAB_CONTAINER_REGISTRY_USER:
    question: gitlab container registry username?
  GITLAB_CONTAINER_REGISTRY_PASS:
    question: gitlab container registry password?
    password: true
  QUAY_CONTAINER_REGISTRY_USER:
    question: quay container registry username?
  QUAY_CONTAINER_REGISTRY_PASS:
    question: quay container registry password?
    password: true
  COMPOSER_GITHUB_TOKEN:
    question: composer github token?
    password: true
  XDEBUG_HOST:
    question: xdebug host? [host.docker.internal -- WIN] [172.17.0.1 - LIN] [MAC - see in ifconfig or in your network connection]
    default: "172.17.0.1"
  CORE_PATH:
    question: path to core? [./core] | [../core] | [/global/path/core]

deployments:
  docker-core:
    helm:
      chart:
        name: ./cluster/chart
      values:
       credentials:
         - registry: quay.io
           username: "${QUAY_CONTAINER_REGISTRY_USER}"
           password: "${QUAY_CONTAINER_REGISTRY_PASS}"
         - registry: registry.gitlab.com
           username: "${GITLAB_CONTAINER_REGISTRY_USER}"
           password: "${GITLAB_CONTAINER_REGISTRY_PASS}"
         - registry: gitlab.com
           username: "${GITLAB_CONTAINER_REGISTRY_USER}"
           password: "${GITLAB_CONTAINER_REGISTRY_PASS}"


dev: {}

commands:
  database-refresh:
    command: |-
      ./bin/setupDatabaseKube.sh
    description: Refresh slim damp and migration on DB.

profiles:
  - name: core
    merge:
      deployments:
        docker-core:
          helm:
            values:
              develop:
                core: 1
      dev:
        core:
          labelSelector:
            io.kompose.service: core
          ssh:
            enabled: true
          devImage: registry.gitlab.com/gotphoto/platform/core/app:develop-latest
          patches: &core-patches
            - op: remove
              path: spec.containers.lifecycle
          env: &core-envs
            - name: XDEBUG_CONFIG
              value: "client_host=${XDEBUG_HOST}"
          sync:
            - path: ${CORE_PATH}:/srv/www
              printLogs: true
#              excludeFile: .gitignore
              excludePaths:
                - .cache_composer/
                - /app/tmp/
                - ${CORE_PATH}/app/tmp/
                - .git
              uploadExcludePaths:
                - ${CORE_PATH}/vendor/
                - /vendor/
                - ${CORE_PATH}/Plugin/
                - /Plugin/
                - ${CORE_PATH}/app/tmp/
                - /app/tmp/
              downloadExcludePaths:
                - /.cache_composer/
                - /app/tmp/
              onUpload:
                # These post-sync commands will be executed after DevSpace synced changes to the container in the given order
                exec:
                  - command: |-
                      composer config github-oauth.github.com ${COMPOSER_GITHUB_TOKEN}
                    onChange: ["composer.lock"]
                  - command: |-
                      composer install --no-interaction --prefer-dist
                    onChange: ["composer.lock"]
                  - command: "chmod -R 0777 ./"      # string   | Command that should be executed after DevSpace made changes
        core-commands:
          labelSelector:
            io.kompose.service: core-commands
          devImage: registry.gitlab.com/gotphoto/platform/core/app:develop-latest
          patches: *core-patches
          env: *core-envs
        core-order-events:
          labelSelector:
            io.kompose.service: core-order-events
          devImage: registry.gitlab.com/gotphoto/platform/core/app:develop-latest
          patches: *core-patches
          env: *core-envs

Local Environment:

  • DevSpace Version: devspace version 6.2.1
  • Operating System: mac
  • ARCH of the OS: ARM64
    Kubernetes Cluster:
  • Local: k3d
  • Kubernetes Version: Client Version: v1.25.4
    Kustomize Version: v4.5.7
    Server Version: v1.24.6+k3s1

Anything else we need to know?

@GDXbsv GDXbsv added the kind/bug Something isn't working label Dec 1, 2022
@loft-bot loft-bot changed the title devspace dev command terminated with exit code 139 devspace dev command terminated with exit code 139 Dec 1, 2022
@loft-bot loft-bot added the bug label Dec 1, 2022
@th3fallen
Copy link

im acutally getting a very similar bug on m1, but it only seems to happen for images coming from my registry if I reference one directly from docker it works... very interesting.

@tbondarchuk
Copy link
Contributor

I believe I've found root cause for this issue - it's devspacehelper-arm64 failing if there are more then 500 env variables. 🤷

@FabianKramm could you please check?

Steps to reproduce:

# modify/remove toleration as needed
kubectl run -i --tty --rm debug --image=busybox --restart=Never --overrides='{"spec":{"nodeSelector":{"kube/nodetype":"app-arm"},"tolerations":[{"operator":"Exists"}]}}' -- sh

# get current amount of env vars and export some dummy one until it's more then 500
env | wc -l
for i in $(seq 1 50); do export x$i=$i; done

# get and run helper
wget https://github.com/devspace-sh/devspace/releases/download/v6.3.2/devspacehelper-arm64
chmod +x devspacehelper-arm64
./devspacehelper-arm64 version

Resulting error:

# ./devspacehelper-arm64 version
fatal error: failed to get system page size
runtime: panic before malloc heap initialized

runtime stack:
runtime.throw({0xa4c537?, 0x0?})
        /Users/runner/hostedtoolcache/go/1.19.7/x64/src/runtime/panic.go:1047 +0x40 fp=0xffffff981c00 sp=0xffffff981bd0 pc=0x45550
runtime.mallocinit()
        /Users/runner/hostedtoolcache/go/1.19.7/x64/src/runtime/malloc.go:365 +0x2c0 fp=0xffffff981c30 sp=0xffffff981c00 pc=0x1bb30
runtime.schedinit()
        /Users/runner/hostedtoolcache/go/1.19.7/x64/src/runtime/proc.go:693 +0xa0 fp=0xffffff981c90 sp=0xffffff981c30 pc=0x48cf0
runtime.rt0_go()
        /Users/runner/hostedtoolcache/go/1.19.7/x64/src/runtime/asm_arm64.s:86 +0xa8 fp=0xffffff981cc0 sp=0xffffff981c90 pc=0x73778
# 

After unsetting few vars to make them 500 or less - helper works again.
Tested on amd64 - no such issue. added 50k dummy vars, still works.

Bit of context - I have namespace with a lot of pods so even empty pod has about 450 env vars, all those

WEB_APP_SERVICE_PORT_HTTP=80
WEB_APP_SERVICE_PORT=80
WEB_APP_PORT_80_TCP=tcp://172.20.127.23:80
WEB_APP_PORT_80_TCP_ADDR=172.20.127.23
WEB_APP_PORT_80_TCP_PORT=80
WEB_APP_SERVICE_HOST=172.20.127.23
WEB_APP_PORT_80_TCP_PROTO=tcp
WEB_APP_PORT=tcp://172.20.127.23:80

multiplied by service. and few services have a lot of env vars/secrets so that explains why I was seeing this 139 exit code only for some apps. And I guess it also might explain @th3fallen docker vs private registry issue - I guess private images had some extra vars added on top of base image during build.

@lizardruss
Copy link
Collaborator

@tbondarchuk
I'm just catching up on this issue. Would it be possible to turn off pod.spec.enableServiceLinks so not so many environment variables are created?

@jordiclariana
Copy link

jordiclariana commented Dec 12, 2023

Hi, I'm experiencing a panic when running ./devspacehelper-arm64 which seems related to this issue.
I ran several tests and I ended up concluding UPX is the one to blame (which also explains why the panic comes with such a useless backtrace).

First of all, worth mentioning that this panic is not happening in all machines I tested, but especially on a single one (making it even more odd). The machine I'm running is a MacBook Pro with M1 chip and 32 GB RAM updated to the latest MacOS version (13.2.1). I'm running Docker Desktop 4.26.0 (was failing as well with 4.25.x). I have reinstalled Docker Desktop several times, and reprovisioned (from scratch) its Kubernetes cluster several times as well.

On my tests, running the original devspacehelper-arm64 from the releases page (latest version at that moment v6.3.6, but also with previous versions), I assessed that the number of arguments and/or the number of environment variables matter to reproduce the issue. For example, take this execve captured with strace:

execve("/tmp/devspacehelper-arm64", ["/tmp/devspacehelper-arm64.orig", "sync", "downstream", "--exclude", "/node_modules", "--exclude", "/public/humans.txt", "--exclude", "/public/storage", "--exclude", "/public/build", "--exclude", "/bootstrap/cache", "--exclude", "/storage", "--exclude", "/vendor", "--exclude", ".npmrc", "--exclude", ".env", "--exclude", ".env.*", "--exclude", ".phpunit.result.cache", "--exclude", ".phpunit", "--exclude", "/.circleci", "--exclude", "/.github", "--exclude", "!/.github/CODEOWNERS", "--exclude", "/hack", "--exclude", "/kubernetes", "--exclude", "/.api", "--exclude", ".git", "--exclude", ".git-blame-ignore-revs", "--exclude", ".gitattributes", "--exclude", ".gitignore", "--exclude", "manifest", "--exclude", "/.bin", "--exclude", "VERSION", "--exclude", "/docs", "--exclude", "/tests", "--exclude", "toc.json", "--exclude", "docker-compose*", "--exclude", "Dockerfile", "--exclude", "dev.Dockerfile", "--exclude", "sandbox.Dockerfile", "--exclude", "sandbox-list.txt", "--exclude", "phpstan*", "--exclude", "Makefile", "--exclude", "secrets", "--exclude", ".sops.yaml", "--exclude", "devspace.yaml", "--exclude", "foo.yaml", "--exclude", ".devspace", "--exclude", "/devspace", "--exclude", "/foo", "--exclude", "phpunit.xml", "--exclude", "sonar-project.properties", "--exclude", "server.php", "--exclude", "rector.php", "--exclude", "README.md", "--exclude", "cypress.json", "--exclude", ".eslintrc.js", "--exclude", ".php-cs-fixer*", "--exclude", ".pre-commit-config.yaml", "--exclude", ".prettierrc", "--exclude", ".stignore", "--exclude", ".stylelintrc.json", "--exclude", "company.code-workspace", "--exclude", "jest.config.js", "--exclude", "/resources/js/**/*.spec.ts", "--exclude", "/client/**/*.spec.ts", "--exclude", "company.iml", "--exclude", "/bar", "--exclude", "/bar.*", "--exclude", ".dockerignore", "--exclude", ".*.swp", "--exclude", "perfReports", "/var/www/html"], 0xffffe7ac01c0 /* 373 vars */) = 0

Notice the 373 vars at the end. For reference, running env | wc in the pod gives 439 467 21204 (more lines since some env vars are multiline, and a total of ~21k bytes).
This command fails with the same panic mentioned in this comment.
When playing with the command arguments, removing some --exclude, sometimes removing one, or two, or even sometimes up to three, makes the command work (no panic, expected behavior). On the other side, if the arguments are kept untouched, but some env vars are removed, the same thing happens: the command works, no panic.

I played with devspace source code and compiled a version of myself adding some debugging log here and there, to try to figure out where exactly the panic was coming from. No debug was ever printed, meaning the panic was happening before the devspacehelper code was hit. So I tried to unpack the binary with:

upx -d -o devspacehelper-arm64.unc /tmp/devspacehelper-arm64

Then running the same command as above, with the uncompressed binary worked (no panic).

Moreover, one can run this easy test to assess what the max number of env vars can UPX support:

$ env -i bash -c 'for i in {0..496}; do export ENVVAR_$i="$i"; done; env | wc -l; /tmp/devspacehelper-arm64 version; echo;'
500
v6.3.6
$ env -i bash -c 'for i in {0..497}; do export ENVVAR_$i="$i"; done; env | wc -l; /tmp/devspacehelper-arm64 version; echo;'
501
fatal error: failed to get system page size
runtime: panic before malloc heap initialized

runtime stack:
runtime.throw({0xa51b7e?, 0x0?})
	/Users/runner/hostedtoolcache/go/1.20.11/x64/src/runtime/panic.go:1047 +0x40 fp=0xffffefdc1820 sp=0xffffefdc17f0 pc=0x43820
runtime.mallocinit()
	/Users/runner/hostedtoolcache/go/1.20.11/x64/src/runtime/malloc.go:367 +0x2a4 fp=0xffffefdc1850 sp=0xffffefdc1820 pc=0x1b5d4
runtime.schedinit()
	/Users/runner/hostedtoolcache/go/1.20.11/x64/src/runtime/proc.go:712 +0x9c fp=0xffffefdc18b0 sp=0xffffefdc1850 pc=0x46fbc
runtime.rt0_go()
	/Users/runner/hostedtoolcache/go/1.20.11/x64/src/runtime/asm_arm64.s:86 +0xa4 fp=0xffffefdc18e0 sp=0xffffefdc18b0 pc=0x729d4

>500 there is a panic.

So, to me, this is definitely an upx problem. I went to their GitHub project and I saw no issue opened describing this problem. Before opening an issue on their side (which I would guess should be up to devspace maintainers to do), I would like to know if anybody can reproduce this same behavior on their end. Or even if devspace maintainers could consider an alternative to UPX (I know, bold move) to compress the binary.

xvzf added a commit to xvzf/devspace that referenced this issue Dec 12, 2023
This is a mitigation for devspace-sh#2448 (comment)

Signed-off-by: Matthias Riegler <matthias.riegler@ankorstore.com>
@jordiclariana
Copy link

For those curious, I opened an issue on UPX project: upx/upx#743

Andrioden pushed a commit to Andrioden/devspace that referenced this issue Jan 29, 2024
This is a mitigation for devspace-sh#2448 (comment)

Signed-off-by: Matthias Riegler <matthias.riegler@ankorstore.com>
Signed-off-by: André S. Hansen <andre.ok@online.no>
Andrioden pushed a commit to Andrioden/devspace that referenced this issue Feb 5, 2024
This is a mitigation for devspace-sh#2448 (comment)

Signed-off-by: Matthias Riegler <matthias.riegler@ankorstore.com>
Signed-off-by: André S. Hansen <andre.ok@online.no>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants