Devfile Registry deployment on minikube with helm keeps crash looping #1295

michael-valdron · 2023-10-18T22:33:48Z

Which area this feature is related to?

/kind bug

Which area this bug is related to?

/area registry

What versions of software are you using?

Go project

Operating System and version: N/A

Go Pkg Version: 1.18

Node.js project

Operating System and version: N/A

Node.js version: 18

Yarn version: 1.22.19

Project.json: https://github.com/devfile/devfile-web/blob/91b745246e20f760efd74758022420d7302becf6/package.json

Web browser

Operating System and version: N/A

Browser name and version: N/A

Bug Summary

Describe the bug:

Deploying the devfile registry using the helm chart on minikube results in the index server and registry viewer containers experiencing a repeating CrashLoopBackOff state for over 10 minutes, causing the integration testing on devfile registry under registry-support to timeout (has a 10 minute wait for deployment limit).

To Reproduce:

Start minikube v1.21.0 using Kubernetes v1.21.0 with default settings.
If using docker, run integration testing script: bash .ci/run_tests_minikube_linux.sh. Or follow steps 3-4 to run manually with podman or default next tag, skip to step 5 if using this step
Deploy devfile registry by running: helm install devfile-registry ./deploy/chart/devfile-registry --set global.ingress.domain=$(minikube ip).nip.io
- Add --set devfileIndex.image=quay.io/<user>/devfile-index --set devfileIndex.tag=<tag_label> to specify your own image
Immediately after deploying, run kubectl wait deploy/devfile-registry --for=condition=Available --timeout=600s to wait for available condition
- Use helm install ... & kubectl wait ... to best simulate the timing of the script
The wait process will fail with the reported error

Expected behavior

Deploys successfully within 10 minutes without experiencing frequent CrashLoopBackOff states.

Any logs, error output, screenshots etc? Provide the devfile that sees this bug, if applicable

Full error log: devfile_registry_error.log

Error Message

+ kubectl wait deploy/devfile-registry --for=condition=Available --timeout=600s
error: timed out waiting for the condition on deployments/devfile-registry

Container State Details

Containers:
  devfile-registry:
    ...
    State:           Waiting
      Reason:        CrashLoopBackOff
    Last State:      Terminated
      Reason:        Error
      Exit Code:     137
      Started:       Wed, 18 Oct 2023 21:05:13 +0000
      Finished:      Wed, 18 Oct 2023 21:06:15 +0000
    Ready:           False
    Restart Count:   6
    ...
  registry-viewer:
    ...
    State:           Waiting
      Reason:        CrashLoopBackOff
    Last State:      Terminated
      Reason:        Error
      Exit Code:     134
      Started:       Wed, 18 Oct 2023 21:05:00 +0000
      Finished:      Wed, 18 Oct 2023 21:05:00 +0000
    Ready:           False
    Restart Count:   6
    ...

Events

  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  10m                    default-scheduler  Successfully assigned default/devfile-registry-54488859b-rzhnh to minikube
  Normal   Pulling    9m48s                  kubelet            Pulling image "quay.io/devfile/oci-registry:next"
  Normal   Pulled     9m48s                  kubelet            Successfully pulled image "quay.io/devfile/registry-viewer:next" in 12.152383211s
  Normal   Created    9m34s                  kubelet            Created container oci-registry
  Normal   Pulled     9m34s                  kubelet            Successfully pulled image "quay.io/devfile/oci-registry:next" in 14.156196105s
  Normal   Started    9m33s                  kubelet            Started container oci-registry
  Normal   Pulled     9m32s                  kubelet            Successfully pulled image "quay.io/devfile/registry-viewer:next" in 1.103481248s
  Normal   Created    9m32s (x2 over 9m48s)  kubelet            Created container registry-viewer
  Normal   Started    9m31s (x2 over 9m48s)  kubelet            Started container registry-viewer
  Warning  BackOff    9m30s (x2 over 9m31s)  kubelet            Back-off restarting failed container
  Warning  Unhealthy  9m28s (x3 over 9m30s)  kubelet            Startup probe failed: Get "http://172.17.0.4:3000/viewer": dial tcp 172.17.0.4:3000: connect: connection refused
  Normal   Killing    9m28s                  kubelet            Container devfile-registry failed startup probe, will be restarted
  Normal   Pulling    8m58s (x3 over 10m)    kubelet            Pulling image "quay.io/devfile/registry-viewer:next"
  Normal   Started    8m58s (x2 over 10m)    kubelet            Started container devfile-registry
  Normal   Created    8m58s (x2 over 10m)    kubelet            Created container devfile-registry
  Normal   Pulled     4m50s (x6 over 10m)    kubelet            Container image "devfile-index:latest" already present on machine

Additional context

Any workaround?

Increase the timeout limit of integration testing, this does not solve the problem of the devfile registry taking over 10 minutes to deploy.

Suggestion on how to fix the bug

Unknown at this time.

The text was updated successfully, but these errors were encountered:

michael-valdron · 2023-10-18T22:58:22Z

Might possibly block #1197 if this bug is not fixed by the review testing.

thepetk · 2023-10-19T16:17:16Z

After some investigation found out the following:

Cause of the Failure

The ci check was failing due to a CrashLoopBackOff error. This caused because the container of registry-viewer was never started and hanging on (related issue: nodejs/node#48444):

 1: 0x55dd0567ab94 node::Abort() [node]
 2: 0x55dd0567aed1 node::Assert(node::AssertionInfo const&) [node]
 3: 0x55dd056fb06c node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) [node]
 4: 0x55dd056fb1d7 node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
 5: 0x55dd0563656c node::V8Platform::Initialize(int) [node]
 6: 0x55dd05632a0b node::InitializeOncePerProcess(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, node::ProcessFlags::Flags) [node]
 7: 0x55dd05632e8e node::Start(int, char**) [node]
 8: 0x7f0af3764eb0  [/lib64/libc.so.6]
 9: 0x7f0af3764f60 __libc_start_main [/lib64/libc.so.6]
10: 0x55dd055a0545 _start [node]

Proposed fix

First approach should be to update the github-action that sets up minikube (manusa/actions-setup-minikube) in order to be able to update the versions of minikube and kubernetes used as this error is fixed in later version.

Another improvement could be made if we increase the memory inside the start args to 4gb.

More detailed logging

As CrashLoopBackOff is not related with the kubectl wait command I think we could add more detailed logging in case the conditions of kubectl wait are not met by using the kubectl logs command. An example of more detailed logging can be found here

thepetk · 2023-10-19T16:19:58Z

I've created a PR with the proposed workaround. I've assigned the issue to me and as it has already a PR I've removed the refinement date, add it to the current sprint (because is a blocker) and story pointed with the time/complexity spent on this.

michael-valdron added the severity/blocker Issues that prevent developers from working label Oct 18, 2023

openshift-ci bot added kind/bug Something isn't working area/registry Devfile registry for stacks and infrastructure labels Oct 18, 2023

michael-valdron mentioned this issue Oct 18, 2023

OpenAPI source generation check not showing changes #1294

Closed

thepetk self-assigned this Oct 19, 2023

thepetk mentioned this issue Oct 19, 2023

Update github actions and improve ci logging devfile/registry-support#187

Merged

3 tasks

thepetk closed this as completed in devfile/registry-support#187 Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Devfile Registry deployment on minikube with helm keeps crash looping #1295

Devfile Registry deployment on minikube with helm keeps crash looping #1295

michael-valdron commented Oct 18, 2023 •

edited

Loading

michael-valdron commented Oct 18, 2023

thepetk commented Oct 19, 2023 •

edited

Loading

thepetk commented Oct 19, 2023

Devfile Registry deployment on minikube with helm keeps crash looping #1295

Devfile Registry deployment on minikube with helm keeps crash looping #1295

Comments

michael-valdron commented Oct 18, 2023 • edited Loading

Which area this feature is related to?

Which area this bug is related to?

What versions of software are you using?

Go project

Node.js project

Web browser

Bug Summary

Expected behavior

Any logs, error output, screenshots etc? Provide the devfile that sees this bug, if applicable

Additional context

Any workaround?

Suggestion on how to fix the bug

michael-valdron commented Oct 18, 2023

thepetk commented Oct 19, 2023 • edited Loading

Cause of the Failure

Proposed fix

More detailed logging

thepetk commented Oct 19, 2023

michael-valdron commented Oct 18, 2023 •

edited

Loading

thepetk commented Oct 19, 2023 •

edited

Loading