Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

troubleshoot 3rd party signal handling and child process termination #40

Closed
1 task done
Courey opened this issue Feb 26, 2024 · 6 comments · Fixed by #39
Closed
1 task done

troubleshoot 3rd party signal handling and child process termination #40

Courey opened this issue Feb 26, 2024 · 6 comments · Fixed by #39
Assignees

Comments

@Courey
Copy link
Contributor

Courey commented Feb 26, 2024

Bug Report Details ⚠️🐛

Description

When using npm run to remix run to run arc sandbox which uses dockerode to start a docker instance, if you use control + c to close the process, the child processes do not gracefully terminate this can lead to hanging containers till they are killed in the 10 second docker timeout.

Steps to Reproduce

  • Is the bug reproducible?

Steps to Reproduce

When you are using dev_v2 mode in an Architect app, when you interrupt npm run dev with ^C, it kills all subprocesses with extreme prejudice and does not permit Architect plugins to clean up.

To reproduce:

  • Start a basic Architect project using create-remix.
  • Add a toy sandbox plugin that has a start method (which would normally start a sidecar process --- such as an OpenSearch server) and a stop method (which would normally stop the sidecar process and delete any temporary directories). For example:
plugin-sandbox.js

export default {
  sandbox: {
    start() {
      console.log("starting example sandbox plugin");
    },
    end() {
      console.log("stopping example sandbox plugin");
    },
  },
};
  • Install this plugin in app.arc.
app.arc

...
@plugins
plugin-remix
  src plugin-remix.js
plugin-example
  src plugin-example.js
...
  • Fire up the dev server using npm run dev. Wait until it's up and running.
  • Stop the dev server with a keyboard interrupt (^C on Linux and macOS).

Expected Behavior

The dev server should come down cleanly and you should see both of the log messages from the Architect sandbox plugin, both starting example sandbox plugin and stopping example sandbox plugin. Here's the good output with dev_v2: false set. For a reproducer, see https://github.com/lpsinger/remix-bug-report-arc-sandbox-killed-ungracefully/tree/dev_v1.

$ npm run dev

> dev
> npm-run-all build --parallel "dev:*"


> build
> remix build

 info  building... (NODE_ENV=production)
 info  built (315ms)

> dev:remix
> remix watch


> dev:arc
> cross-env NODE_ENV=development arc sandbox

Watching Remix app in development mode...
💿 Building...
         App ⌁ remix-architect-app
      Region ⌁ us-west-2
     Profile ⌁ @aws profile / AWS_PROFILE not configured
     Version ⌁ Architect 10.13.1
         cwd ⌁ /private/tmp/my-remix-app

✓ Sandbox @http (HTTP API mode / Lambda proxy v2.0 format / live reload) routes
    any /* ................................ server

    http://localhost:3333

✓ Sandbox Started in 9ms
❤︎ Local environment ready!

⚬ Sandbox Running 1 Sandbox startup plugin
starting example sandbox plugin
✓ Sandbox Ran Sandbox startup plugin in 7ms
✓ Sandbox File watcher now looking for project changes
💿 Rebuilt in 441ms
[REMIX DEV] D7AD53A4 ready
^C
⚬ Sandbox Running 1 Sandbox shutdown plugin
stopping example sandbox plugin
✓ Sandbox Ran Sandbox shutdown plugin in 0ms
^C

Actual Behavior

Here's the bad output. You see starting example sandbox plugin, but you never see stopping example sandbox plugin. For a reproducer, see https://github.com/lpsinger/remix-bug-report-arc-sandbox-killed-ungracefully/tree/dev_v2.

$ npm run dev
> dev
> remix dev --no-restart -c "arc sandbox -e testing"

 💿  remix dev

 info  building...
 info  built (536ms)
         App ⌁ remix-architect-app
      Region ⌁ us-west-2
     Profile ⌁ @aws profile / AWS_PROFILE not configured
     Version ⌁ Architect 10.13.1
         cwd ⌁ /private/tmp/my-remix-app

✓ Sandbox @http (HTTP API mode / Lambda proxy v2.0 format / live reload) routes
    any /* ................................ server

    http://localhost:3333

✓ Sandbox Started in 9ms
❤︎ Local environment ready!

⚬ Sandbox Running 1 Sandbox startup plugin
starting example sandbox plugin
✓ Sandbox Ran Sandbox startup plugin in 4ms
✓ Sandbox File watcher now looking for project changes
^C

Links to feature design doc, conversations, mockups, etc.

Leo's original issue

Stakeholders

Active developers using macOS and developers working on OpenSearch/Elasticsearch.

@lpsinger, @Courey, @ronitagarwala01, @tylerbarna, @Vidushi-GitHub

Acceptance Criteria

The cause of the errant signal handling across all the packages must be figured out and solutions offered.

@Courey Courey self-assigned this Feb 26, 2024
@lpsinger
Copy link
Member

@Courey, I took the liberty of editing the stakeholders section.

@tylerbarna
Copy link
Member

I've also noticed that at startup, an error is raised with the opensearch sandbox connection before the sandbox is able to start properly:

(base) tbarna@Desktop:~/gcn.nasa.gov$ npm run dev  
npm run dev

> dev
> run-p "dev:*"


> dev:remix
> remix dev --manual -c "arc sandbox -e testing --host localhost"


> dev:esbuild
> node esbuild.js --dev


> dev:sass
> sass --watch -Inode_modules/nasawds/src/theme -Inode_modules/@uswds -Inode_modules/@uswds/uswds/packages app:app

[watch] build finished, watching for changes...

 💿  remix dev

Sass is watching for changes. Press Ctrl-C to stop.

 info  building...
 info  built (5.3s)
         App ⌁ remix-gcn
      Region ⌁ us-east-1
     Profile ⌁ @aws profile / AWS_PROFILE not configured
     Version ⌁ Architect 10.16.3
         cwd ⌁ /home/tbarna/gcn.nasa.gov

✓ Sandbox @tables created in local database
✓ Sandbox @events and @queues ready on local event bus
✓ Found new functions to hydrate!
⚬ Hydrate Hydrating dependencies in 1 path
⦾ Hydrate Hydrating python/
environment variable ORIGIN must be set for production. Proceeding anyway since we are in development
environment variable ORIGIN must be set for production. Proceeding anyway since we are in development
environment variable SESSION_SECRET must be set for production. Proceeding anyway since we are in development
⦿ Hydrate Hydrating python/
environment variable ORIGIN must be set for production. Proceeding anyway since we are in development

environment variable ORIGIN must be set for production. Proceeding anyway since we are in development

environment variable SESSION_SECRET must be set for production. Proceeding anyway since we are in development

environment variable GITHUB_API_TOKEN must be set for production. Proceeding anyway since we are in development
⊙ Hydrate Hydrating python/
environment variable GITHUB_API_TOKEN must be set for production. Proceeding anyway since we are in development
⊙ Hydrate Hydrating python/
making up fake object size

making up fake object size

environment variable RECAPTCHA_SITE_KEY must be set for production. Proceeding anyway since we are in development

ConnectionError: connect ECONNREFUSED 127.0.0.1:9200
    at ClientRequest.onError (/home/tbarna/gcn.nasa.gov/node_modules/@opensearch-project/opensearch/lib/Connection.js:129:16)
    at ClientRequest.emit (node:events:518:28)
    at Socket.socketErrorListener (node:_http_client:495:9)
    at Socket.emit (node:events:518:28)
    at emitErrorNT (node:internal/streams/destroy:169:8)
    at emitErrorCloseNT (node:internal/streams/destroy:128:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21) {
  meta: {
    body: null,
    statusCode: null,
    headers: null,
    meta: {
      context: null,
      request: [Object],
      name: 'opensearch-js',
      connection: [Object],
      attempts: 3,
      aborted: false
    }
  }
}

I've only included a snippet here so as to not leave an overly long comment, but this error is raised about 5 times before the sandbox connects.

Any chance this is related, or is this expected behaviour?

@lpsinger
Copy link
Member

Any chance this is related, or is this expected behaviour?

No, that's not related. The Remix app server starts up faster than OpenSearch does; you'll get that error message if you navigate to a route that talks to OpenSearch before it is ready. It's unrelated and harmless.

@lpsinger
Copy link
Member

lpsinger commented Mar 1, 2024

We have a new lead to chase down: nasa-gcn/gcn.nasa.gov#2010.

@Courey
Copy link
Contributor Author

Courey commented Mar 1, 2024

This ticket is blocked by nasa-gcn/gcn.nasa.gov#2010

@lpsinger
Copy link
Member

@Courey, would you please check if architect/sandbox#732 made a difference?

@Courey Courey transferred this issue from nasa-gcn/gcn.nasa.gov Jun 13, 2024
lpsinger pushed a commit that referenced this issue Jun 24, 2024
@github-project-automation github-project-automation bot moved this from In Review to Done in GCN Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants