Terminating invest subprocesses | Automated browser tests #50

davemfish · 2020-10-08T16:33:01Z

This PR adds a feature to gracefully terminate an invest run during execution, with a cancel button. It also adds an automated browser test, using puppeteer to drive the browser, to test this functionality on the final built application.

I'm especially interested in review and advice on spawning and terminating subprocesses from node on different operating systems. This comes up in a few places:

src/InvestJob.jsx:investExecute method. -- the invest model subprocess
src/main_helpers:createPythonFlaskProcess function -- launching the flask app
tests/binary_tests/puppet.test.js -- launching the built electron app as a spawned process

I believe the builds should all be successful, but puppet.test.js fails/errors in CI for various reasons:

Windows: the test fails because on Windows we resort to force-killing the invest subprocess, which gives the app no time to handle the request to "Cancel Run" and display the proper message.
MacOS: the test errors because: Unable to create basic Accelerated OpenGL renderer. Core Image is now using the software OpenGL renderer. This will be slow. I think this only comes up in the context of puppeteer, not when installing and launching the electron app. Would be great if @phargogh or @emlys could try running the app and this puppet test locally.
Ubuntu: this was passing, but now seems unreliable, failing for various reasons. In CI, this test requires a virtual frame buffer provided by https://github.com/marketplace/actions/gabrielbb-xvfb-action. I'm not sure if that's a source of unreliability. The test is reliable outside of CI.

This PR also fixes #47, fixes #46, fixes #36

…windows so far.

…ancel-invest-job

This reverts commit a9621a4.

…app.

… values during SetupTab initialization.

…rgs dicts and datastacks - specifically hidden args and n_workers.

…t binaries.

dcdenu4 · 2020-10-13T11:54:18Z

src/InvestJob.jsx

+      this.investRun.terminate = () => {
+        if (this.state.jobStatus === 'running') {
+          // process.kill(-this.investRun.pid, 'SIGTERM'); // does not kill
+          // this.investRun.kill(); // does not kill
+          // This kills, but no chance to handle it and show 'Run Cancelled'
+          exec(`taskkill /pid ${this.investRun.pid} /t /f`);
+          // exec(`taskkill /pid ${this.investRun.pid} /t`) // does not kill
+        }
+      };


I did a little bit of looking into this and I probably found what you did; a lot of nodejs issues about how process.kill doesn't function correctly on Windows. It also seems like Windows doesn't use Signals in the same way.

The threads I was on:

process.kill can't kill process group on Windows nodejs/node#3617

process: process.kill() and signals on Windows nodejs/node#12378

Yeah, looks like only POSIX systems implement signals, but even Windows should handle interrupts of some sort. One SO post I found mentioned Win32 messages, which makes me wonder if there's just an alternate, windows-specific way to handle events. If that's the case, then how would we do this in node? All of the child_process docs talk about processes like they're expecting POSIX signals, which sounds like it would work great until signals don't exist on Windows.

dcdenu4 · 2020-10-13T12:32:41Z

tests/app.test.js

+      .toBeInTheDocument();
+    const cancelButton = await findByText('Cancel Run');
+    fireEvent.click(cancelButton);
+    expect(await findByText('Run Canceled')).toBeInTheDocument();


So on Windows this will error because we can't properly kill a process?

Yes. Tehcnically it fails because that "Run Canceled" message only shows when the subprocess has time to listen and run code on an exit event. Right now on Windows we're resorting to a force kill and there's no such time.

Oh interesting! So then the callback is executed by the child process rather than the parent?

@phargogh Maybe I mistated that. The callback is executed by the parent. Node (the parent) has a reference to the child process (this.investRun) that is listening for events from the child. So maybe the problem here is that the child never has the chance to emit the event?

So now that I'm actually testing this on Windows, my last two statements are both wrong. The exit code on Windows after the force-kill is 1.

dcdenu4 · 2020-10-13T12:35:41Z

tests/binary_tests/flaskapp.test.js

+  // In the CI the flask app takes more than 10x as long to startup.
+  // Especially so on macos.
+  // So, allowing many retries, especially because the error
+  // that is thrown if all retries fail is swallowed by jest


emlys · 2020-10-13T18:48:31Z

Looks like it's not working on my mac because of the meddlesome space character in ~/Library/Application Support! The full command ends up as: invest -vvv run seasonal_water_yield --headless -d /Users/emily/Library/Application Support/invest-workbench/tmp/data-EYud8J/datastack.json. Hence, an invest: error: unrecognized arguments: Support/invest-workbench/tmp/data-EYud8J/datastack.json.

davemfish · 2020-10-13T18:57:06Z

Looks like it's not working on my mac because of the meddlesome space character in ~/Library/Application Support! The full command ends up as: invest -vvv run seasonal_water_yield --headless -d /Users/emily/Library/Application Support/invest-workbench/tmp/data-EYud8J/datastack.json. Hence, an invest: error: unrecognized arguments: Support/invest-workbench/tmp/data-EYud8J/datastack.json.

Aha! Great catch, @emlys ! Some quotes should do that job. I'll patch that up.

dcdenu4 · 2020-10-13T19:02:36Z

invest-flask.spec

    # add rtree dependency dynamic libraries from conda environment
    invest_a.binaries += [
        (os.path.basename(name), name, 'BINARY') for name in
        glob.glob(os.path.join(conda_env, 'lib/libspatialindex*'))]
-elif is_win:
+else:


Since we've made the switch to conda for the invest build workflows, could we add:

# add rtree dependency dynamic libraries from conda environment invest_a.binaries += [ (os.path.basename(name), name, 'BINARY') for name in glob.glob(os.path.join(conda_env, 'Library/bin/spatialindex*.dll'))]

This would also mean a change on line 9,10 to set conda_env for windows as well.

Hi @dcdenu4 , I'm on the fence about touching this stuff here in this PR vs. moving the whole python side of this project into invest. If we changed this PyInstaller script here, we'd also have to change the Actions workflow, because it's not using conda for Windows right now.

Now that you have all that set up for invest in the release branch, it could be time to make a new branch like release/workbench-alpha on natcap/invest and move all of the python & PyInstaller stuff out of the workbench project.

phargogh

Looks good to me @davemfish ! That node behavior on process exit is pretty surprising and I do wonder about what alternatives/workarounds we might be able to implement, like we talked about on the coffee call this morning. Maybe monitoring the process for cancellation would be best? Or maybe we can just trust that cancellation will always go through? I suppose if all else fails, there's always task manager.

phargogh · 2020-10-15T23:13:19Z

src/InvestJob.jsx

+          // This kills, but no chance to handle it and show 'Run Cancelled'
+          exec(`taskkill /pid ${this.investRun.pid} /t /f`);
+          // exec(`taskkill /pid ${this.investRun.pid} /t`) // does not kill


Given that we have to run a program to kill the process on windows, this looks right. So then it looks like the this.investRun.on('exit', (code) ... part down below is the one that won't execute once this.investRun exits?

That's how I understand it as well.

phargogh · 2020-10-15T23:21:53Z

src/InvestJob.jsx

+      this.investRun.terminate = () => {
+        if (this.state.jobStatus === 'running') {
+          // process.kill(-this.investRun.pid, 'SIGTERM'); // does not kill
+          // this.investRun.kill(); // does not kill
+          // This kills, but no chance to handle it and show 'Run Cancelled'
+          exec(`taskkill /pid ${this.investRun.pid} /t /f`);
+          // exec(`taskkill /pid ${this.investRun.pid} /t`) // does not kill
+        }
+      };


Yeah, looks like only POSIX systems implement signals, but even Windows should handle interrupts of some sort. One SO post I found mentioned Win32 messages, which makes me wonder if there's just an alternate, windows-specific way to handle events. If that's the case, then how would we do this in node? All of the child_process docs talk about processes like they're expecting POSIX signals, which sounds like it would work great until signals don't exist on Windows.

phargogh · 2020-10-15T23:35:43Z

tests/app.test.js

+      .toBeInTheDocument();
+    const cancelButton = await findByText('Cancel Run');
+    fireEvent.click(cancelButton);
+    expect(await findByText('Run Canceled')).toBeInTheDocument();


Oh interesting! So then the callback is executed by the child process rather than the parent?

phargogh · 2020-10-15T23:36:36Z

tests/binary_tests/flaskapp.test.js

+  // In the CI the flask app takes more than 10x as long to startup.
+  // Especially so on macos.
+  // So, allowing many retries, especially because the error
+  // that is thrown if all retries fail is swallowed by jest


emlys · 2020-10-16T17:25:36Z

puppet.test.js still isn't working for me :( It's still failing on expect(vals.includes('active')).toBeTruthy();. vals is simply nav-link disabled. Everything works fine when I run it manually, and I can click on the Log tab right after clicking Execute. Maybe the assertion is happening too soon and it takes a moment for the Log tab to become active?

davemfish · 2020-10-16T18:05:14Z

puppet.test.js still isn't working for me :( It's still failing on expect(vals.includes('active')).toBeTruthy();. vals is simply nav-link disabled. Everything works fine when I run it manually, and I can click on the Log tab right after clicking Execute. Maybe the assertion is happening too soon and it takes a moment for the Log tab to become active?

Thanks for continuing to test this @emlys . That's a good thought because the Log tab does not become active immediately, but I just double-checked and the test should be waiting and retrying that expect because it is wrapped in waitFor like this:

await waitFor(async () => {
    const prop = await logTab.getProperty('className');
    const vals = await prop.jsonValue();
    expect(vals.includes('active')).toBeTruthy();
  })

That same Application Support/invest-workbench folder should have logfiles that might reveal what's happening after the Execute button is clicked. Sometimes I tail -f that log and watch it while running the puppet test.

…treat it the same as other exit events

emlys · 2020-10-16T19:00:24Z

Oh 🤦‍♀️ I needed to npm run dist again for the changes to take effect. The test is passing now!

davemfish · 2020-10-16T20:55:32Z

Thanks for reviewing this, everyone. I pushed some changes to address the platform-dependent logic around killing the subprocess. I was partially mistaken when describing the problem earlier. The callback that listens for the exit event fires the same way for all OS. On Windows, the force kill yields an exit code of 1, whereas on other systems the code was null. And that was the only real reason for the different behavior.

The only thing outstanding now is the Mac Build Action. I've made an issue for that (#52 ). Mac builds are good, it's just the puppeteer test that's not running properly in Actions.I don't want that to stop us from merging this PR though.

davemfish added 30 commits September 11, 2020 12:04

added a cancel button for killing an invest run. only working on non-…

e9c933a

…windows so far.

maybe no longer need to manually copy the geos dll for shapely.

a7740ae

trying the detached shell and group-kill method on Windows too.

f2d2d56

quick patch for the windows Actions python env

468fd52

added a test for the LogTab response to terminating an invest run.

1f709a8

preparing to use spectron to launch app and run an invest model.

2e8aa8a

Merge branch 'bugfix/39-react-composition-refactor' into feature/23-c…

14b83a4

…ancel-invest-job

npm update and fixing non-backwards compatible test.

c2a067a

spectron test seems to require a very long timeout limit.

031b8ef

experimenting with webdriver selectors.

b403d4e

major update to electron including breaking changes.

42713b1

updates for the breaking changes to remote module

a9621a4

Revert "updates for the breaking changes to remote module"

f235247

This reverts commit a9621a4.

electron's remote module must now be explicitly enabled.

869c991

we don't need to switch to the new remote module yet, see natcap#45.

dcf3c0a

updates to the node build script

24a1a22

removed a package run script that's not likely to be useful.

8bff5fc

working on e2e test using puppeteer to control the packaged electron …

d7d2a45

…app.

fix natcap#46 and do more explicit handling of 'undefined' invest arg…

1588494

… values during SetupTab initialization.

related to natcap#46. Better handling of which args are exported in a…

bf9a2ed

…rgs dicts and datastacks - specifically hidden args and n_workers.

a working puppeteer test.

586d5c8

removing the buggy spectron test

74c281a

uninstalled spectron

5ef439f

glob for the electron app executeable in puppet test.

3ccc25d

run binary tests after electron-builder, so we can test with puppeteer

9fedde2

debugging puppet test binary path

12796d1

more debugging puppet test binary path

80e8d1e

more debugging puppet test binary path

4eff941

more debugging puppet test binary path

55c3675

debugging the electron-builder config and locations of packaged inves…

f19601a

…t binaries.

davemfish added 2 commits October 8, 2020 08:38

extending the puppet test timeout for Windows CI.

93b92f0

just a little cleanup

f791db1

davemfish requested review from phargogh, emlys and dcdenu4 October 8, 2020 16:33

davemfish added this to the 0.1.0-alpha milestone Oct 8, 2020

This was linked to issues Oct 8, 2020

Automated test of the binary builds #36

Closed

Save Parameters button does not produce a viable datastack #46

Closed

davemfish assigned dcdenu4, davemfish, phargogh and emlys Oct 8, 2020

removing an unused import

bce633b

dcdenu4 reviewed Oct 13, 2020

View reviewed changes

davemfish added 2 commits October 13, 2020 12:12

quote the datastack path that is passed to the invest cli.

462191e

fixing a bug with the expected exit codes from invest

357d415

phargogh reviewed Oct 16, 2020

View reviewed changes

remove the special handling of a user-terminated invest run and just …

96229eb

…treat it the same as other exit events

show the Run Canceled text via the logStdErr prop

42b9fb4

dcdenu4 approved these changes Oct 20, 2020

View reviewed changes

davemfish merged commit 65070af into natcap:main Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminating invest subprocesses | Automated browser tests #50

Terminating invest subprocesses | Automated browser tests #50

davemfish commented Oct 8, 2020 •

edited

Loading

dcdenu4 Oct 13, 2020

phargogh Oct 15, 2020

dcdenu4 Oct 13, 2020

davemfish Oct 13, 2020

phargogh Oct 15, 2020

davemfish Oct 16, 2020

davemfish Oct 16, 2020

dcdenu4 Oct 13, 2020

phargogh Oct 15, 2020

emlys commented Oct 13, 2020

davemfish commented Oct 13, 2020

dcdenu4 Oct 13, 2020

davemfish Oct 14, 2020

davemfish Oct 14, 2020

phargogh left a comment

phargogh Oct 15, 2020

davemfish Oct 16, 2020

phargogh Oct 15, 2020

phargogh Oct 15, 2020

phargogh Oct 15, 2020

emlys commented Oct 16, 2020

davemfish commented Oct 16, 2020

emlys commented Oct 16, 2020

davemfish commented Oct 16, 2020

Terminating invest subprocesses | Automated browser tests #50

Terminating invest subprocesses | Automated browser tests #50

Conversation

davemfish commented Oct 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emlys commented Oct 13, 2020

davemfish commented Oct 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phargogh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emlys commented Oct 16, 2020

davemfish commented Oct 16, 2020

emlys commented Oct 16, 2020

davemfish commented Oct 16, 2020

davemfish commented Oct 8, 2020 •

edited

Loading