CI tests are hitting OOM #312

metral · 2020-01-27T21:12:51Z

Problem description

Seeing intermittent fatal error: runtime: out of memory errors in Travis CI due to what seems to be leaky tests.

These leaks seem tied specifically to the pulumi/eks test surface, and not quantity of tests as eks is < 15 tests currently, and pulumi/examples has ~90 tests.

One possible theory is the over use of dynamic providers in this repo than any other repo. Another theory is that when failures occur in tests, this creates a compounding effect on more failures to occur, leading to further resource starvation.

Tests are run in parallel with a current max of 20 jobs set.

We've started testing in a slimmed VM in EC2 to mimic Travis CI runtime with less resources than travis (using t2.medium)
AWS Region: us-west-2
Swap is not enabled by default in the VM

Errors & Logs

Output of /var/log/kern.log:
kern.log
Output of ps aux | grep node && ps aux | grep pulumi after repro:

Output of top after repro:

Reproducing the issue

Run all tests using make test_all in the nodejs/eks directory
After a few failures occur, Ctrl-C to interrupt tests
Leaked processes and slow response times from SSH, general use of VM etc should start to be noticeably slow.

Related Issues

The text was updated successfully, but these errors were encountered:

metral · 2020-01-29T20:17:28Z

Recent update to mirror slack thread:

We repro’d the starvation issue on a test VM, with no failures occurring - seems that just the concurrent runs of all tests is enough to do the machine in, and noticeably node processes shot up to consuming most of cpu, til just now where kswapd0, snapd, and a couple of pulumi-language processes are coming in for a total of over 150% cpu usage (see pics below for data).

OTOH, in a separate travis run I’ve set TESTPARALLELISM=3 vs current default of 20 tests, and that seems to be humming along for now with no failures, but will inevitably hit the max travis 2 hour test run limit at this pace.

ps aux | grep node:

ps aux | grep pulumi:

top:

metral · 2020-02-03T20:02:44Z

Closed with pulumi/pulumi-kubernetes#974

metral assigned metral and pgavlin Jan 27, 2020

metral mentioned this issue Jan 27, 2020

Unblock CI by disabling debug logging, rm unnecessary tests, and fixing broken tests #309

Merged

metral added this to the 0.31 milestone Jan 27, 2020

metral mentioned this issue Jan 29, 2020

403 expired token pulumi/pulumi#3182

Closed

metral mentioned this issue Jan 30, 2020

More test fixes to unblock CI #316

Merged

lukehoban mentioned this issue Jan 31, 2020

[NodeJS] Significant performance issues with pulumi.output and pulumi.all pulumi/pulumi#3850

Closed

metral closed this as completed Feb 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI tests are hitting OOM #312

CI tests are hitting OOM #312

metral commented Jan 27, 2020 •

edited

Loading

metral commented Jan 29, 2020 •

edited

Loading

metral commented Feb 3, 2020

CI tests are hitting OOM #312

CI tests are hitting OOM #312

Comments

metral commented Jan 27, 2020 • edited Loading

Problem description

Errors & Logs

Reproducing the issue

Related Issues

metral commented Jan 29, 2020 • edited Loading

metral commented Feb 3, 2020

metral commented Jan 27, 2020 •

edited

Loading

metral commented Jan 29, 2020 •

edited

Loading