Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Chromium instability on self hosted github actions running on EKS #15953

Closed
Meemaw opened this issue Jul 26, 2022 · 5 comments
Closed

[BUG] Chromium instability on self hosted github actions running on EKS #15953

Meemaw opened this issue Jul 26, 2022 · 5 comments

Comments

@Meemaw
Copy link

Meemaw commented Jul 26, 2022

Context:

  • Playwright Version: 1.21.1
  • Operating System: Linux
  • Node.js version: 16.11.0
  • Browser: Chromium
  • Extra: Running within self hosted github action runners running on EKS (k8s)

Describe the bug

We are seeing extreme instability of Chromium browser on our Playwright E2E test suite running within self hosted github actions on EKS (k8s).

This is being surfaces as even the most basic operations timing out, e.g page.reload() or page.goto with a timeout of 15000ms (browser just freezes).

Enabling debug logs for browsers via DEBUG=pw:browser* exposed some errors that might be relevant:

[pid=4307][err] [4307:4386:0726/122121.955851:ERROR:zygote_host_impl_linux.cc(263)] Failed to adjust OOM score of renderer with pid 4387: Permission denied (13) +81ms
[pid=4249][err] [4249:4357:0726/122122.079424:ERROR:bus.cc(398)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix") +0ms

I can provide full log output through a separate channel.

Note that the same test suite works without issues on a CirleCI machine executor (VM) setup.

@pavelfeldman
Copy link
Member

pavelfeldman commented Jul 26, 2022

It looks similar to jlandure/alpine-chrome#109. We are already passing --no-sandbox by default, so you can try passing --disable-gpu launch argument. They posted their deployment config https://github.com/Zenika/alpine-chrome/pull/139/files as they were closing the bug, maybe it helps. I assume you are using official docker image or install native dependencies using install-deps.

@Meemaw
Copy link
Author

Meemaw commented Jul 26, 2022

We are starting context like this:

const context = await browser.launchPersistentContext(userDataDir, {
  headless: false,
  args: [
    `--disable-extensions-except=${extensionsArgs}`,
    `--load-extension=${extensionsArgs}`,
    "--no-sandbox",
    "--disable-gpu",
  ],
})

Note that "--no-sandbox" and "--disable-gpu" are the additional arguments we tried on EKS and weren't needed on CirleCI VMs.

We are installing deps using npx playwright@1.21.1 install-deps chromium

@pavelfeldman
Copy link
Member

pavelfeldman commented Jul 26, 2022

No ideas on top of my head. I assume you run with xvfb-run and your headless: false is there because you are testing an extension. This puts your use case into a very narrow niche (extension testing + k8s), there aren't many users like this and I don't think we've seen anything like your issue. I'd try bisecting it using non-persistent contexts, headless operation, persistent w/o extension, etc and figuring out what triggers the issue.

@pavelfeldman
Copy link
Member

We need more information to act on this report. Please file a new one and link to this issue when you get back to it!

@Meemaw
Copy link
Author

Meemaw commented Aug 5, 2022

@pavelfeldman a bizare update on this: we are seeing tests pass/being much more stable in this environment when executing the test command with DEBUG=pw:browser*.

Do you have an explanation on how this could affect the tests?

Relying on this for reliability seems like a mega hack, so wondering if we could figure out what's the underlying cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants