Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Security's Cypress tests on all PRs #167516

Merged
merged 7 commits into from
Oct 6, 2023

Conversation

oatkiller
Copy link
Contributor

@oatkiller oatkiller commented Sep 28, 2023

Summary

Security Solution writes e2e and other tests using Cypress. In the past, these tests, if they failed on a tracked branch, couldn't be easily skipped. They also weren't run in parallelized jobs. For primarily these reasons, they didn't run on most Kibana PRs.

This PR moves these Cypress tests back to the main PR pipeline. Tests that fail on tracked branches create (or update) Github issues which can be used with the skip-test github workflow script to easily skip the failing tests. The pipeline steps are parallelized and run in under 40 minutes.

Open Questions

  • Should this PR enable Serverless Security Defend Workflows Cypress Tests @patrykkopycinski

Some buildkite pipelines that used to run only on Security PRs now run on all PRs:

These steps run on all PRs with these changes

  • Security Solution Cypress Tests (general tests that haven't been organized into an area team)
  • Explore tests
  • Investigations Tests
  • Defend Workflows Tests
  • Defend Workflows Serverless
  • Threat Intel Tests
  • OS Query Tests
  • Security Solution Burning Changed Specs (these run only recently changed specs a few extra times)
  • Security Solution OpenAPI codegen
  • OSQuery burning
  • OSQuery Serverless
And these already run on all PRs
  • Serverless Security Cypress Tests
  • Serverless Explore tests
  • Serverless Investigations Tests

Security Cypress tests run in the main on merge pipeline instead of the on merge unsupported ftrs pipeline:

These steps run in the on merge pipeline with these changes:

  • Security Solution Cypress Tests
  • Explore Cypress Tests
  • Investigations Cypress Tests
  • Defend Workflows Cypress Tests
  • Defend Workflows Serverless Cypress Tests
  • Threat Intelligence Cypress Tests
  • Osquery Cypress Tests
and these already run on the `on merge` pipeline
  • Serverless Security Cypress Tests
  • Serverless Explore - Security Solution Cypress Tests
  • Serverless Investigations - Security Solution Cypress Tests

Additional work to be done:

We need to consolidate build steps, enhance test skipping to support Cypress-grep flags, avoid out-of-memory errors in cypress, enhance parallelization, improve Cypress reporting, and probably other things. These are tracked separately. Reach out to me if you need details.

For maintainers

@oatkiller oatkiller force-pushed the enable-security-cypress-tests branch from 6a36349 to 6421112 Compare September 28, 2023 14:48
@@ -82,7 +82,6 @@ const uploadPipeline = (pipelineContent: string | object) => {
GITHUB_PR_LABELS.includes('ci:all-cypress-suites')
) {
pipeline.push(getPipeline('.buildkite/pipelines/pull_request/security_solution.yml'));
pipeline.push(getPipeline('.buildkite/pipelines/pull_request/defend_workflows.yml'));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the body of this file directly into base.yml, so it will always run.

@@ -1,40 +1,4 @@
steps:
- command: .buildkite/scripts/steps/functional/security_solution.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were moved to the pull_request base.yml so they will always run.

@@ -1,18 +1,4 @@
steps:
- command: .buildkite/scripts/steps/functional/osquery_cypress.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved to the base.yml so it will always run.

@@ -1,26 +0,0 @@
steps:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was all moved to base.yml

@@ -80,6 +80,30 @@ steps:
# - exit_status: '*'
# limit: 1

- command: .buildkite/scripts/steps/functional/defend_workflows.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defend Workflows Cypress tests, including serverless tests, will run on all PRs now.

@oatkiller oatkiller marked this pull request as ready for review September 28, 2023 20:13
@oatkiller oatkiller requested a review from a team as a code owner September 28, 2023 20:13
@jbudz
Copy link
Member

jbudz commented Sep 28, 2023

🥳 !! How do we feel about including these in the on-merge pipeline too? By expanding the scope to all pull requests, operations will typically be the contact for triage. Our notifications come through via on-merge - too much noise on pull requests.

@MadameSheema
Copy link
Member

🥳 !! How do we feel about including these in the on-merge pipeline too? By expanding the scope to all pull requests, operations will typically be the contact for triage. Our notifications come through via on-merge - too much noise on pull requests.

Completely agree with this!! @oatkiller when doing that change, please remember to remove our tests from .buildkite/pipelines/on_merge_unsupported_ftrs.yml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oatkiller I believe we should do the same for the threat intelligence cypress tests.

@PhilippeOberti may you please confirm if those tests are still owned by us? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oatkiller I believe we should do the same for the threat intelligence cypress tests.

@PhilippeOberti may you please confirm if those tests are still owned by us? Thanks!

I'd say they are yes, as @lgestc and I are the ones who worked on them, and we're both in the Investigations' team

@MadameSheema

This comment was marked as resolved.

@oatkiller

This comment was marked as resolved.

@oatkiller oatkiller added the release_note:skip Skip the PR/issue when compiling release notes label Sep 29, 2023
@oatkiller

This comment was marked as resolved.

@@ -76,7 +76,6 @@ const uploadPipeline = (pipelineContent: string | object) => {
GITHUB_PR_LABELS.includes('ci:all-cypress-suites')
) {
pipeline.push(getPipeline('.buildkite/pipelines/pull_request/security_solution.yml'));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files now just run:

  • cypress burn
  • OpenAPI codegen
  • Serverless OS Query tests

I don't think these steps match the doAnyChangesMatch call above.

For example, changes to /^fleet_packages.json/ probably shouldn't trigger those actions. Also, all PRs will run our tests now.

With that in mind, I think we need to re-evaluate these filters.

@xcrzx I believe you added the OpenAPI codegen step. Do you know when it needs to run?

@patrykkopycinski Should Serverless Osquery Cypress Tests run on all PRs now?

Copy link
Contributor

@xcrzx xcrzx Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xcrzx I believe you added the OpenAPI codegen step. Do you know when it needs to run?

We need to run it whenever there are changes to schema files (*.schema.yaml) or generated artifacts (*.gen.ts) in Security Solution. But as we're making the codegen script configurable, other teams might also start using it in their plugins.

UPD: We also need to run codegen on any changes to packages/kbn-openapi-generator

Copy link
Contributor

@patrykkopycinski patrykkopycinski Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, all Servereless Cypress Tests should run on all PRs

@jbudz
Copy link
Member

jbudz commented Sep 29, 2023

do you have an opinion here? I think either approach is acceptable.

We don't have a great UX yet for soft failures. When we had them enabled for serverless tests PR comments displaying Build failed but still allowing merges caused a lot of confusion. I'm not opposed to further consideration and we can improve on the PR comment, but would it be okay to follow up with a separate issue?

@@ -49,66 +49,6 @@ steps:
- exit_status: '*'
limit: 1

- command: .buildkite/scripts/steps/functional/security_solution.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were moved to on_merge.yml

@@ -114,6 +114,66 @@ steps:
- exit_status: '*'
limit: 1

- command: .buildkite/scripts/steps/functional/security_solution.sh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were copied from on_merge_unsupported_ftrs.yml

Copy link
Member

@jbudz jbudz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Timing wise, I would prefer waiting until the 8.11 branch is cut before merging. If we're intentionally targeting this before, no problem

@MadameSheema
Copy link
Member

@MadameSheema @patrykkopycinski @jbudz do you have an opinion here? I think either approach is acceptable.

@oatkiller Not a strong opinion on any of the above solutions if both are valid. My only thought is that if possible, would be great to minimize the number of files where our code lives, in this way is going to be easier to maintain the future.

@oatkiller I believe we should do the same for the threat intelligence cypress tests.
@PhilippeOberti may you please confirm if those tests are still owned by us? Thanks!
I'd say they are yes, as @lgestc and I are the ones who worked on them, and we're both in the Investigations' team

Are we going to move the threat intelligence tests as well?

LGTM. Timing wise, I would prefer waiting until the 8.11 branch is cut before merging. If we're intentionally targeting this before, no problem.

Agree! It would be nice to merge it if possible after 8.11 branch cut to avoid possible issues.

@oatkiller
Copy link
Contributor Author

Hey all. I’m aiming to enable our Cypress e2e tests on all PRs. Can you help me find the right time to merge this? Tomorrow is feature freeze.

  • Option 1: Merge before feature freeze. If we do this, then all of the PRs in the rush to feature-freeze will run our tests. That’s good news! The possible downside is that our tests may be flaky and we could end up blocking the build. On the other hand, we have a workflow for skipping flaky tests, so we have a way to deal with that.
  • Option 2: Merge soon after FF. Once the 8.11 branch is cut, most changes going into 8.11 will already be in main. This will allow PRs to go into main without running our tests (as they do today) but it de-risks the release process of other teams.
  • Option 3: Merge well after FF. Pick a date in the future (e.g. next week) and merge then.

cc: @paul-tavares @patrykkopycinski @kevinlog @eyalkraft @tehilashn @peluja1012 @marshallmain @yctercero @asnehalb @YulNaumenko @michaelolo24 @stephmilovic @MindyRS @MadameSheema @charlie-pichette @k-g-elastic @epixa @crowens @arpadkiraly

@MadameSheema
Copy link
Member

@oatkiller I vote for option 2, FF is happening tomorrow so the PR cut will arrive soon :)

@kevinlog
Copy link
Contributor

kevinlog commented Oct 2, 2023

@oatkiller I agree with @MadameSheema on option 2.

@oatkiller
Copy link
Contributor Author

cc: @mchopda

@paul-tavares
Copy link
Contributor

Hey @oatkiller - thanks for the ping. I also think option 2 is best.

As a side note: I wonder if some sort of "warning" to others that commit to Kibana should be sent out. I often encounter CI errors while the PR is pending that have nothing to do with the changes introduced by the PR (they are failures in other security solution test suites). I get around them by just keep merging upstream and they eventually generate a 💚 build, but this type of issue (unstable tests - maybe?) will likely a) frustrate others outside of security solution, b) cause an increase in "I need help with your failing tests" in our security team channels.

Just a heads up.

@oatkiller oatkiller requested a review from pheyos October 5, 2023 17:41
@oatkiller
Copy link
Contributor Author

@MadameSheema @patrykkopycinski @jbudz do you have an opinion here? I think either approach is acceptable.

@oatkiller Not a strong opinion on any of the above solutions if both are valid. My only thought is that if possible, would be great to minimize the number of files where our code lives, in this way is going to be easier to maintain the future.

@oatkiller I believe we should do the same for the threat intelligence cypress tests.
@PhilippeOberti may you please confirm if those tests are still owned by us? Thanks!
I'd say they are yes, as @lgestc and I are the ones who worked on them, and we're both in the Investigations' team

Are we going to move the threat intelligence tests as well?

LGTM. Timing wise, I would prefer waiting until the 8.11 branch is cut before merging. If we're intentionally targeting this before, no problem.

Agree! It would be nice to merge it if possible after 8.11 branch cut to avoid possible issues.

I will move the threat intelligence tests as well.

@oatkiller
Copy link
Contributor Author

@MadameSheema @patrykkopycinski @jbudz do you have an opinion here? I think either approach is acceptable.

@oatkiller Not a strong opinion on any of the above solutions if both are valid. My only thought is that if possible, would be great to minimize the number of files where our code lives, in this way is going to be easier to maintain the future.

@oatkiller I believe we should do the same for the threat intelligence cypress tests.
@PhilippeOberti may you please confirm if those tests are still owned by us? Thanks!
I'd say they are yes, as @lgestc and I are the ones who worked on them, and we're both in the Investigations' team

Are we going to move the threat intelligence tests as well?

LGTM. Timing wise, I would prefer waiting until the 8.11 branch is cut before merging. If we're intentionally targeting this before, no problem.

Agree! It would be nice to merge it if possible after 8.11 branch cut to avoid possible issues.

I will move the threat intelligence tests as well.

Thanks. I'll figure out a way to notify some more people.

@oatkiller oatkiller force-pushed the enable-security-cypress-tests branch from 40000d0 to cd133a1 Compare October 5, 2023 18:53
@oatkiller oatkiller requested review from a team as code owners October 5, 2023 18:53
ensure jobs in on_merge have the same labels and parallelism as the equivalent jobs in the pull_request pipeline
ensure all Security cypress tests, including serverless tests, are in on_merge
limit: 1

- command: .buildkite/scripts/steps/functional/defend_workflows_serverless.sh
label: 'Defend Workflows Cypress Tests on Serverless'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrykkopycinski is this right?

@oatkiller
Copy link
Contributor Author

Update!:

Hey all, Thanks for the previous reviews.

I have made a few changes:

  • burn steps run on all PRs now. This seems to work fine.
  • codegen steps for OpenAPI run on all PRs now
  • Threat Intel steps run on all PRs

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@PhilippeOberti PhilippeOberti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the Protections Experience team!

Copy link
Contributor

@nkhristinin nkhristinin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@oatkiller oatkiller merged commit ea0a1a0 into elastic:main Oct 6, 2023
@kibanamachine kibanamachine added v8.12.0 backport:skip This commit does not require backporting labels Oct 6, 2023
dej611 pushed a commit to dej611/kibana that referenced this pull request Oct 17, 2023
## Summary

Security Solution writes e2e and other tests using Cypress. In the past,
these tests, if they failed on a tracked branch, couldn't be easily
skipped. They also weren't run in parallelized jobs. For primarily these
reasons, they didn't run on most Kibana PRs.

This PR moves these Cypress tests back to the main PR pipeline. Tests
that fail on tracked branches create (or update) Github issues which can
be used with the skip-test github workflow script to easily skip the
failing tests. The pipeline steps are parallelized and run in under 40
minutes.

### Open Questions

- [ ] Should this PR enable Serverless Security Defend Workflows Cypress
Tests @patrykkopycinski

### Some buildkite pipelines that used to run only on Security PRs now
run on all PRs:

These steps run on all PRs with these changes

- Security Solution Cypress Tests (general tests that haven't been
organized into an area team)
- Explore tests
- Investigations Tests
- Defend Workflows Tests
- Defend Workflows Serverless
- Threat Intel Tests
- OS Query Tests
- Security Solution Burning Changed Specs (these run only recently
changed specs a few extra times)
- Security Solution OpenAPI codegen
- OSQuery burning
- OSQuery Serverless

<details>
  <summary><b>And these already run on all PRs</b></summary>

  - Serverless Security Cypress Tests
  - Serverless Explore tests
  - Serverless Investigations Tests
</details>

### Security Cypress tests run in the main `on merge` pipeline instead
of the `on merge unsupported ftrs` pipeline:

These steps run in the `on merge` pipeline with these changes:

- Security Solution Cypress Tests
- Explore Cypress Tests
- Investigations Cypress Tests
- Defend Workflows Cypress Tests
- Defend Workflows Serverless Cypress Tests
- Threat Intelligence Cypress Tests
- Osquery Cypress Tests

<details>
<summary><b>and these already run on the `on merge`
pipeline</b></summary>

- Serverless Security Cypress Tests
- Serverless Explore - Security Solution Cypress Tests
- Serverless Investigations - Security Solution Cypress Tests

</details>

### Additional work to be done:

We need to consolidate build steps, enhance test skipping to support
Cypress-grep flags, avoid out-of-memory errors in cypress, enhance
parallelization, improve Cypress reporting, and probably other things.
These are tracked separately. Reach out to me if you need details.

### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.