Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping tests with knapsack_pro #147

Closed
sulami opened this issue Mar 21, 2024 · 16 comments · Fixed by #172 or #180
Closed

Dropping tests with knapsack_pro #147

sulami opened this issue Mar 21, 2024 · 16 comments · Fixed by #172 or #180
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@sulami
Copy link

sulami commented Mar 21, 2024

Current behaviour
This is a similar issue to #130 in its results, but with a different root cause (I suspect). We're observing a lot of tests and suites being discarded without a session or module ID.

We're using RSpec + knapsack_pro (in queue mode) to run tests in CI. Knapsack runs the tests in small batches which are dynamically distributed to worker nodes, which speeds up overall test runtime.

knapsack_pro 7 made a lot of changes to the runner, which might be related. It seems like we're losing track of the session and module through this process somehow.

I've confirmed that knapsack_pro 6 is fine, so the massive PR is at fault somewhere. I've also confirmed that it only occurs in queue mode. I'm not entirely sure which project to log this on, but it feels like something that's probably easier to fix here(?)

Expected behaviour
Tests and suites have the correction session and module IDs and make it to Datadog.

Steps to reproduce

  1. Setup a project with RSpec & test tracing (standard setup, nothing fancy)
  2. Run tests with Knapsack Pro 7 in queue mode according to the docs, effectively setting a few env vars and running bundle exec rake knapsack_pro:queue:rspec

Annoyingly this requires a paid Knapsack Pro account. I'm happy to run anything to help on ours to aid in debugging.

Environment

  • datadog-ci-rb version: 0.8.3
  • Configuration block (Datadog.configure ...):
require "datadog/ci"

Datadog.configure do |c|
    c.tracing.enabled = (ENV["DD_ENV"] == "ci")
    c.ci.enabled = true
    c.service = "komoju-ci"
    c.ci.instrument :rspec
end
  • Test framework version: rspec 3.13.0
  • CI Provider: GH Actions
  • Ruby version: 3.2.3
  • Operating system: Linux
@sulami sulami added the bug Something isn't working label Mar 21, 2024
@anmarchenko
Copy link
Member

anmarchenko commented Mar 21, 2024

Hi! sorry that you ran into this issue!

Unfortunately it is not possible for us to instrument all the existing test runners from the start: there are so many of them and most of them skew the default way to run test suites. We need to instrument every such case in a custom way to propagate context and ensure that all tests are connected to sessions, modules, and suites.

Currently we don't support knapsack_pro (TIL about it), but there is a workaround for you: you actually can use Datadog test visibility without session/module/suites info in so called "legacy test level visibility mode". You will still have information on tests performance and flaky tests detection in this mode.

To use it, configure datadog/ci like that:

require "datadog/ci"

if ENV["DD_ENV"] == "ci"
  Datadog.configure do |c|
    c.ci.enabled = true

    # this line sets test visibility mode
    c.ci.force_test_level_visibility = true

    c.service = "komoju-ci"
    c.ci.instrument :rspec
  end
end

As for knapsack_pro instrumentation, I will mark this issue as a feature request and add it to our backlog. Contributions are always welcome if you would like to take a stab on that yourself!

@anmarchenko anmarchenko added enhancement New feature or request and removed bug Something isn't working labels Mar 21, 2024
@sulami
Copy link
Author

sulami commented Mar 22, 2024

Alright, I guess that works for us in the interim, thank you.

@timlkelly
Copy link

timlkelly commented Mar 28, 2024

I am also experiencing this issue. However, when I try KnapsackPro 7.0.1 with c.ci.force_test_level_visibility = true, it does not work, and I see the following debug log:

D, [2024-03-27T21:41:56.593695 #93] DEBUG -- ddtrace: [ddtrace] (/usr/local/bundle/gems/datadog-ci-0.8.3/lib/datadog/ci/test_visibility/recorder.rb:305:in `fix_test_suite!') Trying to fix test suite for test [add some examples to (or delete) /app/packs/profile/spec/models/inferred_employer_interest_job_spec.rb] but no single test suite is running.

@anmarchenko, any thoughts about this message? I imagine it might have something to do with the Knapsack configuration

@sulami Are you able to share any more of the Knapsack configuration that allowed this to work with 7.0.1?


As for knapsack_pro instrumentation, I will mark this issue as a feature request and add it to our backlog. Contributions are always welcome if you would like to take a stab on that yourself!

@anmarchenko are there details about how to add test runner instrumentation to datadog-ci-rb? Where would you suggest to start?

@sulami
Copy link
Author

sulami commented Mar 28, 2024

@sulami Are you able to share any more of the Knapsack configuration that allowed this to work with 7.0.1?

It's literally just the original config I shared plus the force_test_level_visibility line.

We also switched from the rake task to using the knapsack_pro binary, which is recommended in the docs to fix various issues, so our invocation is now bundle exec knapsack_pro queue:rspec.

@anmarchenko
Copy link
Member

anmarchenko commented Mar 28, 2024

Hi @timlkelly!

I am puzzled that you don't see your tests with force_test_level_visibility config as it should just work. The debug line you mentioned is irrelevant in this case because it is stating that no test_suite is running at the time (and it shouldn't, because test suites are not tracked in test level visibility mode). I should remove this log when force_test_level_visibility == true.

Did it work for you with bundle exec knapsack_pro queue:rspec command that @sulami shared? If not, send me the full debug log to andrey.marchenko at datadoghq.com, I will dig into why it doesn't work for you.

As for adding the knapsack_pro instrumentation: it is a question of discovery what knapsack pro does with RSpec runner so that Datadog::CI.start_test_session()/test_session.finish and Datadog::CI.start_test_module()/test_module.finish are not called. You can see them in our RSpec instrumentation here. The rest is finding a way to detect that tests are running under knapsack_pro and instrumenting their runner as done here.

I see that there is some considerate interest in knapsack_pro integration, so I will definitely prioritize this after Intelligent Test Runner and code coverage for Ruby are released, so I would say some time in Q3 2024 it could happen (just an estimation).

@timlkelly
Copy link

@anmarchenko I did switch from the rake command to the knapsack binary and saw no change. Thank you, I will compile some logs and send them to you! I appreciate the help.

@timlkelly
Copy link

Ah! @anmarchenko I did more testing and discovered it was not working as expected because of the competing Datadog configuration from the Rails initializer. Specifically, the initializer set c.tracing.enabled = false, while the Datadog configuration with RSpec did not have that option

The documentation I read suggested that this was an optional setting for Datadog::CI, so I wanted to share my findings. I did not realize that disabling tracing would stop all CI Visibility. This may be my naivety to Datadog's entire suite of tools.

@anmarchenko
Copy link
Member

Oh yes, @timlkelly it happens a lot when tracing is disabled by accident when running tests. We had very unfortunate recomendation in our docs before to use c.tracing.enabled = (ENV["DD_ENV"] == "ci") in order to turn on/off test visibility. Now we recommend using the following approach:

if ENV["DD_ENV"] == "ci"
  Datadog.configure do |c|
    # Configures the tracer to ensure results delivery
    c.ci.enabled = true

    # The name of the service or library under test
    c.service = 'my-ruby-app'

    # Enables the RSpec instrumentation
    c.ci.instrument :rspec
  end
end

... and never touch the c.tracing.enabled setting but it still happens that tracing is disabled by some conflicting config. I will think about making it more obvious that test visibility is built upon tracing and requires it to be enabled.

I am happy that you found the issue and it works now

@timlkelly
Copy link

timlkelly commented Apr 9, 2024

As you said above, I know that adding Knapsack support is not an active work effort (and may never be), but I wanted to share what I've found. In case it's helpful in the future or at least provide a little more context on what's happening.

Knapsack Queue implements its own runner, which is why Datadog CI is unable to start/finish the test session and modules. While Knapsack does prepend the RSpec Runner with their own extensions, it also uses knapsack__ prefixed methods. The core method that Datadog prepends, #run_specs is now #knapsack__run_specs. Because of this, the original prepended RSpec method #run_specs is never called.

The change in method naming makes it a little more difficult to follow the existing approach of prepending Datadog on top of the RSpec methods.

For some context, I created a new Rails app (ruby 3.3, rails 7.1). Then added rspec-rails (3.13), ddtrace (1.21), and knapsack_pro (7.0.1). Configured ddtrace and knapsack accordingly and then created a few simple RSpec tests. Here is a gist of a couple of call stacks, one with only RSpec and another with a Knapsack. This was created by pausing execution within the example spec. Hope this can give some insight into what Knapsack is doing differently.

This is only Knapsack Queue. I haven't looked at the non-queue Knapsack runner yet. I hope to spend more time digging into this to find a solution but just wanted to knowledge share my findings so far 😃

@anmarchenko
Copy link
Member

anmarchenko commented Apr 9, 2024

Thanks for your insights @timlkelly! The main challenge in the instrumentation would be to create a test suite to reproduce knapsack_pro behaviour in our datadog-ci's tests.

Maybe we could just (hehe) instrument knapsack__run_specs directly, but there could be a catch: if CI nodes that knapsack_pro use to run tests are not forked from main process but managed somehow by external processes then we would need to propagate context from the main process that runs specs to all of the queue runners (which might be impossible without submitting a patch to knapsack_pro itself). We have kind of a similar issue with Circle CI parallel runners and for them our current approach is to treat each of the parallel runs as a separate test session (which is far from optimal but gives us at least some level of visibility).

@ArturT
Copy link

ArturT commented Apr 29, 2024

@timlkelly , your diagnosis is correct here
#147 (comment)
We added custom methods to RSpec with the knapsack__ prefix in the knapsack_pro 7.x version.

This is only Knapsack Queue. I haven't looked at the non-queue Knapsack runner yet.

We introduced changes only for Knapsack Pro Queue Mode.

Regarding to what @anmarchenko said here #147 (comment)
I think you would have to override the knapsack__run_specs behaviour as it's done for RSpec here:

test_session = CI.start_test_session(
tags: {
CI::Ext::Test::TAG_FRAMEWORK => Ext::FRAMEWORK,
CI::Ext::Test::TAG_FRAMEWORK_VERSION => CI::Contrib::RSpec::Integration.version.to_s
},
service: datadog_configuration[:service_name]
)
test_module = CI.start_test_module(Ext::FRAMEWORK)
result = super
return result unless test_module && test_session
if result != 0
test_module.failed!
test_session.failed!
else
test_module.passed!
test_session.passed!
end
test_module.finish
test_session.finish

 Annoyingly this requires a paid Knapsack Pro account. I'm happy to run anything to help on ours to aid in debugging.

You can create a free account for testing purposes https://knapsackpro.com/

Maybe we could just (hehe) instrument knapsack__run_specs directly, but there could be a catch: if CI nodes that knapsack_pro use to run tests are not forked from main process but managed somehow by external processes then we would need to propagate context from the main process that runs specs to all of the queue runners (which might be impossible without submitting a patch to knapsack_pro itself). 

The knapsack_pro command is started on each parallel CI node as a new process. We do not fork the Ruby process.
The knapsack_pro gem communicates with the Knapsack Pro API, which is responsible for orchestrating tests. More about Queue Mode here: https://docs.knapsackpro.com/overview/#queue-mode-dynamic-split

Does it make it easier?
I'm not familiar with DataDog but what we do to know that tests data are comming from the same CI build when it runs many parallel CI nodes, we simily look at commit hash, branch name, number of parallel nodes and CI build ID provided by CI provider. You can read these data from the CI provider env vars. Example env vars we read from Github Actions: https://github.com/KnapsackPro/knapsack_pro-ruby/blob/master/lib/knapsack_pro/config/ci/github_actions.rb

The main challenge in the instrumentation would be to create a test suite to reproduce knapsack_pro behaviour in our datadog-ci's tests.

We have integration tests for RSpec in Knapsack Pro Queue Mode to reproduce different RSpec configurations. Perhaps you could use that:
https://github.com/KnapsackPro/knapsack_pro-ruby/blob/45d7c135eb92f13fe8eea7260825c644e5111ddb/spec/integration/runners/queue/rspec_runner_spec.rb#L381

Alternatively, we have a Rails app with an example test suite:
https://github.com/KnapsackPro/rails-app-with-knapsack_pro
You can use one of the bin scripts and replace the API token with yours.
https://github.com/KnapsackPro/rails-app-with-knapsack_pro/blob/master/bin/knapsack_pro_queue_rspec_record_first_run
You can play with Knapsack env vars to simulate different CI configurations.
Docs for our env vars: https://docs.knapsackpro.com/ruby/reference/

I hope this helps.

@anmarchenko
Copy link
Member

@ArturT thanks a lot for the info, it helps!

If knapsack-pro does not but rather run a specific subset of tests on each independent runner, I think for the first version our best chance would to instrument each runner as a separate test session so that we would have N test sessions if test suite is split across N test runners.

After that we'll see if any deeper integration is possible using knapsack API to propagate Datadog context (test_session_id) across test runners: maybe we won't have to do that and simple runners instrumentation will be quite enough.

@ArturT
Copy link

ArturT commented May 2, 2024

After that we'll see if any deeper integration is possible using knapsack API to propagate Datadog context (test_session_id) across test runners: maybe we won't have to do that and simple runners instrumentation will be quite enough.

You can use a CI build ID for the Datadog context (test_session_id) to ensure all subsets of the test suite from parallel nodes are merged into one test suite on the DataDog side.

You would have to detect CI build IDs based on different CI provider env vars.
The knapsack_pro gem detects a CI provider and reads its CI build ID.
KnapsackPro::Config::Env.ci_node_build_id returns CI build ID.

For example, if Knapsack Pro is running inside of the Github Actions then CI build ID is based on GITHUB_RUN_ID.

Here is a list of supported CI providers.

@anmarchenko anmarchenko self-assigned this May 6, 2024
@anmarchenko anmarchenko added this to the 1.0 milestone May 6, 2024
@anmarchenko anmarchenko modified the milestone: 1.0 May 7, 2024
@anmarchenko
Copy link
Member

@sulami @timlkelly I added Knapsack Pro support for test sessions, it will be released together with 1.0 release next week.

To upgrade follow the upgrade guide here: https://github.com/DataDog/datadog-ci-rb/blob/main/docs/UpgradeGuide.md

If you use other Datadog products (APM/ASM/profiling), you'll need to upgrade to 2.0 of datadog library too: https://github.com/DataDog/dd-trace-rb/blob/master/docs/UpgradeGuide2.md

Please share your experience after upgrading! I want to know everything you encounter when running your tests with Knapsack and Datadog to see if there is anything else I can do to make it better.

@anmarchenko
Copy link
Member

Reopening this issue because the following problems were found:

  • currently Datadog traces knapsack_pro:rspec_test_example_detector which is only used by knapsack to collect test examples - this creates a lot of noise in test runs page
  • there are reports of knapsack still dropping tests when running knapsack_pro:rspec_go commands even with 1.0.0.beta4 version

@anmarchenko anmarchenko reopened this May 21, 2024
@3v0k4
Copy link

3v0k4 commented May 21, 2024

@anmarchenko in the next few days @ArturT is unavailable, but I'm around if you have specific questions related to Knapsack Pro I can help with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants