Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test fails inside Nix sandbox #333

Closed
flokli opened this issue Jan 16, 2021 · 12 comments
Closed

Integration test fails inside Nix sandbox #333

flokli opened this issue Jan 16, 2021 · 12 comments
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. keepalive Never close from staleness

Comments

@flokli
Copy link
Contributor

flokli commented Jan 16, 2021

I tried packaging grafana/agent for NixOS, and ran into an issue while running the tests:

=== RUN   TestIntegrationRegistration
--- PASS: TestIntegrationRegistration (0.00s)
PASS
ok      github.com/grafana/agent/pkg/integrations       2.548s
=== RUN   TestNodeExporter
    node_exporter_test.go:43:
                Error Trace:    node_exporter_test.go:43
                Error:          Received unexpected error:
                                failed to create node_exporter: failed to open sysfs: could not read /sys: stat /sys: no such file or directory
                Test:           TestNodeExporter
                Messages:       failed to setup node_exporter
--- FAIL: TestNodeExporter (0.00s)
=== RUN   TestNodeExporter_IgnoredFlags

Apparently the node exporter integration test tries to access /sys, which doesn't quite work in that sandboxed environment. Could the test be skipped in such cases?

@rfratto
Copy link
Member

rfratto commented Jan 16, 2021

I'd be happy to add a build tag to make tests for that package skippable. I won't be able to work on this for a few more days, but if you want to take a shot at it, you're more than welcome to!

@flokli
Copy link
Contributor Author

flokli commented Jan 19, 2021

For now, while packaging this in nixpkgs I simply removed pkg/integrations/node_exporter/node_exporter_test.go before running the tests.

Inside Nix sandboxed builds, /sys simply doesn't exist. Maybe that test can skip if it detects /sys not being present.

@stale
Copy link

stale bot commented Mar 19, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue/PR mark as stale due lack of activity label Mar 19, 2021
@flokli
Copy link
Contributor Author

flokli commented Mar 20, 2021 via email

@stale stale bot removed the stale Issue/PR mark as stale due lack of activity label Mar 20, 2021
@stale
Copy link

stale bot commented Jun 2, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue/PR mark as stale due lack of activity label Jun 2, 2021
@rfratto rfratto added the keepalive Never close from staleness label Jun 2, 2021
@stale stale bot removed the stale Issue/PR mark as stale due lack of activity label Jun 2, 2021
@rfratto
Copy link
Member

rfratto commented Aug 18, 2021

Hey @flokli, is this still a problem? Are there any other tests that are failing inside Nix that we can filter out?

@flokli
Copy link
Contributor Author

flokli commented Aug 19, 2021

Okay, build flags seem to be the way to add conditional tests.

We could add a integration flag, filter these out, and set it in the drone config and Makefile.

But I'm not about the naming. I don't want to disable all these tests. Some integrations tests seem to work, even inside the sandbox. Should we skip them too, or call this flag with_node_exporter_integration_tests?

Why does the redis integration test not fail? I do see time="2021-08-19T17:05:03Z" level=error msg="Couldn't connect to redis instance" in the logs, but it still seem to succeed…

go test -v -p 12 ./pkg/integrations/redis_exporter
=== RUN   TestRedisCases
=== RUN   TestRedisCases/Default_config
time="2021-08-19T17:05:03Z" level=error msg="Couldn't connect to redis instance"
=== RUN   TestRedisCases/Include_exporter_metrics
time="2021-08-19T17:05:03Z" level=error msg="Couldn't connect to redis instance"
=== RUN   TestRedisCases/Lua_script_read_OK
time="2021-08-19T17:05:03Z" level=error msg="Couldn't connect to redis instance"
=== RUN   TestRedisCases/Lua_script_read_fail
=== RUN   TestRedisCases/no_address_given
=== RUN   TestRedisCases/valid_password_file
time="2021-08-19T17:05:03Z" level=error msg="Couldn't connect to redis instance"
=== RUN   TestRedisCases/invalid_password_file
--- PASS: TestRedisCases (0.01s)
    --- PASS: TestRedisCases/Default_config (0.00s)
    --- PASS: TestRedisCases/Include_exporter_metrics (0.00s)
    --- PASS: TestRedisCases/Lua_script_read_OK (0.00s)
    --- PASS: TestRedisCases/Lua_script_read_fail (0.00s)
    --- PASS: TestRedisCases/no_address_given (0.00s)
    --- PASS: TestRedisCases/valid_password_file (0.00s)
    --- PASS: TestRedisCases/invalid_password_file (0.00s)
PASS

@rfratto
Copy link
Member

rfratto commented Oct 1, 2021

Sorry for taking so long to get back to you @flokli, things have been busy and I lost track of some of these issues.

This is a really interesting problem to have, and I'm not sure how to break this down. What are the limitations of the sandbox? No network, no access to host folders like /sys or /dev that something like node_exporter might use?

It's starting to sound more and more like we might just need a sandbox tag, but I'm starting to question whether it's worthwhile running these tests in a restricted environment anyway. It'd ensure some base behavior of the Agent is functional, but could miss out on other things being broken. What do you think?

@flokli
Copy link
Contributor Author

flokli commented Oct 7, 2021

Sorry for taking so long to get back to you @flokli, things have been busy and I lost track of some of these issues.

No worries, thanks for getting back to me! :-)

What are the limitations of the sandbox? No network, no access to host folders like /sys or /dev that something like node_exporter might use?

Yeah, certainly no network access, only private networking. I assume the sandbox also prevents access to some of these folders, to rule out hardware-specific impurities leaking into builds. /dev and /proc are also very minimal, not sure about /sys.

It's not super well documented, but https://github.com/NixOS/nix/blob/bedd12ec14062bb23bbd87dd892219b84abbcce9/src/libstore/build/local-derivation-goal.cc#L1580 should get you an idea if you want to know exactly.

--

More generally: What are the tests supposed to test? Why do the redis ones succeed, even though there's obviously no redis database started, and why does the node-exporter one fail even though it might only not be able to get some metrics?

@rfratto
Copy link
Member

rfratto commented Oct 7, 2021

Yeah, certainly no network access, only private networking. I assume the sandbox also prevents access to some of these folders, to rule out hardware-specific impurities leaking into builds. /dev and /proc are also very minimal, not sure about /sys.

It's not super well documented, but https://github.com/NixOS/nix/blob/bedd12ec14062bb23bbd87dd892219b84abbcce9/src/libstore/build/local-derivation-goal.cc#L1580 should get you an idea if you want to know exactly.

Thanks, this is helpful!

More generally: What are the tests supposed to test? Why do the redis ones succeed, even though there's obviously no redis database started, and why does the node-exporter one fail even though it might only not be able to get some metrics?

I don't think we've been very consistent in testing these, but for integrations in particular, I'd say there's a few things we'd be interested in looking at:

  1. Testing the translation from the agent's integration to the exporter's config (if a translation exists, sometimes the exporter's config is sufficient).
  2. A test where where we test the entire integration (including the exporter)

The first case can probably just be performed as a unit test safely within a sandbox. The second case almost always requires network access (or root FS access for node_exporter), and it sounds like that's where we'd want to start using the sandbox flag.

@flokli
Copy link
Contributor Author

flokli commented Oct 7, 2021

Yeah, this sounds like a good plan 👍

@rfratto
Copy link
Member

rfratto commented Jun 8, 2022

I'd like to close this as part of our normal bug cleanup process. If we run into similar issues again, we can open a new issue for tracking.

@rfratto rfratto closed this as completed Jun 8, 2022
@rfratto rfratto closed this as not planned Won't fix, can't repro, duplicate, stale Jun 8, 2022
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 22, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. keepalive Never close from staleness
Projects
None yet
Development

No branches or pull requests

2 participants