-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing nixos-rebuild switch
#20886
Comments
Hey @danbst, what about https://nixos.org/nixos/manual/index.html#sec-writing-nixos-tests where it says:
Isn't that what you are looking for in terms of a writable store? Do you have your trials for this kind of test available somewhere? |
Thanks for a pointer. Looks like I skipped large part of documentation. Also, looking at firewall test, I do understand now that my intent can be implemented. But first let's consider two cases:
So, first case is closer to user experience, but hard to implement (how would you know what is As for the second case, it is doable. Here is an example that shows problem with
But it looks hacky.
|
Regarding your second solution: I don't think its as hacky as you think. There is no difference between switching to a modified version of your current profile and switching to a completely different profile. Both are treated as a new profile and nix/nixos only cares about the changes as a convenience to not restart the whole system. Of course your current implementation of 2) does the IP-address change, but that is because of the magic the test-system does with the defined nodes. If there is a way to define the same node with your changes maybe within the 'let' block without the IP-address change, then its perfectly valid. Maybe you can mimic the stuff the test system does to the nodes (basically its configuration for eth0). If we found a way to define several states for one node, we could even add something like Maybe you are onto something here, maybe all tests should be of the kind:
|
I really like the idea of always starting a minimal nixos iso, then switching to the intended config. This should really be baked into the testing procedure. This would be tremendous ! |
related, when i was trying to fix xen, a about 2 minutes after that, a watchdog within the hypervisor would trip, and the machine would hard-reset it made it rather difficult to debug stuff until i had figured that out, and switched to but automatically testing that with nixos's qemu framework would be even more difficult |
Proposal:
|
Some more insights on this issue.
|
One more datapoint for such tests - https://stackoverflow.com/questions/44220338/taskserver-fails-to-start-on-nixos The test should:
|
(cc @aszlig btw ---^) |
Testing for switch to a new configuration is already done in the Taskserver test, but the stackoverflow issue could have been triggered more easily if we'd have a test from an older version to a newer one, not just stable/unstable. I personally have a similar use-case for something like this in my deployments, where I test whether database migrations work successfully. Unfortunately, this needs a lot of imports from derivations and I usually have eval times of several minutes on these deployments :-/ So I guess testing between releases should probably end up in a dedicated jobset, which has two inputs like Unfortunately, I think this won't work for more complicated tests, so I'd propose something like a snapshot command in the Perl test driver, which would save the current machine state (including a memory dump). Having such a snapshot command would also help for speeding up boot times in tests, because all that's needed is to restore the state instead of going through the full boot. The idea of starting with a basic machine would also have that advantage. Also, I've been thinking about making it easier to switch between system configurations, but didn't yet come up with a very good solution, but these are the ones I had in mind:
Had some more ideas, but I'm currently writing-impaired (broken shoulder and typing with one hand) so I won't list all of them as I didn't come up with a good solution that's both simple and declarative yet. |
Another case that should have been caught by rebuild-switch tests: #56134 |
I tried the suggested methods to write a test that tests switch-to-configuration to aid fixing #60180 but with no success. The problem is that both I want a way to build a config which I can switch to, but not create a bogus node for that config, as that let
commonConfig = ./common/letsencrypt/common.nix;
in import ./make-test.nix {
name = "acme";
nodes = rec {
letsencrypt = ./common/letsencrypt;
webserver = { config, pkgs, ... }: {
imports = [ commonConfig ];
networking.firewall.allowedTCPPorts = [ 80 443 ];
networking.extraHosts = ''
${config.networking.primaryIPAddress} a.example.com
'';
services.nginx.enable = true;
services.nginx.virtualHosts."a.example.com" = {
enableACME = true;
forceSSL = true;
locations."/".root = pkgs.runCommand "docroot" {} ''
mkdir -p "$out"
echo hello world > "$out/index.html"
'';
};
};
webserver2 = { config, pkgs, ... }: {
imports = [ webserver ];
networking.extraHosts = ''
${config.networking.primaryIPAddress} b.example.com
'';
services.nginx.virtualHosts."b.example.com" = {
enableACME = true;
forceSSL = true;
locations."/".root = pkgs.runCommand "docroot" {} ''
mkdir -p "$out"
echo hello world > "$out/index.html"
'';
};
};
client = commonConfig;
};
testScript = {nodes, ...}:
let
newServerSystem = nodes.webserver2.config.system.build.toplevel;
switchToNewServer = "${newServerSystem}/bin/switch-to-configuration test";
in
''
$client->waitForUnit("default.target");
$letsencrypt->waitForUnit("default.target");
$letsencrypt->waitForUnit("boulder.service");
$webserver->waitForUnit("nginx.service");
# This step fails already, as "a.example.com" points to two servers
$webserver->waitForUnit("acme-a.example.com.service");
$client->succeed('curl https://a.example.com/ | grep -qF "hello world"');
$webserver->succeed("${switchToNewServer}");
$webserver->waitForUnit("acme-b.example.com.service");
$client->succeed('curl https://b.example.com/ | grep -qF "hello world"');
'';
}
methods doesn't work in this particular case, because both nodes register their domain names and now I have two nodes with the same domain name running, which obviously breaks requesting certificates. So now the tests already fail before doing Any way to have a "new config" and "old config" without creating two nodes? |
@danbst thanks for pointing me to However, it doesn't seem to work in this case. whilst So when you try to use it in a NixOS test, you'll get the following error:
Example test here: let
commonConfig = ./common/letsencrypt/common.nix;
in import ./make-test.nix {
name = "acme";
nodes = {
letsencrypt = ./common/letsencrypt;
webserver = { config, pkgs, ... }: {
imports = [ commonConfig ];
networking.firewall.allowedTCPPorts = [ 80 443 ];
networking.extraHosts = ''
${config.networking.primaryIPAddress} a.example.com
'';
services.nginx.enable = true;
services.nginx.virtualHosts."a.example.com" = {
enableACME = true;
forceSSL = true;
locations."/".root = pkgs.runCommand "docroot" {} ''
mkdir -p "$out"
echo hello world > "$out/index.html"
'';
};
nesting.clone = [
({config, pkgs, ... }: {
networking.extraHosts = ''
${config.networking.primaryIPAddress} b.example.com
'';
services.nginx.virtualHosts."b.example.com" = {
enableACME = true;
forceSSL = true;
locations."/".root = pkgs.runCommand "docroot" {} ''
mkdir -p "$out"
echo hello world > "$out/index.html"
'';
};
})
];
};
client = commonConfig;
};
testScript =
''
$client->waitForUnit("default.target");
$letsencrypt->waitForUnit("default.target");
$letsencrypt->waitForUnit("boulder.service");
$webserver->waitForUnit("nginx.service");
$webserver->waitForUnit("acme-a.example.com.service");
$client->succeed('curl https://a.example.com/ | grep -qF "hello world"');
# Incrementally enabling virtual hosts should just work
$webserver->succeed("/run/current-system/fine-tune/child-1/bin/switch-to-configuration test");
$webserver->waitForUnit("acme-b.example.com.service");
$client->succeed('curl https://b.example.com/ | grep -qF "hello world"');
'';
} |
Because nesting.clone calls 'eval-config.nix' manually, without the 'extraArgs' argument that provides the 'nodes' argument to nixos modules in nixos tests, evaluating of 'nesting.clone' definitions would fail with the following error while evaluating the module argument `nodes' in "<redacted>" while evaluating the attribute '_module.args.nodes' at undefined position: attribute 'nodes' missing, at <redacted./nixpkgs/lib/modules.nix:163:28 by not using 'extraArgs' but a nixos module instead, the nodes parameter gets propagated to the 'eval-config.nix' call that 'nesting.clone' makes too - getting rid of the error. See https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/activation/top-level.nix#L13-L23 See https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/build-vms.nix#L27 See NixOS#20886 (comment)
I just created a PR that fixes these issues. |
Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:
|
There are many tests in nixos to test some functionality but only a few of them test activating or deactivating that functionality. Because of that, NixOS tends to be "don't forget to reboot machine after
nixos-rebuild switch
"Specifically, avahi service works, but enabling it doesn't #19034 . Fix is proposed (#20871), but I think there should be an automated way to test enabling/disabling/upgrade of a service.
I tried some time ago to write such test (for #15815), but was hit with immutable
/nix/store
in test machine, so I have no idea how to do that properly.cast @kampfschlaefer for suggestions
(of course, not everything can be enabled/disabled on
switch
. Perhaps this can be documented as test too)The text was updated successfully, but these errors were encountered: