-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS as a module #3107
ZFS as a module #3107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question about storage-init but kicking off the tests in parallel.
@rouming , the riscv64 build is broken: https://github.com/lf-edge/eve/actions/runs/4449491006/jobs/7818263989?pr=3107#step:8:827 |
@@ -34,4 +34,6 @@ zfs_set_default_parameters() { | |||
set_module_parameter zfs zfs_smoothing_write 5 | |||
} | |||
|
|||
# Load ZFS and set parameters | |||
modprobe zfs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the ZFS module is loadable again, I'm wondering if doesn't make sense to pass all parameters during the loading instead of use /sys/module, that was set because module was built-in....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference? All the options start acting only when you do the mount. From kernel perspective I do not see a big difference when to update the static option variables: on modprobe or just after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, there isn't too much difference in terms of execution, but I don't see any reason to keep two functions (set_module_parameter and zfs_set_default_parameters), that were created exclusively because the module was built-in, while we can just load the module and set the parameters at once. By the way, the right way to load the module and pass these parameters is actually by creating a file zfs.conf at /etc/modprobe.d and use the format: option <module_name> <parameter(s)>
, we do this, for instance, in /hostfs/etc/modprobe.d/kms.conf
. And to load zfs module, you can add the module name to /hostfs/etc/modules
, unless I'm missing something that justifies this script for the module loading...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did an attempt but it is not that simple with our container model. The description is here: #3107 (comment) I revoke the patch and leave as it is supposed to be - just a revert in order to fix the licensing problem. I assume we need to revisit this and fix how we call modprobe and make it proper way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the riscv build is failing for some reason.
Can be this line: diff -cw .config .config.new, which silently fails. Will take a look. |
0cd1283
to
37f4061
Compare
Difference to the previous version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
37f4061
to
d815eb5
Compare
Difference to the previous version:
|
d815eb5
to
37e53fc
Compare
Difference to the previous version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run eden
Ok, thanks @rouming . Let's discuss about it and create the proper issue to handle modprobe in a better (and unified) way. LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rouming I've re-run the eden tests about 7 times and I don't think it has passed for any of the zfs variants at any time.
(But it does work if I use the rootfs to update a device running with a ZFS /persist).
So these failures needs some investigation to make sure installation and onboarding works correctly with a ZFS /persist.
37e53fc
to
f27a06a
Compare
Difference to the previous version:
@eriknordmark according to my understanding profile server test cases fail for some reason, for example:
so some of the vms are running, which means that the error not zfs related (I assume if /persist is readonly, could not be mounted, has errors, etc, we could not boot and run vms at all), but maybe memory/time issue? I will check more precisely. Update: also 2023-03-24T12:32:57.9577308Z [pool_name:"persist" storage_type:STORAGE_TYPE_INFO_ZFS zfs_version:"zfs-kmod-2.1.2-1" current_raid:STORAGE_RAID_TYPE_NORAID compression_ratio:1 zpool_size:3489660928 storage_state:STORAGE_STATUS_ONLINE disks:{disk_name:{name:"/dev/sda9" serial:"QM00001"} status:STORAGE_STATUS_ONLINE state:"No error."} pool_status_msg:"OK"] no errors, all good |
Seems like you're correct that this PR doesn't make it any worse. But glancing at eden results for other PRs it seems like 100% of the eden runs for zfs are failing since a week or two (unrelated to this PR). I have no idea whether this is an issue with flaky test(s) which somehow are more brittle when using zfs compared to ext4, or if there is an EVE-OS issue affecting ZFS. It feels scary to continue to accept PR until we know which case it is. So someone needs to root cause this failures before we proceed with any PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kick off tests again
Don't do that! kdump kernel is not for re-formatting the FS, but for attempting to collect a dump. Even the FS is corrupted we don't re-format the FS. Even ZFS modules are not loaded we don't re-format the FS. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
This partially reverts commit 17dd29b. With this revert we build ZFS again as a module. Why? The linux kernel is licensed GPL, but OpenZFS is licensed CDDL. These two are incompatible. So we compile both (kernel and ZFS) and ship them separately. This patch breaks collecting of the kernel dumps for the /persist volume formatted as ZFS. We still jump into the kdump and output the dmesg to the console, but not dump or logs are stored in the /persist volume. This will be fixed in future releases of the EVE-OS Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
f27a06a
to
c395d5c
Compare
@rouming note that this never passes any of the Eden tests with zfs (I think 2x(6+8)=28 tries all failed). So I don't know what the impact will be on neither product quality nor our ability to use the eden tests to test product quaity. Can we please prioritize getting the failing tests fixed? I really hate driving blind. |
@eriknordmark I'm running the following locally on the latest master:
(copied from the zfs run from github actions) All tests successfully pass. From what I see on github, all failures have random nature, like waited for the VM state and timeout expired. My simple assumption is that zfs is greedy to resources and since we don't control the testing environment this is just a matter of unluck to get these failures. @dautovri Can we have more resources on github actions and somehow guarantee dedicated hardware for our tests? |
@rouming I will create a support ticket to increase a size of GH runners, plus we can try to make own self hosted GH runners. |
@dautovri Thanks, Ruslan! That what we definitely need. Let's start from the increasing number of runners. And then self hosted GH. |
This partially reverts commit 17dd29b.
With this revert we build ZFS again as a module. Why?
The linux kernel is licensed GPL, but OpenZFS is licensed CDDL.
These two are incompatible. So we compile both (kernel and ZFS)
and ship them separately.
This patch breaks collecting of the kernel dumps for the /persist
volume formatted as ZFS. We still jump into the kdump and output
the dmesg to the console, but not dump or logs are stored in the
/persist volume. This will be fixed in future releases of the
EVE-OS
cc: @deitch
cc: @eriknordmark