-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport: prune bitmaps fix #360
backport: prune bitmaps fix #360
Conversation
@aesteve-rh I see that the first part that is taken from #346 is related to a different bug, https://bugzilla.redhat.com/show_bug.cgi?id=2125290, and seems unrelated to the other changes - shall we clone bz 2125290 to make it visible to QE? where did the issue that it addressed came from (QE/users-list/out test)? |
Appeared in OST run logs, but the error was ignored, and the OST jobs succeeded anyway, so it went unnoticed until logs were inspected. This should have low overall impact, as having a host with no lvm devices is rare. Nonetheless, as we are going to do another build, I thought it may be nice to include. |
ack, we had a similar case on ovirt-engine and we backported the fix without having a d/s bug
right, definitely makes sense to include it, the question is how :) |
a1b686e
to
9c9f881
Compare
OK, please add this PR to the bug. |
It is already tracked in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2141371 |
I see, OK. |
Add flag warn_stderr to Command class to log to WARNING level instead of DEBUG level if the command execution has data in the stderr output, even if the command finished succesfully. Also, the logged message changes if the new warn_stderr flag is set, printing the command that originated the error output. Legacy debug logging remains unchanged. ProgressCommand in qemuimg will enable the flag so that qemuimg commands show warning on errors. Example: 09:49:34,851 WARNING (MainThread) [storage.operation] Command ['/usr/bin/qemu-img', 'commit', '-p', '-t', 'none', '-b'...] succeeded with warnings: bytearray(b"qcow2_free_clusters failed: No space left on device\nqemu-img: Lost persistent bitmaps during inactivation of node \'#block365\': Failed to write bitmap \'stale-bitmap5\' to file: No space left on device\nqemu-img: Failed to flush the refcount block cache: No space left on device\n") (operation:159)'] Signed-off-by: Albert Esteve <aesteve@redhat.com>
Add merge test in block volumes, to have a realistic scenario with volumes and bitmaps. When bitmaps are added to the base volume after the top volume is create, measuring the top volume will not consider the base bitmaps size, even with backing enabled. This will result in not performing an extend and thus, failing the merge process with `Failed to write bitmap 'xxx' to file: No space left on device`. This test has been used to reproduce the error in https://bugzilla.redhat.com/2141371 Signed-off-by: Albert Esteve <aesteve@redhat.com>
Add prune_bitmaps function, to remove all stale bitmaps from a base volume (that are missing from the top volume). Signed-off-by: Albert Esteve <aesteve@redhat.com>
Prune the stale base volume bitmaps during the prepare step on a merge operation. These stale bitmaps can cause the merge operation to fail due to 'No space left on device'. In this case, qemu does not end with error, so the failure goes unnoticed. As there is not a reliable way to measure the size of these stale bitmaps, and they are invalid and can never be used for incremental backup, it is better to prune them to avoid the error. Related: oVirt#352 Signed-off-by: Albert Esteve <aesteve@redhat.com>
Fix ResourceAlreadyAcquired raise to use f-string instead of %-format. Also, the %-format is misued, with a comma in between where the '%' symbol should be, causing a bogus output. Before: vdsm.storage.resourceManager.ResourceAlreadyAcquired: ('%s is already acquired by %s', '00_storage.a2ad7a23-96c5-4c23-9007-c0d1892ea07d', 'fb7c6873-74b4-4895-876a-a52ab7bfa082') After: vdsm.storage.resourceManager.ResourceAlreadyAcquired: '00_storage.a2ad7a23-96c5-4c23-9007-c0d1892ea07d is already acquired by fb7c6873-74b4-4895-876a-a52ab7bfa082' Signed-off-by: Albert Esteve <aesteve@redhat.com>
Add prune_bitmaps method to internal hsm api. Also add locking to prune_bitmaps method: - shared lock on the storage domain - exclusive lock on the image Since this should be called only from virt code, and the vm owns the volumes, no other storage code should access them (engine should ensure this), but it is safer to take locks so bad engine request will fail at Vdsm instead of corrupting the volume. By explicitely taking the locks with a guarded context, we do not need to call the _produce_volume helper method, as it already takes a shared lock on the storage domain. Instead, produce the volumes directly. Signed-off-by: Albert Esteve <aesteve@redhat.com>
Prune stale bitmaps before a live merge to avoid failing with ENOSPC. Fixes: oVirt#352 Signed-off-by: Albert Esteve <aesteve@redhat.com>
Record and verify prune_bitmaps calls in the tests. Signed-off-by: Albert Esteve <aesteve@redhat.com>
Add support to fake errors in prune_bitmaps calls to the IRS fake class. Signed-off-by: Albert Esteve <aesteve@redhat.com>
9c9f881
to
3879121
Compare
Commits/PRs cherry picked for backporting (in order):
Signed-off-by: Albert Esteve aesteve@redhat.com