Skip to content

Commit

Permalink
core: skip thresholds check for LSM remove snapshot
Browse files Browse the repository at this point in the history
When remove snapshot is performed at the end of Live Storage Migration
operation, try to remove snapshot even in case of low disk space.
Otherwise "RemoveSnapshotCommand" fails on validation and leaves
3 chunks' snapshot (7.5 GiB) unreleased and also reports LSM operation
as failed (though the "move" part actually succeeded).
Due to extra 3 chunks that are temporary used during LSM flow, we
might temporary fall below disk space threshold.
Currently not only that the code leaves unreleased an almost 8 GiB "junk",
but it also leaves the Storage Domain in unhealthy low-space state
which blocks other operations and requires a manual intervention
instead of cleanly recovering from this temporary low disk space
situation by performing the  proper cleanup.

Before the fix:
  2022-08-23 13:14:56,448+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-45) [] EVENT_ID: IRS_DISK_SPACE_LOW_ERROR(201), Critical, Low disk space. iSCSI_SD2 domain has 4 GB of free space.
  2022-08-23 13:16:10,455+03 WARN  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [03ef74cf-ddbe-4027-805f-02631bf96929] Validation of action 'RemoveSnapshot' failed for user admin@internal-authz. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName iSCSI_SD2
  2022-08-23 13:16:15,555+03 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-17) [03ef74cf-ddbe-4027-805f-02631bf96929] Ending command 'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand' with failure.
  2022-08-23 13:16:15,581+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-17) [03ef74cf-ddbe-4027-805f-02631bf96929] EVENT_ID: USER_MOVE_IMAGE_GROUP_FAILED_TO_DELETE_SRC_IMAGE(2,025), Possible failure while deleting iSCSI_VM1_Disk1 from the source Storage Domain iSCSI_SD2 during the move operation. The Storage Domain may be manually cleaned-up from possible leftovers (User:admin@internal-authz).
At the end "iSCSI_SD2" has 4 GiB available, while
"Critical Space Action Blocker" is 5 GiB:

After the fix:
  2022-08-24 11:00:43,164+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48) [] EVENT_ID: IRS_DISK_SPACE_LOW_ERROR(201), Critical, Low disk space. iSCSI_SD2 domain has 4 GB of free space.
We fall below threshold and then automatically recover,
7.5 GiB are returned to the Storage Domain.
At the end "iSCSI_SD2" has 12 GiB available that is above

Signed-off-by: Pavel Bar <pbar@redhat.com>
  • Loading branch information
barpavel committed Aug 24, 2022
1 parent 27dc8ee commit fdf08e5
Showing 1 changed file with 4 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -242,9 +242,9 @@ private void removeImages() {
} else {
RemoveSnapshotSingleDiskParameters parameters = buildRemoveSnapshotSingleDiskParameters(
source, dest, getSnapshotActionType());
ActionReturnValue actionReturnValueturnValue = runInternalActionWithTasksContext(
ActionReturnValue actionReturnValue = runInternalActionWithTasksContext(
getSnapshotActionType(), parameters);
getTaskIdList().addAll(actionReturnValueturnValue.getInternalVdsmTaskIdList());
getTaskIdList().addAll(actionReturnValue.getInternalVdsmTaskIdList());
}

List<Guid> quotasToRemoveFromCache = new ArrayList<>();
Expand Down Expand Up @@ -434,13 +434,13 @@ protected boolean validate() {
* @return True if there is enough space in all relevant storage domains. False otherwise.
*/
private boolean validateStorageDomains() {
MultipleStorageDomainsValidator storageDomainsValidator = getStorageDomainsValidator(getStoragePoolId(), getStorageDomainsIds());
if (DisksFilter.filterImageDisks(getSourceImages()).isEmpty()) {
return true;
}

MultipleStorageDomainsValidator storageDomainsValidator = getStorageDomainsValidator(getStoragePoolId(), getStorageDomainsIds());
return validate(storageDomainsValidator.allDomainsExistAndActive())
&& validate(storageDomainsValidator.allDomainsWithinThresholds())
&& (validate(storageDomainsValidator.allDomainsWithinThresholds()) || getParameters().getParentCommand() == ActionType.LiveMigrateDisk)
&& validate(storageDomainsValidator.allDomainsHaveSpaceForMerge(getAllDisksSnapshot(getSourceImages()), getSnapshotActionType()))
&& validate(storageDomainsValidator.isSupportedByManagedBlockStorageDomains(getActionType()));
}
Expand Down

0 comments on commit fdf08e5

Please sign in to comment.