Skip to content

Commit

Permalink
core: skip thresholds validation for remove snapshot
Browse files Browse the repository at this point in the history
Try to remove snapshot even in case of low Storage Domain disk space.
Otherwise, when remove snapshot is performed  at the end of Live Storage
Migration operation, the "RemoveSnapshotCommand" fails on validation
and leaves 3 chunks' snapshot (7.5 GiB) unreleased. Additionally, the LSM
operation is reported as failed, though the "move" part actually succeeded.
Due to extra 3 chunks that are temporary used during LSM flow, we
might temporary fall below disk space threshold.
Currently not only that the code leaves unreleased an almost 8 GiB "junk",
but it also leaves the Storage Domain in unhealthy low-space state
which blocks other operations and requires a manual intervention
instead of cleanly recovering from this temporary low disk space
situation by performing the  proper cleanup.

Before the fix:
  2022-08-23 13:14:56,448+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-45) [] EVENT_ID: IRS_DISK_SPACE_LOW_ERROR(201), Critical, Low disk space. iSCSI_SD2 domain has 4 GB of free space.
  2022-08-23 13:16:10,455+03 WARN  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [03ef74cf-ddbe-4027-805f-02631bf96929] Validation of action 'RemoveSnapshot' failed for user admin@internal-authz. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName iSCSI_SD2
  2022-08-23 13:16:15,555+03 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-17) [03ef74cf-ddbe-4027-805f-02631bf96929] Ending command 'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand' with failure.
  2022-08-23 13:16:15,581+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-17) [03ef74cf-ddbe-4027-805f-02631bf96929] EVENT_ID: USER_MOVE_IMAGE_GROUP_FAILED_TO_DELETE_SRC_IMAGE(2,025), Possible failure while deleting iSCSI_VM1_Disk1 from the source Storage Domain iSCSI_SD2 during the move operation. The Storage Domain may be manually cleaned-up from possible leftovers (User:admin@internal-authz).
At the end "iSCSI_SD2" has 4 GiB available,
which is below the 5 GiB "Critical Space Action Blocker".

After the fix "RemoveSnapshotCommand" executes successfully:
  2022-08-24 11:00:43,164+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48) [] EVENT_ID: IRS_DISK_SPACE_LOW_ERROR(201), Critical, Low disk space. iSCSI_SD2 domain has 4 GB of free space.

We fall below threshold and then automatically recover,
7.5 GiB is returned to the Storage Domain.
At the end "iSCSI_SD2" correctly has 12 GiB available,
which is above the 5 GiB "Critical Space Action Blocker".

Signed-off-by: Pavel Bar <pbar@redhat.com>
  • Loading branch information
barpavel authored and ahadas committed Aug 28, 2022
1 parent ffedbdd commit a952c95
Showing 1 changed file with 3 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -242,9 +242,9 @@ private void removeImages() {
} else {
RemoveSnapshotSingleDiskParameters parameters = buildRemoveSnapshotSingleDiskParameters(
source, dest, getSnapshotActionType());
ActionReturnValue actionReturnValueturnValue = runInternalActionWithTasksContext(
ActionReturnValue actionReturnValue = runInternalActionWithTasksContext(
getSnapshotActionType(), parameters);
getTaskIdList().addAll(actionReturnValueturnValue.getInternalVdsmTaskIdList());
getTaskIdList().addAll(actionReturnValue.getInternalVdsmTaskIdList());
}

List<Guid> quotasToRemoveFromCache = new ArrayList<>();
Expand Down Expand Up @@ -434,13 +434,12 @@ protected boolean validate() {
* @return True if there is enough space in all relevant storage domains. False otherwise.
*/
private boolean validateStorageDomains() {
MultipleStorageDomainsValidator storageDomainsValidator = getStorageDomainsValidator(getStoragePoolId(), getStorageDomainsIds());
if (DisksFilter.filterImageDisks(getSourceImages()).isEmpty()) {
return true;
}

MultipleStorageDomainsValidator storageDomainsValidator = getStorageDomainsValidator(getStoragePoolId(), getStorageDomainsIds());
return validate(storageDomainsValidator.allDomainsExistAndActive())
&& validate(storageDomainsValidator.allDomainsWithinThresholds())
&& validate(storageDomainsValidator.allDomainsHaveSpaceForMerge(getAllDisksSnapshot(getSourceImages()), getSnapshotActionType()))
&& validate(storageDomainsValidator.isSupportedByManagedBlockStorageDomains(getActionType()));
}
Expand Down

0 comments on commit a952c95

Please sign in to comment.