Skip to content
This repository has been archived by the owner on Oct 16, 2022. It is now read-only.

Removing snapshots fails if BTRFS quota groups (qgroups) is disabled #680

Closed
twoktwok opened this issue Dec 2, 2020 · 9 comments
Closed

Comments

@twoktwok
Copy link

twoktwok commented Dec 2, 2020

If ...

  • the option "Enable BTRFS qgroups (recommended)" is disabled in the GUI settings -and-
  • btrfs quotas is disabled on the system (sudo btrfs quota disable /)

... Timeshift fails to completely remove snapshots on the first try.

e. g. (sample output)

-------------
[...]
E: ERROR: can't list qgroups: quotas not enabled

E: btrfs returned an error: 256
E: Failed to query subvolume quota
[...]
-------------
Removing snapshot: 2020-12-02_20-08-29
Deleting subvolume: @ (Id:1513)
Deleted subvolume: @ (Id:1513)

Destroying qgroup: 0/1513
E: Failed to destroy qgroup: '0/1513'
E: Failed to remove snapshot: 2020-12-02_20-08-29
-------------

Problem: On this first try everything except the snapshots "info.json" file is actually deleted. This causes Timeshift to still list this snapshot! A second remove attempt then successfully removes the already deleted snapshot from the list.
This behaviour seriously messes up scheduling because actually deleted snapshots are still counted.

I assume the check and failure to "destroy qgroup" error causes the remove process to exit before info.json and the snapshot folder are deleted.


System:

  • Arch Linux with kernel 5.9.11-arch2-1
  • XFCE
  • Timeshift version 20.11.1

Note: I have experienced some serious system freezes on a low-spec machine caused by [btrfs-cleaner] and [btrfs-transaction] after removing snapshots through timeshift. This issue vanishes when btrfs quotas are disabled. That's when I noticed the above bug.

@ptandler
Copy link

@twoktwok if I have some spare time I will look at the sources and see if this error can be avoided.

At least a check if the option is enabled could be added, I guess.
And also, there might be a way to handle this error without leaving the snapshot dir in an inconsistent state. I could be that the qgroups are disabled although timeshift's option is enabled.

@ptandler
Copy link

I can confirm these error messages: I can find them in the logs when I disabled quota and deselected the option to enable them again (to avoid #697) as you described.

I had some fun browsing the code:

The message "Failed to query subvolume quota" is printed in Main.vala:3979 in method query_subvolume_quota(string subvol_name). This is called via query_subvolume_quotas() by restore_execute_btrfs() in Main.vala:2880 and by query_subvolume_info_thread() in Main.vala:3943.

As far as I understand, the updating of the btrfs snapshot sizes fails. No idea, where this is used for what reason.

The other message "Failed to destroy qgroup" is printed in Subvolume.vala:177 in remove().
This is called by SnapshotRepo.auto_remove() > SnapshotRepo.remove_untagged() > Snapshot.remove(bool)> Snapshot.remove_btrfs()`.

When the qgroup cannot be deleted in Subvolume.remove(), it returns false for an error - and this causes Snapshot.remove_btrfs() also to abort (in Snapshot.vala:530).

The question is (e.g. @teejee2008): What should be the correct behaviour?

  • when qgroups are enabled in the settings, this really is an error and should be reported (to the user via UI some way?). However, after the snapshot and also the subvolume has been deleted already, in order to keep things consistent, maybe the remove_btrfs() should continue to clean up the rest ...?
  • when qgroups are disabled in the settings, this behaviour is the expected one - I think! However, it might make sense to check if quota is enabled and in this case also destroy the qgroup. But when this fails it definitively shouldn't abort the rest of remove. Maybe log an info, but not more.

Once this we agree on the expected behaviour, it shouldn't be too complicated to provide a PR.

I'm looking forward to hear your thoughts.

@thetayloredman
Copy link

Can this please get resolved ASAP? My system hardlocks for 9/10ths of the time it's online due to these stupid quota groups, and if I disable them I may acidentally rollback a snapshot which is already deleted (would this brick my system?)

@Shadow505
Copy link

@teejee2008: Has there been any progress on this? This is a major bug. As said by the previous comment, it is not possible to delete snapshots when qgroups are disabled.

I disabled them both in the timeshift settings and system wide but still, attempting to delete snapshots will fail with the error that has been mentioned here before.

It's also very annoying because when having the system configured to create a snapshot when installing packages, there will be a dozen of error messages due to this bug.

And if you enable qgroups, the system will constantly freeze up for a long time..

I would appreciate if this can be resolved.

Thanks!

@elexx
Copy link

elexx commented Sep 14, 2021

I'd like to give this my +1 as I'm also encountering this problem.

@ptandler
Copy link

As a workaround I do enable quota groups from time to time and disable again after deleting snapshots. This seems to work ... well ...

@l0rdg3x
Copy link

l0rdg3x commented Dec 29, 2021

@teejee2008: Has there been any progress on this? This is a major bug. As said by the previous comment, it is not possible to delete snapshots when qgroups are disabled.

I disabled them both in the timeshift settings and system wide but still, attempting to delete snapshots will fail with the error that has been mentioned here before.

It's also very annoying because when having the system configured to create a snapshot when installing packages, there will be a dozen of error messages due to this bug.

And if you enable qgroups, the system will constantly freeze up for a long time..

I would appreciate if this can be resolved.

Thanks!

I'm facing this problem too, I hope for a fix ASAP.
Thanks

@slush0
Copy link

slush0 commented Jan 7, 2022

Second this, inability to run properly without btrfs quota is serious issue. Before diving more into the issue I suspected that hardware/HDD issue is freezing my machine. Only because I installed btrfs & timeshift to second machine I realized this must be software bug. Cost me hours to diagnose and find a reason, which is btrfs quotas and Timeshift triggering btrfs-cleaner.

@Rbthn
Copy link

Rbthn commented May 10, 2022

It's not a fix, but I found a workaround.

  • The configuration is saved in /etc/timeshift/timeshift.json.
  • Move, remove or rename this file
    sudo mv /etc/timeshift/timeshift.json /etc/timeshift/timeshift.json.old
  • Launch timeshift-gtk
  • In the setup, disable qgroups.
  • Verify that you didn't change any other settings
    diff /etc/timeshift/timeshift.json.old /etc/timeshift/timeshift.json

Even though the config files are the same for me, the behaviour of timeshift changes and it works without errors again.

teejee2008 added a commit that referenced this issue May 28, 2022
…are created or removed

This fixes multiple issues related to quota groups but also removes the ability to display columns for 'Size' and 'Unshared Size' for BTRFS snapshots.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants