Unreasonably high memory consumption during restore #1067

buengese · 2024-02-17T16:25:32Z

I've recently tried out rustic on one of my existing repositories and noticed an unreasonable high memory usage compared to restic with rustic being killed by the OOM killer at more than 12gb of ram consumption. Original restic uses less than 2gb of ram during the entire restore.

The command being run is: rustic -r /path/to/repo restore 8c15dc1a .
Rustic version: 0.7.0

The repo in question is around 4.5TB in size and contains 14991 files.

Edit: I'm just noticing that this might be more a rustic_core kind of issue...

The text was updated successfully, but these errors were encountered:

aawsome · 2024-02-17T19:10:09Z

Thanks for opening this issue @buengese
Actutally this is unexpected. The only reason I can imagine is that it is related to the fact that rustic does a local scan to determine which already present data can be re-used.
Can you give the output of the rustic restore run until it aborts?

buengese · 2024-02-17T19:31:48Z

Here the base output.

root@vm-nas:/mnt/media-ingress/restore# rustic --log-level=info --log-file rustic-restore.log -r <redacted> restore 8c15dc1a .
enter repository password: [hidden]
[00:00:00] reading index...               ████████████████████████████████████████       1503/1503
[00:00:00] collecting file information... ⠁
[WARN] Note: additional entries exist in destination
[00:00:23] collecting file information...
Files:  15011 to restore, 0 unchanged, 0 verified, 0 to modify, 1 additional
Dirs:   991 to restore, 15 to modify, 0 additional
[00:01:39] restoring file contents...     ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 998.72 MiB/4.47 TiB   21.10 MiB/s  (ETA 5d)
Getötet

Additionally this is from the log file at info logging level tough it doesn't seem to contain any particularly useful info.

19:22:53 [INFO] using no config file, none of these exist: /root/.config/rustic/rustic.toml, /etc/rustic/rustic.toml, ./rustic.toml
19:23:00 [INFO] repository local:<redacted>: password is correct.
19:23:01 [INFO] using cache at /root/.cache/rustic/361eb9d4ab59bbaac47481b8f351c4f681eac44c8e4887c567bd58c15395c3d6
19:23:02 [INFO] getting snapshot...
19:23:26 [WARN] Note: additional entries exist in destination
19:23:26 [INFO] total restore size: 4.5 TiB

OOM message from the Kernel log:

[18949.707360] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ns,mems_allowed=0-1,oom_memcg=/lxc/1003,task_memcg=/lxc/1003/ns/user.slice/user-0.slice/session-50.scope,task=rustic,pid=65670,uid=0
[18949.708190] Memory cgroup out of memory: Killed process 65670 (rustic) total-vm:14989124kB, anon-rss:11756028kB, file-rss:1920kB, shmem-rss:0kB, UID:0 pgtables:23448kB oom_score_adj:0

Worth noting that this an lxc container so cgroups are involved. Why I don't think that should cause any weird behaviour it is something worth keeping in mind.

aawsome · 2024-02-17T19:43:12Z

Ok, trying to investigate what could use that much memory during the actual restore... Can you please give some further information:

Output of rustic repoinfo (to see pack size statistics)
Output of /bin/time rustic restore -n or something else to get get the memory usage of a dry-run restore

Thanks a lot!

buengese · 2024-02-17T20:36:24Z

Sure.
Something I only just noticed: the repo is using a non standard pack size of 128M.

Here is the repoinfo output

| File type | Count | Total Size |
|-----------|-------|------------|
| Key       |     1 |      449 B |
| Snapshot  |     1 |      234 B |
| Index     |  1503 |  372.5 MiB |
| Pack      |  9013 |    4.4 TiB |
| Total     | 10518 |    4.4 TiB |


| Blob type |   Count | Total Size | Total Size in Packs |
|-----------|---------|------------|---------------------|
| Tree      |    1007 |  209.4 MiB |           209.4 MiB |
| Data      | 3157727 |    4.4 TiB |             4.4 TiB |
| Total     | 3158734 |    4.4 TiB |             4.4 TiB |

| Blob type  | Pack Count | Minimum Size | Maximum Size |
|------------|------------|--------------|--------------|
| Tree packs |          2 |     93.0 MiB |    116.5 MiB |
| Data packs |       9011 |    472.4 MiB |    519.8 MiB |

Time output looks like this:

167.71user 8.23system 2:57.78elapsed 98%CPU (0avgtext+0avgdata 1138612maxresident)k
37680889inputs+0outputs (2149major+647536minor)pagefaults 0swaps

The dry run behaved just fine and peaked at only little over 1.2gb. So I've gone a little further and thrown a profiler at it during a real restore. I'm sure there is a more rust native tool than heaptrack but I'm not that familiar with the rust ecosystem. This is where all the allocations seem to be coming from:

aawsome · 2024-02-17T21:10:27Z

Actually you have a (data) pack size of around 500 MiB - the rustic "standard" would be around 100MiB for your repository size.

Thanks for the memory profiling. Yes, it seems the following is happening:

Your restore needs most of the data from the repository
This means you basically need the full content of each pack file
which implies that restore reads the complete pack file into memory before processing it
And - this is not yet optimized in any way - rustic uses a concurrency of 20 for restoring which means 20 pack files are kept in-memory in parallel.
This gives the 10 GiB memory usage plus the 1.2 GiB rustic needs for index data and the "restore plan". There is some other overhead, so arriving at around 12 GiB sounds plausible.

IIRC restic uses a streaming approach which of course doesn't need the "pack" space in memory..

I'll think about what we can improve here. But this might need an enhancement how rustic talks to the backend and maybe also needs to wait until we tackle concurrency in a right way...

In meanwhile I can only suggest to you to decrease restore concurrency by self-compiling rustic after setting https://github.com/rustic-rs/rustic_core/blob/5ccf12c71fd6730e78f2679ea18c0e5be7cd84e4/crates/core/src/commands/restore.rs#L35 to a lower value. Sorry.

aawsome · 2024-02-17T21:13:51Z

An alternative would be to use lower packsizes. (but I don't know if you want to do this)
To let rustic do the repacking, the following steps must be made:

use rustic config to set the datapack size relevant settings. Don't forget to specify a --set-max-packsize-tolerate-percent to let rustic not tolerate too large pack files
run rustic prune - it should repack all too large packs.

buengese · 2024-02-17T21:18:50Z

Yeah that makes perfect sense to me. Thank you for looking into this. That being said I think concurrency of 20 might just be to high of a default given the current approach that reads all data into memory. I know the default pack size for both restic and rustic is much lower but there are reasons (specifically limits regarding the maximum number of files per directory implemented by a number of cloud storage providers) to increase it. Maybe it would be possible to limit concurrency based on pack size for now.

aawsome · 2024-02-17T21:22:39Z

Actually we could limit the amount of data which is read from the backend at once, e.g. to something like 32MiB or 64MiB. That would not decrease performance (IMO) and keep memory usage for pack file data below MAX_READER_THREADS_NUM * LIMIT.
And it would keep memory-usage independent from the pack size.

aawsome · 2024-02-17T21:26:17Z

Another optimization would be to remove index entries from memory just before starting the actual restore (all information for the restore is already contained in the restore plan) That would in your case remove most of the 1.2 GiB during the actual restore. Combined with a limit as written above I think memory usage would drop below 1.5 GiB.

buengese · 2024-02-17T21:53:08Z

That does sound like an even better plan.

aawsome · 2024-02-17T21:59:31Z

@buengese Can you try if rustic-rs/rustic_core#165 works for you? The memory usage with that PR should be roughly 2GiB in your case..
To try it out, you have to self-compile rustic-rs using rustic_core from the PR.

…les (#165) closes rustic-rs/rustic#1067

buengese · 2024-02-22T14:20:33Z

Sorry for missing to answer here but I can confirm that works.

reduce memory usage of the `restore` command by removing the data entries from the index before doing the actual restore. see #1067 TODO: - [x] remove Cargo changes once rustic-rs/rustic_core#166 is merged

github-actions bot added the S-triage Status: Waiting for a maintainer to triage this issue/PR label Feb 17, 2024

aawsome mentioned this issue Feb 17, 2024

fix(memory): Limit memory usage for restore when having large pack files rustic-rs/rustic_core#165

Merged

aawsome added C-bug Category: Something isn't working as expected A-commands Area: Related to commands in `rustic` A-core Area: Generally related to `rustic_core` and removed S-triage Status: Waiting for a maintainer to triage this issue/PR labels Feb 17, 2024

aawsome mentioned this issue Feb 19, 2024

perf: Reduce memory usage of restore #1069

Merged

1 task

aawsome closed this as completed in rustic-rs/rustic_core#165 Feb 21, 2024

aawsome added a commit to rustic-rs/rustic_core that referenced this issue Feb 21, 2024

fix(memory): Limit memory usage for restore when having large pack fi…

18b1b2d

…les (#165) closes rustic-rs/rustic#1067

buengese mentioned this issue Feb 22, 2024

Crash: Attempted to subtract with overflow. #1077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreasonably high memory consumption during restore #1067

Unreasonably high memory consumption during restore #1067

buengese commented Feb 17, 2024 •

edited

Loading

aawsome commented Feb 17, 2024

buengese commented Feb 17, 2024 •

edited

Loading

aawsome commented Feb 17, 2024

buengese commented Feb 17, 2024 •

edited

Loading

aawsome commented Feb 17, 2024

aawsome commented Feb 17, 2024

buengese commented Feb 17, 2024 •

edited

Loading

aawsome commented Feb 17, 2024 •

edited

Loading

aawsome commented Feb 17, 2024

buengese commented Feb 17, 2024

aawsome commented Feb 17, 2024

buengese commented Feb 22, 2024

Unreasonably high memory consumption during restore #1067

Unreasonably high memory consumption during restore #1067

Comments

buengese commented Feb 17, 2024 • edited Loading

aawsome commented Feb 17, 2024

buengese commented Feb 17, 2024 • edited Loading

aawsome commented Feb 17, 2024

buengese commented Feb 17, 2024 • edited Loading

aawsome commented Feb 17, 2024

aawsome commented Feb 17, 2024

buengese commented Feb 17, 2024 • edited Loading

aawsome commented Feb 17, 2024 • edited Loading

aawsome commented Feb 17, 2024

buengese commented Feb 17, 2024

aawsome commented Feb 17, 2024

buengese commented Feb 22, 2024

buengese commented Feb 17, 2024 •

edited

Loading

buengese commented Feb 17, 2024 •

edited

Loading

buengese commented Feb 17, 2024 •

edited

Loading

buengese commented Feb 17, 2024 •

edited

Loading

aawsome commented Feb 17, 2024 •

edited

Loading