Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does the size of the save grow with each save? #79653

Open
IdleSol opened this issue Feb 13, 2025 · 19 comments
Open

Why does the size of the save grow with each save? #79653

IdleSol opened this issue Feb 13, 2025 · 19 comments
Labels
(S1 - Need confirmation) Report waiting on confirmation of reproducibility

Comments

@IdleSol
Copy link

IdleSol commented Feb 13, 2025

Describe the bug

I created a world and a character. After which I did:

  1. save and exit
  2. see the size of the save folder
  3. load the save
  4. repeat steps 1-4.
N size diff
0 4 085 876 -
1 4 093 493 +7 617
2 4 098 858 +5 365
3 4 104 377 +5 519
4 4 109 912 +5 535
5 4 115 435 +5 523

What kind of extra information is added to the save every time I save and/or quit the game?

I waited an extra few minutes of real time for the 4 tests (between loading and saving) without doing anything in the game. The difference is not significant.

Attach save file

n/a

Steps to reproduce

n/a

Expected behavior

It's okay if it's some useful information.

Screenshots

No response

Versions and configuration

Testing was done on a version of cdda-windows-with-graphics-x64-2025-02-12-1549 running on an emulator. Arch + portproton.

Additionally tested on stable version 0.H for linux

Additional context

Of course the size is small, just a few kilobytes. Which means nothing against the size of the late game save file. Or maybe there is a bug that causes such large sizes.

@IdleSol IdleSol added the (S1 - Need confirmation) Report waiting on confirmation of reproducibility label Feb 13, 2025
@PatrikLundell
Copy link
Contributor

Unless someone knows, you might be able to try to figure it out by using diff tools to see what's different in the save.

On thing I can think of is if you're using the same save as you use as the basis for #79651, in which case it might be egg hatching checks resulting in new animals appearing (and the eggs disappearing in the process). That would be a bug, but there are a number of things that are checked when a map (actually submap) gets loaded into the game, and it could be a result of these checks maybe running more frequently than they should on save/load cycles with no time in between. Note that this is just a guess.

@IdleSol
Copy link
Author

IdleSol commented Feb 13, 2025

Part 2, which I was going to test initially.

While playing in my main world, I noticed a dramatic increase in save size. From ~340 MB to ~468 MB. This surprised me, because I was not traveling to new regions, but completing a military base sweep. Moreover, I was destroying unnecessary corpses and things with a vehicle.

Of course there was one controversial point, since on one of the hikes, I used a new route to the base. And so a few new omt were opened. But their number is not comparable to the already opened omt.

So I did an experiment:

  1. I created a new character in the test world. (Started in the shelter as a survivor).
  2. The character teleports to all neighboring omt with the omt of the shelter. So that any movement on the omt of the shelter does not trigger the map generator.
  3. I gave the character a debugging backpack.
  4. And started opening om until I found the military base.
  5. The character is teleported to the omt boundaries of the military base (minefield)
  6. The character teleports to all other omt boundaries of the military base and uses the debug menu to destroy monsters. Again so that any movement within the base does not trigger the map generator.
  7. The character teleports to the central omt of the military base and kills the remaining monsters
  8. After which the character is teleported to the shelter
  9. Two marks on the map are created:
  • point 1 = shelter
  • point 2 = military base warehouse
  1. The game is saved and the first measurement of the save size (N0) is made.

Clarification. By save I mean “save and exit”. Then I look at the size and load the save.

  1. After loading, I repeat the save and look at the size (N1).
  2. Step 11 is repeated (N2).
  3. The character is teleported to mark 2 (warehouse). And collects absolutely all items from the warehouse
  4. The character teleports to mark 1 (shelter). Things are laid out on the floor.
  5. Steps 13 and 14 are repeated until all items from the warehouse are in the shelter
  6. I save again and see the size (N3).
  7. Load, save and exit and see the size of the save (N4).
  8. Pull items out of containers. (mostly food and medicine, but I didn't touch the sealed rations and canned goods (too much, I got lazy)).
  9. Save and see the size (N5)
  10. Repeat step 19 (N6)
  11. The character is teleported to point 2. And looks at the size of the save (N7)
  12. The character is sent back to point 1 and again looks at the size (N8).

Here are the results:

N size diff
0 19 990 117 -
1 20 001 338 +11 221
2 20 009 007 +7 669
3 20 969 618 +960 611
4 20 976 316 +6 698
5 20 477 393 -498 923
6 20 484 532 +7 139
7 20 505 235 +20 703
8 20 527 472 +22 237

As you can see, moving items from one place to another added 1 MB (~+5%) to my save size. At the same time, if I take items out of containers, I reduce the size by 0.5 MB.

And also saving to new locations gives an additional increase in size.

@IdleSol
Copy link
Author

IdleSol commented Feb 13, 2025

On thing I can think of is if you're using the same save as you use as the basis for #79651

No, it was a fresh test world. All saves happened at the same game moment. Or at the game start time: 8 hours 0 minutes and 0 seconds.

Unless someone knows, you might be able to try to figure it out by using diff tools to see what's different in the save.

If no one gives me an answer. Then I'll watch which file grows in size and compare them to each other. I'm still hoping someone knows what's wrong though.

UPD. Or it will say it can't reproduce and the reason is my machine and/or configuration. Or is this even normal behavior for linux file resaves.

@PatrikLundell
Copy link
Contributor

A number of comments:

  • How are you able to collect and spread things without any time passing? Debug functionality?
  • Are you using the compression logic? It may affect things in various ways, so I'd turn that off (larger saves, but probably more consistent results, and an additional level of complexity is removed).
  • Removing things from containers should result in a decrease in size, as the game needs to store an item locator only for an item on the ground rather than keeping track of both that the item is stored within the container and an item locator for the item referencing the container. I may be incorrect on this, but that's how I think it works.

@IdleSol
Copy link
Author

IdleSol commented Feb 13, 2025

How are you able to collect and spread things without any time passing? Debug functionality?

The time didn't go down only for the first test, where all I did was load and save. In the test where I was transferring things, the time was running as normal. Debugging was only used for teleporting a character, spawning a bag and opening a map.

Are you using the compression logic?

I hadn't thought of that

@ZhilkinSerg
Copy link
Contributor

You can just copy world folders after each save and then compare to see what was modified each time.

@akrieger
Copy link
Member

Not directly addressing the root cause, but while working on #78857 I did notice files getting re-written that I would expect to be 'static'. Apparently under the hood the actual division of data between files is not as clean as optimal. Maybe after enabling save compression I'll start a longer term project for reorganizing that.

@ZhilkinSerg
Copy link
Contributor

There are a lot of things happening:

  • data changes during reload cycle:

Image

  • data and empty data structures is appearing after reload:

Image

Image

  • JSON-arrays are not sorted in a reproducible manner:

Image

Image

@IdleSol
Copy link
Author

IdleSol commented Feb 13, 2025

If only I understood which files were responsible for what. But a quick check made me realize that the contents of the files are changing. And also that everything in them is lumped into one line and it's a lot of fun to figure it out.

As example 1: uistate.json

--"input_history":{},
++"input_history":{"item_filter":[],"list_item_downvote":[],"list_item_priority":[]},

As example 2: o.1.1

"city_tiles":[
  [163,68],
  [152,51],
  [87,16],
...

vs

"city_tiles":[
  [4,30],
  [167,80],
  [145,63],
...

Either there are no matches or just a different order. There are 40k+ lines in this file (after some processing). And trying to understand whether something has changed or just a different order is a hopeless endeavor. If you don't know what it is responsible for and how it is created.

I'm still leaning that something is being added since the size keeps increasing

This is the processed output of diff:

./save0/test:						      |	./save/test:
  413117  #0JDQvdC00LXRgNGB0L7QvSDQn9GN0YDRgNC40YE=.sav	      |	  414888  #0JDQvdC00LXRgNGB0L7QvSDQn9GN0YDRgNC40YE=.sav
  966686  o.0.0						      |	  971052  o.0.0
 1102729  o.0.1						      |	 1104791  o.0.1
 1100876  o.1.0						      |	 1104110  o.1.0
  911745  o.1.1						      |	  918294  o.1.1
    2473  uistate.json					      |	    2537  uistate.json


./save0/test/maps/2.9.-1:				      |	./save/test/maps/2.9.-1:
 9688  75.299.-1.map					      |	 9772  75.299.-1.map

./save0/test/maps/2.9.1:				      |	./save/test/maps/2.9.1:
 2367  73.302.1.map					      |	 2641  73.302.1.map
 2145  74.302.1.map					      |	 2419  74.302.1.map
 1172  75.298.1.map					      |	 1309  75.298.1.map
 2067  75.302.1.map					      |	 2341  75.302.1.map
 1350  76.298.1.map					      |	 1487  76.298.1.map
 1210  76.299.1.map					      |	 1347  76.299.1.map
 1756  76.302.1.map					      |	 2030  76.302.1.map
 1298  77.299.1.map					      |	 1435  77.299.1.map
 1334  77.300.1.map					      |	 1471  77.300.1.map
 2232  77.302.1.map					      |	 2506  77.302.1.map
 2237  78.297.1.map					      |	 2511  78.297.1.map
 1965  78.298.1.map					      |	 2239  78.298.1.map
 1002  78.299.1.map					      |	 1276  78.299.1.map
  827  78.300.1.map					      |	 1101  78.300.1.map
 1241  78.301.1.map					      |	 1515  78.301.1.map
 1407  78.302.1.map					      |	 1818  78.302.1.map


./save0/test/maps/2.9.-2:				      |	./save/test/maps/2.9.-2:
 10601  74.299.-2.map					      |	 10685  74.299.-2.map


./save0/test/maps/2.9.-4:				      |	./save/test/maps/2.9.-4:
 22347  78.297.-4.map					      |	 22431  78.297.-4.map
 17412  78.298.-4.map					      |	 17496  78.298.-4.map

save0 - initial save

@PatrikLundell
Copy link
Contributor

PatrikLundell commented Feb 13, 2025

Stuff ending up in "random" order is a price you pay for using maps (i.e. the data structure, not the game feature), I believe. I think their hash keys are made up on the spot, and thus have no consistency in their iteration orders between saves.
"placed_unique_set" is std::unordered_set, so it not being consistent is to be expected.

Edit: Incorrect segment:
The "placements" JSON array is created by using emplace() into a std::vector. I would guess it would retain the order if emplace_back() was used, but I don't know what criteria "emplace" uses to determine location (a guess would be a "random" hash key again).

I'd take the smallest of the map files and run a diff on it between saves (before or after manually making it readable by untangling the horrible anti human one liners) in order to try to find things that grow.

@ZhilkinSerg's images indicate some things are "refreshed" on reload. In those cases either the regeneration on load should be removed, or the saving should be removed (no point in saving something that's going to be overwritten on load). That unnecessary work shouldn't grow the saves, though, just change them needlessly.

Edit:
"city_tiles" is, again a std::unordered_set, so no consistency can be expected there either.

@akrieger
Copy link
Member

akrieger commented Feb 14, 2025

The "placements" JSON array is created by using emplace() into a std::vector

Raw vector::emplace requires an explicit position iterator argument to emplace after. emplace for a map does not. vector never has 'random' insertion.

@PatrikLundell
Copy link
Contributor

@akrieger is of course correct. The loading code loads into an unordered_map.

@IdleSol
Copy link
Author

IdleSol commented Feb 14, 2025

Since I deleted the old saves, I had to make new ones. But this time I decided to compare three saves instead of two.

file test0 test1 test2
./#0JDQvdC00LXRgNGB0L7QvSDQn9GN0YDRgNC40YE=.sav 135 404 137 147 137 306
./o.0.0 1 097 853 1 105 502 1 113 151
./o.1.0 944 024 946 207 948 376
./uistate.json 2 473 2 537 2 537
./maps/6.1.1/219.43.1.map 1019 1156 1156

This time there are fewer files that change.

But two peculiarities: uistate.json and 219.43.1.map are changed only after the first resave. On the second save (test3) they are already unchanged.

And if you look at the difference in 219.43.1.map between test0 and test1:

{
  "version":36,
  "coordinates":[439,86,1],
  "turn_last_touched":5212800,
  "temperature":0,
  "terrain":[["t_open_air",144]]
++  "radiation":[0,144],
++  "furniture":[],
++  "items":[],
++  "traps":[],
++  "fields":[],
++  "cosmetics":[],
++  "spawns":[],
++  "vehicles":[],
++  "partial_constructions":[]
},

That is, for something (tile, omt?) with coordinates 439,86,1 a lot of parameters were missed in the first save. Which were added only in the second save.

@PatrikLundell
Copy link
Contributor

The "data" added seems to be null data. My guess would be that unnecessary fields weren't generated originally, but added by the loading code as "missing" and then kept thereafter.

I don't think this is the stuff we should be looking at, but rather things that keep growing for no apparent reason.

However, the realization that we need 3 saves rather than two to detect what's growing for no reason (adding "missing" empty data is a bad reason, but still a reason) is important.

@IdleSol
Copy link
Author

IdleSol commented Feb 14, 2025

[
  {
    "version":36,
    "coordinates":[642,224,1],
    "turn_last_touched":5212800,
    "temperature":0,
    "terrain":[["t_open_air",65],
    "t_treetop",["t_open_air",78]],
    "radiation":[0,144],
    "furniture":[],
    "items":[],
    "traps":[],
    "fields":[],
    "cosmetics":[],
    "spawns":[],
    "vehicles":[],
    "partial_constructions":[]
  },

  {
    "version":36,
    "coordinates":[642,225,1],
    "turn_last_touched":5212800,
    "temperature":0,
    "terrain":[["t_open_air",144]]
  },

  {
    "version":36,
    "coordinates":[643,224,1],
    "turn_last_touched":5212800,
    "temperature":0,
    "terrain":[["t_open_air",144]]
  },
  
  {
    "version":36,
    "coordinates":[643,225,1],
    "turn_last_touched":5212800,
    "temperature":0,
    "terrain":[["t_open_air",47],
    "t_treetop",["t_open_air",96]],
    "radiation":[0,144],
    "furniture":[],
    "items":[],
    "traps":[],
    "fields":[],
    "cosmetics":[],
    "spawns":[],
    "vehicles":[],
    "partial_constructions":[]
  }
]

This is already from another world. But this time it's a complete map file, created right after saving at the beginning of the game. Note, for the coordinates:

  • [642,224,1] and [643,225,1] - All fields are present even though they are empty
  • [642,225,1] and [643,224,1] - Some fields are missing, i.e. the record is abbreviated.

And I don't believe the game is flipping a coin, save here and don't save there. There must be a reason for that.

What if we were lucky enough to have empty fields in that location? Maybe there weren't empty fields there, but the game didn't save them the first time and replaced them with empty ones the second time?

@PatrikLundell
Copy link
Contributor

The two entries containing the extra empty stuff have treetops in them, while the empty ones only mention open air. That's probably the cause.
Why? I don't know (treetops support stuff, while open air doesn't, if you want to get a guess).
Is it important? I don't think it is.

@PatrikLundell
Copy link
Contributor

PatrikLundell commented Feb 14, 2025

Comparing o.1.-1 save iteration 3 to save iteration 2:
The "camps" section contains 3 copies of ("apis_hive" followed by "bare_bones_NPC_camp") versus 2 copies. Thus, it seems a copy of this pair is generated once per save. It can also be noted that the position for all copies of both versions is [278, -57. 0]. And "bare_bones_NPC_camp" is nothing that should be generated in the world, as it's a PC camp version).

A large section starting with "predecessors" is growing, but since the entries are in random order it's not really possible to tease out what the additions are (114689 bytes growing to 115654).

Edit:
Camps get added through two processes during the loading of a save, which probably is what causes the duplication:

  • Reading the "camps" JSON structure, adding the entries directly to the vector.
  • Calls to overmapbuffer::add_camp() from migrate_camps() from overmap::unserialize() where code adds a camp if oter_id_should_have_camp() is true while processing "layers" (whatever that is). Unfortunately, this appears before the operation processes the "camps" JSON, even though the cases I've seen from debugging has had the JSON being read first, possibly because it comes from a different file that happens to be read earlier?

The vector is written to JSON, obviously without knowledge of whether the entries were read from the save or invented from the overmap terrain data originally.

Regardless, the code shouldn't recklessly add camps based on overmap ids. overmap Id addition should only add it if it doesn't already exist. If it cannot be guaranteed that the JSON data is read first, the JSON reading should check if the entry already exists and replace it (assuming it's an overmap entry of inferior quality), or perform a check to see which one is the better one (existing saves with multiple entries would probably use the first one to store the faction status, such as quests, etc.). This kind of processing might be useful as a "migration" action to weed out duplicates from existing saves.

Edit 2:
I tested to short circuit overmap::migrate_camps() by placing a "return" at the beginning, and things seemed to be generated just fine. A save file from the new world contained some Isherwood camp part, Mr Lapin, and Apis, and seemed to contain the same things after a load/save cycle. A guess is that migrate_camps was used at some time in the past for migration purposes. I don't trust this brief experiment to be sufficient to be confident about writing a PR, though.

@IdleSol
Copy link
Author

IdleSol commented Feb 14, 2025

The o.1.0 file from the main world weighs 8 MB. A search for "owner":"apis_hive","name" returns 69 matches.

And also 69 for "owner": "tacoma_commune","name","owner":"exodii","name", "owner":"robofac","name"

UPD.

What do these files store? I looked at all the files and the file o.0.0 has a size of 304 MB. But the whole folder weighs 651 MB.

If I compare the save for today to the save for February 10th. The file had a weight of 10 MB, with a total weight of 340 MB.

@PatrikLundell
Copy link
Contributor

PatrikLundell commented Feb 14, 2025

That probably means the save has been save/loaded 68 time since its creation, but it's obviously possible there are other factors we haven't identified involved.
Hm, what happens if you travel around and maps get loaded into the reality bubble? In the scary case you'd get another copy each time you do that. Note that this is speculation, not something I've tested.

My save folder contains a bunch of other files, plus the maps folder and a .mm1 folder.

Given that I saw unwarranted growth in the "predecessors" data, there's probably something fishy going on there as well.

Edit:
Tested to teleport back and forth 5 times to between the start an a tile 6 OMTs away. It didn't increase the "camps" section more than just loading and saving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
(S1 - Need confirmation) Report waiting on confirmation of reproducibility
Projects
None yet
Development

No branches or pull requests

4 participants