Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked archive readers for large files #127

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

yretenai
Copy link
Contributor

@yretenai yretenai commented Feb 9, 2024

Allows large files to be read. The issue occurs because these files (usually .umaps) are over 2 GB which is too big for an array. This new reader loads the file in chunks stored in a 128 MB buffer.

This might prevent the issue described in #95.

I briefly tested this and it seems to work, but due to the minor refactor in PakFileReader, I'm not sure if I haven't introduced side-effects.

@yretenai yretenai marked this pull request as ready for review February 9, 2024 10:50
@yretenai yretenai changed the title Chunked patch readers for large files Chunked archive readers for large files Feb 9, 2024
@GMatrixGames GMatrixGames self-requested a review February 9, 2024 16:29
Copy link
Collaborator

@GMatrixGames GMatrixGames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a way to force disable reading files larger than int.MaxValue as well. It should be enabled by default, and if someone wishes to allow large files, they should do so manually. Reading large files utilizes a very large amount of memory, and take significantly longer than other files as well (which is to be expected)

Would rather prevent accidental loading of files of this size as opposed to letting people unknowingly use it, causing their system to most likely lock up and become unresponsive for a considerable amount of time or until a restart.

@yretenai
Copy link
Contributor Author

yretenai commented Feb 10, 2024

7bfb62a adds two new options in Globals:

  • AllowLargeFiles defaulting to false, which will prevent large ubulk and uptnl files from loading. This will reasonably error when processing embedded vertex streams (landscape proxies) and textures (usually heightmaps, weightmaps) in maps. Should it be better to just return null?
  • LargeFileLimit controls what is considered a large file (currently set to 2 GB, should it be lower?)

Large files as determined by LargeFileLimit will now also always use the chunk reader.

Copy link
Collaborator

@4sval 4sval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, create a method in GameFile.cs to verify the size of the payload

@yretenai
Copy link
Contributor Author

added HasValidSize which will validate the size.

@Chuanhsing
Copy link

Chuanhsing commented Feb 20, 2024

I got out of memory with this patch on extract Palworld' umap file "Pal/Content/Pal/Maps/MainWorld_5/PL_MainWorld5"

                var allExports = provider.LoadAllObjects(path);
                var fullJson = JsonConvert.SerializeObject(allExports, Formatting.Indented);
                File.WriteAllText(mapJsonPath, fullJson);

and get out of memory on extract "Pal/Content/Pal/Maps/MainWorld_5/PL_MainWorld5/_Generated_/", the original version is able to extract this directory.

@yretenai
Copy link
Contributor Author

I got out of memory with this patch on extract Palworld' umap file "Pal/Content/Pal/Maps/MainWorld_5/PL_MainWorld5"

this PR would only enable reading of large files, you still would end up using a large amount of memory if you were to serialize it to a JSON file.

and get out of memory on extract "Pal/Content/Pal/Maps/MainWorld_5/PL_MainWorld5/_Generated_/", the original version is able to extract this directory.

the new code shouldn't run on any of the Pal/Content/Pal/Maps/MainWorld_5/PL_MainWorld5/_Generated_/ umaps as they're all under 2 GB. i'll check again later, it could be that my refactors to decompression broke something.

@Chuanhsing
Copy link

Chuanhsing commented Feb 22, 2024

the new code shouldn't run on any of the Pal/Content/Pal/Maps/MainWorld_5/PL_MainWorld5/_Generated_/ umaps as they're all under 2 GB. i'll check again later, it could be that my refactors to decompression broke something.

Provide more information.
It stop at Pal/Content/Pal/Maps/MainWorld_5/PL_MainWorld5/_Generated_/MainGrid_L0_X-15_Y-3_DL0 and normally it will generate 7.53MB json.
Computer has 60GB free memory before run this program.
In Globals.cs, both AlwaysUseChunkedReader and AllowLargeFiles set to true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants