Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored AssetDB config file search to allow more wildcards #148

Merged
merged 1 commit into from
Dec 1, 2022

Conversation

elmodor
Copy link
Contributor

@elmodor elmodor commented Nov 30, 2022

Fixes/Solves
The AssetDB config search was done via an recursive call. This is usually slow.

I tried to refactor the _get_file_config_value function. I tried two methods:

  • Using a regex
  • Using a normal string find

Both methods would allow different wildcards for the config file:

  • Regex would allow any amount of wildcards anywhere
  • String find only on the right or left side

I compared the loading times:

  • Default: ~ 305ms
  • Regex: ~ 405ms
  • Find: ~ 260ms

Assets used:

  • Default pack
  • additionally 80 custom cards

While I would like a regex approach I don't think it is worth sacrificing loading times. Especially with loarget packs and more assets, it will take even longer.

Using the find method we are: faster, don't have a recursive call, and would be able to use wildcards at the beginning:

  • "My_Card_b.png"
  • "*_b.png"
  • "My_Card*"
    All 3 combinations would work.

The method I used is:

  • Go over all sections and see if the sections name is inside the file name
  • Sort the results (shortest to longest) - longest should obviously be the exact name match
  • call config.get_value with all possibilities from generic (*) to specific (exact match)

What are your thoughts about this?

Would solve #146

@elmodor elmodor force-pushed the refactor_assetdb_loading branch from 716251c to 5258e46 Compare November 30, 2022 17:12
@drwhut
Copy link
Owner

drwhut commented Nov 30, 2022

This is a great optimisation, thank you! I agree the find method seems to be best here, but I think I've found a couple of holes:

  • I'm not sure if the in operator is enough here: e.g. if a file is called Item.test.png, and the config file has a section titled [*.test], then this would pick up that file, but should it? Should we only match the beginning and the end of the string?
  • With setting the config value at the end, would it not be better to go in reverse order, and break out the loop once a value is found (i.e. not null)?

@elmodor
Copy link
Contributor Author

elmodor commented Nov 30, 2022

I'm not sure if the in operator is enough here: e.g. if a file is called Item.test.png, and the config file has a section titled [*.test], then this would pick up that file, but should it? Should we only match the beginning and the end of the string?

So basically we would check:

var search_term = section.replace("*","")
if search.begins_with(search_term) or search.ends_with(search_term):

which is around 10ms slower than before.
I think if there are ambiguous wildcards there's no way to know which one is the right one.
What should [*.test] match to? It makes no sense without a file ending. It wouldn't match anything if it doesn't match Item.test.png.
So not sure what the right move is here. Either one works and is faster than before.

With setting the config value at the end, would it not be better to go in reverse order, and break out the loop once a value is found (i.e. not null)?

I was thinking about it but decided to go for the easier and route.
I tried it and I didn't noticed any speedup. Normally we only have like 1-2 found values anyway.
To check if I need to break the for loop I need to do:

if value != default

Which leads to the issue that default and value are of different types (int, object, string) which threw an error. If we do some casting or more logic there I don't think we will gain any speed up at all.

@drwhut
Copy link
Owner

drwhut commented Nov 30, 2022

So not sure what the right move is here. Either one works and is faster than before.

Personally, when I see [*.test], that means to me any file ending with .test, rather than a file that has .test in the middle, which for me would make more sense to be caught by [*.test*], which... wait.

Could we check if it matches the beginning and/or the end depending on where the wildcards are? (in the example I gave, [*.test] would turn into .test, but only check the end of the string) But then do we want to include a sanity check to see if there is a wildcard in the middle?

Also, IIRC ConfigFile.get_value returns null if a section does not contain the given key, so I think you may need to check for this in the last for loop anyways.

EDIT: There's a method in ConfigFile for checking if a section has a given key!

@elmodor
Copy link
Contributor Author

elmodor commented Nov 30, 2022

No the issue was that sometimes default value is a reference:

			var value = _get_file_config_value(config, from.get_file(), "value", Reference.new())
			var suit  = _get_file_config_value(config, from.get_file(), "suit", Reference.new())

this can not be compared to with an int (which is the correct return value).

Anyhow that's not needed anymore thanks to the section check.

So I updated both versions (is pushed). Regex has a global regex cache to avoid compiling the same regex all over again. This is quite the speedup.
Regex is now ~ 330ms
The other find (with checking whether the wildcard is at the beginning or end) is ~ 280ms

The regex might be a valid option after all? It would catch all possible wildcards.

EDIT: However, regex is also more dangerous. What if the filename is something [card?($%.png]. Works to load, kills the regex probably. And adding more security checks makes it just slower. So, maybe not

@elmodor elmodor force-pushed the refactor_assetdb_loading branch from 60f082a to 6a497a5 Compare November 30, 2022 21:43
@drwhut
Copy link
Owner

drwhut commented Dec 1, 2022

Yeah, I think with regex there's going to be so many edge cases we need to worry about that I don't think it's worth implementing (plus, I think having the wildcards anywhere might be a bit overkill, the wildcards at the beginning and end work fine for 99% of use cases).

With the find method, we just need to test wildcards at the beginning, at the end, and at both ends.

@elmodor elmodor force-pushed the refactor_assetdb_loading branch from 6a497a5 to 9a83821 Compare December 1, 2022 14:38
@elmodor elmodor force-pushed the refactor_assetdb_loading branch from 9a83821 to d0b087c Compare December 1, 2022 14:40
@elmodor elmodor changed the title WIP: Refactored AssetDB config file search to allow more wildcards Refactored AssetDB config file search to allow more wildcards Dec 1, 2022
@elmodor
Copy link
Contributor Author

elmodor commented Dec 1, 2022

Cleaned up, removed regex and added documentation.

Copy link
Owner

@drwhut drwhut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@drwhut drwhut merged commit 3cc944a into drwhut:master Dec 1, 2022
@elmodor elmodor deleted the refactor_assetdb_loading branch December 1, 2022 21:00
@elmodor elmodor mentioned this pull request Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants