-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci_find_file/unix: add fast-path for file lookup #1350
Conversation
this achieves 2 major improvements: - if the directory and filename are correct, only a single stat syscall needs to be done to check whether the file exists. no need to open the directory (1 syscall) and check whether each one of the entries matches the filename (1 syscall each). for perspective, each syscall requires to enter the kernel, a full contex switch, and therefore at least 1000 cpu cycles. - if the filename contains a directory separator to dive into a subdir, the existing method fails. this allows to use e.g. PlayMP3File("sounds/sound25.ogg") successfully.
Why using goto in a C++ codebase? If you can't refactor your code use the do while(false) pattern and break to keep the stack making sense. |
because it's valid C++ ?
the change you propose is much more intrusive, as it adds another indentation level for all code that follows, and makes the diff 50+ lines rather than 10. |
You can modify more lines, it will be code that builds. It will do the same in your architecture and not blow up my wasm port. Now I need to PR a New version of this for wasm because there's no way to goto there. |
yeah, but it's not only intrusive, it also breaks git features like git blame.
emscripten/wasm don't support goto ? first time i hear this. do you have a link with more info about that? |
Everything that uses goto has to be refactored. This is one of the reasons I can't build everything from source or build old things from source that had goto. This is why I bring some stuff from Emscripten ports. It has a mechanism where it can kinda guess with binaryen and I use it and it can guess go-tos and it will refactor with this do while false pattern. It doesn't always work but sometimes does. |
this seems to be a restriction of the wasm bytecode format itself, certainly emscripten itself can deal with goto as it is used in a lot of codebases, for example extensively even in musl libc which emscripten uses as its libc implementation. |
current master branch has 1278 uses of goto in different sources, so if emscripten can't deal with it your effort is prone to fail anyway, this one goto doesn't change the overall situation. |
just found this comment by kripken (emscripten lead dev) confirming that emscripten can deal just fine with goto: WebAssembly/design#796 (comment) |
I strongly disenchourage using goto. There's no need at all for it, it's a simple refactor to not use it. If you can't refactor to use inline function, use the do while false break pattern. There are additional problems:
This entire file though will probably get axed soon since it has already other problems - like compile time case sensitivity assumption, this doesn't hold at all on MacOS. |
I don't have too much problem with this change if it keeps working everywhere for now. This function is a very old code that probably supposed to be refactored later anyway.
NOTE: the only other
It may be okay to put one fix even if it's temporary.
The rest of this particular file uses To be fair, we do not have this requirement for the code style in general though. Personally I don't find this condition cryptic at all, and we use this pointer check everywhere in the code, and the newest code. I usually write in a same way for instance, and nobody ever mentioned that it's difficult to understand for some reason.
Why is that a mystery? it's just a temp buffer and used in a sprintf one line below, and that's it. |
By the way, since we started talking about The return statements on lines 155, 160, 165, they may leave local buffers |
i found goto to be used in:
it's big enough under most circumstances to hold a filepath, but not enough to cause issues with huge stack buffers (eventually overflowing thread stacks) and if it isn't the code will fail gracefully and use the non-fast-path implementation below.
it's not really dangerous, but i already checked the code for these issues.
the code is only used on non-case-insentive filesystems, so only on platforms that have '/' as directory separator.
what?
once out of rational arguments, you refrain to "disencourage" use without arguments... alright.
it's not cryptic, it's actually more clear than having the gratuitous comparisons which make the code longer for no reason to do the same and cause more strain and effort for the reader.
i'll take a look and fix those in a followup commit, if indeed an issue. |
I think @ericoporto may be referring to fprintf calls. Apprently only one of them was wrapped in AGS_PLATFORM_DEBUG |
They do not have |
@@ -115,7 +115,7 @@ char *ci_find_file(const char *dir_name, const char *file_name) | |||
fix_filename_slashes(filename); | |||
} | |||
|
|||
if(directory && filename) { | |||
if(directory && filename && !strstr(filename, "..")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a correct check. It's possible to have a file name containing two dots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but in that case we fallback to the chdir/opendir approach below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you ) (that's even more elaborate than I thought about)
Case Sensitivity: No need to add the additional comment clause at right, it's false, MacOS is Unix and Case Insensitive by default: https://support.apple.com/guide/disk-utility/file-system-formats-available-in-disk-utility-dsku19ed921c/mac Unfortunately this can't be fixed here in this PR, AGS should be able to workout Case Sensitivity at runtime. Also this function used to print when it found a diamond in debug and now it doesn't. I still strongly object against using goto in AGS codebase - using AGS own string type would make this code much cleaner and remove any need for goto at no performance impact at all. |
the comment says "only used on UNIX platforms", not "on ALL UNIX platforms" |
This will require to rewrite whole function, and also adjust its uses, as the code that calls it expects a buffer that got to be freed.
Does it have to, if there was no search and the file was found by a matching name as-is? The Windows variant also does not print anything. |
@rofl0r I think your memory leaks fix with gotos may leave some things unitialized. There is a case where EDIT: the fact that it uses chdir is annoying, this makes this function harder to use in a multithreaded enviroment... if we ever rewrite it, chdir should not be used imho. |
About what you wrote in PR description:
From what I understand, your added code only deals with the trivial case-matching situation. Does this mean that if there's |
good catch, it's fixed now
absolutely correct. using chdir is a real hack and codesmell.
yes, fastpath code (and that usecase) only works when the case is 100% correct. |
@ericoporto I am okay with But it definitely better be rewritten at some point in the future, it's not good for number of reasons. Some cases won't even work with it, not even after this fix (I mentioned one example above). You mentioned that your emscripten port does not work with it? Could you mention how do you deal with the legacy route finder code, do you not include it, or had to refactor the whole thing? There are also multiple Are there other things that do not work in your port, could I see the changes that were necessary for it to work? |
this achieves 2 major improvements:
if the directory and filename are correct, only a single stat
syscall needs to be done to check whether the file exists.
no need to open the directory (1 syscall) and check whether
each one of the entries matches the filename (1 syscall each).
for perspective, each syscall requires to enter the kernel,
a full contex switch, and therefore at least 1000 cpu cycles.
if the filename contains a directory separator to dive into
a subdir, the existing method fails.
this allows to use e.g. PlayMP3File("sounds/sound25.ogg")
successfully.