-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use WASM function names in compiled objects #8627
Use WASM function names in compiled objects #8627
Conversation
Instead of generating symbol names in the format "wasm[$MODULE_ID]::function[$FUNCTION_INDEX]", generate (if possible) something more readable, such as "wasm[$MODULE_ID]::$FUNCTION_NAME". This helps when debugging or profiling the generated code. Co-authored-by: Jamey Sharp <jsharp@fastly.com>
Is it guaranteed that the names section only contains unique names? |
Thanks! This makes sense to me and I've often thought this would be nice as well. A hesitation I've had in the past though is that I'm a bit wary about exposing user-provided data to native tools like (a good example being @bjorn3's suggestion where the |
I wonder if we could define suitable "cleanup" for names: (i) rewrite duplicates with numeric suffixes, (ii) replace or remove any characters not in the usual set ( |
Could also add the function_index for the cases where uniqueness isn't guaranteed. Alright, I'll cook up a patch with the suggestions. |
Yeah I think that'd work well (albeit bit a bit more involved in the implementation). While I can't say for certainty that spaces are not supported my guess is that tools would likely eventually choke on non-C-looking symbols so I'd be tempted to leave them out. |
How about
? This avoids needing to either add numbers into the sanitized name and then worrying about removing those numbers when capping the length, or needing to account for the length of the numbers that might be added after sanitizing and capping the name. |
Alright, pushed some changes that:
I think this should be sufficient. Can anyone please take a look again? |
dc98f8f
to
6cbaade
Compare
07aaafc
to
19b22f1
Compare
crates/wasmtime/src/compile.rs
Outdated
.trim_matches('?') | ||
.trim_end_matches('?') | ||
.to_string(); | ||
name.truncate(Self::MAX_SYMBOL_LEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.take(N) in the iterator chain should allow the truncation to happen inline (and stop iteration early if so, since the whole chain is lazy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no, I'm not sure how to do this; I want to truncate the string after trim_matches('?')
removed repeated occurrences of '?'
in the cleaned string, not before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention that you intended to turn any run of multiple ?
into a single ?
. Note that trim_matches
doesn't do that. Instead, it deletes all copies of the given pattern from both ends of the string. That's why it can return a borrowed &str
instead of a heap-allocated String
: it doesn't change the middle of the string, so it can just adjust the start pointer and the length.
I think this should do more or less what you want (though I'm typing it in GitHub's web editor without testing it so who knows). It deletes all ?
at the beginning of the string, then compresses any run of multiple ?
into a single ?
. It can leave a single ?
at the end of the string but I couldn't think of a simple way to fix that and I don't think it matters. Also it uses take
like Chris suggested.
let mut last_char = '?';
let name = name
.chars()
.map(|c| if bad_char(c) { '?' } else { c })
.filter(|c| { let keep = last_char != c; last_char = c; keep })
.take(Self::MAX_SYMBOL_LEN)
.collect::<String>();
19b22f1
to
f70b328
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me, thanks for the iterations!
Filter symbol names to include only characters that are usually used for function names, and that might be produced by name mangling. Replace everything else with a question mark (and all repeated question marks by a single one), and then truncate to a length of 96 characters. This should be enough to not only avoid passing user-controlled strings to tools such as "perf" and "objdump", and make it easier to disambiguate symbols that might have the same name but different indices.
f70b328
to
ba7737b
Compare
Ah, force-pushed after my review; could you do subsequent updates as additional commits? Let me know when stable and I can re-review. |
It's stable now. (I'm not using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be such a big usability improvement for people using tools like perf! I'm excited.
crates/wasmtime/src/compile.rs
Outdated
.trim_matches('?') | ||
.trim_end_matches('?') | ||
.to_string(); | ||
name.truncate(Self::MAX_SYMBOL_LEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention that you intended to turn any run of multiple ?
into a single ?
. Note that trim_matches
doesn't do that. Instead, it deletes all copies of the given pattern from both ends of the string. That's why it can return a borrowed &str
instead of a heap-allocated String
: it doesn't change the middle of the string, so it can just adjust the start pointer and the length.
I think this should do more or less what you want (though I'm typing it in GitHub's web editor without testing it so who knows). It deletes all ?
at the beginning of the string, then compresses any run of multiple ?
into a single ?
. It can leave a single ?
at the end of the string but I couldn't think of a simple way to fix that and I don't think it matters. Also it uses take
like Chris suggested.
let mut last_char = '?';
let name = name
.chars()
.map(|c| if bad_char(c) { '?' } else { c })
.filter(|c| { let keep = last_char != c; last_char = c; keep })
.take(Self::MAX_SYMBOL_LEN)
.collect::<String>();
Ah, the CI test failures are because I don't like that this test is relying on this implementation detail of Wasmtime, but I suppose it's fine. So you can either change this PR to continue to use Either way, you'll also need to run |
Alright, thanks for the review! This should be the last round. (@jameysharp, that filter wouldn't work -- words like "keep", which are fine, would be transformed to "kep" -- but I tweaked it to consider that case.) |
Oh, you're absolutely right. 😅 Looks good to me, let's ship it! |
Instead of generating symbol names in the format
"wasm[$MODULE_ID]::function[$FUNCTION_INDEX]", generate (if possible) something more readable, such as "wasm[$MODULE_ID]::$FUNCTION_NAME". This helps when debugging or profiling the generated code.