-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add csv-format ouput #44
Conversation
70bd90c
to
e425f40
Compare
Hi @csmoe! Were you planning on adding csv support for all the other analyses in this PR as well, or were you planning on doing them as follow ups? Basically, I'm wondering if I should start reviewing now or later :) |
@fitzgen Yes, csv for all. I’ll commits more tomorrow |
Sounds good -- thanks! |
@fitzgen is csv suitable to store nested data like( |
I think a north star to guide the design would be something like "what is most useful for someone who wants to write a custom script to post-process twiggy analyses, and doesn't want to reimplement code that is already implemented in twiggy?" For dominators, I think flattening the tree to a list and including each item's immediate dominator in the row would allow people to recover the tree without hassle. Something like:
For paths, I think returning a list of every path (rather than a list of each item we are finding paths for) would be best. Something like:
What do you think? |
@fitzgen thank you, that's a good approach, I'll work on your advice. |
cbf7474
to
54a89bc
Compare
@csmoe double checking: you are still working on this PR, and aren't waiting for my review, right? |
@fitzgen Yes, if it’s done I’ll ping you for review. |
b3d1ac5
to
5842370
Compare
@@ -0,0 +1,11 @@ | |||
Generic,ApproximateMonomorphizationBloatBytes,ApproximateMonomorphizationBloatPercent,TotalSize,TotalSizePercent,Monomorphizations | |||
alloc::slice::merge_sort,1977,3.396673768125902,3003,5.159439213799739,"alloc::slice::merge_sort::hb3d195f9800bdad6, alloc::slice::merge_sort::hfcf2318d7dc71d03, alloc::slice::merge_sort::hcfca67f5c75a52ef" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store the monos into a string splited with comma.
@fitzgen almost done, waiting for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this @csmoe!
For the future, it might be easier to do one csv implementation at a time, so that it isn't such a big PR with longer turn around on feedback. That way, we can also land incremental functionality as we go.
Overall, this looks great, but there are a few things that aren't quite answering the right question. See the inline comments below. It would be useful to compare the csv output and text output for what is otherwise the same command to verify that the paths/immediate dominators/etc are correct in the csv output.
Thanks again, @csmoe!
analyze/analyze.rs
Outdated
shallow_size_percent: size_percent, | ||
retained_size: Some(retained_size), | ||
retained_size_percent: Some(retained_size_percent), | ||
// TODO CSMOE: find immediate dominator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this fixed in later commits? Should get this fixed before merging.
analyze/csv.rs
Outdated
#[derive(Debug, Default, Serialize)] | ||
pub struct CsvRecord { | ||
#[serde(skip_serializing_if = "Option::is_none")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is every command using the same csv records? I find this mildly surprising, and I assumed that they would each have their own specific csv record type, rather than adding options to all the fields and skipping None
s. That would be a bit less "dynamically typed" although it is more boilerplate.
Did you consider that approach? Why did you end up going down this route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a "common" CsvRecord
because most part of the records shared the fields from id
to size_related_stuff
, and the other commands like dominator
might have one or two extra fields.
Declaring new Record
inside every method seems a bit trivial, but duplicated skipping None
s too.
It seems you prefer the individual record, I'll fix that.
@@ -0,0 +1 @@ | |||
generic,approximate_monomorphization_bloat_bytes,approximate_monomorphization_bloat_percent,total_size,total_size_percent,monomorphizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing the test!(...)
for this expectation file? Also, it would be good to have a test that actually has some output.
analyze/analyze.rs
Outdated
@@ -411,7 +415,7 @@ impl traits::Emit for DominatorTree { | |||
retained_size: Some(retained_size), | |||
retained_size_percent: Some(retained_size_percent), | |||
// TODO CSMOE: find immediate dominator | |||
immediate_dominator: Some(id.0), | |||
immediate_dominator: Some(idom), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, looks the TODO is fixed -- we should remove the comment then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess it is done in this commit :-p
analyze/analyze.rs
Outdated
@@ -402,7 +402,11 @@ impl traits::Emit for DominatorTree { | |||
items.retained_size(id), | |||
(items.retained_size(id) as f64) / (items.size() as f64) * 100.0, | |||
); | |||
|
|||
let idom = if let Some(idom) = items.predecessors(id).last() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right.
Predecessors are just the normal neighbors graph edges reversed. Instead of asking questions like "what other items does X refer to?" it allows us to ask "what items refer to X?"
The immediate dominator of X is the parent of item X in the dominators tree. Although we have it available, we don't actually save this information inside ir::Items::compute_dominators
right now, so we will need to do that. We can add an immediate_dominators: Option<BTreeMap<Id, Id>>
member to ir::Items
and fill it in inside compute_dominators
.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fix later.
analyze/analyze.rs
Outdated
@@ -601,8 +600,9 @@ impl traits::Emit for Paths { | |||
let item = &items[id]; | |||
let size = item.size(); | |||
let size_percent = (size as f64) / (items.size() as f64) * 100.0; | |||
let mut path = String::with_capacity(item.name().len()); | |||
path.push_str(item.name()); | |||
let mut callers = items.predecessors(id).into_iter().map(|i| items[i].name()).collect::<Vec<&str>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right either.
If we have
fn a() {}
fn b() { a() }
fn c() { a() }
fn d() { c() }
Then a
's predecessors will be [b, c]
. But the paths to a
are [ "a <- b", "a <- c <- d" ]
. The predecessors just give us one depth of incoming edges, not the paths, which we construct recursively by processing the edges.
As written, this will emit b -> c -> a
for the csv paths for a
, but that isn't an actual path to a
. It should emit two paths:
b -> a
d -> c -> a
The most straightforward way to fix this is to do a similar thing as recursive_callers
does in emit_text
and emit_json
to use the predecessors to walk the paths.
Long term, every single emit_blah
shouldn't have to redo this walk, and I have been meaning to fix this by doing the walking in the paths
function that creates the Paths
object and just storing the results in some sort of tree structure.
ir/ir.rs
Outdated
@@ -378,6 +378,11 @@ impl Id { | |||
pub fn root() -> Id { | |||
Id(u32::MAX, u32::MAX) | |||
} | |||
|
|||
/// Get the real id of a item. | |||
pub fn real_id(&self) -> u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me what this is for. Both components are real, and dropping one or the other loses the unique-ness property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I don't know why the Id
is represented by a u32-tuple(my fault, I should ask for your advice before coding). So how can I write a Id
? Just raw tuple as it's or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a pair of (section index, item within that section index)
To get something that could be serialized as a single thing, you could combine them into a u64
:
let top = self.0 as u64 << 32;
top | (self.1 as u64)
I would call this serializable
or something rather than real_id
.
@csmoe is this ready for me to look at again? |
@fitzgen yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this @csmoe ! I know it was a lot of back and forth.
Next time, we can do it more incrementally by implementing CSV formatting for one analysis at a time, for example. That should make PRs less monolithic, and land faster.
Thanks again for sticking with this!
@address #8