Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add csv-format ouput #44

Merged
merged 11 commits into from
May 22, 2018
Merged

[WIP] Add csv-format ouput #44

merged 11 commits into from
May 22, 2018

Conversation

csmoe
Copy link
Member

@csmoe csmoe commented Apr 20, 2018

@csmoe csmoe force-pushed the csv_format branch 2 times, most recently from 70bd90c to e425f40 Compare April 20, 2018 13:17
@fitzgen
Copy link
Member

fitzgen commented Apr 23, 2018

Hi @csmoe! Were you planning on adding csv support for all the other analyses in this PR as well, or were you planning on doing them as follow ups? Basically, I'm wondering if I should start reviewing now or later :)

@csmoe
Copy link
Member Author

csmoe commented Apr 23, 2018

@fitzgen Yes, csv for all. I’ll commits more tomorrow

@fitzgen
Copy link
Member

fitzgen commented Apr 23, 2018

Sounds good -- thanks!

@csmoe
Copy link
Member Author

csmoe commented Apr 24, 2018

@fitzgen is csv suitable to store nested data like(dominatortree/paths)? If so, could you give some tips?

@fitzgen
Copy link
Member

fitzgen commented Apr 24, 2018

I think a north star to guide the design would be something like "what is most useful for someone who wants to write a custom script to post-process twiggy analyses, and doesn't want to reimplement code that is already implemented in twiggy?"

For dominators, I think flattening the tree to a list and including each item's immediate dominator in the row would allow people to recover the tree without hassle. Something like:

"Id","Name","Retained Size","Retained Percent","Immediate Dominator"
98,"foo::bar::baz",1234,42.1337,33

For paths, I think returning a list of every path (rather than a list of each item we are finding paths for) would be best. Something like:

"Name", "Shallow Size","Shallow Percent","Path"
"foo::bar::baz",1234,42.1337,"blah -> dodo -> fluffy"
"foo::bar::baz",1234,42.1337,"blah -> dodo -> woof"
"foo::bar::baz",1234,42.1337,"blah -> dodo -> bark"
"foo::bar::baz",1234,42.1337,"blah -> choo -> blech"

What do you think?

@csmoe
Copy link
Member Author

csmoe commented Apr 25, 2018

@fitzgen thank you, that's a good approach, I'll work on your advice.

@csmoe csmoe force-pushed the csv_format branch 2 times, most recently from cbf7474 to 54a89bc Compare April 26, 2018 03:54
@fitzgen
Copy link
Member

fitzgen commented May 2, 2018

@csmoe double checking: you are still working on this PR, and aren't waiting for my review, right?

@csmoe
Copy link
Member Author

csmoe commented May 2, 2018

@fitzgen Yes, if it’s done I’ll ping you for review.

@csmoe csmoe force-pushed the csv_format branch 3 times, most recently from b3d1ac5 to 5842370 Compare May 10, 2018 12:15
@@ -0,0 +1,11 @@
Generic,ApproximateMonomorphizationBloatBytes,ApproximateMonomorphizationBloatPercent,TotalSize,TotalSizePercent,Monomorphizations
alloc::slice::merge_sort,1977,3.396673768125902,3003,5.159439213799739,"alloc::slice::merge_sort::hb3d195f9800bdad6, alloc::slice::merge_sort::hfcf2318d7dc71d03, alloc::slice::merge_sort::hcfca67f5c75a52ef"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store the monos into a string splited with comma.

@csmoe
Copy link
Member Author

csmoe commented May 10, 2018

@fitzgen almost done, waiting for your review.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this @csmoe!

For the future, it might be easier to do one csv implementation at a time, so that it isn't such a big PR with longer turn around on feedback. That way, we can also land incremental functionality as we go.

Overall, this looks great, but there are a few things that aren't quite answering the right question. See the inline comments below. It would be useful to compare the csv output and text output for what is otherwise the same command to verify that the paths/immediate dominators/etc are correct in the csv output.

Thanks again, @csmoe!

shallow_size_percent: size_percent,
retained_size: Some(retained_size),
retained_size_percent: Some(retained_size_percent),
// TODO CSMOE: find immediate dominator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fixed in later commits? Should get this fixed before merging.

analyze/csv.rs Outdated
#[derive(Debug, Default, Serialize)]
pub struct CsvRecord {
#[serde(skip_serializing_if = "Option::is_none")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is every command using the same csv records? I find this mildly surprising, and I assumed that they would each have their own specific csv record type, rather than adding options to all the fields and skipping Nones. That would be a bit less "dynamically typed" although it is more boilerplate.

Did you consider that approach? Why did you end up going down this route?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a "common" CsvRecord because most part of the records shared the fields from id to size_related_stuff, and the other commands like dominator might have one or two extra fields.
Declaring new Record inside every method seems a bit trivial, but duplicated skipping Nones too.
It seems you prefer the individual record, I'll fix that.

@@ -0,0 +1 @@
generic,approximate_monomorphization_bloat_bytes,approximate_monomorphization_bloat_percent,total_size,total_size_percent,monomorphizations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing the test!(...) for this expectation file? Also, it would be good to have a test that actually has some output.

@@ -411,7 +415,7 @@ impl traits::Emit for DominatorTree {
retained_size: Some(retained_size),
retained_size_percent: Some(retained_size_percent),
// TODO CSMOE: find immediate dominator
immediate_dominator: Some(id.0),
immediate_dominator: Some(idom),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks the TODO is fixed -- we should remove the comment then :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess it is done in this commit :-p

@@ -402,7 +402,11 @@ impl traits::Emit for DominatorTree {
items.retained_size(id),
(items.retained_size(id) as f64) / (items.size() as f64) * 100.0,
);

let idom = if let Some(idom) = items.predecessors(id).last() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right.

Predecessors are just the normal neighbors graph edges reversed. Instead of asking questions like "what other items does X refer to?" it allows us to ask "what items refer to X?"

The immediate dominator of X is the parent of item X in the dominators tree. Although we have it available, we don't actually save this information inside ir::Items::compute_dominators right now, so we will need to do that. We can add an immediate_dominators: Option<BTreeMap<Id, Id>> member to ir::Items and fill it in inside compute_dominators.

Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fix later.

@@ -601,8 +600,9 @@ impl traits::Emit for Paths {
let item = &items[id];
let size = item.size();
let size_percent = (size as f64) / (items.size() as f64) * 100.0;
let mut path = String::with_capacity(item.name().len());
path.push_str(item.name());
let mut callers = items.predecessors(id).into_iter().map(|i| items[i].name()).collect::<Vec<&str>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right either.

If we have

fn a() {}
fn b() { a() }
fn c() { a() }
fn d() { c() }

Then a's predecessors will be [b, c]. But the paths to a are [ "a <- b", "a <- c <- d" ]. The predecessors just give us one depth of incoming edges, not the paths, which we construct recursively by processing the edges.

As written, this will emit b -> c -> a for the csv paths for a, but that isn't an actual path to a. It should emit two paths:

b -> a
d -> c -> a

The most straightforward way to fix this is to do a similar thing as recursive_callers does in emit_text and emit_json to use the predecessors to walk the paths.

Long term, every single emit_blah shouldn't have to redo this walk, and I have been meaning to fix this by doing the walking in the paths function that creates the Paths object and just storing the results in some sort of tree structure.

ir/ir.rs Outdated
@@ -378,6 +378,11 @@ impl Id {
pub fn root() -> Id {
Id(u32::MAX, u32::MAX)
}

/// Get the real id of a item.
pub fn real_id(&self) -> u32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what this is for. Both components are real, and dropping one or the other loses the unique-ness property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I don't know why the Id is represented by a u32-tuple(my fault, I should ask for your advice before coding). So how can I write a Id? Just raw tuple as it's or?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a pair of (section index, item within that section index)

To get something that could be serialized as a single thing, you could combine them into a u64:

let top = self.0 as u64 << 32;
top | (self.1 as u64)

I would call this serializable or something rather than real_id.

@fitzgen
Copy link
Member

fitzgen commented May 21, 2018

@csmoe is this ready for me to look at again?

@csmoe
Copy link
Member Author

csmoe commented May 22, 2018

@fitzgen yes

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this @csmoe ! I know it was a lot of back and forth.

Next time, we can do it more incrementally by implementing CSV formatting for one analysis at a time, for example. That should make PRs less monolithic, and land faster.

Thanks again for sticking with this!

@fitzgen fitzgen merged commit 28df29b into rustwasm:master May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants