[WIP] Add csv-format ouput #44

csmoe · 2018-04-20T06:09:52Z

fitzgen · 2018-04-23T15:57:10Z

Hi @csmoe! Were you planning on adding csv support for all the other analyses in this PR as well, or were you planning on doing them as follow ups? Basically, I'm wondering if I should start reviewing now or later :)

csmoe · 2018-04-23T16:00:40Z

@fitzgen Yes, csv for all. I’ll commits more tomorrow

fitzgen · 2018-04-23T16:01:29Z

Sounds good -- thanks!

csmoe · 2018-04-24T08:46:23Z

@fitzgen is csv suitable to store nested data like(dominatortree/paths)? If so, could you give some tips?

fitzgen · 2018-04-24T17:14:57Z

I think a north star to guide the design would be something like "what is most useful for someone who wants to write a custom script to post-process twiggy analyses, and doesn't want to reimplement code that is already implemented in twiggy?"

For dominators, I think flattening the tree to a list and including each item's immediate dominator in the row would allow people to recover the tree without hassle. Something like:

"Id","Name","Retained Size","Retained Percent","Immediate Dominator"
98,"foo::bar::baz",1234,42.1337,33

For paths, I think returning a list of every path (rather than a list of each item we are finding paths for) would be best. Something like:

"Name", "Shallow Size","Shallow Percent","Path"
"foo::bar::baz",1234,42.1337,"blah -> dodo -> fluffy"
"foo::bar::baz",1234,42.1337,"blah -> dodo -> woof"
"foo::bar::baz",1234,42.1337,"blah -> dodo -> bark"
"foo::bar::baz",1234,42.1337,"blah -> choo -> blech"

What do you think?

csmoe · 2018-04-25T12:53:25Z

@fitzgen thank you, that's a good approach, I'll work on your advice.

fitzgen · 2018-05-02T21:48:46Z

@csmoe double checking: you are still working on this PR, and aren't waiting for my review, right?

csmoe · 2018-05-02T22:08:00Z

@fitzgen Yes, if it’s done I’ll ping you for review.

csmoe · 2018-05-10T12:17:33Z

twiggy/tests/expectations/monos_wasm_csv

@@ -0,0 +1,11 @@
+Generic,ApproximateMonomorphizationBloatBytes,ApproximateMonomorphizationBloatPercent,TotalSize,TotalSizePercent,Monomorphizations
+alloc::slice::merge_sort,1977,3.396673768125902,3003,5.159439213799739,"alloc::slice::merge_sort::hb3d195f9800bdad6, alloc::slice::merge_sort::hfcf2318d7dc71d03, alloc::slice::merge_sort::hcfca67f5c75a52ef"


store the monos into a string splited with comma.

csmoe · 2018-05-10T12:18:38Z

@fitzgen almost done, waiting for your review.

fitzgen

Thanks for sticking with this @csmoe!

For the future, it might be easier to do one csv implementation at a time, so that it isn't such a big PR with longer turn around on feedback. That way, we can also land incremental functionality as we go.

Overall, this looks great, but there are a few things that aren't quite answering the right question. See the inline comments below. It would be useful to compare the csv output and text output for what is otherwise the same command to verify that the paths/immediate dominators/etc are correct in the csv output.

Thanks again, @csmoe!

fitzgen · 2018-05-11T17:09:28Z

analyze/analyze.rs

+                shallow_size_percent: size_percent,
+                retained_size: Some(retained_size),
+                retained_size_percent: Some(retained_size_percent),
+                // TODO CSMOE: find immediate dominator


Is this fixed in later commits? Should get this fixed before merging.

fitzgen · 2018-05-11T17:12:17Z

analyze/csv.rs

 #[derive(Debug, Default, Serialize)]
 pub struct CsvRecord {
+    #[serde(skip_serializing_if = "Option::is_none")]


Is every command using the same csv records? I find this mildly surprising, and I assumed that they would each have their own specific csv record type, rather than adding options to all the fields and skipping Nones. That would be a bit less "dynamically typed" although it is more boilerplate.

Did you consider that approach? Why did you end up going down this route?

I created a "common" CsvRecord because most part of the records shared the fields from id to size_related_stuff, and the other commands like dominator might have one or two extra fields.
Declaring new Record inside every method seems a bit trivial, but duplicated skipping Nones too.
It seems you prefer the individual record, I'll fix that.

fitzgen · 2018-05-11T17:14:04Z

twiggy/tests/expectations/monos_wasm_csv

@@ -0,0 +1 @@
+generic,approximate_monomorphization_bloat_bytes,approximate_monomorphization_bloat_percent,total_size,total_size_percent,monomorphizations


I think this is missing the test!(...) for this expectation file? Also, it would be good to have a test that actually has some output.

fitzgen · 2018-05-11T17:16:14Z

analyze/analyze.rs

@@ -411,7 +415,7 @@ impl traits::Emit for DominatorTree {
                retained_size: Some(retained_size),
                retained_size_percent: Some(retained_size_percent),
                // TODO CSMOE: find immediate dominator
-                immediate_dominator: Some(id.0),
+                immediate_dominator: Some(idom),


Ok, looks the TODO is fixed -- we should remove the comment then :)

Ok, I guess it is done in this commit :-p

fitzgen · 2018-05-11T17:28:16Z

analyze/analyze.rs

@@ -402,7 +402,11 @@ impl traits::Emit for DominatorTree {
                items.retained_size(id),
                (items.retained_size(id) as f64) / (items.size() as f64) * 100.0,
            );
-
+            let idom = if let Some(idom) = items.predecessors(id).last() {


This isn't quite right.

Predecessors are just the normal neighbors graph edges reversed. Instead of asking questions like "what other items does X refer to?" it allows us to ask "what items refer to X?"

The immediate dominator of X is the parent of item X in the dominators tree. Although we have it available, we don't actually save this information inside ir::Items::compute_dominators right now, so we will need to do that. We can add an immediate_dominators: Option<BTreeMap<Id, Id>> member to ir::Items and fill it in inside compute_dominators.

Does that make sense?

Yes, fix later.

fitzgen · 2018-05-11T17:40:43Z

analyze/analyze.rs

@@ -601,8 +600,9 @@ impl traits::Emit for Paths {
            let item = &items[id];
            let size = item.size();
            let size_percent = (size as f64) / (items.size() as f64) * 100.0;
-            let mut path = String::with_capacity(item.name().len());
-            path.push_str(item.name());
+            let mut callers = items.predecessors(id).into_iter().map(|i| items[i].name()).collect::<Vec<&str>>();


This isn't quite right either.

If we have

fn a() {} fn b() { a() } fn c() { a() } fn d() { c() }

Then a's predecessors will be [b, c]. But the paths to a are [ "a <- b", "a <- c <- d" ]. The predecessors just give us one depth of incoming edges, not the paths, which we construct recursively by processing the edges.

As written, this will emit b -> c -> a for the csv paths for a, but that isn't an actual path to a. It should emit two paths:

b -> a d -> c -> a

The most straightforward way to fix this is to do a similar thing as recursive_callers does in emit_text and emit_json to use the predecessors to walk the paths.

Long term, every single emit_blah shouldn't have to redo this walk, and I have been meaning to fix this by doing the walking in the paths function that creates the Paths object and just storing the results in some sort of tree structure.

fitzgen · 2018-05-11T17:44:24Z

ir/ir.rs

@@ -378,6 +378,11 @@ impl Id {
    pub fn root() -> Id {
        Id(u32::MAX, u32::MAX)
    }
+
+    /// Get the real id of a item.
+    pub fn real_id(&self) -> u32 {


It is not clear to me what this is for. Both components are real, and dropping one or the other loses the unique-ness property.

Since I don't know why the Id is represented by a u32-tuple(my fault, I should ask for your advice before coding). So how can I write a Id? Just raw tuple as it's or?

It is a pair of (section index, item within that section index)

To get something that could be serialized as a single thing, you could combine them into a u64:

let top = self.0 as u64 << 32; top | (self.1 as u64)

I would call this serializable or something rather than real_id.

fitzgen · 2018-05-21T22:33:30Z

@csmoe is this ready for me to look at again?

csmoe · 2018-05-22T00:00:53Z

@fitzgen yes

fitzgen

Thanks for sticking with this @csmoe ! I know it was a lot of back and forth.

Next time, we can do it more incrementally by implementing CSV formatting for one analysis at a time, for example. That should make PRs less monolithic, and land faster.

Thanks again for sticking with this!

csmoe force-pushed the csv_format branch 2 times, most recently from 70bd90c to e425f40 Compare April 20, 2018 13:17

csmoe force-pushed the csv_format branch 2 times, most recently from cbf7474 to 54a89bc Compare April 26, 2018 03:54

csmoe force-pushed the csv_format branch 3 times, most recently from b3d1ac5 to 5842370 Compare May 10, 2018 12:15

csmoe commented May 10, 2018

View reviewed changes

fitzgen reviewed May 11, 2018

View reviewed changes

csmoe force-pushed the csv_format branch from 5842370 to 29ae91f Compare May 20, 2018 10:50

csmoe added 10 commits May 20, 2018 19:26

csv-format: top

0f394e1

csv-format: refactor csv into mod

14a842d

csv-format: dominators

545d8cd

csv-format: monos

28cc3b1

csv-format: fix top_2_csv

0159521

csv-format: path

2420c14

csv-format: dominator:idom

5f865e1

csv-format: paths:path

84c4eae

rebase

cd71873

address suggestions

ce2a921

csmoe force-pushed the csv_format branch from 29ae91f to ce2a921 Compare May 20, 2018 12:12

csmoe force-pushed the csv_format branch from 2b55bcb to 0185e8d Compare May 20, 2018 12:36

fix test

3ca7ff4

csmoe force-pushed the csv_format branch from 0185e8d to 3ca7ff4 Compare May 20, 2018 12:44

fitzgen approved these changes May 22, 2018

View reviewed changes

fitzgen merged commit 28df29b into rustwasm:master May 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add csv-format ouput #44

[WIP] Add csv-format ouput #44

csmoe commented Apr 20, 2018 •

edited

Loading

fitzgen commented Apr 23, 2018

csmoe commented Apr 23, 2018

fitzgen commented Apr 23, 2018

csmoe commented Apr 24, 2018

fitzgen commented Apr 24, 2018

csmoe commented Apr 25, 2018

fitzgen commented May 2, 2018

csmoe commented May 2, 2018

csmoe May 10, 2018

csmoe commented May 10, 2018

fitzgen left a comment

fitzgen May 11, 2018

fitzgen May 11, 2018

csmoe May 14, 2018

fitzgen May 11, 2018

fitzgen May 11, 2018

fitzgen May 11, 2018

fitzgen May 11, 2018

csmoe May 14, 2018

fitzgen May 11, 2018

fitzgen May 11, 2018

csmoe May 14, 2018

fitzgen May 15, 2018

fitzgen commented May 21, 2018

csmoe commented May 22, 2018 •

edited

Loading

fitzgen left a comment

		@@ -0,0 +1,11 @@
		Generic,ApproximateMonomorphizationBloatBytes,ApproximateMonomorphizationBloatPercent,TotalSize,TotalSizePercent,Monomorphizations
		alloc::slice::merge_sort,1977,3.396673768125902,3003,5.159439213799739,"alloc::slice::merge_sort::hb3d195f9800bdad6, alloc::slice::merge_sort::hfcf2318d7dc71d03, alloc::slice::merge_sort::hcfca67f5c75a52ef"

		@@ -0,0 +1 @@
		generic,approximate_monomorphization_bloat_bytes,approximate_monomorphization_bloat_percent,total_size,total_size_percent,monomorphizations

[WIP] Add csv-format ouput #44

[WIP] Add csv-format ouput #44

Conversation

csmoe commented Apr 20, 2018 • edited Loading

fitzgen commented Apr 23, 2018

csmoe commented Apr 23, 2018

fitzgen commented Apr 23, 2018

csmoe commented Apr 24, 2018

fitzgen commented Apr 24, 2018

csmoe commented Apr 25, 2018

fitzgen commented May 2, 2018

csmoe commented May 2, 2018

Choose a reason for hiding this comment

csmoe commented May 10, 2018

fitzgen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fitzgen commented May 21, 2018

csmoe commented May 22, 2018 • edited Loading

fitzgen left a comment

Choose a reason for hiding this comment

csmoe commented Apr 20, 2018 •

edited

Loading

csmoe commented May 22, 2018 •

edited

Loading