-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Meeting 2014 08 18
Lars Bergstrom edited this page Aug 19, 2014
·
1 revision
- Administrative: Mike Poessy online to debug recurring Vidyo echo issues w/ this meeting
- Work week dates
- Hi-pri Rust issues: https://github.com/servo/servo/issues/2853 (larsberg)
- Need an
unsafe
code audit of all variable lifetimes. See https://github.com/rust-lang/rust/issues/16366#issuecomment-52393356 and https://github.com/dotdash/rust-azure/commit/053d2643f006cb26568ca3a8864e70307cea7b8f Servo should use a different, lifetime-respecting, transmute? (larsberg) - JS blog post is ~ready! (kmc) -Servo vs Firefox, for Android mobile phone and Desktop : https://www.dropbox.com/s/e23p7c6e7ek6i5k/Servo-vs-Firefox.pdf (Laleh)
kmc, jack, larsberg, azita, laleh, glenn, patrick, martin, simon
- jack: Unanimous support for November 3, so I'll fill out the forms for that. Still not sure where we will be, but will figure that out soon.
- larsberg: Some of them in good shape; tehre are existing Rust bugs.
- jack: I'll talk with Rust team on the Cargo issues. Going through them quickly, calling from within autotools, you get extra files written by it. When that happens, Cargo does a full rebuild. So, every time there's an autotools dependency, it rebuilds the whole thing. This is bad because it causes a recompile of SpiderMonkey. Workaround is for Cargo to respect an IGNORE file. (.gitignore, .htignore, etc.). #348 is a problem for continuous integration because it rebuilds the dependencies for each profile. "Cargo build" then "cargo test" then "cargo doc" builds everything three times. They may fix this; we can work around it with more builders, if needed. #382 is related to native dependencies. Any time it needs to rebuild a native dependency, it deletes all of the native dependencies and builds them all from scratch. Not a huge problem because we don't messs around in our C dependencies too much. We'll talk about it in the rust workweek tomorrow. There are some other cargo bugs Simon has run into.
- SimonSapin: Everything I've run into in Cargo is filed in bugs
- jack: Hit a bug that went away if we moved a line of code. Turns out that Servo was broken, not Rust! It was a real bug. Some changes in how Rust optimizes caused us to do bad things. WE worked around one, but found another (mrobinson did) that was really bad. Adding a println! made it go away, which confused us. What happened is that the Rust compiler got much better about reusing stack space, so if we misused stack space, the pointers would not become invalid. In both of these bugs in Servo, we were calling transmute on stack allocated code and passing it outside of the block. WE've talked about doing a transmute audit before. Lars is suggesting a different transmute.
- larsberg: Yeah, we usually just change the type, not the lifetime.
- jack: I'll talk with Niko. There are less versions of transmute in Rust and they're getting rid of them. Maybe pcwalton can talk about it. It'd make it hard if they remove them...
- zwarich: We can make them utility functions, since we can define them in terms of transmute.
- kmc: Do we maybe want lints to catch them? If we had a central place, then it would help. Maybe we could have a rule for transmute that has the type info.
- jack: Do we have knowledge of if it's a stack value? A nice first pass would be to tell that we are transmuting something with stack lifetime and returning it transmuting something with stack lifetime and returning it.
- kmc: Maybe we would write a lint that would warn but not error on unsafe blocks we haven't audited, too. It does introduce more warnings, but it would be nice to have them in place. It can be nice to have everything set to warn.
- jack: A warning on uses of transmute (as long as it wasn't the default) seems great. maybe a 'make transmute-warn'?
- kmc: Could add allow(transmute) after you've audited the file to disable it. The JS thing is simple - just have an attribute on the struct and anywhere it's mentioned it's analyzed. I'll see what form that could take.
- jack: Takeaway here is that it's worth looking at any transmutes you're using and see if you can use a more restricted form.
- SimonSapin: Before we get a version with DST, we might want to audit our transmutes, because it changes the representation of trait objects and will break some of our code.
- jack: Yes, in parallel.rs.
- pcwalton: Aren't trait objects still going to be two pointers?
- SimonSapin: YEs, but they're swapped!
- kmc: We could write a lint that warns about transmuting a trait object.
- pcwalton: I tried to typedef all that stuff away into UnsafeFlow
- jack: There are at least two different special structs that unwrap the trait object. I filed a bug for this. When the code was written, there was no Trait object type, but now there is, so we could remove that.
- SimonSapin: Yes, we should use that type for the primary representation of that stuff.
- jack: If we use the standard library object Trait type, that should be fixed automatically, right?
- SimonSapin: yes.
- jack: The version of Rust we updated to also has lifetime inference. So, if you're working on code with a lot of lifetimes, I didn't go through and try to remove the explicit ones, but if you see lots of lifetimes, you might want to try to remove them and see if the compiler can infer them.
- SimonSapin: Where can we read the new rules?
- pcwalton: If there's one parameter in the output and input parameter, you can remove the lifetime parameter.
- jack: About 80% of the lifetime annotations in the Rust compiler went away with this rule, so I suspect it will work. I fixed all of the cases where the compiler could no longer infer a type.
- kmc: It's ready, and I'll put a link here.
- jack: Need more review?
- kmc: It's on github and in a wordpress draft, but you have to be logged in for the latter. https://github.com/jdm/who-roots-the-rooters/blob/master/dump.md (anyone can see) https://blog.mozilla.org/research/?p=269&preview=true (with final formatting; need to be logged in)
- kmc: Sent the link to Dave; he is reviewing.
- azita: Please ping Paul Jarrat, too, so he can also give feedback.
- jack: He'll also want us to publish it Tuesday or Thursday instead of Monday or Friday. I'll send him e-mail to look over it. Do you think you could have feedback turned around later this week? Or next Tuesday?
- kmc: Unless there's some huge change, I can have it sorted out.
- jack: I'll ping Paul after this meeting. Let's shoot for Wednesday unless we hear differently.
- kmc: Who's Paul?
- jack: PR person.
- laleh: In part A of the PDF is a comparison between servo and firefox on android. The time is about the same, but our power usage is currently more than firefox on android. Part B compares firefox with servo on the desktop. You can see that firefox is currently better than servo on 8 threads, which makes sense because servo also runs for much longer.
- jack: It looks like the relative power use and times are about equal. If we could optimize the rest of servo to be as fast as firefox, we should match.
- pcwalton: It looks like the problem is not rendering or layout - probably script or parsing.
- gw: And worker threads that are spinning.
- jack: perf-rainbow?
- laleh: Yes. Perf is about the same, but pwoer is much higher.
- pcwalton: We do get a benefit with more threads, though, which is great. I'm not too surprised by these numbers, given that there are inefficiencies in scripts, in particular. I wouldn't be surprised if those explain the majority of these numbers.
- jack: Why are we so close on android and not desktop? Maybe glfw?
- pcwalton: Coudl be frontend issues - setting up the GL context. There's a lot of stuff that goes into Android app lifecycle, which is really complex, and we might be doing it wrong. -jack: We have the glut stuff still there. MAybe we could compile it for desktop and see if performance improves.
- pcwalton:Could measure startup time - time until script task starts, from process launch until script.
- zwarich: Layout, power, and performance tests should not be measuring what happens when we start the browser.
- pcwalton: Yes, there are some numbers in Part C that break those out separately.
- jack: Is Part C desktop?
- laleh: Yes. And just time. These are partitioning the time between rendering, compositing, and layout. It's interesting that 90% is in render/composite/layout, but in Servo it's at most 61%.
- jack: Yes, that we knwo about because layout is what we optimized. We'd expect that layout will be less of our time because we've made it fast.
- pcwalton: I expect rendering to be similar. We're using skia, as firefox is...
- jack: Firefox doesn't use the skia backend.
- pcwalton: Maybe dominated by the same backend code, at least on android - fills, blits, etc.
- jack: We could switch to the cairo backend in SErvo?
- zwarich: No, we built in lots of dependencies that require the skia backend. I tried using the other backends on CPU/GPU, but they crash deep in native code.
- jack: Other things to look at first. How do you know when firefox is done?
- laleh: Until the page is loaded.
- jack: That might be noisy.
- pcwalton: Maybe we should try to do a JS-based thing and measure how long that takes?
- laleh: The firefox profiler shows when it is done, and the rest is just waiting. It's just the power part that is manual.
- jack: Maybe we can check with someone on the firefox team to see how to automatically close it.
- laleh: On Servo, we can just use the -x flag.
- jack: Yes, using the Firefox one would be great. We could also add more flags to Servo to make it easier to measure other power scenarios.
- laleh: We can see the time in layout in Servo with 8 threads is less time than the layout time in firefox.
- jack: Do we count selector matching in the layout time.
- pcwalton: Yes.
- jack: And we know that part isn't optimized.
- pcwalton: wat?
- jack: No bloom filter...
- pcwalton: Doesn't help with perf-rainbow.
- jack: OK, good. We'll see if we can follow up with more numbers.
- jack: We were chatting with dherman on the Servo architecture and talked about selector matching and how we don't have a bloom filter. I think the Zoom paper is where they mentioned it. When bjz talked with sfowler about what they'd done, they had done a perfect hashing for it. The reason they did that is that they wanted to share the bloom filter across the layout threads to copy it around cheaply. But we may not need to share it at all, since we're not doing COW DOM. We may be able to store it in the DOM with a readonly pointer, since we don't mutate it.
- pcwalton: Yes you are. You are using a different filter for every node. But you can't have one for every DOM node. So, what gecko does is it has one bloom filter and adds/removes stuff from it as it goes through, caching what's seen above you (the ancestors). It's a counting bloom filter so you can remove stuff.
- jack: So save the state when it does the children, undoes the state, etc.?
- pcwalton: Yes. And the bloom filter is trash at the end of layout. It's a transient thing.
- jack: Maybe 2 bloom filters?
- pcwalton: Maybe. Lots of various tricks you can do, but it's not trivial. Hard to say; I also don't know how much it helps to begin with. IT's going to be very page-dependent and situational. Style sharing may be good enough that bloom filters don't help that much.
- zwarich: Webkit was the first to have this. In the original bug, it said it sped up style matching by 30% on sites with descendent selectors.
- pcwalton: Style sharing also has to not kick in; those heuristics have gotten much better.
- zwarich: Descendent selectors have gotten much more common over time in real pages.
- kmc: Yes, because the syntax is shorter, right?
- pcwalton: Yes.
- jack: I was thinking maybe we could implement them now that we're not doing COW DOM easily, and while that's not true, mayb ewe could still do it. Along those lines, our roadmap doesn't have optimizations listed because it's more coarse-grained. Is that the most important optimizaiton for selector matching?
- pcwalton: As long as we have Atomized it, it's good.
- gw: Still a DOMString for the class that gets split; I have a patch out for that. I've changed most of the other attributes to Atom
- pcwalton: After the Atomization is complete, we just need the bloom filter for selector matching and it should be good. The bloom filter might just be a nice situational assist.
- zwarich: And CSS selector JIT!
- jack: dherman was in touch with the CSS JIT folks at blink and they're not going to do it because it didn't result in better perf for them.
- pcwalton: That's one datapoint. Webkit said that it helped, though. Not saying we should or shouldn't do it, but it does seem more low priority than script optimizations or SIMD text layout.
- jack: Do you know who did the CSS JIT in webkit?
- zwarich: Yes. I asked him how it did on real pages and said the gains showed up on real pages. With all the optimizations they've done, style resolution is no longer a hot spot in webkit. Layout is now the thing they need to optimize. I have no reason to believe that his claims on both microbenchmarks and reality are false. i'm very curious why the blink developers aren't going with it - complexity, interacting with other parts of blink, or if it really wasn't faster. Because only the final is important to us.
- pcwalton: Maybe V8 interaction?
- jack: Can we talk with them, cameron?
- zwarich: I'll get a hold of him after the Blink post-mortem comes out.
- jack: Is there a bug on the last atomization?
- gw: No, I'll open it. Should be pretty easy.
- kmc: I wonder if it would be worth hacking class parsing in the HTML parser to spit out a list of atoms instead of a single string? Then you don't have to split it even once.
- SimonSapin: In the DOM, not parser. We already store the positions of the whitespace there, but don't use it yet. We could also store the Atoms directly in the DOM.
- pcwalton: I think that's what Gecko does.
- SimonSapin: Don't know if we should use a HashSet or just a VEctor...
- pcwalton: I don't know what Gecko does there.
- SimonSapin: I just submitted a PR with basic block layout for vertical writing modes! https://github.com/servo/servo/pull/3110#issuecomment-52576175
- jack: YAY!
- zwarich: Maybe we should get it passing again...
- jack: Some was broken by the corruption.
- SimonSapin: I couldn't get a reftest for ACID2 working because the rendering across many linuxes (mine, larsberg's, and TRavis's) were not the same.
- pcwalton: How not the same?
- SimonSapin: We should check. It was off by one pixel.
- jack: Bug already? I saw screenshots, but maybe a bug?
- SimonSapin: It's in the PR. https://github.com/servo/servo/pull/2741
- jack: If we can put the screenshots in there, maybe we
- larsberg: I suspect memory issues; lots of random corruption.
- jack: WIsh there was a rustc mode that did 0xDEADBEEF into reused stack slots or something so we'd crash instead of just corrupting.
- zwarich: The stack reuse uses LLVM's register allocator, so unfortunately there's no real way to implement it. You could do something that like scribbles the stack whenever you leave scope (write over the
alloca
s). Maybe that wouldn't be too hard to implement, but couldn't quite do it with the stack slots. - jack: Yeah, I just wonder how we'll find all these.
- zwarich: If we switch to the transmutes that respect lifetimes, we'd probably be fine. The ones that remain and need to change a lifetime are the ones we'll need to really look at. So the first step is just changing to the safer version of transmute.
- kmc: How would you write the type that doesn't change it? 'a T -> 'a S? What if it's a struct with a lifetime parameter?
- zwarich: You'd have to make a type-specific transmute that guarantees it. Maybe with macros or syntax extensions? Could have some sort of type-specific lifetime-invariant transmute generated for all the cases. I feel like most of them are plain references rather than internal ones, but maybe not. I assume it's just going to be the stupidest mistake most of the time.
- jack: Transmute_lifetime was removed on 5/9 because "it was likely a strong indication that the code is incorrect." That seems like the opposite of what we've discovered.
- pcwalton: You can use type annotations on transmute; just the parts that matter. You can transmute &_ to *_.
- jack: Harder to see that it's wrong. I can imagine that we're not allowed to use transmute EVER in servo. But a bare transmute vs. an annotated one seems harder to see.
- ryan: Question about the experiment between firefox and servo. How did you measure power usage?
- laleh: Android ro desktop?
- ryan: firefox and servo on android.
- leleh: We attach a power meter to the phone and have a tool from the firefox team that reads the numbers from the ammeter that is connected to the battery.
- jack: There's a blog post on that - can we put it in the notes?
- laleh: Here is a link to it: https://developer.mozilla.org/en-US/Apps/Build/Performance/Performance_fundamentals/Power
- laleh: It has a picture of the custom printed equipment we have.
- jack: Do we have a lot? Can we send them?
- laleh: It's 3d printed, we can send the file.
- jack: ryan, would your team like one of these?
- ryan: I was just curious.
- jack: Let us know if you want one and I can have it shipped. I was talking with the Daala team and the ARM7 chips with differently-powered cores. Some of the Samsung devices ship with these chips already and it would be nice to use.
- kmc: Are these the big-little ones?
- jack: Yes, I think the Galaxy S4 has one. It wasn't clear if you could tell the linux scheduler which threads to put on which processors.
- zwarich: It's linux, of course you can! It looks likes it's the Samsung S4, S5, and Galaxy Note 3. Also, the latest Qualcomm Snapdragons, which are probably in higher-end phones than the FFOS phones that use SoC today.
- jack: I knew they existed, but didn't know they were shipping products yet. It'd be neat to see how Servo runs on them and if we can take advantage of those chipsets.
- zwarich: I'd be very curious about it. Because the little cores are MUCH slower. They're in-order cores that are not much better than what was in phones four years ago. It would be very interesting to see if layout can even run fast on that thing or if it requires an out of order CPU.
- jack: Yeah, seems like they got most of the power gains by being in-order only.
- zwarich: Yes, in theory code can switch between the two. In ARM's implementation, there is a big hit for switching a thread from the big to small cores. You have to go through an explicit transfer state; you can't do it lazily, so there must be a kernel feature to avoid any penalties associated with that.
- jack: Seems like you want the layout threadpool on the little cores.
- kmc: Linux scheduler supports affinity. May just need to look at the CPU ids and probably need a native thread spun up, but should be able to do it.
- jack: We should track android experiment pages. We almost have reliable android builds and nightlies soon!
- zwarich: Starting on Samsung products with kernel 3.10, it's all just built-in for heterogenous scheduling. Nothing crazy.
- kmc: At least we're not going continuous integration for ios; that sounds horrible.