Port NodeJS backend into Rust. #22

Mark-Simulacrum · 2016-08-03T21:19:35Z

Removes JS file and replaces backend directory with Rust port.

Fixes: #21.

Running list to keep track of major future, either in this PR or separate, work (@nrc, feel free to add):

Use liblog rather than println for logging.
Track down a good way of parsing dates passed from the client.
Tests.
Think about and possibly replace error-chain.
Rewrite the make_times function to return a Timings struct which will keep the total phase separate, since it is a meta-phase. This is a huge project since this affects most of the codebase; as well as causing problems with the iteration over phases code: it probably expects the total phase to exist.

Mark-Simulacrum · 2016-08-03T21:40:01Z

backend/Cargo.toml

@@ -0,0 +1,15 @@
+[package]
+authors = ["Mark-Simulacrum <mark.simulacrum@gmail.com>"]


I'm not sure what to put in here--let me know and I can replace it/change it.

["Mark-Simulacrum <mark.simulacrum@gmail.com>", "Nicholas Cameron <ncameron@mozilla.com>", "The rustc-perf contributors"]

nrc · 2016-08-03T21:50:51Z

Just at a first glance, it would be nice, I think, to break out at least a couple of modules - ideally main.rs should be just the extern crates and a main function. You should probably leave the JS version there for the moment - it will be easy to remove later, but it might be interesting to run both for testing. Could you add some more comments please, particularly around the major data structures.

Mark-Simulacrum · 2016-08-03T21:55:28Z

Regarding the JS version: should I leave the ES6 version (upon which this work was based, at least in part: although I compared with both along the way to make sure I wasn't missing anything) or the version in master right now?

nrc · 2016-08-03T22:10:22Z

I guess the old version, since that is still our working reference.

Mark-Simulacrum · 2016-08-03T23:32:15Z

Pushed the modularization and documentation. Should be at least ready for another review pass.

Mark-Simulacrum · 2016-08-04T00:26:53Z

backend/src/load.rs

+
+        // Post processing to generate the summary data.
+        let summary_rustc = Summary::new(&data_rustc, last_date);
+        let summary_benchmarks = Summary::new(&data_benchmarks, last_date);


I just did a test, and both of these complete in less than a second (in debug mode). Should we move them into the server logic, rather than splitting the summary into two steps? I don't see many advantages to doing so, but wanted to leave a comment noting that here.

We might want to take a global look at what code is where later, but for now I think this is fine.

Mark-Simulacrum · 2016-08-04T01:25:14Z

I've also been thinking about the median computation for the summary page, and I'm not sure that it's beneficial: can you explain to me what the intended benefits are? I feel like it doesn't help to show the true by-week results, and skews the data by potentially never showing a week (especially if the week is an outlier!). I may be misreading something though.

nrc · 2016-08-04T04:25:56Z

re the medians, it is the median of three commits at apx the same point in the week, not three weeks, so we should never miss a week. The benefit is meant to be that there is some noise in the results and if we don't take the median, we are more likely to see a difference due to noise rather than signal, hopefully using a median gets rid of some of the noise.

Mark-Simulacrum · 2016-08-04T04:29:14Z

it is the median of three commits at apx the same point in the week, not three weeks

I'm not sure I follow you here: Is there a guarantee that there are >=3 commits per week? If not, then the (at least current) logic takes the previous 3 data points, not necessarily within the current week:

// For a given date we'll get the three most recent sets of data
// and take the the median for each value.
let start_idx = start_idx(data, &date);
assert!(start_idx >= 3, "Less than 3 days of data");
let mut weeks = Vec::new();
for idx in 0..3 {
    weeks.push(&data[start_idx - idx].by_crate);
}

nrc · 2016-08-04T04:32:55Z

It's not guaranteed, and at the moment we've been struggling with perf issues (ha! irony), but we were averaging three runs per day, so it would be highly unlikely to ever miss a week. We could add a guard against this in the code.

Mark-Simulacrum · 2016-08-04T04:36:43Z

A guard sounds good to me. (And this can be a future work type thing, I think).

This sound good as to a specification?: Assert that all three data points we just pulled are within the current week. Drop any data points that are outside the current week (and log a warning?). Possibly log another warning when 0 data points are now left and skip the week if no data points for it were found.

Mark-Simulacrum · 2016-08-04T05:08:05Z

Also, I've been looking at the rust-lang/rust#34831 issue (as a case of a pre-existing URL pointing at perf.rust-lang.org), and upon testing it with the current code, things break. I don't know of a good way to "best effort parse this date we're throwing at you" in Rust (and even in JS, truth be told). Without that, we'll either need to try and list all the cases we can think of (attempting parsing with each, failing, and moving on to the next one), or think of something else. Perhaps input sanitization? Regex-based search and recombine the input into something more readily parseable?

Let me know if you think we shouldn't care about backwards compatibility with URLs (I think we should). I'm also not 100% certain how to generate the URL in that issue, my attempts don't seem to produce the HH:MM:SS GMT+0000 (UTC) part.

Error(Msg("while parsing Mon Jun 13 2016 19:51:42 GMT+0000 (UTC)"), (Some(ParseError(TooLong)), stack backtrace:
...

nrc · 2016-08-04T05:14:00Z

backend/src/util.rs

+
+/// Return where the passed date is (Ok(usize)) or should go (Err(usize)) in
+/// the data slice.
+fn get_insert_location(data: &[Data], date: &NaiveDateTime) -> ::std::result::Result<usize, usize> {


Given the return type, this should probably be called find or something like that

I was tempted, but find [to me] implies returning &Item (&Data) in this case; position could work, but since this can return the position where something should go, not where it is, I wasn't too sure.

Also, do you think it makes sense to return Result<usize, usize> here? Or would returning usize and doing the if is_ok(), unwrap(), else, unwrap_err() logic in here be better? We perform that logic anyway below near as I can tell.

nrc · 2016-08-04T05:26:57Z

backend/src/load.rs

+                data_rustc.push(Data::new(date, &header, times, true));
+
+                for timing in times.members() {
+                    let crate_name = timing["crate"].as_str().unwrap().to_string();


What is going on here: .as_str().unwrap().to_string() ?

timing["crate"] is a JsonValue; so to convert it to a string we need to take a Option<&str> out, unwrap that (it could be something else) and then convert it to a String.

This is one of the places where Serde might help... I'm still split on which one is better (Serde's ergonomics are terrible IMO if you don't use the Serialize/Deserialize auto-impls).

Mark-Simulacrum · 2016-08-04T18:36:30Z

I'd like to add tests, but I've been struggling to come up with good isolated portions of the code which can be tested. We can definitely do some form of overall diff-like testing though I think (create a InputData with predetermined state and call the server functions to generate output). That shouldn't be too difficult as the server functions don't depend on a request being passed, just the input data and [optionally] a body.

Mark-Simulacrum · 2016-08-04T19:56:31Z

backend/src/server.rs

+    }
+
+    if kind == "benchmarks" && body["crates"].contains("total") {
+        return Err(format!("bad value for crates {:?} with kind benchmarks", body["crates"]).into());


Any ideas as to why a "benchmarks" kind can't have "total" in the crates? I'm a little confused by this.

nrc · 2016-08-05T04:18:46Z

Yeah, how we test this is a big question for me. I think in the medium term we should refactor to make portions testable, one piece at a time, and add unit tests - test coverage is a big goal for me for the rewrite.

In the short-term, that is probably too much to do. Some diff testing with real data from the timing repo would be good to do, if it is not too complex.

Mark-Simulacrum · 2016-08-05T04:19:25Z

backend/src/util.rs

+            data.len() - 1
+        }
+    }
+}


Both start_idx and end_idx basically just clamp the value into the bounds of 0..data.len(); is there something in the stdlib better than what we do now?

Not that I know about, might be worth asking on #rust though

Mark-Simulacrum · 2016-08-05T04:23:56Z

I think I can do some dirty testing with real data. Unfortunately, due to a few changes in logic (the percent update is the main one), we can't compare every page. I think I'll wait on the tests until we're blocking on them for merge if I can't come up with some good way of doing them quickly and cheaply. My current slow and semi-painful idea is to Save as... each page from a run of the JS version and do the same here and then diff them; but this grows rather quickly due to the semi-large number of variants for each page: kind, group_by, etc.

nrc · 2016-08-05T04:31:22Z

It might be worth separating out any corrections into a separate commit, so we can be sure that the old version matches the new version without fixes, then add the fixes back in. Not sure if that is easy enough to be worthwhile.

Mark-Simulacrum · 2016-08-05T04:35:32Z

Hmm, yeah, I'm not sure how easy that would be either. I think the only major difference should be on the summary page (and this has so far proven true from visual testing); but I may be wrong. There may also be subtle changes that are more difficult to detect/fix: for example, right now I believe the dates on the summary page are off by one compared to the dates on perf.rust-lang.org. I'm not certain why this is, although I have suspicions that it has to do with my updating the data directory on my machine while the official one seems to have stopped auto-updating.

nrc · 2016-08-05T04:39:28Z

I should have thought of this earlier, but we should move either the Rust or node implementation into a separate directory so we know what is what.

Mark-Simulacrum · 2016-08-05T04:44:06Z

I see a couple of options for that, not sure which is better, but I prefer either the first or the third personally. Creating a directory for a single file feels odd to me.

Moving Rust into a rs-backend directory
Moving Node into a node-backend directory
Moving the [single file] node perf.js to top level and rename to node-backend.js.

nrc · 2016-08-05T04:47:32Z

I like 1 or 2, so 1 seems like a plan we can both get behind :-)

nrc · 2016-08-05T05:01:27Z

backend/src/server.rs

+    let mut chain = Chain::new(router);
+    chain.link(State::<InputData>::both(data));
+
+    Iron::new(chain).http("0.0.0.0:2346").unwrap();


Can you pull the ip addr into a const please?

Crate-level or module-level? I'm torn either way: server-specific but also semi-global.

I'd probably stick at the top of the crate to make it as easy as possible to change. We'll probably read it from a config file at some point, I guess.

The intent is to increase maintainability and allow for faster development. This rewrite does not intend to change any functionality other than fixing the percent-handling code to use the proper formula. Fixes: #21.

nrc · 2016-08-05T05:44:20Z

Awesome work, thank you!

nrc mentioned this pull request Aug 3, 2016

Update JS backend code to ES6. #20

Closed

Mark-Simulacrum reviewed Aug 3, 2016
View reviewed changes

Mark-Simulacrum reviewed Aug 4, 2016
View reviewed changes

nrc reviewed Aug 4, 2016
View reviewed changes

Mark-Simulacrum reviewed Aug 4, 2016
View reviewed changes

Mark-Simulacrum reviewed Aug 5, 2016
View reviewed changes

nrc reviewed Aug 5, 2016
View reviewed changes

Rewrite NodeJS backend in Rust.

df7669b

The intent is to increase maintainability and allow for faster development. This rewrite does not intend to change any functionality other than fixing the percent-handling code to use the proper formula. Fixes: #21.

nrc merged commit 085f5ff into rust-lang:master Aug 5, 2016

Mark-Simulacrum deleted the rust-port branch August 7, 2016 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port NodeJS backend into Rust. #22

Port NodeJS backend into Rust. #22

Mark-Simulacrum commented Aug 3, 2016 •

edited

Loading

Mark-Simulacrum Aug 3, 2016

nrc Aug 3, 2016

nrc commented Aug 3, 2016

Mark-Simulacrum commented Aug 3, 2016

nrc commented Aug 3, 2016

Mark-Simulacrum commented Aug 3, 2016

Mark-Simulacrum Aug 4, 2016

nrc Aug 4, 2016

Mark-Simulacrum commented Aug 4, 2016

nrc commented Aug 4, 2016

Mark-Simulacrum commented Aug 4, 2016

nrc commented Aug 4, 2016

Mark-Simulacrum commented Aug 4, 2016

Mark-Simulacrum commented Aug 4, 2016

nrc Aug 4, 2016

Mark-Simulacrum Aug 4, 2016

nrc Aug 4, 2016

Mark-Simulacrum Aug 4, 2016 •

edited

Loading

Mark-Simulacrum commented Aug 4, 2016

Mark-Simulacrum Aug 4, 2016

nrc commented Aug 5, 2016

Mark-Simulacrum Aug 5, 2016

nrc Aug 5, 2016

Mark-Simulacrum commented Aug 5, 2016

nrc commented Aug 5, 2016

Mark-Simulacrum commented Aug 5, 2016

nrc commented Aug 5, 2016

Mark-Simulacrum commented Aug 5, 2016

nrc commented Aug 5, 2016

nrc Aug 5, 2016

Mark-Simulacrum Aug 5, 2016

nrc Aug 5, 2016

nrc commented Aug 5, 2016

		@@ -0,0 +1,15 @@
		[package]
		authors = ["Mark-Simulacrum <mark.simulacrum@gmail.com>"]

Port NodeJS backend into Rust. #22

Port NodeJS backend into Rust. #22

Conversation

Mark-Simulacrum commented Aug 3, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrc commented Aug 3, 2016

Mark-Simulacrum commented Aug 3, 2016

nrc commented Aug 3, 2016

Mark-Simulacrum commented Aug 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-Simulacrum commented Aug 4, 2016

nrc commented Aug 4, 2016

Mark-Simulacrum commented Aug 4, 2016

nrc commented Aug 4, 2016

Mark-Simulacrum commented Aug 4, 2016

Mark-Simulacrum commented Aug 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-Simulacrum Aug 4, 2016 • edited Loading

Choose a reason for hiding this comment

Mark-Simulacrum commented Aug 4, 2016

Choose a reason for hiding this comment

nrc commented Aug 5, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-Simulacrum commented Aug 5, 2016

nrc commented Aug 5, 2016

Mark-Simulacrum commented Aug 5, 2016

nrc commented Aug 5, 2016

Mark-Simulacrum commented Aug 5, 2016

nrc commented Aug 5, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrc commented Aug 5, 2016

Mark-Simulacrum commented Aug 3, 2016 •

edited

Loading

Mark-Simulacrum Aug 4, 2016 •

edited

Loading