-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port NodeJS backend into Rust. #22
Conversation
@@ -0,0 +1,15 @@ | |||
[package] | |||
authors = ["Mark-Simulacrum <mark.simulacrum@gmail.com>"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to put in here--let me know and I can replace it/change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
["Mark-Simulacrum <mark.simulacrum@gmail.com>", "Nicholas Cameron <ncameron@mozilla.com>", "The rustc-perf contributors"]
Just at a first glance, it would be nice, I think, to break out at least a couple of modules - ideally main.rs should be just the |
Regarding the JS version: should I leave the ES6 version (upon which this work was based, at least in part: although I compared with both along the way to make sure I wasn't missing anything) or the version in master right now? |
I guess the old version, since that is still our working reference. |
Pushed the modularization and documentation. Should be at least ready for another review pass. |
|
||
// Post processing to generate the summary data. | ||
let summary_rustc = Summary::new(&data_rustc, last_date); | ||
let summary_benchmarks = Summary::new(&data_benchmarks, last_date); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a test, and both of these complete in less than a second (in debug mode). Should we move them into the server logic, rather than splitting the summary into two steps? I don't see many advantages to doing so, but wanted to leave a comment noting that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to take a global look at what code is where later, but for now I think this is fine.
I've also been thinking about the median computation for the summary page, and I'm not sure that it's beneficial: can you explain to me what the intended benefits are? I feel like it doesn't help to show the true by-week results, and skews the data by potentially never showing a week (especially if the week is an outlier!). I may be misreading something though. |
re the medians, it is the median of three commits at apx the same point in the week, not three weeks, so we should never miss a week. The benefit is meant to be that there is some noise in the results and if we don't take the median, we are more likely to see a difference due to noise rather than signal, hopefully using a median gets rid of some of the noise. |
I'm not sure I follow you here: Is there a guarantee that there are >=3 commits per week? If not, then the (at least current) logic takes the previous 3 data points, not necessarily within the current week: // For a given date we'll get the three most recent sets of data
// and take the the median for each value.
let start_idx = start_idx(data, &date);
assert!(start_idx >= 3, "Less than 3 days of data");
let mut weeks = Vec::new();
for idx in 0..3 {
weeks.push(&data[start_idx - idx].by_crate);
} |
It's not guaranteed, and at the moment we've been struggling with perf issues (ha! irony), but we were averaging three runs per day, so it would be highly unlikely to ever miss a week. We could add a guard against this in the code. |
A guard sounds good to me. (And this can be a future work type thing, I think). This sound good as to a specification?: Assert that all three data points we just pulled are within the current week. Drop any data points that are outside the current week (and log a warning?). Possibly log another warning when 0 data points are now left and skip the week if no data points for it were found. |
Also, I've been looking at the rust-lang/rust#34831 issue (as a case of a pre-existing URL pointing at perf.rust-lang.org), and upon testing it with the current code, things break. I don't know of a good way to "best effort parse this date we're throwing at you" in Rust (and even in JS, truth be told). Without that, we'll either need to try and list all the cases we can think of (attempting parsing with each, failing, and moving on to the next one), or think of something else. Perhaps input sanitization? Regex-based search and recombine the input into something more readily parseable? Let me know if you think we shouldn't care about backwards compatibility with URLs (I think we should). I'm also not 100% certain how to generate the URL in that issue, my attempts don't seem to produce the
|
|
||
/// Return where the passed date is (Ok(usize)) or should go (Err(usize)) in | ||
/// the data slice. | ||
fn get_insert_location(data: &[Data], date: &NaiveDateTime) -> ::std::result::Result<usize, usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the return type, this should probably be called find
or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was tempted, but find
[to me] implies returning &Item
(&Data
) in this case; position
could work, but since this can return the position where something should go, not where it is, I wasn't too sure.
Also, do you think it makes sense to return Result<usize, usize> here? Or would returning usize and doing the if is_ok(), unwrap(), else, unwrap_err() logic in here be better? We perform that logic anyway below near as I can tell.
data_rustc.push(Data::new(date, &header, times, true)); | ||
|
||
for timing in times.members() { | ||
let crate_name = timing["crate"].as_str().unwrap().to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is going on here: .as_str().unwrap().to_string()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timing["crate"]
is a JsonValue
; so to convert it to a string we need to take a Option<&str>
out, unwrap that (it could be something else) and then convert it to a String
.
This is one of the places where Serde might help... I'm still split on which one is better (Serde's ergonomics are terrible IMO if you don't use the Serialize/Deserialize auto-impls).
I'd like to add tests, but I've been struggling to come up with good isolated portions of the code which can be tested. We can definitely do some form of overall diff-like testing though I think (create a |
} | ||
|
||
if kind == "benchmarks" && body["crates"].contains("total") { | ||
return Err(format!("bad value for crates {:?} with kind benchmarks", body["crates"]).into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas as to why a "benchmarks" kind can't have "total" in the crates? I'm a little confused by this.
Yeah, how we test this is a big question for me. I think in the medium term we should refactor to make portions testable, one piece at a time, and add unit tests - test coverage is a big goal for me for the rewrite. In the short-term, that is probably too much to do. Some diff testing with real data from the timing repo would be good to do, if it is not too complex. |
data.len() - 1 | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both start_idx
and end_idx
basically just clamp the value into the bounds of 0..data.len(); is there something in the stdlib better than what we do now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know about, might be worth asking on #rust though
I think I can do some dirty testing with real data. Unfortunately, due to a few changes in logic (the percent update is the main one), we can't compare every page. I think I'll wait on the tests until we're blocking on them for merge if I can't come up with some good way of doing them quickly and cheaply. My current slow and semi-painful idea is to |
It might be worth separating out any corrections into a separate commit, so we can be sure that the old version matches the new version without fixes, then add the fixes back in. Not sure if that is easy enough to be worthwhile. |
Hmm, yeah, I'm not sure how easy that would be either. I think the only major difference should be on the summary page (and this has so far proven true from visual testing); but I may be wrong. There may also be subtle changes that are more difficult to detect/fix: for example, right now I believe the dates on the summary page are off by one compared to the dates on perf.rust-lang.org. I'm not certain why this is, although I have suspicions that it has to do with my updating the data directory on my machine while the official one seems to have stopped auto-updating. |
I should have thought of this earlier, but we should move either the Rust or node implementation into a separate directory so we know what is what. |
I see a couple of options for that, not sure which is better, but I prefer either the first or the third personally. Creating a directory for a single file feels odd to me.
|
I like 1 or 2, so 1 seems like a plan we can both get behind :-) |
let mut chain = Chain::new(router); | ||
chain.link(State::<InputData>::both(data)); | ||
|
||
Iron::new(chain).http("0.0.0.0:2346").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pull the ip addr into a const please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crate-level or module-level? I'm torn either way: server-specific but also semi-global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably stick at the top of the crate to make it as easy as possible to change. We'll probably read it from a config file at some point, I guess.
The intent is to increase maintainability and allow for faster development. This rewrite does not intend to change any functionality other than fixing the percent-handling code to use the proper formula. Fixes: #21.
Awesome work, thank you! |
Removes JS file and replaces backend directory with Rust port.
Fixes: #21.
Running list to keep track of major future, either in this PR or separate, work (@nrc, feel free to add):
total
phase separate, since it is a meta-phase. This is a huge project since this affects most of the codebase; as well as causing problems with the iteration over phases code: it probably expects the total phase to exist.