-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show Contents of Readme on Crate Pages #869
Conversation
225d8b9
to
1308f1c
Compare
@carols10cents This PR should be ready for review now :) Tell me if you need anything concerning this. |
Cargo.toml
Outdated
@@ -49,6 +50,9 @@ serde_derive = "1.0.0" | |||
serde = "1.0.0" | |||
clippy = { version = "=0.0.142", optional = true } | |||
chrono = "0.4.0" | |||
pulldown-cmark = { version = "0.0.15", default-features = false } | |||
ammonia = "0.5.0" | |||
syntect = "1.7.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dependency version specification, did we want to leave out the specific patch version e.g. 0.5
compared to 0.5.0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I just copied and pasted the requirement line from the crates' pages (example). If I understand correctly the cargo documentation, this is just a stylistic issue. I'd be happy to change it if that's what we want, but I'm not sure this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave out the patch version, we get the latest patch version when the Cargo.lock file gets updated via cargo update
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand... The current lock file references diesel 0.14.1
despite having diesel = 0.14.0
in the Cargo.toml. Where is the behavior you describe documented ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes you're right, apologies, I should have read the cargo documentation as well.
src/uploaders.rs
Outdated
@@ -54,35 +54,54 @@ impl Uploader { | |||
} | |||
} | |||
|
|||
pub fn readme_location(&self, crate_name: &str, version: &str) -> Option<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a documentation comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it ! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to get crates.io to cooperate on Debian sid, I'll be able to review this fully during this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing!!! So much work, thank you so much!!!!! I have a few small review comments, and I have some overall thoughts I'm going to leave in a non-review comment, but I'm super excited for this!!! ❗️ ❣️
src/uploaders.rs
Outdated
@@ -82,40 +82,34 @@ impl Uploader { | |||
format!("readmes/{}/{}-{}.html", name, name, version) | |||
} | |||
|
|||
/// Uploads a file using the configured uploader (either S3, Local or NoOp). | |||
/// It returns a a tuple containing the path of the uploaded file | |||
/// and its checksum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this refactoring!!!!! 😻 😻 😻 😻
src/uploaders.rs
Outdated
response | ||
}; | ||
if handle.response_code().unwrap() != 200 { | ||
if let Err(e) = self.delete(req.app().clone(), &crate_path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I noticed from the old code that's not in the new code is this delete
call if the response isn't 200-- I think what this does is makes sure we delete anything on s3 in case, say, the file was partially uploaded. Was there a reason for removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This delete
call is here to remove the uploaded crate if the readme couldn't be uploaded. It technically wasn't removed. In the new code, I'm creating the Bomb
for the uploaded crate before uploading the readme so that if the readme upload fails a CargoResult::Error()
is returned, triggering the Bomb and deleting the uploaded crate file. I hope I got the Bomb
behavior correct.
src/bin/render-readmes.rs
Outdated
version.num, | ||
path.display() | ||
)); | ||
match file { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL this code was really confusing to me until I looked at the tar
docs and saw that tar::Entries
iterates over Result<Entry>
😂😂😂
Since everywhere else in find_file_by_path
is calling expect
and crashing, wdyt about calling expect
instead of this match
and making this method return just String
rather than Option<String>
, and removing the unwrap
on line 200?
I'm not entirely sure what the error would be here, so I don't know what to say in the expect
. I was going to suggest making the unwrap
on line 200 be an expect
that says "file not found" but I see that there's already an expect
that says just that in this method!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I might have done a little bit too much expect
/unwrap
for a script supposed to be long-running though it would seem that panic
'ing in a thread does not affect the main one.
While I'm at it, do we want to refactor the threading model on this script to use a real threadpool and not create a thread for each crate ? It would certainly improve performance but we couldn't expect
/unwrap
as much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that we're at the scale where we need to be concerned about performance at that level
src/bin/render-readmes.rs
Outdated
} | ||
|
||
/// Creates and Arc over an App instance. | ||
fn make_app() -> Arc<App> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplication with src/bin/server.rs
is a bit 😿😿😿
Do you think it would be possible to factor out the duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree with you ! Do you know where we would want this, as a new function in the app
module (app::make
) ? As a new method on the App
structure (App::from_env
) ? Somewhere else ?
src/render.rs
Outdated
} | ||
|
||
/// Renders the given markdown to HTML using the current settings. | ||
pub fn to_html(&self, text: &str) -> CargoResult<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!!!!!!
Again, this is looking awesome!! Since this is such a big change though (which again, is awesome!!!), I want to do a bit more review and testing before getting this out in production.
@vignesh-sankaran I'm definitely interested in how this affects development-- let's try to get your development environment fixed. Please ping me if you're still having trouble! |
Oh ummmm one more thing I need to debug before merging: when I run the tests locally on osx with 1.19.0 (and 1.18.0), I get an invalid memory access 😱😱😱 end of output of
This doesn't make much sense to me because it worked on travis! What are you running, @kureuil? Have you ever seen this? @vignesh-sankaran once you get your dev env up, please try running the tests and report back! |
@carols10cents I've never seen this during development of this PR. I wrote this code using Rust 1.18 and nightly-2017-07-04 on Fedora 25 and 26. Do you have any more logs, or a coredump file ? |
@carols10cents Yes I have run into that problem in macOS, and I don't know how to figure out what dependency is causing that problem :(. The Cargo.lock file has been updated, I'll try it again. All tests run fine on Linux, specifically Debian Sid. The problem I'm specifically having is not knowing which part of the fn new() function handling the /crates/new endpoint requires superuser permissions. This could take me a while to figure out since I'll have to log everything happening in that function until I see where the 500 error is being thrown. @kureuil, have you run into this problem? I can't seem to publish a crate locally to the locally hosted backend unless I run it as |
@vignesh-sankaran I did not run into this problem when I was developing, and I even tested it with the local uploader. However, I did not test the local upload after the refactor of |
@kureuil Ah yes that'd also explain the 404 I ran into when trying to run this in the local dev environment, since the directory to upload with READMEs doesn't exist. Specifically, the front end couldn't find the README file to display. The permissions problem actually exists even on master, so I'll still have to look into this. This gives me a starting point to work with though :). EDIT: @carols10cents, ok so I'm still running to problems on macOS, it could be one of the 4 new dependencies we're adding to |
Little update on this, I think I found the source of the segfault. The culprits are git2 and syntect and their regex functions. git2 wraps libgit2 which has a regfree function (declared as a weak alias) and syntect uses the onigumura regex library which also defines a regfree function which because it is not declared as a weak alias is overriding libgit2's function. So when libgit2 think it is calling its function it is in reality calling onig's one. I don't even know how it is reliably working on linux... I'll remove it for now and will make the syntax highlighting client-side for now. I'll try to submit another PR later when this issue will be fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll document private types as well, see if we can get them displayed on the crates.io documentation pages as discussed here
src/uploaders.rs
Outdated
fn crate_path(name: &str, version: &str) -> String { | ||
// No slash in front so we can use join | ||
format!("crates/{}/{}-{}.crate", name, name, version) | ||
} | ||
|
||
fn readme_path(name: &str, version: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment please :)
Yeah, I have the CORS config and a custom bucket policy to make all my uploaded objects public, but I think this one is already in place.
|
This removes the need for an app instance in the render-readmes script
So I tried running the render readmes script on staging by using
Indeed, there is no index checkout, because tmp directories aren't shared between dynos, and Good news is the render readmes script doesn't actually need a git checkout! It needs a curl::Easy handle, an uploader, and a database connection. These all happen to be conveniently available if you have an App, but it isn't too terrible to get them without one. I had to change uploader upload a bit, since we want to use the app's proxy handle for testing, but the render readmes script doesn't need to use the app-configured proxy, so I just passed in the curl::Easy handle rather than the app. After these changes, I was able to run the render readmes script on staging! It took like 10 seconds to do 26 versions, most of which don't have readmes, so not a very representative test. I got the CORS settings worked out with alex and I've got this deployed on staging. Interesting ones:
So it looks like it's working! I'll probably deploy this sometime tomorrow when I'm able to monitor the processing and impact :) |
let's queue this up (i'm about to get on a plane so i'm not going to deploy it til later) bors: r+ |
869: Show Contents of Readme on Crate Pages r=carols10cents Hey everyone, I started working on README rendering earlier and I thought that I should get some feedback now that it is working correctly on my local instance. I edited the `/crates/new` endpoint so that when a crate is uploaded to the registry, its README (markdown only) is rendered to HTML and a `readmes/{crate}/{crate}-{version}.html` file is created on S3 when the crate file is uploaded there. Then, when visiting a crate page, the Ember application makes a request to `{s3}/readmes/{crate}/{crate}-{version}.html`: if the file is found, its content is rendered in place, if not nothing happens. This way, the renderd README is cached and versioned, following @ashleygwilliams [recommendations](#81 (comment)). On the `crate/version` route, the README contents is displayed in place of the package description. The generated HTML is sanitized using [Ammonia](https://crates.io/crates/ammonia). This crate removes all tags and attributes that are not whitelisted, making the resulting markup as safe as possible for the end user. Bonus point: is uses the html5ever crate (created by the Servo project) to parse HTML. ~~~Codeblocks syntax highlighting is done server-side using [syntect](https://crates.io/crates/syntect). This crate uses Sublime syntax files to highlight code samples, which means that we automatically have access to a lot of pre-existing grammars and we can easily add/create missing ones.~~~ Syntax highlighting is now done client-side using PrismJS due to syntect's dependency onigumura conflicting with git2. Finally, I added a new script in `src/bin` named `render-readmes`. Its goal is to render the README of every uploaded crate version and to upload it on S3. This means that if modify the renderer behavior and we want to retroactively apply it, we're just one command (and a lil' bit of time) away ! (Though I might have left too much `unwrap()`/`expect()` in it) TODO: * [x] Security checks in `render::markdown_to_html` (script tag, javascript callbacks, more ?) * [x] Tests * [x] Documentation * [x] ~Edit delete-crate & delete-version scripts to delete README files~ It would seem that the scripts only delete database entries, not stored files * [x] Script to render published crates README (?) Fixes #81 Preview: ![screenshot from 2017-07-10 00-22-45](https://user-images.githubusercontent.com/1275463/27998080-f83bad7a-6505-11e7-884e-95ee4f7c2a47.png)
Build succeeded |
IT'S ALIVE!!!!!! (render-readmes is still running :) but everything is looking great!!!!!) |
Is this documented anywhere on crates.io's website? Sorry, I couldn't find any just now |
@alfiedotwtf Looks like it's rendering here now, was this the crate in question?. I'm not across how exactly readme rendering works, but I think it takes a little while to render server side? Otherwise it could be an add on preventing the loading of the readme on the client side? |
Yeah, I think that's the confusion... I read about this pull request a while back in "This Week in Rust", and so I added a README.md to my crate this morning, but when I Looking at the cargo.io documentation, I couldn't find anything on how to trigger the rendering on the crate page. So I ended up looking at other crates that did have it working, and found that to trigger this feature I needed to add the "readme" config item in Config.toml :) |
Hmm sounds like something that could be made clearer, could you perhaps open an issue up in the cargo repository? doc.crates.io's source code is currently in the cargo repo for some reason :). |
Ah, thanks for the pointer... done: rust-lang/cargo#4620 |
Hey everyone, I started working on README rendering earlier and I thought that I should get some feedback now that it is working correctly on my local instance.
I edited the
/crates/new
endpoint so that when a crate is uploaded to the registry, its README (markdown only) is rendered to HTML and areadmes/{crate}/{crate}-{version}.html
file is created on S3 when the crate file is uploaded there.Then, when visiting a crate page, the Ember application makes a request to
{s3}/readmes/{crate}/{crate}-{version}.html
: if the file is found, its content is rendered in place, if not nothing happens.This way, the renderd README is cached and versioned, following @ashleygwilliams recommendations.
On the
crate/version
route, the README contents is displayed in place of the package description.The generated HTML is sanitized using Ammonia. This crate removes all tags and attributes that are not whitelisted, making the resulting markup as safe as possible for the end user. Bonus point: is uses the html5ever crate (created by the Servo project) to parse HTML.