-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put hashes in asset filenames, not querystrings #118
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I updated the file_fingerprint test to match the change.
71b2559
to
aa72284
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate this has gone in already, but I've got some concerns about the approach we've taken here, which I'm finding quite subtle and pretty hard to follow. Was this an urgent change, or could it have waited a day or two? I appreciate this isn't my field of expertise, so I'd be interested to hear @tombye's thoughts on the approach as well.
Currently our specified image is not being shown on twitter cards, for example in https://twitter.com/msuxg/status/1407254592333697024 In our html we currently have ``` <meta property="og:image" content="https://www.gov.uk/alerts/assets/images/opengraph-image.png?f86f1d0dd106bfbcd8ad1ee5ea68e75e"> ... <meta property="og:image" content="/assets/images/govuk-opengraph-image.png"> ``` where the first is the image we want to show and the second is html added by the GOV.UK Frontend. https://github.com/alphagov/govuk-frontend/blob/v3.12.0/src/govuk/template.njk#L26 Note the second meta image currently 404s for us because it is a relative url rather than an absolute url. Twitter should be following the spec of open graph, where the first image is given preference and it ignores the second - https://ogp.me/#array. However, I think it is not following this correctly and that is why our image is not showing on twitter cards. This hypothesis backed by a manual test by me to remove the second meta tag and I see that twitter renders our card with our first image correctly. Therefore it leaves us with a few options 1. Get twitter to fix a potential bug with their twitter card implementation 2. Change the second meta tag to use an absolute URL 3. Get the design system to remove their fallover tag 4. Add in specific meta tags for twitter images Option 1 and 3 will take time and effort. Option 2 may not work because we won't have a file hosted in our assets directory at that location because of #118 Therefore I've gone for option 4 because I've tested it and it appears to work and it is also quick and non controversial. You can see the spec documented at https://developer.twitter.com/en/docs/twitter-for-websites/cards/overview/markup
This is 90% working now. Just need to figure out why |
Just in case it helps: would it be easier to get the Python code to do the renaming? It already knows where the compiled assets are, and how to fingerprint them, so it's not a big step to rename them as well. |
@benthorner Totally forgot that Ollie had a go at doing it in Python: OllieJC@6126d5e |
The only problem with doing it all in python is that the JavaScript build generates a sourcemap to give the browser a view of the uncompressed code. This is linked to the compressed code by a comment at it's end, containing the name of the sourcemap. In other words, if we rename the compressed file, we need to rename this too. If rollup (the library that builds the JS) is allowed to insert the hash itself, it will update this reference too. All that being the case, we've been looking at how we can keep the hash generation in the frontend build but make it work with the python. Caveat: we could just do a search-and-replace on the sourcemap comment but messing with the contents of files doesn't seem a great idea if we can avoid it. |
Thanks for your thoughts @tombye. I've wondered about source maps in the past: why do we have them? If they're just a nice-to-have, then we could just get rid of them instead of spending more effort to make them work. |
It can be very hard to debug JS in the browser without them as we obfuscate our code as well as minify it. They aren't necessarily needed on production but I can't seem much value in doing extra work to remove them when:
|
Me and @quis have been working on some changes which move this to the approach I mentioned in #118 (comment). This new approach:
That means as long as assets have 8 character SHAs in their name, we trust Gulp to generate the right SHA for their contents. This also includes:
|
Thanks for the update @tombye, that does sound like a simpler approach: all the knowledge about the hashing is in one place, and the Python code just needs to care about knowing what the file names are (a simpler goal). I noticed there are quite a few "WIP" commits. Can we squash those at all? That might make the PR easier to review. |
Using querystring hashes is a DoS vulnerability because it means that a request with a new querystring will miss the edge and pass through to the origin.
Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>
This requires a bit of bodging because `gulp-hash-filename` sees the filename and extension for `file.js.map` as `file.js` and `.map`. We should be able to do this with rollup by adding `entryFileNames: '[name]-[hash].js'` to its output config but that seems to use a different hashing algorithm so is incompatible with `gulp-hash-filename` and our backend code 🤡
Changes the plugin used to add a SHA to the filenames of all file types except JavaScript. For JavaScript files, use Rollup's own hashing method. Note: this method means the output file matches that input so references to it in templates need to change. This commit also creates a copy of each file it adds a SHA to, so the python filter code can find it without needing to know the SHA at any point.
It turns out that rollup's 'input' option allows you to define the 'name' property (used in the format string that defines the final name of output files). This can be done by passing in an object containing name:path pairs. For example: 'input': { 'foo': '/path/to/file.js' }, 'output': { 'entryFileNames': '[name].js' } ...will produce 'foo.js' These changes use this feature to return the JavaScript files to the names they had before and updates the HTML to reflect this. See the docs for the input option for more detail: https://www.rollupjs.org/guide/en/#input
The JavaScript files produced by rollup have SHAs in their names that are not just generated from their final contents*. Because of this, it is tricky to determine their filename. This proposes a different approach to the file_fingerprint filter, changing it so it gets the SHA'ed filename by regexp rather than using the same hashing as the gulp task. *This issue contains some useful information on how rollup generates its hashes: rollup/rollup#2839
Includes: - fix for how options are sent to gulp plugin that does the asset SHA'ing - give the HTML5 shiv a SHA in its name - add quotes to hrefs specified on certain tags without them* - stop the static asset build producing duplicate files for those with hashes in the name, these aren't used to verify the hash any more - remove unused gulp plugins and moves the gulp-sha256-filename plugin into dev dependencies - add hash to name of SVG fallback images *HTML5 allows the current form (which is why it all works) but it's a good idea to quote them for consistency.
Some image URLs are baked into the GOVUK Frontend components so can't be passed through our file_fingerprint filter, to get a SHA. Because of this, we need to keep them without the SHA.
1. updates the existing one so it covers the new behaviour 2. adds a test for when a file can't be found to match the path sent
I went through those commits, squashed them into one and added a summary of the combined changes. Hopefully this should help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying @tombye ⭐.
I ended up looking at the overall diff: from a reviewers perspective, the commits are hard to follow because they tell a meandering story of dead ends and small fixes (stuff I don't need to know). Fortunately the overall diff wasn't too hard to look at in one go. Definitely something I'll be pushing for in future PRs I review, though. As a quick suggestion, I reckon you could squash all but the last two commits into one that explains the approach.
Anyway, nice one for pursuing this and finding a single approach for it, especially with the weird rollup [name]-[hash]
bit 💯. I've only got a few more minor comments, so I'll approve this now.
gulpfile.js
Outdated
}) | ||
}), | ||
// Terser is a replacement for UglifyJS | ||
rollupPluginTerser() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't mentioned in the commit. I can see all the other entries have it, so I'm assuming it's intentional. (Another argument for DRYing up 😉.)
gulpfile.js
Outdated
@@ -69,7 +91,8 @@ const javascripts = { | |||
] | |||
}).then(bundle => { | |||
return bundle.write({ | |||
file: paths.dist + 'javascripts/govuk-frontend-details.js', | |||
dir: paths.dist + 'javascripts/', | |||
entryFileNames: '[name]-[hash].js', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a long time to find the code that generates [hash]
, to be sure it will always be 8 chars long. Would have been helpful to have a reference in the commit message that introduces this: https://github.com/rollup/rollup/search?q=digest
We’re calling Rollup three times, repeating the same config each time. We can make this less duplicative by having each task share the same config. I’ve done this by making a wrapper function, rather than making the config itself reusable. The way I’ve done it avoids more duplication. Unfortunately I couldn’t work out to do this in such a way that let the tasks keep their names in the console output.
We moved the hash from the query string to file name in this PR: #118 The current code is based on the file name being unchanged.
Using querystring hashes is a DoS vulnerability because it means that a request with a new querystring will miss the edge and pass through to the origin.