integrity for downloads #68

dveditz · 2017-03-25T16:44:52Z

When we were first discussing sub-resource integrity verifying downloads was one of the original desires. It got booted from the "MVP" early on (I can't remember why) and didn't get carried over from the old issues space to this one. Now it's time to take it up again.

If part of the concern was about navigations vs downloads and/or wanting to know whether we had to check integrity before we started the download we could restrict it to links that also have the HTML download attribute.

The text was updated successfully, but these errors were encountered:

mikewest · 2017-03-27T05:44:54Z

SRI for explicit downloads seems like low-hanging fruit. You're thinking something like <a href='' download integrity="...">?

I vaguely recall @bzbarsky having concerns about content-encoding: gzip, but I think @devd worked them out. Otherwise, the infrastructure should be there.

mikewest · 2017-03-27T05:49:59Z

(We just need someone to sign up to do the work... Y'all volunteering? :) )

annevk · 2017-03-27T06:04:11Z

It seems the main concern that hold this back in the past (proposed as https://wiki.whatwg.org/wiki/Link_Hashes and also various alternatives, see https://lists.w3.org/Archives/Public/public-whatwg-archive/2012Oct/0188.html; one of which was once added to the standard: a https+aes scheme) was lack of implementer interest and the worry that the integrity would get out-of-sync with the download and the user would just use some other tool to get the resource.

annevk · 2017-03-27T06:06:22Z

Note also that unless we carve out an exception (let's not?) this will require CORS, which is new for downloads. So you end up with <a crossorigin download=... integrity=...> and you'd have to define both crossorigin and integrity for <a>.

shekyan · 2017-03-27T07:12:10Z

Sounds easy and interesting. I can try to write it up, if nobody more qualified signs up for this.

mikewest · 2017-03-27T07:39:00Z

Note also that unless we carve out an exception (let's not?) this will require CORS

That's a good point.

We require CORS for subresource fetches because we'd otherwise be exposing the content of the resource via the hashes. Does the same apply to downloads? As far as I know, <a download> is fire-and-forget in Chrome; we don't expose a success/failure event or give the site access to its downloaded resources. Is the data exposed via one of the performance/timing APIs?

annevk · 2017-03-27T10:33:21Z

We've had requests already, e.g., in whatwg/html#954. I don't think we should try to postpone the need for safety as that will just make it very brittle.

mikewest · 2017-03-27T10:35:30Z

Got it. In that case, I completely agree that the CORS requirement is something we should keep in place.

riking · 2018-03-09T16:01:22Z

Looks like this issue has fallen by the wayside?

Content integrity for downloads has resurfaced in the news, including cases where an HTTPS page links to a plain-HTTP download. While those cases should be fixed, including download integrity feels like a low-hanging fruit to my uninformed point of view.

[1]: https://citizenlab.ca/2018/03/bad-traffic-sandvines-packetlogic-devices-deploy-government-spyware-turkey-syria/

Add the integrity check for `a` and `area` elements with the download attribute. It doesn't impact `a` and `area` elements without the download attribute. Know issues with that proposal: - It doesn't define the behavior of the `crossorigin` attribute - It doesn't explains how to handle "open in a new tab/window" actions on links: should the user agent download it the same tab or can the user perform integrity check on new tab/window?

annevk · 2018-03-11T04:25:12Z

Given that the download attribute works in terms of navigation at the moment this actually seems even harder. Perhaps there is some way to decouple it from navigation, but that would be quite a major change to implementations.

tdelmas · 2018-03-30T22:07:10Z

I create #78 to try push forward the discussion as this feature could really improve the security of the global ecosystem.

annevk · 2018-04-03T06:46:01Z

Unfortunately, I don't think that helps as it doesn't address the issues.

Marcono1234 · 2020-03-15T14:44:06Z

Is there something one (with limited HTML and HTTP knowledge) can do to help with the process of this issue?

Popular software such as GIMP or LibreOffice use mirrors and I would expect that the average computer user does not know how to verify the integrity or that this is important.

Regarding the linked whatwg mail archive thread it would be necessary to clarify what the intention of this issue is:

verify integrity of linked sites / documents: ❌
Therefore it would make sense to call the attribute download-integrity to clarify that it has no effect unless used for downloads (would still require download attribute)
verify integrity of downloaded files: ✔️
prevent against corruption while download is saved to file system (due to file system errors): ❓
Hashing the downloaded data on the fly (instead of re-reading the file) would be more performant

Supporting a length value describing the size of the downloaded content in bytes would allow failing fast, even while downloading if the content is larger than the specified length.

The proposed format should also support specifying multiple checksum algorithms in case the user agent does not support all, which will especially become the case in the future when new checksum algorithms emerge.

Therefore the following would in my opinion be a good format:

<a href="..." download download-integrity="INTEGRITY_DATA">

With INTEGRITY_DATA having this format (pseudo grammar):

INTEGRITY_DATA:
     (CHECKSUM,)+
     length:[1-9][0-9]*

CHECKSUM:
    ALG_NAME
    :
    CHECKSUM_VALUE

ALG_NAME:
    [a-zA-Z0-9-_]+

CHECKSUM_VALUE:
     Base64

Algorithm names should be clearly defined (either here or somewhere else) and should be matched case-sensitively to prevent something like "SHA-1", "shA-1", "sHa-1" and because in some programming languages comparing case insensitively can easily go wrong when the system language is used and it has special lowercasing rules (e.g. Turkish).

The checksum bytes are Base64 encoded because it can even in hex notation be quite large, e.g. for SHA-512 it is 128 chars in hex while only being 88 chars in Base64. Base64 padding (trailling =) is required and must not be omitted.

Example:
<a href="example.com/download" download download-integrity="length:1245667025,md5:1B2M2Y8AsgTpgAmY7PhCfg==,sha256:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=">

User agent behavior

If length is present, then the user agent must use it to verify the integrity.
If multiple checksums are present it may pick any, it is advised to pick the strongest one.

If no checksum algorithm is supported it may show a warning to the user, or it may just ignore the checksum information. It may also display the algorithms and checksum values to the user so they have a chance to verify the integrity manually.
Note: It might make sense to add a warn-if-none-supported:true/false value to the download-integrity attribute. The default value is false. If true the user agent must warn the user. The usecase would be mirror sites where failing to verify the integrity could have security implications.

If the integrity was successfully verified, the user agent is encouraged to indicate this to the user. However, it should be displayed as informational text (so the user knows they do not have to verify the integrity manually), but must not create a false impression of security, e.g. that the file is not a virus (similar to the previously green lock icon in the URL bar for HTTPS sites).

If the integrity check fails, the user must be informed that the file may be corrupted, modified by an attacker or that the site is incorrectly configured. The user agent is encouraged to advise the user to contact the site administrator. The user agent must offer the user two options: Deleting the file (preferred), and keeping the file. Unlike described in the whatwp wiki it should not use the term "Quarantine" since that would for most (if not all) OS' be just another folder. User agents are encouraged to only place the downloaded content in the "Downloads" folder of the OS as soon as the user accepted to keep the file. Otherwise the user might first see the file in the "Downloads" folder and open it before noticing the warning by the user agent.

Hopefully this comment is useful and not too intrusive. I tried to write down my thoughts as precise as possible. Any feedback is welcome :)

tdelmas · 2020-03-15T15:29:47Z

@annevk What are the blocking point on that issues? What points need to be discussed to make it move forward?

It is an important security issue for all websites using mirrors/CDNs for downloads.

There is no workaround for it (VLC tried to use js to download the file in memory and do the checksum but it has a lot of drawbacks: the browser compatibility is terrible, it require CDNs to add CORS headers and it doesn't work well with large files).

mozfreddyb · 2020-03-16T15:19:57Z

Given that the download attribute works in terms of navigation at the moment this actually seems even harder. Perhaps there is some way to decouple it from navigation, but that would be quite a major change to implementations.

@annevk, Wasn't download respecified as based on fetch?

khuguenin · 2020-03-16T17:47:11Z

We wrote an article (https://serval.unil.ch/resource/serval:BIB_9BD511E5C0D0.P001/REF) on checksum verification recently and suggested extending SRI to handle downloads. We wrote an explainer: https://github.com/checksum-lab/checksum-lab.github.io/blob/master/README.markdown
One issue with the download attribute for elements (mentioned above) is that it is restricted to same-origin links, which is the case that makes the least sense for checksums (https://www.w3schools.com/tags/att_a_download.asp).

mozfreddyb · 2020-03-17T08:08:31Z

I can answer parts of my own question to annevk from above. Downloading a hyperlink is specified in HTML.

@khuguenin same-origin or cors-same-origin, no? It would suffice if the CDN/Mirror sent a header of `access-control-allow-origin: *, which many CDNs do and already have to do for SRI with scripts/styles.

tdelmas · 2020-03-17T09:36:48Z

@mozfreddyb I think requiring CORS would reduce the usage of checksum because all mirrors/CDNs do not support it. If the download is "fire and forget" and the original page have no way to know if the download is complete, valid, or not, then I do no see a reason to require CORS. (also, if the mirrors/CDNs do have CORS, the javascript could do the checksum itself already today)

mozfreddyb · 2020-03-17T10:27:42Z

How do we ensure the download is (and remains) unobservable? I see there's the request's initiator set to download in the spec, but I'm not entirely sure that it can not be forged. I'd like to hear an expert's opinion here (@annevk, probably :))

devd · 2020-03-17T19:38:34Z

I feel like we can start with spec'ing with CORS; that's gonna be hard enough. Let's not increase difficulty level to max.

…

On Tue, Mar 17, 2020, 3:27 AM Frederik Braun ***@***.***> wrote: How do we ensure the download is (and remains) unobservable? I see there's the request's initiator set to download in the spec, but I'm not entirely sure that it can not be forged. I'd like to hear an expert's opinion here ( @annevk <https://github.com/annevk>, probably :)) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBOGMCTINQE3IMWISVTSLRH5GC3ANCNFSM4DFERDIA> .

annevk · 2020-03-20T13:23:51Z

What HTML says about downloads isn't entirely in line with implementations. Basically, navigation can result in a download (Content-Disposition) so it's all handled there. The download attribute is an additional input to the navigation algorithm to force downloads. I don't remember the crucial differences unfortunately, but any change here would be rather involved I'm afraid.

jb-wisemo · 2022-11-16T14:00:49Z

This feature should not be postponed or redefined for things other than specifying the uncorrupted hash of download.

Accordingly, this reduces to the following simple changes to the SRI specification:

The integrity attribute (as already specified) is valid for any HTML element that specifies an URI with any protocol.
The CORS requirement in the SRI specification shall not apply to any resource that would not otherwise be checked by the rules in the CORS specification. Downloads and alternative URLs are just examples of this, but so are subresources downloaded with other protocols such as FTP and TFTP.

Note that nothing in the SRI specification and concept depend if the user agent uses the "fetch" specification or not.

As a logical consequence, the following would all apply:

Specifying integrity for an ordinary page link, shall cause the loading of the linked page to fail with an appropriate error (not warning) if the page doesn't match. CORS does not (by default) apply to these links. This is useful for having a trusted document delivered in an off-web secure way (such as S/MIME e-mail) to refer to stable documents online. This link hashing can be chained to unlimited depth as long as the author avoids dependency loops (a.html specifies the hash of b.html which specifies the hash of c.html which specifies the hash of a.html).

Specifying integrity for a download link (with or without download attribute) shall cause the download to fail with an appropriate error (not warning), if the file doesn't match. This is useful for any download provided via a CDN or other 3rd party server. CORS does not (by default) apply to these links.

Specifying integrity for an image, sound, video, applet, script or font that doesn't match shall result in a failed subresource download (broken image symbol etc.). CORS does (by default) apply to these .

Alternative URIs in IMG tags etc. are not subject to the generic integrity attribute (it wouldn't match), but new attributes could be introduced to specify their hash values. For many of these, CORS does (by default) apply, but conceivably, new extensions to HTML could introduce alternative URIs for things to which CORS does not (by default) apply.

Alternative documents available via HTTP or other content negotiation mechanism will need their own enhancement of the SRI specification, perhaps by providing the hash of a list/tree of resource hashes where that list/tree is provided in the negotiation server response. However the basic specification for URIs that return a stable byte stream should not wait for such enhancements.

devd added the SRI-next label Mar 27, 2017

devd added this to the v2 milestone Mar 27, 2017

BigBlueHat mentioned this issue Mar 21, 2018

secure urls - poor man's ipfs beakerbrowser/beaker#43

Closed

mozfreddyb added the feature-request label Jul 2, 2019

mlissner mentioned this issue Apr 18, 2022

Apply subresource integrity to <img> tags #113

Open

adrelanos mentioned this issue Jun 2, 2023

check integrity of downloaded files rndme/download#120

Open

adrelanos mentioned this issue Jun 14, 2023

check integrity of downloaded files jimmywarting/StreamSaver.js#321

Open

jayaddison mentioned this issue Nov 29, 2023

SRI: Integrity enforcement on downloads w3c/webappsec#497

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrity for downloads #68

integrity for downloads #68

dveditz commented Mar 25, 2017

mikewest commented Mar 27, 2017

mikewest commented Mar 27, 2017

annevk commented Mar 27, 2017

annevk commented Mar 27, 2017

shekyan commented Mar 27, 2017

mikewest commented Mar 27, 2017

annevk commented Mar 27, 2017

mikewest commented Mar 27, 2017

riking commented Mar 9, 2018

annevk commented Mar 11, 2018

tdelmas commented Mar 30, 2018

annevk commented Apr 3, 2018

Marcono1234 commented Mar 15, 2020 •

edited

Loading

tdelmas commented Mar 15, 2020

mozfreddyb commented Mar 16, 2020

khuguenin commented Mar 16, 2020

mozfreddyb commented Mar 17, 2020 •

edited

Loading

tdelmas commented Mar 17, 2020

mozfreddyb commented Mar 17, 2020

devd commented Mar 17, 2020 via email

annevk commented Mar 20, 2020

jb-wisemo commented Nov 16, 2022

integrity for downloads #68

integrity for downloads #68

Comments

dveditz commented Mar 25, 2017

mikewest commented Mar 27, 2017

mikewest commented Mar 27, 2017

annevk commented Mar 27, 2017

annevk commented Mar 27, 2017

shekyan commented Mar 27, 2017

mikewest commented Mar 27, 2017

annevk commented Mar 27, 2017

mikewest commented Mar 27, 2017

riking commented Mar 9, 2018

annevk commented Mar 11, 2018

tdelmas commented Mar 30, 2018

annevk commented Apr 3, 2018

Marcono1234 commented Mar 15, 2020 • edited Loading

User agent behavior

tdelmas commented Mar 15, 2020

mozfreddyb commented Mar 16, 2020

khuguenin commented Mar 16, 2020

mozfreddyb commented Mar 17, 2020 • edited Loading

tdelmas commented Mar 17, 2020

mozfreddyb commented Mar 17, 2020

devd commented Mar 17, 2020 via email

annevk commented Mar 20, 2020

jb-wisemo commented Nov 16, 2022

Marcono1234 commented Mar 15, 2020 •

edited

Loading

mozfreddyb commented Mar 17, 2020 •

edited

Loading