Skip to content
This repository has been archived by the owner on Nov 15, 2017. It is now read-only.

Support file:// protocol #303

Closed
gorhill opened this issue May 26, 2014 · 6 comments
Closed

Support file:// protocol #303

gorhill opened this issue May 26, 2014 · 6 comments

Comments

@gorhill
Copy link
Owner

gorhill commented May 26, 2014

Currently HTTPSB supports only the http:// or https:// protocols for the matrix. file:// is ignored, and http resources pulled from a file://-based page are redirected through the chromium-behind-the-scene scope.

So investigate all the details in supporting the file:// protocol (from which no hostname can be extracted, thus no scoping, etc.). Main idea is to mash up all file://-based requests into a specific built-in scope, just like all orphan requests are mashed up into the chromium-behind-the-scene matrix.

@pdfernhout
Copy link

Thanks for creating an issue on this about adding full file:// support in response to my question about HTTPSB on the Chrome Web Store entitled "Whitelisting file:// URL after disabling JavaScript" (no direct URL to the question I could find unfortunately). In thinking more on this, here are two minor changes to the GUI short of full support for file:// that would have helped with my confusion.

For file:// or any other protocol which HTTPSB does not currently handle, it might be good if the popup HTTPSB matrix window could have said that the specific protocol was not handled. That window might then also include some text suggesting looking at the global JavaScript settings to control that behavior. As it is now, the matrix popup says "No net traffic seen for this tab so far." I understand how that might be correct from HTTPSB's point of view. Still, that led me to believe that HTTPSB might start doing things for the page if only I could get the whitelisting set up correctly (when global JavaScript was turned off). The matrix popup also continues to say that after the file:// page has loaded remote resources, which is again confusing. Such a change seems easier to implement to me than full file:// support, and it might also help avoid confusion with any other protocol if all unsupported protocols said the same thing.

For whitelisting under "Ubiquitous rules" in the HTTPSB dashboard settings, it says: "One rule per line. A rule can be a plain hostname, or an Adblock Plus-compatible filter. Lines prefixed with ‘#’ will be ignored." However, it was not clear to me from this whether the "@@" required for AdBlock Plus whitelistings was required (or maybe optional) in specifying a rule in that specific text area. This is because otherwise such rules for blacklisting and whitelisting are in the same file for AdBlock Plus, whereas in this case for HTTPSB the whitelist rules are in a special area. So I was trying to add rules with and without the "@@" which made testing twice as hard in a case where something did not work as expected as a new user. It was also not clear to me when an asterisk for a wildcard like in AdBlock Plus would be required given the first part of that help text suggesting I could just put in a hostname (so, two perhaps conflicting ways of entering rules). Also, it was not clear to me if I needed to include the file:// protocol in this case, so I tried it with and without as well. Since there was no host name for file://, I also tried a leading asterisk and parts of the file name as well as the full filename. I tried leading pipe "|" characters. I also tried with and without things at the end, like partial paths with asterisk and also $script and $document at the end. My point is that there are a large number of combinations possible in making such rules when they don't work as expected in this case for file:// support. So, here is a suggestion (perhaps needs a new issue) to reduce some of this ambiguity for new users and to keep them from wandering off into the AdBlock Plus help forums. :-) It might be useful have a link on the settings page near those blacklist and whitelist text areas either to an external web page or to an internal popup help box which includes some examples and explains what they do. Just a few examples of what works (and maybe what doesn't, like adding "file://" or "ftp://" or whatever) might go a long way to helping avoid confusion on this. Alternatively, at the very least, the short help text on the page could explain about whether "@@" is required for whitelist rules. Also, if you do add file:// support, you might mention there how the whitelist and blacklist work for it in linked help on the settings. Still, if was clear from the GUI that file:// was not supported as in the previous suggestion, then I would not have been trying all these variations with whitelisting, and in that case, the current advice on that page might be good enough with just a little experimentation.

Regarding lumping all file:// requests together as a common scope, that makes sense to me under Chrome, given Chrome seems to use a common file:// origin as well for things like IndexedDB (where two different local html files both can access the same local data). So it would fit my expectations there. This is in contrast to Firefox which has a much more restrictive approach and almost tries to have every specific file be its own security domain and creates different IndexedDB databases for every local HTML file that uses them. Firefox's main concern there was people opening downloaded HTML files and having them do unexpected things. However, that is a significant tradeoff of usability in Firefox for any set of local web pages that work together as a local app (and there are Firefox Bugzilla discussions at length on that). There are some complex nuances currently in Firefox I don't fully understand about how it lumps domains together based on following internal links within file:// HTML pages versus via bookmarks, perhaps a leftover from when Firefox was thinking about lumping together files in part of a directory subtree into the same security domain. Chrome's simpler policy is much easier to understand and so has a security benefit from that point of view in that at least you know what to expect and what to worry about -- although it is true that opening any downloaded HTML file in Chrome then puts all your local IndexedDB data at risk. Surprisingly, Firefox even restricts access down to the query string on files in some cases depending on how you follow a link, as I reported here (again trying to get the same local web page to work well and running into a stumbling block): https://bugzilla.mozilla.org/show_bug.cgi?id=1005634

Please take these suggestions all with a grain of salt. You might have better ideas to address the root causes, or these ideas might add too much clutter to the GUI, like you mentioned in one response on the Chrome Web Store. HTTPSB does what it does well; too much clutter or extra features like file:// support might just be a distraction and so possibly a security risk by adding more code with minimal value. Just reporting better in the GUI about what HTTPSB does for unsupported protocols might be all most users need. Anyway, just trying to help make a great project even better!

@pdfernhout
Copy link

I've thought some more about this issue, and while it is more work, and might be too cluttery, below is another approach in line with HTTPSB's mission of supporting the user deciding who to trust regarding file:// URLs. After that possibility, I included some more general thoughts on the issue of what "same-origin policy" might mean for files in an HTTPSB context for authorization -- which led to the design of the specific possible approach. There may well be other better approaches, and simplicity might also be better; these are just thoughts exploring this issue and various options as a sort of "whitepaper" on this topic.

=== One possible general approach -- nested scopes based on directories down to each file

One possible approach would be to have a series of nested scopes from file:// down through each directory (and possibly subdirectories) down to the specific file as a scope. These scopes would actually fit into the matrix approach, as the scopes would make up the initial rows of the matrix with additional rows being subsequent loaded resources from the web. For comparison, even for files in one global scope, NoScript breaks out these additional web requests separately.

For example, consider the case of my opening my single-page app file on a Chromebook, which loads RequireJS at startup from another file and then may load web resources later. Here is the current URL for that: file:///home/chronos/user/Downloads/Pointrel20140331/source/PointrelBootstrapLoader.html

The nested scopes (each withing the previous) would be:

  • file://
  • home
  • chronos
  • user
  • Downloads
  • Pointrel20140331
  • source
  • PointrelBootstrapLoader.html
  • require.js (the extra loaded file, at the same level as the previous file)

In practice, at least on a Chromebook, you might be able to assume the "file:///home/chronos/user/Downloads/" part, in which case the scopes might be displayed as these four:

  • Pointrel20140331
  • Pointrel20140331/source
  • Pointrel20140331/source/PointrelBootstrapLoader.html
  • Pointrel20140331/source/require.js

Enabling a scope would allow anything in that directory or any subdirectory. Perhaps there could be a more fine-grained option for only the directory and not the subdirectories, but in practice I don't know if it would be used and it would clutter the interface since the directories woudl have three options (disallow, allow, allow with subdirectories).

So, for my local single-page app, I would most likely choose to allow either the top level directory "Pointrel20140331" and all subdirectories. However, it would still work if I enabled "Pointrel20140331/source". So, one click there in an HTTPSB matrix view to make that green, and the app would work locally.

Regarding requested local files, If I enabled just the specific file for the page, then I'd expect the the require.js load should fail. However, I can see an argument that the request should succeed since it is made from a trusted file. That is more the Firefox model of implied trust. After the app runs, then more items would appear for web resources it tries to load, which would also fail until I enable them like NoScript does. Again though, I might see an argument the requests should succeed if I trusted the file that made them or trusted an entire directory that file was in. Somehow, I feel what I would expect though is that the requests for files or web resources would fail without being specifically authorized in some way. I would expect that authorizing a directory would only authorize loading and running JavaScript in those files, not in web resources they load. It seems to me that would be easier to implement, and less at risk of bugs from logic about following resources.

However, I do see one big complexity in this which may have been your original concern. When you authorize a web resource loaded by a file, where is that information stored? Is it stored for the file itself or for some directory along the nested scopes? I don't know what is best, but probably one answer is to attach that information to the finest level scope that applies and which has been specifically authorized (so, for HTTPSB, perhaps green as authorized vs. light green as implied authorization). In practice, probably people would most often authorize an entire directory rather than a file, and so all authorizations would apply to that directory. Still, if you download a specific file into a big pile of downloaded files, you might just authorize the specific file if you want to use it.

Presumably, people could disallow specific files or even specific directories or directorates. Although I would think that would probably never be used? Since if you don't trust the file alongside trusted files you would either delete it or move it to another directory?

Paths would need to be checked for trying to get outside the current directory using "..". I don't know what to do about symbolic links. Probably they should be followed since most likely they would be made by the user? However, what if you unzip something that makes a symbolic link somewhere private (if that is possible)?

Still, as a security challenge, if you have authorized a directory, what happens if code in another authorized directory tries to jump over to load libraries from that other directory? This may be a security risk. But, if the alternative is to just allow all "file://" pages, then the presence of some remaining risk does not necessarily make this a worse approach than allowing everything. The user is, after all, granting permissions and names explicitly to these directories and so presumably could manage this risk.

=== A larger analysis of the issues of scope and same-origin policy for files and directories

The standard security domain for web pages under some "same-origin" policy to prevent cross-site scripting attacks is to usually use scheme, hostname, and port, as in "http://www.example.com:8080". There can be subtleties about what are equivalent schemes if they nest and also what are equivalent ports if they are not explicit.

The main issue with files is that the resources are not so neatly categorization into one specific domain. So, what we are discussing here is a sort of "same origin policy" for arbitrary files and directories and the resources they load and how a specific policy could be implemented reasonably in HTTPSB. Related references:
http://en.wikipedia.org/wiki/Same-origin_policy
https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy
https://code.google.com/p/browsersec/wiki/Part2
http://www.w3.org/Security/wiki/Same_Origin_Policy

Firefox same-origin policy for files is somewhat complex (maybe too complex?) but worth at least looking at to see what could be involved. Still, I feel where Firefox goes wrong is assuming that the browser should try to carry along authorization based on how you are clicking around in the browser. Firefox needs to do that though because it does not have a facility like HTTPSB to support a user easily saying a page is allowed if it is not working, and also because of other things like trying to keep IndexedDB database somehow per-origin. So, by trying to be "automatic" and so needing to reason about how implied authorization should flow based on user actions, the Firefox rules get complicated quickly. I still like Firefox's aspirations in regard to privacy and security; I just feel their approach and implementation may be problematical for files.
https://developer.mozilla.org/en-US/docs/Same-origin_policy_for_file:_URIs

I had previously read through the Firefox Bugzilla discussions related to supporting IndexedDB and security domains, and done some thinking on my own about that. Here are the most obvious security domains for file-based HTML pages that I can think of, from the perspective of someone interest in writing HTML5/CSS3/JavaScript apps that run from local files instead of a web server and instead of being true browser-specific apps.

Options include:

  • Use a scope of the file:// level itself. As above, this is what NoScript supports. Chrome also lumps all IndexedDB references together. Based in those examples, one could argue this was good enough. It is easy to understand, and easy to implement. It has the downside though that any downloaded web page can access the IndexedDB resources and maybe other resources of any other downloaded webpage. It is common for users to download some web page for off-line reference or archiving and then open it locally later.
  • Use a scope of the specific file itself. This is the other end of the security-restriction spectrum. (Well, you could even restrict down to the query string sometimes as in Firefox, but I feel as above that is a bug.) When I reflect on this, I still feel it is OK that Chrome allows all IndexedDB database access across all downloaded files to better support local HTML apps vs. what Firefox does of database scope per local page. However, it still might be good to only allow a downloaded page to use JavaScript features like IndexedDB when you have chosen to give a specific downloaded page special authorization. For me, writing a single-page app, this seems acceptable. However, since even my single-page app loads an external library, I can wonder whether that library should be allowed by default? Somehow it almost feels like an external library should not be loadable without explicit authorization (which is more trouble to manage though). However, an app like TiddlyWiki shows what is possible without loading any external libraries. And I could bundle RequireJS inside my HTML page I suppose.
  • Use a scope of the specific directory only (and all the files in that specific directory). As some people point out on the Firefox Bugzilla discussions, there is a use case where you want to give users a set of HTML files and scripts that represent a complex application. This is instead of putting all those files up on a webserver somewhere. There might be privacy reasons for wanting the data local, or there might be practical reasons like the application is used on laptops in rural areas where there is no reliable network access or wireless access is expensive. Typically these apps may use IndexedDB, WebSQL, or localStorage to share data across these pages. When you install such an app, you know where you put it and you know you trust it. You might not trust extra resources it downloads though, but presumably these could be downloaded to another directory or a subdirectory. Whether such trust should extend to loaded web resources is debatable, as above; the most secure answer is to require additional authorization for such resources.
  • Scope of a directory and all its subdirectories. In practice, it is common to have a directory with an index page and then have subdirectories that might have more HTML files or local JavaScript libraries. So, in practice, users who want to trust a directory probably also want to trust all the subdirectories under it. Downloaded untrusted content could presumably be put in another directory outside that hierarchy.
  • Scope of a specific well-defined "application". This is what Chrome is doing with local apps (including HTTPSB as an example). You download the app from the Chrome Store (or wherever) and authorize it to do specific things. This is another solution for running single-page or multi-page local apps. The problem here is that such apps need to be written with a manifest specifically for Chrome. You can't just download a dynamic web page (like for TiddlyWiki) and expect it to work locally as a Chrome app. If only well-defined apps are supported, you can't just unzip some multi-page app and expect it to work on any browser (to the extent the browser supports it given other security restrictions like with Firefox and IndexedDB).
  • Scope of a declared "application" within the HTML pages. I don't think anyone has a standard for doing this. But in theory, pages could announce they belong to some security scope representing some app. So, if you download the different pages at different times and put them in different storage locations, they could still work together. This may sound like a security issue, and it no doubt is, but in practice this is what Chrome supports with IndexedDB, given any downloaded page can access the same database it it knows its name.
  • Scope of where the web page is downloaded from. This is another logical possibility. I don't know of any browser that tracks enough information to make this possible. However, once could imagine a browser could do this. Then you would make a security choice of whether the downloaded pages could be authorized as a group or whether they might inherit the authorizations of the site they were downloaded from. While no browser may support this, if you have the scope-by-directory-model above, you can have some of this by downloading one sites files all to the same directory and then authorizing that directory.

There may be other logical scopes. These are just the ones I can think of right now. The possible solution of hierarchical scopes I outline at the start is sort of a mix of these different ideas. Still, a lot of complexity runs the risk of introducing security holes, so I still feel it is acceptable to avoid the issue of file access and just give the user better GUI feedback about what HTTPSB supports and refer users to the Chrome JavaScript settings. But if HTTPSB is to have the best support for file:// URLs, these are all things to at least consider. And even if it was workable as I outlined (which would take further review), I don't know if in practice the GUI for the matrix popup be too cluttery if it tried to show all the nested security scopes related to a file. Somehow I feel it would look OK though. Again though, just lumping everything under "file://" is also a reasonable approach. And such an approach could be refined later.

@gorhill
Copy link
Owner Author

gorhill commented May 27, 2014

Err... I will admit that I read only the first paragraph, sorry.

I will just go with file:// mashed up into one hard-coded scope. I just checked and this is how NoScript and RequestPolicy works too. Anything more than this at this point is way overboard IMO.

@pdfernhout
Copy link

Makes sense. Please let me know if I can help test at some point.

@gorhill
Copy link
Owner Author

gorhill commented May 28, 2014

Actually, I will try to extend the solution to other non-http-like schemes: data:, opera:, etc. -- well, anything that can be seen by the webRequest API.

@gorhill
Copy link
Owner Author

gorhill commented May 31, 2014

Fixed with 0285672 (though there is no way inline javascript can be blocked for file://, maybe browser settings can address that, I didn't try, in any case it will be for the users to deal with this, HTTPSB won't play with browser's javascript settings)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants