-
Notifications
You must be signed in to change notification settings - Fork 83
Support file://
protocol
#303
Comments
Thanks for creating an issue on this about adding full file:// support in response to my question about HTTPSB on the Chrome Web Store entitled "Whitelisting file:// URL after disabling JavaScript" (no direct URL to the question I could find unfortunately). In thinking more on this, here are two minor changes to the GUI short of full support for file:// that would have helped with my confusion. For file:// or any other protocol which HTTPSB does not currently handle, it might be good if the popup HTTPSB matrix window could have said that the specific protocol was not handled. That window might then also include some text suggesting looking at the global JavaScript settings to control that behavior. As it is now, the matrix popup says "No net traffic seen for this tab so far." I understand how that might be correct from HTTPSB's point of view. Still, that led me to believe that HTTPSB might start doing things for the page if only I could get the whitelisting set up correctly (when global JavaScript was turned off). The matrix popup also continues to say that after the file:// page has loaded remote resources, which is again confusing. Such a change seems easier to implement to me than full file:// support, and it might also help avoid confusion with any other protocol if all unsupported protocols said the same thing. For whitelisting under "Ubiquitous rules" in the HTTPSB dashboard settings, it says: "One rule per line. A rule can be a plain hostname, or an Adblock Plus-compatible filter. Lines prefixed with ‘#’ will be ignored." However, it was not clear to me from this whether the "@@" required for AdBlock Plus whitelistings was required (or maybe optional) in specifying a rule in that specific text area. This is because otherwise such rules for blacklisting and whitelisting are in the same file for AdBlock Plus, whereas in this case for HTTPSB the whitelist rules are in a special area. So I was trying to add rules with and without the "@@" which made testing twice as hard in a case where something did not work as expected as a new user. It was also not clear to me when an asterisk for a wildcard like in AdBlock Plus would be required given the first part of that help text suggesting I could just put in a hostname (so, two perhaps conflicting ways of entering rules). Also, it was not clear to me if I needed to include the file:// protocol in this case, so I tried it with and without as well. Since there was no host name for file://, I also tried a leading asterisk and parts of the file name as well as the full filename. I tried leading pipe "|" characters. I also tried with and without things at the end, like partial paths with asterisk and also $script and $document at the end. My point is that there are a large number of combinations possible in making such rules when they don't work as expected in this case for file:// support. So, here is a suggestion (perhaps needs a new issue) to reduce some of this ambiguity for new users and to keep them from wandering off into the AdBlock Plus help forums. :-) It might be useful have a link on the settings page near those blacklist and whitelist text areas either to an external web page or to an internal popup help box which includes some examples and explains what they do. Just a few examples of what works (and maybe what doesn't, like adding "file://" or "ftp://" or whatever) might go a long way to helping avoid confusion on this. Alternatively, at the very least, the short help text on the page could explain about whether "@@" is required for whitelist rules. Also, if you do add file:// support, you might mention there how the whitelist and blacklist work for it in linked help on the settings. Still, if was clear from the GUI that file:// was not supported as in the previous suggestion, then I would not have been trying all these variations with whitelisting, and in that case, the current advice on that page might be good enough with just a little experimentation. Regarding lumping all file:// requests together as a common scope, that makes sense to me under Chrome, given Chrome seems to use a common file:// origin as well for things like IndexedDB (where two different local html files both can access the same local data). So it would fit my expectations there. This is in contrast to Firefox which has a much more restrictive approach and almost tries to have every specific file be its own security domain and creates different IndexedDB databases for every local HTML file that uses them. Firefox's main concern there was people opening downloaded HTML files and having them do unexpected things. However, that is a significant tradeoff of usability in Firefox for any set of local web pages that work together as a local app (and there are Firefox Bugzilla discussions at length on that). There are some complex nuances currently in Firefox I don't fully understand about how it lumps domains together based on following internal links within file:// HTML pages versus via bookmarks, perhaps a leftover from when Firefox was thinking about lumping together files in part of a directory subtree into the same security domain. Chrome's simpler policy is much easier to understand and so has a security benefit from that point of view in that at least you know what to expect and what to worry about -- although it is true that opening any downloaded HTML file in Chrome then puts all your local IndexedDB data at risk. Surprisingly, Firefox even restricts access down to the query string on files in some cases depending on how you follow a link, as I reported here (again trying to get the same local web page to work well and running into a stumbling block): https://bugzilla.mozilla.org/show_bug.cgi?id=1005634 Please take these suggestions all with a grain of salt. You might have better ideas to address the root causes, or these ideas might add too much clutter to the GUI, like you mentioned in one response on the Chrome Web Store. HTTPSB does what it does well; too much clutter or extra features like file:// support might just be a distraction and so possibly a security risk by adding more code with minimal value. Just reporting better in the GUI about what HTTPSB does for unsupported protocols might be all most users need. Anyway, just trying to help make a great project even better! |
I've thought some more about this issue, and while it is more work, and might be too cluttery, below is another approach in line with HTTPSB's mission of supporting the user deciding who to trust regarding file:// URLs. After that possibility, I included some more general thoughts on the issue of what "same-origin policy" might mean for files in an HTTPSB context for authorization -- which led to the design of the specific possible approach. There may well be other better approaches, and simplicity might also be better; these are just thoughts exploring this issue and various options as a sort of "whitepaper" on this topic. === One possible general approach -- nested scopes based on directories down to each file One possible approach would be to have a series of nested scopes from file:// down through each directory (and possibly subdirectories) down to the specific file as a scope. These scopes would actually fit into the matrix approach, as the scopes would make up the initial rows of the matrix with additional rows being subsequent loaded resources from the web. For comparison, even for files in one global scope, NoScript breaks out these additional web requests separately. For example, consider the case of my opening my single-page app file on a Chromebook, which loads RequireJS at startup from another file and then may load web resources later. Here is the current URL for that: file:///home/chronos/user/Downloads/Pointrel20140331/source/PointrelBootstrapLoader.html The nested scopes (each withing the previous) would be:
In practice, at least on a Chromebook, you might be able to assume the "file:///home/chronos/user/Downloads/" part, in which case the scopes might be displayed as these four:
Enabling a scope would allow anything in that directory or any subdirectory. Perhaps there could be a more fine-grained option for only the directory and not the subdirectories, but in practice I don't know if it would be used and it would clutter the interface since the directories woudl have three options (disallow, allow, allow with subdirectories). So, for my local single-page app, I would most likely choose to allow either the top level directory "Pointrel20140331" and all subdirectories. However, it would still work if I enabled "Pointrel20140331/source". So, one click there in an HTTPSB matrix view to make that green, and the app would work locally. Regarding requested local files, If I enabled just the specific file for the page, then I'd expect the the require.js load should fail. However, I can see an argument that the request should succeed since it is made from a trusted file. That is more the Firefox model of implied trust. After the app runs, then more items would appear for web resources it tries to load, which would also fail until I enable them like NoScript does. Again though, I might see an argument the requests should succeed if I trusted the file that made them or trusted an entire directory that file was in. Somehow, I feel what I would expect though is that the requests for files or web resources would fail without being specifically authorized in some way. I would expect that authorizing a directory would only authorize loading and running JavaScript in those files, not in web resources they load. It seems to me that would be easier to implement, and less at risk of bugs from logic about following resources. However, I do see one big complexity in this which may have been your original concern. When you authorize a web resource loaded by a file, where is that information stored? Is it stored for the file itself or for some directory along the nested scopes? I don't know what is best, but probably one answer is to attach that information to the finest level scope that applies and which has been specifically authorized (so, for HTTPSB, perhaps green as authorized vs. light green as implied authorization). In practice, probably people would most often authorize an entire directory rather than a file, and so all authorizations would apply to that directory. Still, if you download a specific file into a big pile of downloaded files, you might just authorize the specific file if you want to use it. Presumably, people could disallow specific files or even specific directories or directorates. Although I would think that would probably never be used? Since if you don't trust the file alongside trusted files you would either delete it or move it to another directory? Paths would need to be checked for trying to get outside the current directory using "..". I don't know what to do about symbolic links. Probably they should be followed since most likely they would be made by the user? However, what if you unzip something that makes a symbolic link somewhere private (if that is possible)? Still, as a security challenge, if you have authorized a directory, what happens if code in another authorized directory tries to jump over to load libraries from that other directory? This may be a security risk. But, if the alternative is to just allow all "file://" pages, then the presence of some remaining risk does not necessarily make this a worse approach than allowing everything. The user is, after all, granting permissions and names explicitly to these directories and so presumably could manage this risk. === A larger analysis of the issues of scope and same-origin policy for files and directories The standard security domain for web pages under some "same-origin" policy to prevent cross-site scripting attacks is to usually use scheme, hostname, and port, as in "http://www.example.com:8080". There can be subtleties about what are equivalent schemes if they nest and also what are equivalent ports if they are not explicit. The main issue with files is that the resources are not so neatly categorization into one specific domain. So, what we are discussing here is a sort of "same origin policy" for arbitrary files and directories and the resources they load and how a specific policy could be implemented reasonably in HTTPSB. Related references: Firefox same-origin policy for files is somewhat complex (maybe too complex?) but worth at least looking at to see what could be involved. Still, I feel where Firefox goes wrong is assuming that the browser should try to carry along authorization based on how you are clicking around in the browser. Firefox needs to do that though because it does not have a facility like HTTPSB to support a user easily saying a page is allowed if it is not working, and also because of other things like trying to keep IndexedDB database somehow per-origin. So, by trying to be "automatic" and so needing to reason about how implied authorization should flow based on user actions, the Firefox rules get complicated quickly. I still like Firefox's aspirations in regard to privacy and security; I just feel their approach and implementation may be problematical for files. I had previously read through the Firefox Bugzilla discussions related to supporting IndexedDB and security domains, and done some thinking on my own about that. Here are the most obvious security domains for file-based HTML pages that I can think of, from the perspective of someone interest in writing HTML5/CSS3/JavaScript apps that run from local files instead of a web server and instead of being true browser-specific apps. Options include:
There may be other logical scopes. These are just the ones I can think of right now. The possible solution of hierarchical scopes I outline at the start is sort of a mix of these different ideas. Still, a lot of complexity runs the risk of introducing security holes, so I still feel it is acceptable to avoid the issue of file access and just give the user better GUI feedback about what HTTPSB supports and refer users to the Chrome JavaScript settings. But if HTTPSB is to have the best support for file:// URLs, these are all things to at least consider. And even if it was workable as I outlined (which would take further review), I don't know if in practice the GUI for the matrix popup be too cluttery if it tried to show all the nested security scopes related to a file. Somehow I feel it would look OK though. Again though, just lumping everything under "file://" is also a reasonable approach. And such an approach could be refined later. |
Err... I will admit that I read only the first paragraph, sorry. I will just go with |
Makes sense. Please let me know if I can help test at some point. |
Actually, I will try to extend the solution to other non-http-like schemes: |
Fixed with 0285672 (though there is no way inline javascript can be blocked for |
Currently HTTPSB supports only the
http://
orhttps://
protocols for the matrix.file://
is ignored, andhttp
resources pulled from afile://
-based page are redirected through thechromium-behind-the-scene
scope.So investigate all the details in supporting the
file://
protocol (from which no hostname can be extracted, thus no scoping, etc.). Main idea is to mash up allfile://
-based requests into a specific built-in scope, just like all orphan requests are mashed up into thechromium-behind-the-scene
matrix.The text was updated successfully, but these errors were encountered: