-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The proxy can learn when a website has been visited #7
Comments
Hi @vtoubiana Thanks for the feedback, we'll look into these scenarios carefully and explore potential options. One thing I would like to clarify:
The "Privacy-preserving search result link prefetching" feature is not launched yet. Through this github repo and engagement with the community, we hope to refine and generalize the proposal. So, thank you again for playing a part by sharing such thoughtful feedback! |
Hi @KenjiBaheux , Thank you for your answer. Regarding the fact that the feature is not launched, maybe the Privacy Whitepaper should be updated to reflect that. Indeed, it mentions in the first paragraph of the document that "This document does not cover features that are still under development, such as features in the beta, dev and canary channel and active field trials, or Android apps on Chrome OS if Play Apps are enabled. " That being said, I'm glade that it was mentioned so that we can start this conversation :). Best, Vincent |
Sorry for the delay and thanks again Vincent for sharing these scenarios. Here is our thinking about how to address the underlying concern:
Let us know if you have any further concerns. |
I always thought I understood how private prefetch proxy works in Chrome. With the comment above, I'm confused again :)
Can you explain how that would help directly or indirectly with privacy? e.g., lets say google.com inserts prefetches for foo.com and bar.com (in that sequence), then if bar.com is fetched but foo.com is not, then it's a clear signal to Google that the user has visited foo.com in the past. Are you suggesting that the browser's heuristics around data usage or battery may somehow result in bar.com prefetched but not foo.com?
I think adding noise is definitely helpful, but I think it only handles a small subset of the problems. Lets say Google inserts prefetches for foo.com, bar.com, baz.com; user has cookies for foo.com and Chrome is configured to do 2 prefetches. For this example, Chrome would send credential-less prefetch for foo.com, and it would still prefetch bar.com and baz.com. The prefetch for baz.com along with information that Chrome is configured to do 2 prefetches is sufficient to reveal to Google that the user has browsed foo.com in the past even though Chrome did a noisy prefetch for foo.com. This also seems vulnerable to repeated attacks: Google can try the prefetch multiple times, and a single instance of missing prefetch alone would be sufficient to leak the user’s browsing history.
That makes sense. Would be good to still make the protocol private in the meantime. I think the feature would benefit from the PING privacy review. There are other Chrome features that are already under review there (https://lists.w3.org/Archives/Public/public-privacy/2021JanMar/thread.html). |
Are there any updates here? Did you get a chance to take an in-depth look at this? |
Sorry for the long delay! There are two distinct scenarios and it’s helpful to consider them separately. In one case, the attacker controls both the speculation rules on the website and also the network the user is on (i.e., the evil hostel scenario in the first post). In the other case, the concern is that the referring website operator and the proxy operator collude to learn the user history, with the Google scenario from the first post being a specific case. In the former, the user agent (i.e., Chrome) can’t know if the user is being exposed to the evil hostel attack, which means the user agent needs to have robust countermeasures. In our initial deployment model, we will not allow other websites to initiate private prefetches while we gauge interest from the community and iterate on a viable model. Before moving forward with opening up the proxy to third-party sites, we’ll need to address the evil hostel concern. In the latter scenario where Google controls the speculation rules and the proxy, the user agent has the benefit that it can know exactly how the data is being used. The Chrome whitepaper and privacy policy will describe how information learned by the proxy is used before launch (it takes some time for those to be published), and we will follow that description. In the meantime before those documents are published, this issue describes our initial experiment. It clarifies that no user identifier is sent on requests to the proxy, and that any information learned by the proxy is used solely to facilitate anonymous prefetching and is not linked to other information from the user’s Google account. With that said, the upcoming experiment will give us a better understanding of the bandwidth vs performance improvement tradeoff. This knowledge will help inform our discussions about mechanisms to mitigate this theoretical side-channel. The concern we’re trying to balance is that sending prefetch requests when we know we can’t use the response consumes both user and publisher bandwidth, but does not improve the user experience. |
Based on our experiments, prefetching introduced much less than 1% of additional network traffic. We are now making prefetch requests, without cookies, in all cases where Google doesn't already know if the user has visited the site before (e.g., users with Sync enabled). If the user did have a cookie but we did not send it, we will not use the prefetched resource as we can't know if it would be different if we had sent the cookie. |
We're exploring using the Chrome's private prefetch proxy (See discussion here). However, it's unclear to us if we need to update our site's privacy policy to account for Google proxy learning about user's past visits. Is there any guidance that you can provide to web developers? |
The proxy does not learn about user's past visits. The Google proxy does not learn anything that Google does not already know from serving the search results to the user. |
Hi,
My understanding is that when the browser already has cookies for a given domain, it'll block any attempt to prefetch resources from that domain. Hence, if an entity controls:
It can learn that a website has already been visited by the browser if it sees no prefetch request for the most probable prefetched candidate.
For instance on Google Search, that would most likely be the top search result. Hence, if knowing the visited search result page, the proxy sees no request for the top search result, it can learn that it has cookies stored in the browser. My understanding is that it's how search result prefetching works since chrome 87 : https://www.google.com/chrome/privacy/whitepaper.html#netpredict
An other problematic case is when an entity being able to force the browser to visit a search result page also has the capacity to monitor traffic (e.g. an hostel access point being able to "pop-under the login page" some Google search result pages and then listen to the traffic). If the entity sees no traffic to the connect proxy, it can assume that the browser blocked the request because it already had a resource for that domain.
I don't know if I'm clear enough and/or if I missed something in the way this work.
Best regards,
Vincent
The text was updated successfully, but these errors were encountered: