Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling issues with non-automated verification process #48

Open
krgovind opened this issue Aug 10, 2021 · 2 comments
Open

Scaling issues with non-automated verification process #48

krgovind opened this issue Aug 10, 2021 · 2 comments

Comments

@krgovind
Copy link
Collaborator

Opening this issue to capture feedback from TAG member, @rhiaro:


Great to see expansion on this, but I'm still concerned about how this would work in practice. I acknowledge that it's (rightfully) still very open and subject to evolution. But I'm imagining either:

* there is one list that all UAs have come to agreement on policy on, and they all pull from the same list. I see the practicality of this, but it's dependent on UAs agreeing policy (not a guarantee) and for something with such a broad scope (any site on the web can play) feels far too much centralisation.
* sites have to submit their set for approval to *every* UA, which doesn't seem realistic. Even with just mainstream UAs right now, assuming they all implement FPS, I can see site admins submitting requests to MS, Apple, Mozilla and Google, but not bothering with Opera, Samsung, let alone Vivaldi, Brave, Tor... And what about mobile browsers? Whether or not requests need to be submitted to eg. Firefox mobile separately from Firefox desktop may not be obvious to site admins.
* the middle ground: some UAs agree policy and share a list, other UAs might fork the list, others might use the list but take liberties with how they implement it (eg. customise it). Besides fragmentation issues, this becomes confusing for site owners with regards to knowing where to submit their sets for verification.

Basically I'm just struggling to imagine how the verification process to determine whether sites are legitimately in the same set can reasonably scale to the whole web with this level of centralisation, and without [automation](https://github.com/privacycg/first-party-sets/issues/43).

_Originally posted by @rhiaro in https://github.com/privacycg/first-party-sets/pull/45#discussion_r656032925_
@krgovind
Copy link
Collaborator Author

krgovind commented Oct 1, 2021

Hi @rhiaro - I'd like to bring your attention to this expanded proposal for the UA policy that we published since your feedback on PR #45; and I hope that it provides answers to your questions.

  • there is one list that all UAs have come to agreement on policy on, and they all pull from the same list. I see the practicality of this, but it's dependent on UAs agreeing policy (not a guarantee) and for something with such a broad scope (any site on the web can play) feels far too much centralisation.

Our proposal is indeed that all UAs come to agreement on the policy, and pull from the same list (see Responsibilities of the User Agent for more). The policy is inspired from prior art in the ecosystem, such as the definition of "party" in the DNT specification which was published as a W3C Candidate Recommendation in 2016; so we are hopeful that we can at least come to rough consensus on the principles, and then work through smaller/specific objections as they come up.

You also pointed out concerns about centralization, and scalability.

Regarding the centralization concern, I was curious if you think that the challenges here are substantially different from the lists that are used elsewhere in the web platform, such as the Public Suffix List, or HSTS Preload list? Or perhaps the feedback is based on issues you've seen with those lists? The "Responsibilities of the User Agent" section that I cited above intends to address some of the existing challenges with those lists.

Regarding the scalability concern, we hope that our expanded UA policy proposal alleviates this. Specifically, if you look at the section on the responsibilities of the enforcement entity, we are proposing that the entity doesn't have to check every single request manually, but only perform random spot checks. Technical consistency checks will be indeed be performed on each request however; and to the extent possible, we would like to automate as many checks as possible.

If I understand correctly, I believe that the TAG's preferences are around a technical-only mechanism. In fact, that is exactly where we started with the original version of this proposal. However, we eventually introduced the UA policy to address concerns around potential for abuse, which was pointed out to us by other browsers in #6 and #7. A technical-only mechanism was also contrasted with the analogous Disconnect Entities list that is currently used by Firefox and Edge in their default Tracking Protection modes. This list is curated based on a policy that is documented here, and you will see some similarities around how tracking is defined as (emphasis mine) "the collection of data regarding a particular user's or device's activity across multiple websites or applications that aren’t owned by the data collector, and the retention, use or sharing of that data.".

We hope to strike a balance between scalability, and abuse-resistance by having acceptances primarily based on self-attestations and technical checks; along with supplemental accountability measures such as a publicly auditable log, random spot checks, and a mechanism for users and civil society to report potentially invalid or policy-violating sets. We think that the public self-attestations will play an important role in deterring abuse, because as footnote#1 in this section points out, "[Public] Misrepresentations about an entity's ownership/control of a site that lead to the collection of user data outside of the First Party Sets policy would be enforceable in the same way that misrepresentations or misleading statements in privacy policies are."

dmarti added a commit to dmarti/first-party-sets that referenced this issue Jan 13, 2022
Add IEE role in surveys of users to check that they understand
common identity.

(It would be impractical to leave this to the browser and site
author, especially in cases where the browser and site author
have a business relationship that would be influenced by FPS
validity or invalidity.)

Refs WICG#43 WICG#48 WICG#64 WICG#76
@krgovind
Copy link
Collaborator Author

@rhiaro - I think my previous answer is due for an update. We significantly updated this proposal based on ecosystem feedback (summary in #92).

  • Regarding the concern about whether different UAs can come to an agreement on sharing the same set. Note that the update now requires the use of either requestStorageAccess or requestStorageAccessForOrigin.
    • The former is already shipping in a few major browsers, and relies on user-agent-defined heuristics/allowlists/other logic to determine how the browser mediates this request for access to unpartitioned cookies. We propose that the FPS list be incorporated into this user-agent-defined logic. We anticipate user agents that don't support FPS will fall back on their own set of heuristics/allowlists.
    • requestStorageAccessForOrigin allows for similar user-agent-defined logic, and we are currently incubating that proposal within the PrivacyCG where we hope to align with non-FPS supporting browsers on how to avoid Prompt Spam where First Party Sets are not Used privacycg/requestStorageAccessFor#2.
    • Note that we intend to make the list available for use by any user agents that intend to implement FPS; so user agents may also choose to use FPS to augment their user-agent-defined logic; such as improving user prompts by displaying FPS-declared information.
  • Regarding scaling to the web, we have further eliminated the need for spot checks by completely automating the process of validating sets. These detailed Submission Guidelines explain the process of submitting a set. We rely on a combination of technical checks (e.g. verify administrative control over sites by looking for .well-known files, check that a given list of sites are ccTLD variants of each by comparing against the known list of ccTLDs on the Public Suffix List, placing a numeric limit of 3 domains for the associative subset, etc.)

Please let us know if you see any unresolved issues with the latest version of the proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant