Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit the threat model from #422 #424

Closed
wants to merge 12 commits into from
205 changes: 205 additions & 0 deletions anti-tracking-threat-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# Anti-tracking Threat Model

WebKit and other browser engines are trying to reduce websites' ability to track
users across the web. We don't want Web Packaging to make this effort any more
difficult.

This document contains a proposed threat model for the best-case outcome of that anti-tracking effort. We do not yet claim there is consensus about any of:

1. Whether the [attacker capabilities](#attacker-capabilities) are plausible.
1. Whether it will eventually be possible to frustrate the [attacker goals that
we want to frustrate](#attacker-goals-that-we-want-to-frustrate).
1. Whether the mitigations actually mitigate the attacks.
1. Whether the costs of the mitigations are worth the benefits.
1. Probably other things.

Nonetheless, we feel it's important to start with a concrete proposal in mind
when discussing how to evolve both this threat model and the Web Packaging
proposals.

One argument to avoid restricting web packaging is, “you can track users in so
many ways so why care about this one?” For example, any link can convey a
user-id hidden in the URL. Because of the effort mentioned above to reduce
tracking abilities, this document assumes that browser engines will succeed in
adding all the limits and restrictions on existing technologies necessary to
eliminate tracking through them. It then analyzes what restrictions on web
packaging are necessary to prevent it from undoing that progress.

The success of new web technologies, including signed packages, relies on better
security and privacy guarantees than what we've had in the past. We want
progression in this space, not the status quo.

## The Actors

* **The user.** This is the human who relies on the user agent to protect their
privacy.
* **The user agent.** This is the web browser that tries to protect the user's
privacy.
* **Distributor.** An entity that delivers a signed package to either the user or another distributor.
* **AdTech or `adtech.example`.** This is a distributor that the user also
engages with as a first-party site and that has a financial interest in
1. knowing what the user does on other websites to augment its rich profile
of the user and
2. individual targeting of ads, based on its rich profile of the user.
* **Publisher.** An entity that owns a domain and publishes content there. The
publisher may not actually author the content, but they put their name on it.
* **News or `news.example`.** This is a publisher for a news website which
wants its articles to be served as signed packages with the user agent's URL
bar showing `news.example`.

## Use Cases

See https://wicg.github.io/webpackage/draft-yasskin-webpackage-use-cases.html
for use cases. It's possible some mitigations will sacrifice some use cases, but
those mitigations should call out the use cases they break.

## Attacker Capabilities

1. AdTech has significant first-party traffic which means most users have an
`adtech.example` cookie holding a unique ID, even in browsers with
multi-keyed caches.
1. AdTech can convince News to let them create packages that get signed as
`news.example`, in a couple alternate ways.

This seems like an implausible capability at first glance, but publishers
routinely give CDNs this ability to terminate TLS traffic, and a
generally-trusted AdTech could convince publishers that it's merely doing
what a good CDN would, for cheaper.

1. News acquires a `news.example` [exchange-signing
certificate](https://wicg.github.io/webpackage/draft-yasskin-http-origin-signed-responses.html#cross-origin-cert-req)
and gives its private key to AdTech.
1. News acquires a `news.example` exchange-signing certificate and uses it to
authorize a short-lived key owned by AdTech, using a system like
[Delegated
Credentials](https://tools.ietf.org/html/draft-ietf-tls-subcerts-03).
1. News tells a CA that AdTech has permission to receive exchange-signing
certificates for `news.example`. This is the model CDNs usually use.
1. News hosts a signing service that signs packages given to News by AdTech.
This resembles the CDN "Split-TLS" model. News would expect AdTech to
fetch News's content, optimize it in some way, and send the result back to
News for signing on the fly.
1. `adtech.example` can serve a link to `news.example` and expect users to click
on it and then browse around `news.example`.
1. This link can point to a resource that redirects to `news.example`.
1. This link can point to a signed exchange or web package containing
content signed by `news.example`'s certificate.
1. AdTech can make any number of signatures with private keys it controls.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above on certificates vs private keys.


## Attacker Non-capabilities

1. AdTech cannot manipulate the DNS system on a per-user basis.
1. There is a limit to the complexity AdTech can convince News to add to its
serving infrastructure. For example, News is willing to ignore unknown query
parameters and fragments, but is not willing to ignore unexpected path
segments. TODO: Can we describe this limit any more precisely?

## Attacker goals that we want to frustrate

1. AdTech wants to augment its profile of the user while the user reads articles
on `news.example`.
1. AdTech wants to use its rich profile of the user to influence the content of
ads or articles on `news.example`.
1. More abstractly, AdTech wants to transfer its unique ID for a user to
the Javascript environment created for `news.example`.
1. AdTech doesn't want the user agent or external auditors to be able to detect
that AdTech is tracking users.

# Attacks and their Mitigations

This section lists potential ways AdTech might achieve its unwanted goals, along
with proposed or already-adopted mitigations to frustrate those goals. Not all
of the attacks use Web Packaging, so that we can explore the threat model's
implications for the rest of the web platform.

## Sign identifying information into the package

### The Attack

1. AdTech convinces News to let them create packages that get signed as
`news.example`.
1. When a user clicks a link from `adtech.example` to
`https://adtech.example/news.example.sxg`, they send identifying information
to the `adtech.example` server. This could be cookies or a user ID encoded in
the query, path, or even hostname.
1. Instead of signing `news.example`'s content directly, AdTech embeds the
user's identity in that content and signs the result on the fly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Server Timing, we have response headers that are directly readable from JS, so embedding the UID through them might be feasible even without on-the-fly signing. So we need to make sure that the distributor cannot add arbitrary non-signed headers to the internal response, as well as that the external response Server Timing headers are not exposed to the navigated page.
I believe that's already the case, tbh, but IMO it's worthwhile to explicitly note that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a security requirement that the distributor can't add response headers to the post-redirect response. https://wicg.github.io/webpackage/draft-yasskin-http-origin-signed-responses.html#signed-headers mentions that idea, although it's in a section nobody's implementing. The application/signed-exchange format just can't represent such unsigned post-redirect response headers.

I don't think Server-Timing response headers on a 303 response get exposed to the post-redirect javascript. So I'm not sure there's really anything to say here.

1. This successfully transfers the unique ID to the `news.example` JS
environment, where it can be picked up by advertising code there.

### Proposed Mitigations

#### Preflight to publisher

1. The server responding with the signed package is required to send the
signature up front. This does not prevent any attacks but increases the
user-visible latency, which AdTech experiences as a cost.
2. The user agent makes an ephemeral, cookie-less preflight request to
`news.example` to get the signature and then validates the package from
`adtech.example` against that signature.

Both the
[fully-offline](https://wicg.github.io/webpackage/draft-yasskin-webpackage-use-cases.html#fully-offline-use)
use case and [cryptographic
agility](https://wicg.github.io/webpackage/draft-yasskin-webpackage-use-cases.html#crypto-agility)
require `news.example` to be able to declare that multiple signatures are
currently valid. This requires AdTech to send all of its signatures to
`news.example` and for News to list them all in the now-quite-large preflight
response. Having enough signatures that each one identifies a user would be
detectable by clients and might be enough cost that News wouldn't be willing
to do it.
3. We add a signed time stamp to the package signature to avoid AdTech telling
News to get signatures from `adtech.example` backend and send personalized
signatures back as preflight responses. With such time stamps, the user agent
can decide to not accept signatures younger than, say one minute. For this to
work we need signed, official time.

TODO: Figure out how this blocks the attack.

#### Public signature repository

Another potential mitigation would be some kind of public repository of
signatures to check against.

This does not prevent any attacks, but could make them detectable.

#### Make package loads stateless

When requesting a signed package:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only given casual thought to how a webpage can instruct the browser to restrict a load because it is for a signed package. Two ideas: 1) a /.well-known/ location or 2) a special HTTP redirect similar to an upgrade.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to write down all the details in this document, but I'm thinking of:

  1. In https://wicg.github.io/webpackage/loading.html#mp-http-network-or-cache-fetch 20.2, add

    • httpRequest’s credentials mode is "omit""
    • httpRequest’s method is GET
    • etc.

    to the constraints on "setting response to httpRequest’s stashed exchange's response."

  2. Add an attribute to <a> tags to let them cause credential-less fetches, and similarly anywhere else we want to enable signed packages that crossorigin isn't sufficient. There may be a more ergonomic way to do this with /.well-known/ or origin policy.

If we wind up thinking this is the right mitigation for this problem, I think we'll propose and discuss the new credentials="omit" mechanism in a separate repository, since it could also be useful independent of web packaging.


1. The request to the distributor must be credential-less. i.e. the [credentials
mode](https://fetch.spec.whatwg.org/#concept-request-credentials-mode) must
be `"omit"`. This prevents AdTech from learning the user's identity from its
first-party cookies.
1. It must be an HTTP GET request to prevent, for example, a POST request body
from including a user ID.
1. The request URL on the distributor must not have a query string. Fragments
aren't sent to the server and would be blocked after the redirect, if
necessary, by anti-tracking measures that are independent of web packages.
1. The path of the package request must be the same as the path on the target
domain to prevent the distributor from encoding a user ID in the path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems like the distributor can use the referrer in order to include a UID (e.g. redirect through a unique path, and have a permissive ReferrerPolicy). Mitigation can be to enforce stricter ReferrerPolicy on redirects or in general.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm saying in #424 (comment), I think we'll need to block that communication route on all redirects, so it's not special to web packages. Do you want me to add a separate attack describing it below?


Note that this still allows AdTech to encode a user ID in the *signed* path and
inject Javascript to decode it into local variables and `pushState()` the URL
back to what the publisher's Javascript expects.

This also still allows AdTech to encode a user ID into the hostname.

## ORIGIN frame and shared connections

### The Attack

Sketch: AdTech gets a certificate that covers both `adtech.example` and
`news.example`. They associate the user's ID with the HTTP/2 connection, use the
ORIGIN frame to convince the client to make a `news.example` request over that
connection, and return a `Set-Cookie` header with the user's ID.

## AdTech subdomain

### The Attack

Sketch: AdTech convinces News to point `*.adtech.news.example` to AdTech's
servers, perhaps by offering News higher rates on ads. AdTech has users click on
a link to `userid.adtech.news.example` and returns a `Set-Cookie` header setting
a user-id cookie for all of `news.example`, followed by a redirect to the real
URL.