Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback from Mozilla #158

Closed
annevk opened this issue May 14, 2019 · 29 comments
Closed

Feedback from Mozilla #158

annevk opened this issue May 14, 2019 · 29 comments

Comments

@annevk
Copy link
Member

annevk commented May 14, 2019

We'd like to support reporting beyond CSP's report-uri in Firefox, but are uncomfortable with the design of the Reporting API as-is. Our proposal is that for v1 of the feature its functionality is strictly equivalent to that of CSP's report-uri. More specifically:

  • Remove the storage functionality and tie reporting policies to documents and workers directly. The lifetime of any reporting URL will then match that of the document or worker, equivalent to CSP. This avoids introducing new tracking vectors and adding significant complexity to v1 of this feature. Our suggestion is that Origin Policy/Manifest is flushed out further for any and all origin-wide policies, provided the tracking issues can be solved.
  • Remove the failover functionality. This isn't needed for a v1 of the API and goes beyond what we offer for equivalent APIs elsewhere.
  • Switch from JSON to Structured Headers for HTTP while we are making breaking changes. There appears to be quite broad support for making Structured Headers a thing, whereas JSON headers do not have that support.

This would enable Firefox to ship this functionality and start generating reports for various existing and new features (e.g., we've gotten feedback that reporting for Cross-Origin-Opener-Policy is rather essential).

@igrigorik
Copy link
Member

That's great news, glad you guys are investigating adding support! Would love to work together to address any questions or issues.

To start, a bit of context.. The intent and goal of Reporting is to provide shared infrastructure to different report generators. As of today, that already includes Network Error Logging, Feature Policy, CSP3, Web Package, and there is probably a few more that I'm forgetting.

With that in mind, functionality such as ability to register a report policy is a must have requirement for NEL and ditto for failover: these requirements were informed and motivated by long history of production experience with Chrome's "domain reliability" precursor. Further, I'll note that there are currently some features defined by NEL (e.g. sampling rate) that we're hearing developers asking us to move to Reporting for use cases like CSP as well — "we don't need every single violation report".

Which is to say, based on already existing integrations, I don't think we can simply drop all of that functionality on the floor. That said, perhaps we can define a path where Firefox is able to incrementally adopt a subset of said functionality? E.g. as a start, enable CSP reporting via Reporting.

@annevk
Copy link
Member Author

annevk commented May 14, 2019

From those, it seems that NEL has very domain-specific requirements. It also seems it has some kind of same-site requirement that the others do not share if I'm reading things correctly. It's not immediately clear to me how a shared abstraction makes sense if document policies work for the others?

And yeah, part of trying to move this along is unblocking whatwg/html#4622 and our Feature Policy work (though given the scope of recently discussed changes there the exact integration with the Reporting API might well change) which could both use this.

@johnwilander
Copy link

We (Apple) would like to express our support for removing the Storage part. For such a functionality to be feasible in browsers with modern privacy protections, partitioning of the storage would have to be integrated which might defeat its intended purpose.

@dcreager
Copy link
Member

dcreager commented Jun 26, 2019

I'd be in favor of letting Origin Policy take over the responsibility of maintaining the Reporting and NEL policies for an origin, using caching headers on the Origin Policy to take the place of our max_age fields. Though until it's ready, I'd want there to still be some language that allows Chrome's current behavior.

In the meantime, I think the current spec includes enough flexibility to let Mozilla and Apple implement something less persistent. For instance, the Garbage collection section currently allows UAs to drop endpoints and reports after deadlines that can be arbitrarily small. Chrome's current implementation keeps its cache purely in memory, which means that Reporting and NEL policies don't persist across browser restarts. (Though there's ongoing work to change that.)

Would you all find it acceptable to add clarifying language that (a) a user agent is free to drop an endpoint or report from the cache at any time (making it more explicitly okay to implement "not really caches"), and (b) we want to eventually migrate the storage responsibilities over to Origin Policy or something similar?

(Note that there would still need to be some kind of cache for the reports themselves, since those are batched up and sent at some point in the future, and not as a blocking or immediate part of servicing the request that generates the reports.)

@clelland
Copy link
Contributor

clelland commented Jul 5, 2019

I agree that in practice there's a need for an out-of-document cache of some sort, especially to enable things like NEL and crash reporting. The spec doesn't need to enforce that though.

I think that framing the reporting mechanism itself as best-effort, non-guaranteed delivery helps here. If agents are permitted to drop reports for any reason -- including not sending reports that may impact user privacy -- then implementers should have leeway to experiment in that space.

Are the concerns around privacy any broader than what has been spelled out in the spec? (Specifically in 11.1 and 11.5)

Could we call out in a spec note that the requirements on the reporting cache specifically allow for a per-document or per-origin cache, that it may persist only in RAM, or may be destroyed along with the document, and that its only purpose is to enable batching of reports as the user agent sees fit, and to increase reliability by allowing failed deliveries to be retried?

@annevk
Copy link
Member Author

annevk commented Jul 15, 2019

I think it's problematic if we have a shared feature that works wildly different across user agents. Given the existing market dynamics that will result in a set of bug reports against Firefox and Safari, basically (at best).

Also, when I look at the complete feature set and what we'd like the difference is quite stark. The way I view it is that a document should be able to declare zero or more "report endpoints", each with a "name" and "url".

I'd much rather support this via a new header that can use the structured value syntax (as likely recommended by the HTTP WG) than something custom.

@igrigorik
Copy link
Member

@annevk do you have any suggestions for how we could accomplish what you've described while still enabling NEL + crash reporting to exist, without having to rebuild their own version of Reporting?

@annevk
Copy link
Member Author

annevk commented Aug 8, 2019

I'm not sure why crashes could not be per-document. We have been thinking about OOM reporting at least and we don't think we need any kind of origin-wide store for that. We can use the header that came with the response that created the agent cluster that OOM'd.

For NEL I'm not sure as it requires addressing the tracking issue somehow (perhaps restrict to top-level + cookie lifetime), but I don't see why having separate registration syntax is prohibitive. It can reuse all of the other infrastructure as far as I can tell.

@igrigorik
Copy link
Member

Re, crashes: yeah, I think you're right. I'm trying to remember why I originally included that in the set, but the reason escapes me at the moment.

For NEL I'm not sure as it requires addressing the tracking issue somehow (perhaps restrict to top-level + cookie lifetime).

Hmm, interesting.

Re, cooking lifetime: there might be an interesting parallel here to where we landed for CH based on discussions at IETF 109 (see httpwg/http-extensions#878), where we opted to drop explicit lifetime in favor of implicit opt-in registration and persistence. Perhaps there is a similar (same?) argument to be made here? /cc @yoavweiss @martinthomson

Re, top-level: can you elaborate on this one a bit more?

I don't see why having separate registration syntax is prohibitive. It can reuse all of the other infrastructure as far as I can tell.

Well, the point of Reporting was to eliminate the need for different registration syntax between features, so there's that. However, I think the larger issue here is related to failover, retries, etc. In your original comment you noted that it's not necessary for v1 of API, but that is functionality that (at least when NEL was designed and as informed by "domain reliability" implementation experience in Chrome) is necessary for NEL.. I think that's the one we need to unpack and come to consensus on.

@martinthomson
Copy link
Member

I've personally come to the conclusion that lifetimes on persistence no longer need to be signaled by sites. Any expiration that the site needs to maintain can be opaque to the browser. Where there are space pressures (like cookies), the browser needs discretion on how to drop state, and for that expiration times don't really help.

@annevk
Copy link
Member Author

annevk commented Aug 21, 2019

To be clear, I don't really know how to solve NEL, it seems about as hard as Origin Policy and we haven't landed on anything concrete there thus far.

I'd love to unblock generalized per-document reporting (i.e., beyond CSP) and the general reporting format though as there are many features that could make use of that in a privacy-preserving manner.

@clelland
Copy link
Contributor

I certainly don't see a good way to accomplish the goals of NEL without going to something like Origin Policy (which would at least make all of the potential problems explicit ones)

If we were to use this spec to define a common infrastructure for more ephemeral reporting, but leave the algorithms open for use by NEL, would that unblock you?

(I'd like to keep the option open for batching and retrying out-of-band report delivery, but that's a different issue than the tracking concerns around NEL)

@annevk
Copy link
Member Author

annevk commented Aug 21, 2019

Well, not really, see #158 (comment). But also, the nature of batching and out-of-band delivery can definitely have privacy implications that CSP's model does not have.

@dcreager
Copy link
Member

But also, the nature of batching and out-of-band delivery can definitely have privacy implications that CSP's model does not have.

Can you elaborate on this part? The report batching respects origin boundaries — each report upload will only ever contain reports about a single origin. (It has to, to be able to participate in the CORS preflight check.) And Chrome's implementation currently throws away all undelivered reports whenever the network settings change, for instance, so that the delay between report generation and delivery doesn't leak any information. There's a note about that here, but we could strengthen that language to make it a requirement instead of a recommendation.

@annevk
Copy link
Member Author

annevk commented Aug 21, 2019

If you batch A nested in top-level B with top-level A you have some kind of tracking vector. If you deliver them sometime after the users closes all As you tell A something about when the user is using the browser.

@clelland
Copy link
Contributor

Can we at least batch per-document? If a single page view triggers several deprecation reports, for instance, it seems only positive for the browser to wait a minute, and then deliver them in a single HTTP request (gaining advantages from compression and fewer TCP round-trips) rather than delivering each in a single request as they occur.

Slightly better, and hopefully still unproblematic would be batching reports from the same origin within an agent cluster -- if they can script each other, or even postMessage, then they can already infer any information about the user that a batch report delivery could provide.

@annevk
Copy link
Member Author

annevk commented Aug 29, 2019

I think that's probably fine, with the latest possible delivery time matching Fetch's keepalive.

@clelland
Copy link
Contributor

As a concrete proposal for moving forward on this, we discussed the following steps at TPAC
(see rough slides here):

  1. Remove persistent storage of reporting configuration from this spec
  2. Restrict batching to just reports from same-origin documents within the same agent cluster
  3. Remove max_age and include_subdomains from Report-To configuration
  4. Use a different configuration for Network Error Logging (probably Origin Policy, but that's an issue for NEL, not Reporting)
  5. Attempt to switch to SH (see Switch reporting to Structured Headers #177)

@jpchase
Copy link
Collaborator

jpchase commented Sep 19, 2019

As a concrete proposal for moving forward on this, we discussed the following steps at TPAC
(see rough slides here):

  1. Remove persistent storage of reporting configuration from this spec
  2. Restrict batching to just reports from same-origin documents within the same agent cluster
  3. Remove max_age and include_subdomains from Report-To configuration
  4. Use a different configuration for Network Error Logging (probably Origin Policy, but that's an issue for NEL, not Reporting)

I thought the suggestion was to pull out persistent storage of configuration into a separate doc for incubation, and later decide whether to merge back into Reporting (from the TPAC notes). This sounds like it's making persistent storage a thing only for the NEL spec?

@clelland
Copy link
Contributor

No, this is definitely not making persistence NEL-only. The building blocks for persistent reporting are likely to go into a couple of different places:

  • Reporting cache (not associated with any document), max_age, and any other delivery considerations for out-of-document reports, such as more advanced batching and failover, privacy consideration for persistence, etc. will all move into a separate document in this repo (as you describe)
  • The concept of persistent origin-wide configuration (separate from reporting specifically) is the domain of origin policy.
  • NEL's configuration in particular will be described in NEL, using concepts from Reporting and from Origin Policy.
  • Other persistent reporting specs may similarly want to describe their own configuration mechanisms in their own spec, using the same building blocks that NEL uses.

clelland added a commit that referenced this issue Dec 18, 2019
This change removes the concepts which are only relevant to persistent
out-of-band reporting specs, such as Network Error Logging, and
explicitly defines reporting mechansims for documents, where the
generated reports are queued with the document itself, and thereby
inherit its lifetime. Generic reporting concepts are extracted into
another section so that they can be used by other specs.

Addresses issues raised in #158.
@clelland
Copy link
Contributor

I've split the reporting spec into two parts (#191); take a look at the pull request and feel free to comment there (or here) about the structure. Roughly, Reporting covers the core concepts, ReportingObserver, and what I've labeled "document-centered reporting", which was the original request in #158 (comment).

The other spec is currently "Network Reporting", and extends Reporting by adding back the reporting cache, and defining different cache and delivery algorithms. To keep them separate, configuration for these reports is handled exclusively through Origin Policy.

@annevk
Copy link
Member Author

annevk commented Jan 13, 2020

From #158 (comment) I see 1 and 3 in that PR (and I guess 4, did not look at that document). I don't really see batching described or 5. And wasn't failover also going to be an extension of sorts?

@clelland
Copy link
Contributor

5 was on a separate branch; I've published that as https://w3c.github.io/reporting/structured-headers/ (based on the branch at https://github.com/w3c/reporting/tree/structured-headers). I should probably merge those together, but didn't want to conflate all of the issues.

You're right about failover -- I didn't pull that out yet. I'll work on that today, to see what the end result looks like.

Restricted batching, I believe, is implied in this version by the fact that reports are owned by documents, and not pooled together across documents.

@annevk
Copy link
Member Author

annevk commented Jan 13, 2020

Ah okay, but I thought we had agreed that a limited form of batching was agreeable? But yeah, maybe it's best to build that on top of this as defining that might be somewhat involved.

@clelland
Copy link
Contributor

I raised the question of removing failover / load balancing completely in #196; the most extreme version of that would seem to remove the need for everything other than a single URL to define an endpoint, but I want to get some feedback on whether that's going to far for existing use cases.

I'd still like to be able to batch across same-origin documents, in the same agent cluster, for sure. It seems like a net efficiency win, without any real issues. Does that feel like the sort of thing that can be added on top?

@annevk
Copy link
Member Author

annevk commented Jan 13, 2020

Same origin within an agent cluster, though maybe even same origin + same top-level origin within a user agent.

@clelland
Copy link
Contributor

Double-keying like that could work as well

clelland added a commit that referenced this issue Mar 3, 2020
* Remove persistent non-document reporting from spec.

This change removes the concepts which are only relevant to persistent
out-of-band reporting specs, such as Network Error Logging, and
explicitly defines reporting mechanisms for documents, where the
generated reports are queued with the document itself, and thereby
inherit its lifetime. Generic reporting concepts are extracted into
another section so that they can be used by other specs.

This removes endpoint weight and priority from the basic reporting spec,
and creates a 'network reporting endpoint' concept for network
reporting. Endpoint groups are similarly moved to network reporting, as
there is no longer any need to group endpoints otherwise. Retries and
failover are removed from basic reporting.

Addresses issues raised in #158.
@clelland
Copy link
Contributor

clelland commented Mar 4, 2020

Structured headers is now in the main spec text as well, as of 7da9f80.

I think that addresses all of the concerns raised in this issue. I'll look at adding some text re: batching within an agent cluster in a separate issue.

@annevk
Copy link
Member Author

annevk commented Mar 6, 2020

Thanks for all your work here @clelland! Much appreciated! And indeed, let's close this and use more specific issues for further work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants