Skip to content

Latest commit

 

History

History
197 lines (92 loc) · 14.6 KB

2021-08-25-minutes.md

File metadata and controls

197 lines (92 loc) · 14.6 KB

Privacy TF Minutes - 25 Aug 2021

  • Chair: Dan
  • Scribe: Jeffrey
  • Present: Dan, Jeffrey, Wendy, Don, Pete, Amy, Christine, Tess
  • Regrets: Robin

Pull Requests

Pete: Goal is to capture that there's information that I don't want to share, but isn't generally sensitive and doesn't add to identifiability.

Christine: Looking at same-site recognition, that's focused on a user who visits the same site more than once, but there's risk to users who have attributes identified within a single visit.

Jeffrey: this is an important half of #27, I've started working on the other half but don't have anything to share yet.

Don: Profiling based on negative space? A site might have a bunch of users in a community of practice, and get a subscriber list from titles ready by experts. Experts might be willing to be profiled, but that reveals that everyone else is a non-expert. Profiling is more of a spooky-action-at-a-distance problem than just "Peter is a furry".

Dan: Love the "Peter is a furry" example. It seems like a proxy for marginalized communities. For Don, how does your negative-space example contribute to protecting someone marginalized?

Don: There are in-person events that usually require air travel. If you see someone lives in Chicago but has attended a conference in Hawaii, you can guess they're not on the no-fly list. If someone's never on an attendee list for a conference that needs air travel, you can guess they're on a no-fly list.

Tess: Or you see they frequently attend Europe + Asia conferences but none in the U.S. Or omit Saudi Arabia but visit nearby countries, it reveals something about them, even though they never revealed anything directly.

Christine: In case it's causing a problem, "profiling" may not be the right word, but we do need a word. And it needs to be different from "same-site recognition". I may also be misunderstanding Don, but I see this thing as both recognizing positive and negative things. It's about what you can infer.

Don: Yes, that makes sense. This is a really good section, and it'd be good to add the negative-space example, to emphasize that it's not just about people choosing to fill in information. It can be about others choosing to reveal enough that the site can infer things about the person.

Dan: Could just add another example bullet point that calls out lack-of-information.

Jeffrey: all of the discussion is reminding me of Sherlock Holmes where the fact he can infer so many things about a person is a privacy invsation. We're expecting most people we interact with to miss a lot of things and not infer everything they can about us. We might call this inappropriate inference. It would be nice if we could write down something about the limits. It's clear you can go too far, but how does a site make decisions about how far to go? Recommendation engines are an example where a lot are harmful because they make the wrong recommendations, but the fundamental idea of a bookseller looking at the books you read and recommending similar ones is appropriate. How do we help sites draw a line between appropriate and inappropriate inference?

Christine: Thanks for bringing up that we need recommendations. The text at the moment gives a description of the category. As will all categories, it's helpful to give recommendations. Not sure what they are yet.

Dan: I saw a news article/Twitter thread recently, where people pointed out how when they used Uber, as soon as their battery declined past 10%, the price jumped. This is price discrimination, which is included in the text here. Providing access to a useful API has an unintended privacy-related consequence. You don't necessarily know you're exposing the data, and may expect that you're not exposing it.

Tess: Or if you are, you expect that it's not used to change the price. Is price discrimination always bad? The W3C charges different membership dues for different-size organizations, which is a form of price discrimination.

Dan: The W3C's price list is public.

Tess: Another example is you do UA detection, detect that they're on Mac or iOS, infer they have more money than someone on Android or ChromeOS. Will we be overstepping if we make normative statements about price discrimination?

Don: Can use location data: if someone lives close to an office supply store, offer them lower mail-order prices.

Wendy: Some of this discussion illustrates that we're not setting categorical rules here, and maybe we should be including some factors that help analyze things. Does transparency mitigate some of the concerns about unwanted profile building? Some concerns go beyond strict privacy concerns to equity and fairness.

Pete: I think it's categorically wrong to say "these are sensitive vs these are not". What matters is user agency. It's neither good nor bad to share the size of an organization, but it's bad to go through tax returns to infer this. Book recommendations are ok, but it's bad to get the information from a data broker.

Dan: the text itself is in service of highlighting principles... a battery status api is not intrinsically harmful but developers of such an api might not understand the unintended consequences, and might not understand they need to have additional privacy considerations around such an api

Christine: It's also in some cases about collecting without consent. e.g. publications that use WebRTC not for its intended purpose but to find the local IP address of people coming to a site.

Pete: Christine and I could have another conversation this week or next, but folks should also send us concrete suggestions for things to revise.

Dan: I like the "unwanted profiling" term. We already have a TAG finding on "unsanctioned tracking" which might be relevant.

Pete: I have a callout in the text saying this is related to but distinct from browser fingerprinting.

Dan: Unsanctioned tracking isn't just about fingerprinting. Christine's IP address example helped prompt the unsanction tracking finding.

Dan: Are people ok with the name?

Jeffrey: possibly hard to tell what is unwanted? Unexpected or maybe inappropriate might be easier to convince websites that they've stepped past a boundary. I don't feel particularly strongly though.

Dan: People might ask "unwanted by whom".

Pete: Don't like "inappropriate". Like "unexpected" or "unintentional".

Dan: Taking this to next week.

Jeffrey: an implementation of Christine's issue. When I was doing it I noticed that profiling is very similar to correlation and singling out is very similar to identification so I wonder if we can combine them and generalise instead of adding extra bullet points

Dan: We just talked about "unwanted profiling", so it'd be good to have a definition of "profiling"

Christine: Correlation is more focused on combining data than building profiles. It's a little different.

Christine: Would be fine with adding "singling out" to "identification", but they are different, and we might want to keep them separate. Most privacy laws focus on information about an identifiable individual. There's interest in looking at cases where a red apple can be singled out even if they can't be identified.

Jeffrey: would it make sense to broaden identification to when it doesn't tie to a physical identity? or do we need to keep it in order to tie into existing data protection laws?

Christine: Would be fine with broadening identification.

Jeffrey: we could say it's still identification even if it can't be tied ot a real world identity, even if it's just identifying someone as person 73

Christine: Our principles will hopefully go above and beyond what laws require. We're writing what we think is the right approach for privacy on the web.

Dan: So expand the statement about identification instead of adding singling-out. But we do want a separate definition of profiling if we're going to talk about unwanted profiling.

Christine: Could imagine unwanted profiling happening with just one piece of information, eg. the WebRTC local IP address.

Jeffrey: isn't the problem with IP address is that it is correlated with a location? Or allows you to correlate something else?

Christine: I see it as an input to correlation

Jeffrey: I was thinking the harm is in that second step of correlating the information and inferring extra things based on it that the user iddn't want inferred

Christine: I wonder if even just knowing that fact -- there can be one piece of information that's harmful to the user.

Dan: The low battery thing.

Jeffrey: harmful because it's correlated with the user being desperate to find a ride

Dan: not correlated, that's inferred

Jeffrey: also the case that we mostly control what web browsers do, that's the lever we have, and web browsers can intervene by preventing the piece of information from being disclosed, but we can't prevent inference. If inference is the harm, all we can remove is the information gathering

Don: And we don't know if it's the last piece of information that the person doing inference needs to learn something harmful.

Dan: Consensus seems to be to merge singling-out into identification, but to keep profiling separate from correlation.

Jeffrey: I might try to write something about the privacy harm in the inference but th elever we have is eliminating the profiling and information gathering, so we need to be looking at both

Dan: where would that text go?

Jeffrey: not sure yet

Dan: I think that's good. Maybe 2.9 is the right place?

Jeffrey: more broadly I'm noticing that we have privacy principles and behavioural levers and things browsers can enforce and they're distinct notions and in the identity section I kind of described the explicitly and I think we should be looking to do that more

Dan: Should we be highlighting things that website should enforce? e.g. Christine's point about data collection. That's not a requirement we can put on anybody, but it's still worth calling out.

Christine: We should eventually have a discussion about what guidance we're comfortable providing in that sphere, as part of this document. We're still formulating categories of privacy risks and vulnerabilities, and we eventually need to recommend what web design and browsers can do to mitigate those risks. But down at the website level is more complicated. Let's note that we need to have that conversation, but do it after we've worked on the text more.

Dan: Opening an issue: #39.

Don: This is an ongoing issue that's come up in the First Party Sets discussion in the Privacy CG. There are a lot of really complex corporate ownership structures. example.com Ireland, example.com LLC, operating companies in other locations. Putting analysis of corporate ownership into a privacy policy seems to be a lot of extra overhead. There's the concept of a "controller" under GDPR, where it's the person who's ultimately responsible for decisions made with user data, and that seems more relevant. Shouldn't just drag GDPR things into privacy principles, but it's a good place to start from.

Dan: Had a lot of discussions in the TAG about FPS, and here about companies and entities and organizations. Discussion always comes up about A.com, A.co.uk. Also FB + WhatsApp were German regulator said they're not allowed to share data even though they're the same company. Like to have more nuance than just sharing common owners. I like the phrase "legal accountability", but how does it match with anything else? Christine+Wendy?

Christine: We've had lots of discussions, and I need to go look at the text for how we're using it. At this point, I'm confused. Give us a week to think about it.

Don: WICG/first-party-sets#49 is an example scenario. In the USA there are mortgage-backed securities that are all registered under a single company so they don't have to change the registration when they trade. The more ownership gets incorporated into browser policy, the more that the usual suspects will circumvent it with complex ownership structures.

Christine: Agree that we shouldn't be using ownership. Do we use "data controller" or some other word? We need to get the concept right. Both for Don's reasons and because we need to think about the entity's responsibility/accountability from a privacy point of view.

Wendy: Another challenge is that there are many jurisdictions, and their rules conflict. Soemtimes their rules are intuitive, and at times not. We should learn from descriptions in GDPR and other relevant regulation, but we shouldn't encode them as the Way Things Should Be. Recognize that different sites will have to do different things in different places. We can recommend a best practice that's the high water mark, but we'll still see variations.

Dan: I think it would be wrong to use the GDPR's language in this case, since that's one specific implementation. Don's phrase is more generic. It's specific enough but not tied to one legal framework.

Jeffrey: I'm between accountability and control. I'm not sure which one.. accountability implies that someone is holding them accountable. Control talks about who is making decisions. I feel like control might be the better term, but not confident of that

Dan: the text does refer to common controller. The only change Don has suggested here is changing common owners to legal accountability.. Feels like a positive thing. Christine wants time to check with the rest of the text?

Christine: yes

Dan: Leaving this one for next week.

Issues

Tess: Probably need to wait for Robin.

Wendy: Nothing new since last week.

Jeffrey: concluded to distinguish pseudonymous in general and with respect to a particular entity.. Robin may have volunteered to take a stab at that

Close issues?

Dan: #25 ?

Jeffrey: There's useful text here for the Introduction, even though the Abstract has been fixed.

Dan: #22 ? This is bookmarked as an issue, but we may have discussed it. We seem to have consensus that we are referencing contextual integrity.

Christine: Want more time to look at the section about it.


Strong disagreement about whether granny smith apples are delicious.

Wendy loves Cortland apples.