GDPR Meta Issue #3954

davidfischer · 2018-04-16T18:07:41Z

The GDPR comes into effect on May 25, 2018 and Read the Docs is going to use this to get our house in order. Read the Docs currently does not plan to do anything different for EU citizens than for anybody else. We want to respect user privacy as much as possible and so we're going to apply the stricter protections mandated by the GDPR to everybody.

It is unclear precisely what it means to be in compliance with the GDPR (if you are a lawyer with expertise in this subject, let us know!) but we are not going to use that as an excuse to throw up our hands and do nothing. Some of its provisions are clear enough.

The goal of this issue is to frame the discussion around the GDPR and how it applies to Read the Docs and to communicate what we are doing around data protections and privacy. This issue will be edited as more things are identified.

PERSONAL DATA

While Read the Docs tries not to collect very much personal information on users, we do collect some. Specifically, we collect at least:

Names and emails when somebody creates an account
Logged-in users can tie their account to 3rd party code hosting services like GitHub and Bitbucket
Names and emails exist in code repositories we have synced in order to build the docs. These code repositories are public. Does that affect things?
IPs in our web server log files
IPs are collected when users click on ads to combat ad fraud

We do not collect any data that is "considered sensitive" under the GDPR.

VARIOUS TASKS

These are things that I'm committing to by the May 25 deadline. This is a living list and should link to other issues where possible.

Get a privacy policy in place (No privacy policy #2602). This means mentioning all personal data we collect, when they are collected, the reasons for collection, and how long they are stored.
Ensure all user data is deleted when a user deletes their account
Limit timeframe of personally identifiable data in web server logs (see GDPR Meta Issue #3954 (comment))
Remove/anonymize/pseudo-anonymize IPs collected for advertising (see GDPR Meta Issue #3954 (comment))
Enumerate our list of partners with whom we share any data, verify we are sharing only as much as necessary, and verify their compliance
Update internal data policies

QUESTIONS

Do we have the appropriate level of consent for the data we collect?
What do we need to do to make sure we aren't inadvertently collecting data on minors?
Is our cookie policy defensible (which cookies for which reasons and for how long)? Do we need an explicit "cookie agreement" (this obviously depends on the first question here)?
Do we need a "Data Privacy Officer" or is just having an open line of communication through this public issue tracker sufficient? We have a team for privacy issues available at privacy@readthedocs.org
Can users edit and see all their personal data? Do we need a way to extract it? For example, if it is just a name and email, we probably don't need to do anything additional here. Users can control their data in their dashboard. We collect so little data that extraction means copy/pasting their name and email.

LINKS

EDIT HISTORY

2018-05-22: Added Moz's blog on GDPR and online marketing
2018-05-18: handle advertising
2018-05-18: answer a few questions
2018-05-02: note about not collecting any sensitive data
2018-04-26: changes around plan for web server logging
2018-04-18: added EFF DNT guide
2018-04-16: fix typos

davidfischer · 2018-04-26T17:46:10Z

Our current plan with respect to logs is to:

Retain only 10 days of logs which will be encrypted on rotation (days 2-10 will be encrypted). This will apply to .org as well as documentation sites. IPs and user agents will be present in the logs. 10 days is the limit of EFF's DNT policy which we are working toward from a compliance perspective.
Have a separate log for POST/PUT/DELETE/PATCH requests for .org only which will not have personally identifiable data in it (no IPs) but is retained for 90 days.

davidfischer · 2018-05-02T16:57:56Z

As of today, we are only keeping 10 days of logs.

davidfischer · 2018-05-02T20:02:59Z

Our privacy policy PR is here: #3978

davidfischer · 2018-05-02T20:25:39Z

Here's the code that governs when a user deletes their account: https://github.com/rtfd/readthedocs.org/blob/dc96c6d/readthedocs/profiles/views.py#L197-L213

It looks like this does in fact delete the user model (where the name and email is) and it does cascade to their social account connections (github/bitbucket) as well as their user profile (which doesn't have anything personal).

This does not immediately delete documentation build artifacts or version control checkouts. These are public code repositories so they are probably not very sensitive but ideally they eventually get deleted.

davidfischer · 2018-05-04T18:04:32Z

In our WIP privacy policy (#3978), I have detailed all the 3rd parties with whom we share data and what is shared.

davidfischer · 2018-05-04T18:22:29Z

Currently Read the Docs can set the following 1st party cookies with the following durations:

CSRF cookie for all users (1 year)
Login cookie for logged in users only (2 weeks)
GA cookies for all users (up to 2 years)
Stripe cookies when visiting donation/subscription pages only (1 year)

The only 3rd party cookie I could find was a session cookie set by New Relic:

New Relic session cookie on all pages (session - deleted on browser close)

The CSRF and login cookies are definitely exempt from requiring a cookie agreement based on information here. Arguably the CSRF cookie should have a shorter timeframe but that's a separate issue.

Because the New Relic cookie is a session cookie, it may be exempt.

csadorf · 2018-05-08T17:57:04Z

Do we need to obtain consent from users for each individual project in case that the project uses a Google Analytics Tracking ID?

davidfischer · 2018-05-08T18:47:30Z

@csadorf Docs authors will not need to do anything. I'm aiming to avoid a specific cookie/consent notice on docs sites and that will probably involve some changes. Any necessary changes though will be made in the Read the Docs codebase and rolled out to all docs sites automatically.

From a cookie standpoint, GA sets 1st party cookies which are not compliant currently but with some changes they may be. By default, the longest cookie lasts 2 years which is definitely unacceptable without consent. However, all of this is configurable and I should have this dialed in in the next couple weeks. GA can be run with a session cookie or even with no cookies whatsoever. In the last form, you'll lose things like the ability to differentiate new vs. returning visitors but all the rest of the data is there.

From a sharing personally identifiable data standpoint, I don't believe anything is required since Read the Docs is already instructing GA to anonymize IPs (the only personally identifiable data under GDPR shared). I do think we can do better, but from a legal standpoint, I don't believe anything is required.

Ideally, I think the solution is to proxy GA requests on Read the Docs' servers before sending to GA to anonymize data and generate a non-personal client ID in order to differentiate new vs. returning users. I think this solves the problem of sharing personal data and of the privacy complaints of visiting a docs site resulting in a request to google-analytics.com. It will result in tens of millions of extra requests though so it needs to be worked out.

We are also in the process of making people who have Do Not Track enabled not load GA whatsoever (#4046) so that might affect things as well.

csadorf · 2018-05-08T19:07:09Z

@davidfischer Thank you very much for clarifying.

davidfischer · 2018-05-18T17:58:36Z

With respect to a data protection officer, we have created a small team internally to handle privacy related things. The email is privacy@readthedocs.org.

davidfischer · 2018-05-18T22:59:49Z

With respect to advertising, when somebody clicks an ad, we store some data to prevent fraud, handle billing, and to report aggregated statistics to advertisers (more on that below). We store a user agent, an anonymized version of a user's IP address, and a client ID which will change periodically per user but will be unique for a limited period of time. We believe this is in line with the GDPR and it is acceptable from a Do Not Track perspective.

We do not share personally identifiable information with advertisers such as users' IP addresses or possibly identifying info like user agents. We do not share even the anonymized IP address. We may share aggregated data (eg. a pie chart of countries where users clicked on the ad, % mobile vs. desktop, etc.).

davidfischer · 2018-05-21T21:50:13Z

The privacy policy went live today: https://docs.readthedocs.io/en/latest/privacy-policy.html

davidfischer · 2018-05-22T18:58:59Z

Moz had a pretty good blog yesterday regarding the GDPR and online marketing. Here's a brief summary:

They believe Google Analytics is good to go as long as IP anonymization is on (RTD has it on)
Email newsletters must be opt-in. No list buying/sharing. (our newsletter is double opt-in)
The privacy policy must be in plain language (ours is pretty plain)
No vague cookie statements like "We use cookies to give you a better experience and by using this site".

davidfischer · 2018-05-31T19:08:14Z

We published our blog post on the GDPR and merged the somewhat related Do Not Track PR. As a result, I think we can close this.

If issues related to compliance arise, we are committed to addressing them as separate action items.

davidfischer self-assigned this Apr 16, 2018

davidfischer mentioned this issue Apr 20, 2018

Draft Privacy Policy #3978

Merged

davidfischer mentioned this issue May 1, 2018

Do Not Track support #4046

Merged

davidfischer mentioned this issue May 25, 2018

GDPR post readthedocs/blog#39

Merged

davidfischer closed this as completed May 31, 2018

choldgraf mentioned this issue Feb 14, 2022

remove analytics jupyter/jupyter.github.io#408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDPR Meta Issue #3954

GDPR Meta Issue #3954

davidfischer commented Apr 16, 2018 •

edited

Loading

davidfischer commented Apr 26, 2018

davidfischer commented May 2, 2018

davidfischer commented May 2, 2018

davidfischer commented May 2, 2018

davidfischer commented May 4, 2018

davidfischer commented May 4, 2018

csadorf commented May 8, 2018

davidfischer commented May 8, 2018

csadorf commented May 8, 2018

davidfischer commented May 18, 2018

davidfischer commented May 18, 2018

davidfischer commented May 21, 2018

davidfischer commented May 22, 2018

davidfischer commented May 31, 2018

GDPR Meta Issue #3954

GDPR Meta Issue #3954

Comments

davidfischer commented Apr 16, 2018 • edited Loading

PERSONAL DATA

VARIOUS TASKS

QUESTIONS

LINKS

EDIT HISTORY

davidfischer commented Apr 26, 2018

davidfischer commented May 2, 2018

davidfischer commented May 2, 2018

davidfischer commented May 2, 2018

davidfischer commented May 4, 2018

davidfischer commented May 4, 2018

csadorf commented May 8, 2018

davidfischer commented May 8, 2018

csadorf commented May 8, 2018

davidfischer commented May 18, 2018

davidfischer commented May 18, 2018

davidfischer commented May 21, 2018

davidfischer commented May 22, 2018

davidfischer commented May 31, 2018

davidfischer commented Apr 16, 2018 •

edited

Loading