-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that analytics collection is compliant with GDPR #4226
Comments
Hi @gar1t thanks for the suggestion, but I am going to have to disagree on this note. Analytics are essential to helping us develop |
If I develop an app with Gradio and deploy it to users in Europe, I must comply with privacy regulations, including the General Data Protection Regulation. The GDPR requires the presence of a privacy policy that is easily accessible to visitors of the website, if the site collects any personally identifiable data, which Google Analytics does. Could you point me to the direction to the privacy policy for the organization responsible for the Google Tag Manager id UA-156449732-1, I can't seem to find that in the documentation. Similarly, GDPR mandates that all cookies must be explicitly opt-in, unless required for the site to function. I don't see this functionality implemented either by the framework. Is there documentation somewhere, that the user must implement this themselves, or that they must set the envvar to disable the behavior. |
I see, yes I think we'll need to look into this. Reopening the issue and renaming it with this focus |
GRADIO_ANALYTICS_ENABLED
env var assumed to be True
if unset)
There are scenarios where you have to explain every outgoing network connection. There it is extremely inconvenient to have to find out if a library is using telemetry, since there may be several layers of software between what you installed and the library that phones home. I second the request to make telemetry opt-in. |
I am also shocked to discover that gradio is collecting user data without any warning. Users need to know exactly what data is being collected. For example, are the block labels or info being collected too? Or function names, parameter names, input or output names? Element IDs or class names? I need to know exactly what is being collected to make sure any sensitive user info or credit card details are not being sent to any third party. And I want to know exactly how to turn it off, otherwise you could be liable for any damage to users privacy. It did not help when I added this to the env variables: |
Yes GDPR compliance is something we still need to work on (I think we can reasonably get to it this summer). For now, let me provide an answer here:
In terms of what analytics we collect, the relevant source file is here. When a Blocks/Interface is created, we collect: data = {
"mode": self.mode,
"custom_css": self.css is not None,
"theme": self.theme.name,
"is_custom_theme": is_custom_theme,
"version": get_package_version(),
"ip_address", ip_address
} when a Blocks/Interface is launched, we collect: data = {
"launch_method": "browser" if inbrowser else "inline",
"is_google_colab": self.is_colab,
"is_sharing_on": self.share,
"share_url": self.share_url,
"enable_queue": True,
"server_name": server_name,
"server_port": server_port,
"is_space": self.space_id is not None,
"mode": self.mode,
"version": get_package_version(),
"is_kaggle": blocks.is_kaggle,
"is_sagemaker": blocks.is_sagemaker,
"using_auth": blocks.auth is not None,
"dev_mode": blocks.dev_mode,
"show_api": blocks.show_api,
"show_error": blocks.show_error,
"title": blocks.title,
"inputs": blocks.input_components
if blocks.mode == "interface"
else inputs_telemetry,
"outputs": blocks.output_components
if blocks.mode == "interface"
else outputs_telemetry,
"targets": targets_telemetry,
"blocks": blocks_telemetry,
"events": events_telemetry,
"is_wasm": wasm_utils.IS_WASM,
"ip_address", ip_address
} Here, e.g. |
Why not go the optional route and give users/developers the option to send data if they are interested in sharing it? With always-on telemetry, you get a lot of unorganized data, including test installs, abandoned projects, and other data you might not want to use to prioritize features. If you give a clear "send_usage.py" command that shows a quick preview of what's being sent, and tell people "use this and we'll prioritize what YOU want", you'll get more useful feedback. And since we're talking about developers, you can probably even get people to fill out free-form text fields with some reasons for what they use and why, and what they are missing. I understand the interest in how a project is being used, but the telemetry route always implies "we have to get it sneaky or we won't get it", even though many developers are happy to communicate with the upstream projects that provide them with the basic infrastructure they use in their projects. |
Also many downstream projects disable it for their users. Two of the most popular projects using gradio, stable-diffusion-webui and text-generation-webui, set both |
- GradioのGoogleアナリティクス設定を無効化(OFF)します。 該当のissue gradio-app/gradio#4226 ドキュメント https://www.gradio.app/docs/gradio/blocks#initialization - ブラウザにアプリ名を表示します。 タブの視認性を良くするために変更しました。
Thanks folks, we considered various different ways we could collect Google Analytics or equivalent in a way that was compliant with GDPR. Ultimately, we ended up deciding to ditch Google Analytics altogether, as part of #8263. This PR also details the telemetry that we have in place when a Gradio app is launched. My understanding is that Gradio is still not fully GDPR-compliant since we load Google Fonts, but we have a separate issue to track that here: #7968 |
Is your feature request related to a problem? Please describe.
I was surprised to see that an open source project running locally as a demo was sending analytics data to Google.
Describe the solution you'd like
enable_analytics
should be an option that defaults to do not track analytics. A user should explicitly opt-into this behavior rather than have to opt out.Additional context
The opt-in default is implemented here:
https://github.com/gradio-app/gradio/blob/v3.30.0/gradio/blocks.py#L686-L687
Apart from the unconventional use of
True
as an environment variable to signal true/yes/on (it's more common to use1
), environment variables should default to "user did not specify this value".I think it's perfectly acceptable and to enable the collection of data by way of a public server (e.g. a hosted version of a gradio app) but it is not acceptable IMO to track data running locally without an explicit opt-in from the user.
The text was updated successfully, but these errors were encountered: