Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that analytics collection is compliant with GDPR #4226

Closed
1 task done
gar1t opened this issue May 15, 2023 · 10 comments
Closed
1 task done

Ensure that analytics collection is compliant with GDPR #4226

gar1t opened this issue May 15, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@gar1t
Copy link

gar1t commented May 15, 2023

  • I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.

I was surprised to see that an open source project running locally as a demo was sending analytics data to Google.

Describe the solution you'd like

enable_analytics should be an option that defaults to do not track analytics. A user should explicitly opt-into this behavior rather than have to opt out.

Additional context

The opt-in default is implemented here:

https://github.com/gradio-app/gradio/blob/v3.30.0/gradio/blocks.py#L686-L687

Apart from the unconventional use of True as an environment variable to signal true/yes/on (it's more common to use 1), environment variables should default to "user did not specify this value".

I think it's perfectly acceptable and to enable the collection of data by way of a public server (e.g. a hosted version of a gradio app) but it is not acceptable IMO to track data running locally without an explicit opt-in from the user.

@abidlabs
Copy link
Member

abidlabs commented May 16, 2023

Hi @gar1t thanks for the suggestion, but I am going to have to disagree on this note. Analytics are essential to helping us develop gradio and understand how gradio is being used by developers. The reason we collect analytics is because they provide the clearest signal on component/feature use, helping us prioritize issues related to commonly-used features of the library and letting us understand the implications of deprecating less-used features. As a library with over 2,000 issues that is heavily informed by the community, losing these analytics, or even switching to opt-out analytics (which tend to be more biased in practice) would be a big loss for the development team. gradio being an open-source library, the specific analytics that are collected are fully transparent. I do understand that there are certain settings in which even these collecting analytics are problematic, which is why we make it very straightforward to disable them using the environmental variable (or even by setting analytics_enabled=False -- if either this parameter is False, or the environment variable is set, we do not send any analytics).

@abidlabs abidlabs closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023
@hannesj
Copy link

hannesj commented May 16, 2023

If I develop an app with Gradio and deploy it to users in Europe, I must comply with privacy regulations, including the General Data Protection Regulation. The GDPR requires the presence of a privacy policy that is easily accessible to visitors of the website, if the site collects any personally identifiable data, which Google Analytics does. Could you point me to the direction to the privacy policy for the organization responsible for the Google Tag Manager id UA-156449732-1, I can't seem to find that in the documentation.

Similarly, GDPR mandates that all cookies must be explicitly opt-in, unless required for the site to function. I don't see this functionality implemented either by the framework. Is there documentation somewhere, that the user must implement this themselves, or that they must set the envvar to disable the behavior.

@abidlabs
Copy link
Member

abidlabs commented May 16, 2023

I see, yes I think we'll need to look into this. Reopening the issue and renaming it with this focus

@abidlabs abidlabs reopened this May 16, 2023
@abidlabs abidlabs changed the title Analytics is an opt-out scheme rather than opt-in (GRADIO_ANALYTICS_ENABLED env var assumed to be True if unset) Ensure that analytics collection is compliant with GDPR May 16, 2023
@abidlabs abidlabs added the enhancement New feature or request label May 16, 2023
@allo-
Copy link

allo- commented Mar 3, 2024

There are scenarios where you have to explain every outgoing network connection. There it is extremely inconvenient to have to find out if a library is using telemetry, since there may be several layers of software between what you installed and the library that phones home. I second the request to make telemetry opt-in.

@peeter2
Copy link

peeter2 commented Mar 13, 2024

I am also shocked to discover that gradio is collecting user data without any warning. Users need to know exactly what data is being collected. For example, are the block labels or info being collected too? Or function names, parameter names, input or output names? Element IDs or class names? I need to know exactly what is being collected to make sure any sensitive user info or credit card details are not being sent to any third party.

And I want to know exactly how to turn it off, otherwise you could be liable for any damage to users privacy.

It did not help when I added this to the env variables:
analytics_enabled=False

@akx
Copy link
Contributor

akx commented Mar 13, 2024

@peeter2 The environment variable is GRADIO_ANALYTICS_ENABLED; set it to anything but True. analytics_enabled is the kwarg you'd give to gr.Blocks().

But the point stands – analytics needs to be much more clearly documented and made GDPR compliant. On that note, @abidlabs, any news on that front?

@abidlabs
Copy link
Member

abidlabs commented Mar 13, 2024

Yes GDPR compliance is something we still need to work on (I think we can reasonably get to it this summer). For now, let me provide an answer here:

  • You can turn off analytics collection by setting analytics_enabled=False in your Blocks/Interface class,
  • or you can set the GRADIO_ANALYTICS_ENABLED env to "False" (or anything but "True" as @akx mentioned above)

In terms of what analytics we collect, the relevant source file is here.

When a Blocks/Interface is created, we collect:

            data = {
                "mode": self.mode,
                "custom_css": self.css is not None,
                "theme": self.theme.name,
                "is_custom_theme": is_custom_theme,
                "version": get_package_version(),
                "ip_address", ip_address
                
            }

when a Blocks/Interface is launched, we collect:

    data = {
        "launch_method": "browser" if inbrowser else "inline",
        "is_google_colab": self.is_colab,
        "is_sharing_on": self.share,
        "share_url": self.share_url,
        "enable_queue": True,
        "server_name": server_name,
        "server_port": server_port,
        "is_space": self.space_id is not None,
        "mode": self.mode,
        "version": get_package_version(),
        "is_kaggle": blocks.is_kaggle,
        "is_sagemaker": blocks.is_sagemaker,
        "using_auth": blocks.auth is not None,
        "dev_mode": blocks.dev_mode,
        "show_api": blocks.show_api,
        "show_error": blocks.show_error,
        "title": blocks.title,
        "inputs": blocks.input_components
        if blocks.mode == "interface"
        else inputs_telemetry,
        "outputs": blocks.output_components
        if blocks.mode == "interface"
        else outputs_telemetry,
        "targets": targets_telemetry,
        "blocks": blocks_telemetry,
        "events": events_telemetry,
        "is_wasm": wasm_utils.IS_WASM,
        "ip_address", ip_address
    }

Here, e.g. inputs_telemetry is the list of input components. I've mentioned before why we collect these analytics, but it allows us to measure usage, both of the library itself, as well as of individual components / features so that we can prioritize our efforts given that we are a relatively small team.

@allo-
Copy link

allo- commented Mar 13, 2024

Why not go the optional route and give users/developers the option to send data if they are interested in sharing it? With always-on telemetry, you get a lot of unorganized data, including test installs, abandoned projects, and other data you might not want to use to prioritize features.

If you give a clear "send_usage.py" command that shows a quick preview of what's being sent, and tell people "use this and we'll prioritize what YOU want", you'll get more useful feedback. And since we're talking about developers, you can probably even get people to fill out free-form text fields with some reasons for what they use and why, and what they are missing.

I understand the interest in how a project is being used, but the telemetry route always implies "we have to get it sneaky or we won't get it", even though many developers are happy to communicate with the upstream projects that provide them with the basic infrastructure they use in their projects.

@allo-
Copy link

allo- commented Mar 13, 2024

Also many downstream projects disable it for their users. Two of the most popular projects using gradio, stable-diffusion-webui and text-generation-webui, set both GRADIO_ANALYTICS_ENABLED=False and analytics_enabled=False by default. This means you won't get the analytics data even from users who don't care as the developers care about it. If you don't want to use dark patterns to make it harder to disable telemetry, you need other ways to get feedback from these projects.

haruharu-1105 added a commit to haruharu-1105/AMeThyst that referenced this issue May 11, 2024
- GradioのGoogleアナリティクス設定を無効化(OFF)します。
該当のissue
gradio-app/gradio#4226
ドキュメント
https://www.gradio.app/docs/gradio/blocks#initialization

- ブラウザにアプリ名を表示します。
タブの視認性を良くするために変更しました。
@abidlabs
Copy link
Member

Thanks folks, we considered various different ways we could collect Google Analytics or equivalent in a way that was compliant with GDPR. Ultimately, we ended up deciding to ditch Google Analytics altogether, as part of #8263. This PR also details the telemetry that we have in place when a Gradio app is launched.

My understanding is that Gradio is still not fully GDPR-compliant since we load Google Fonts, but we have a separate issue to track that here: #7968

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants