Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow access to metalink/countme fields for external DNF Count Me implementations #1068

Closed
travier opened this issue Oct 23, 2020 · 13 comments
Closed
Assignees

Comments

@travier
Copy link

travier commented Oct 23, 2020

I'm currently working on implementing DNF Count Me support in rpm-ostree.

Unfortunately, we can not use the current support from libdnf in rpm-ostree for several reasons:

  • In the common use case, rpm-ostree systems do not fetch repository metadata regularly. There is thus no repository data to use for computing the Count Me window.
  • In the common use case, we do not need to fetch any metadata at all as we will not use it on the system and just need to report to the mirror that we are a running system.
  • The current API does not gives direct access to the URL configured as metalink nor to the countme option, preventing an external implementation from reading those values using this library.

Would you be open to a PR extending the libdnf API to allow external implementations of the Count Me logic?

Thanks!

@dmnks
Copy link
Contributor

dmnks commented Nov 2, 2020

This is a valid request.

Indeed, we don't expose this functionality from libdnf, since it's been designed from the beginning in such a way that it blends nicely with the regular HTTP metadata traffic. That means, rather than ship a dedicated systemd timer to send a separate GET request just for the countme flag itself, we simply bundle the flag with the next metalink request the user (or makecache timer) would sooner or later make anyway. This makes it seamless and non-intrusive for the user. The assumption has been that most (workstation) systems would be updated sort of regularly. That's also why this feature was implemented in the DNF stack in the first place.

In fact, we do ship the makecache systemd timer with DNF in Fedora Workstation all Fedora variants, which is enabled by default and goes off once per hour, which translates into a couple of actual metalink requests per week (depending on some DNF config options and the actual uptime of the system).

That being said, the need to issue an out-of-band HTTP request just for the countme flag in specific cases seems completely reasonable. For that, I think the easiest way would be for libdnf to expose a new method for an out-of-band countme request that doesn't store the metalink on disk but just performs the HTTP request itself. You could then ship a systemd timer that calls the method at regular intervals (you could even use the "Persistent=true" option (see systemd.timer(5)) to make it "catch up" with missed events right after system start-up).

Another solution would be to just expose addCountmeFlag() directly, but that would require you as the API consumer to deal with the librepo handle. Shouldn't be difficult, but you would probably end up duplicating some of the code we already have in libdnf.

I'll have a closer look this week to see what can be done.

@dmnks dmnks self-assigned this Nov 2, 2020
@travier
Copy link
Author

travier commented Nov 3, 2020

Thanks for the detailed answer!

You could then ship a systemd timer that calls the method at regular intervals (you could even use the "Persistent=true" option (see systemd.timer(5)) to make it "catch up" with missed events right after system start-up).

Yes, this is the current plan I'm working on.

Another solution would be to just expose addCountmeFlag() directly, but that would require you as the API consumer to deal with the librepo handle. Shouldn't be difficult, but you would probably end up duplicating some of the code we already have in libdnf.

My current PoC is indeed replicating both the repo configuration parsing logic and the "Count Me" logic to figure out the window and store the timestamp. The best option for us would be something that does not need root access and either does the request itself or gives us the URL to perform the request.

@travier
Copy link
Author

travier commented Jan 14, 2021

Quick update: we merged the initial external implementation in rpm-ostree while we figure out how to proceed here: coreos/rpm-ostree#2372

I would like to add another feature/option request: asking libdnf not to perform Count Me requests explicitly, even though the countme value is set to true in repositories configs. This would be useful in our case as we already do those requests directly and don't want libdnf duplicating that when someone overlay a package on an rpm-ostree system.

@wgwoods
Copy link

wgwoods commented Jan 14, 2021

As the person currently responsible for the server-side countme data collection.. are you sure libdnf is the right place for a general client API for counting systems?

The DNF countme feature was (as noted above) specifically designed to avoid adding new or out-of-band requests. That's partly because those might constitute user tracking under GDPR etc. and require consent/opt-in from the user. Adding countme=X inline to normal requests was a good approach - we're directly measuring normal libdnf client behavior, but gathering only anonymous data, which allows us to still keep it as an on-by-default, opt-out system.

(The downside of an anonymous, opt-out system is that we don't - actually can't - identify individual users or systems. So we can't tell the difference between (e.g.) 1000 well-behaved client systems behind the same NAT gateway and one misconfigured/malicious system sending 1000 fake countme requests. We also can't get useful realtime or daily data - at best we get aggregate data once per week. It's a census, not a survey.)

The usual approach elsewhere would be to have an opt-in API/service (countme-harder? :D) which could gather richer, more accurate data about client systems. Something like that could get more detailed data about common hardware platforms, cloud/hypervisor usage, installed packages, etc. It could also provide fresher, realtime-ish data (e.g. on release day). But AFAIK we don't have anything like that in Fedora/CentOS at the moment, so it's kinda moot.

This compromise solution - just let other clients piggyback on DNF countme - kinda has the downsides of both approaches:

  1. We aren't measuring the client's normal behavior; clients send fake requests for repos they're not actually using (which we then have to account for in the statistics),
  2. we don't know if sending out-of-band requests specifically for data-collection purposes would require user consent, and
  3. at best we'll still only get fuzzy aggregate data, once per week.

...still, if everyone's OK with that, it's definitely better than nothing.

@dmnks
Copy link
Contributor

dmnks commented Jan 14, 2021

The DNF countme feature was (as noted above) specifically designed to avoid adding new or out-of-band requests. That's partly because those might constitute user tracking under GDPR etc. and require consent/opt-in from the user. Adding countme=X inline to normal requests was a good approach - we're directly measuring normal libdnf client behavior, but gathering only anonymous data, which allows us to still keep it as an on-by-default, opt-out system.

This is true, and thanks for bringing it up, Will. We have had numerous discussions about the best privacy-aware yet useful way of gathering these statistics, before settling for the final solution that involves a countme=X parameter that's just added to the existing metalink requests, one per week, where X is the "age bucket" (4 in total) which the system falls into.

In my previous comment, I was somewhat fine with the idea of exposing this functionality, but now that I got back to it after some time and having read Will's comment, I have changed my mind. I'd rather not allow other clients to send such out-of-band requests either. It would defeat the original design and our strict intention not to track and not to allow third parties to track.

The usual approach elsewhere would be to have an opt-in API/service (countme-harder? :D ) which could gather richer, more accurate data about client systems. Something like that could get more detailed data about common hardware platforms, cloud/hypervisor usage, installed packages, etc. It could also provide fresher, realtime-ish data (e.g. on release day). But AFAIK we don't have anything like that in Fedora/CentOS at the moment, so it's kinda moot.

Indeed. This is going to be off-topic, and I think I already shared this with Will a while ago, but there is one interesting way to obtain much richer statistics by using a method called Differential Privacy, more specifically a technique similar to RAPPOR (from Google) that combines Randomized response with a Bloom filter, which together allow for encoding arbitrary strings in a privacy-strong way, on the client, so that the data that leaves the client machine is only meaningful in the statistical sense.

It's basically privacy-aware "telemetry" (however bizzare that word combination may sound 😄 ) that allows for a transparent quantification of the privacy risk to end users, which AIUI can be pretty small if done right, and might be acceptable to at least a portion of the userbase. There's a lot to be researched about this topic if one is interested (the RAPPOR paper in particular is pretty dense but interesting, and there's also a well-written blog post series), but the idea behind it is pretty simple and elegant. In any case, if we ever get to anything like this (and most likely not within DNF as Will noted), it would have to be opt-in for sure.

This compromise solution - just let other clients piggyback on DNF countme - kinda has the downsides of both approaches:

1. We aren't measuring the client's normal behavior; clients send fake requests for repos they're not actually using (which we then have to account for in the statistics),

2. we don't know if sending out-of-band requests specifically for data-collection purposes would require user consent, and

3. at best we'll still only get fuzzy aggregate data, once per week.

...still, if everyone's OK with that, it's definitely better than nothing.

Could you please clarify a bit? How would the clients "piggyback"? :)

@dmnks
Copy link
Contributor

dmnks commented Jan 14, 2021

I would like to add another feature/option request: asking libdnf not to perform Count Me requests explicitly, even though the countme value is set to true in repositories configs. This would be useful in our case as we already do those requests directly and don't want libdnf duplicating that when someone overlay a package on an rpm-ostree system.

In addition to my above comment, I apologize for not responding sooner to your original request, or just paying more attention to it from the beginning. The implementation you have now looks alright, at first glance at least - while it does these "dreaded" out-of-band requests just for the purpose of countme, it does it in a randomized fashion.

One of the concerns I always had was that, by specifying which request the countme flag is included in (i.e. the first one in that particular week), that fact alone could be seen as "information leak". So I added this (perhaps silly) check that ensures that the addition of the flag is randomized over a fixed number of requests (hardcoded as 4 currently). However, with a timer, randomizing the time when it fires fulfills the same purpose (it still isn't fully random, though, since if you were using the Persistent=yes option in the timer file (which you're not it seems), systemd would execute the timer right after bootup if the previous iteration was missed - but that's just me being ridiculously pedantic 😄 ).

We were initially considering a similar timer-based approach in DNF, but that still seemed too "intrusive" in a sense, esp. when compared to just adding the flag to the regular metalink requests that would normally occur anyway. In your case, though, it's a bit different - as you noted in the beginning, there are no such regular updates normally done on an rpm-ostree system. So it seems reasonable to have such a timer there, I think.

@travier
Copy link
Author

travier commented Jan 14, 2021

Better counting and reporting (with hardware info, etc.) for Fedora CoreOS is tracked in coreos/fedora-coreos-tracker#86. We initially discussed working on that but this is a much bigger and harder topic thus I went with replicating the count me mechanism as a first step.

However, with a timer, randomizing the time when it fires fulfills the same purpose (it still isn't fully random, though, since if you were using the Persistent=yes option in the timer file (which you're not it seems), systemd would execute the timer right after bootup if the previous iteration was missed - but that's just me being ridiculously pedantic smile ).

Thanks for raising that. I will take a look at it.

In your case, though, it's a bit different - as you noted in the beginning, there are no such regular updates normally done on an rpm-ostree system. So it seems reasonable to have such a timer there, I think.

Yes, our other goal with this timer is to make is easy to disable for users that do not want to be counted. We will update the documentation and post announcements on how to disable counting before turning it on by default.

@dmnks
Copy link
Contributor

dmnks commented Jan 15, 2021

Better counting and reporting (with hardware info, etc.) for Fedora CoreOS is tracked in coreos/fedora-coreos-tracker#86.

I wasn't aware of this, thanks for sharing.

Thanks for raising that. I will take a look at it.

Generally, the question goes like this:

Could a series of weekly countme requests form a pattern that could help a bad guy fingerprint a particular system? Basically, could there be "side-channel" leakage based on the timing of those requests?

As an extreme example, if a system would send a countme request at roughly the same time every Tuesday (because the machine is set up to wake up automatically at a certain time, or because the user maintains a very strict and precise work schedule 😄 ), the fact that the countme flag is a one-in-a-week event per one system makes it much more special (less likely to occur) and thus easier to distinguish from the other normal requests logged on that server in that time period. Now, correlating a number of such hits over a longer time span (of months) is just a step away from correlating IP addresses (if they change over time) to one physical system, hence tracking it.

That said, this is probably a very far-fetched scenario, as there could be easier ways to fingerprint users, based on scarce data (e.g. a specific set of packages being regularly updated) that's already part of the URLs nowadays. Plus, there's the whole TLS layer at play (at least when fetching from Fedora repositories).

But it's not difficult to reduce the risk of such fingerprinting based on the timing of countme requests, by adding a random component to it, which is what both implementations are currently doing, so that's fine.

Also, I guess that going to such lengths in the case of countme helps communicate to the community that, indeed, we take privacy very seriously in Fedora. Especially since anything connected to "user data gathering" always raises a lot of concerns in the open-source, not to mention Fedora, community.

Yes, our other goal with this timer is to make is easy to disable for users that do not want to be counted. We will update the documentation and post announcements on how to disable counting before turning it on by default.

Sounds good!

@travier
Copy link
Author

travier commented Jan 18, 2021

Yes, our other goal with this timer is to make is easy to disable for users that do not want to be counted. We will update the documentation and post announcements on how to disable counting before turning it on by default.

Sounds good!

Which is also why we need the following:

I would like to add another feature/option request: asking libdnf not to perform Count Me requests explicitly, even though the countme value is set to true in repositories configs. This would be useful in our case as we already do those requests directly and don't want libdnf duplicating that when someone overlay a package on an rpm-ostree system.

Currently, if users disable the timer but overlay a package on top of the base image, then the count me logic will trigger and they will report their system which not ideal. The instructions to disable count me support are thus unfortunately not as simple as "disable this timer".

@travier
Copy link
Author

travier commented Jan 19, 2021

So I've read again the description for Persistent=yes and I don't think it matters much in our case where we essentially set a one week randomization delay for timer triggers: https://github.com/coreos/rpm-ostree/blob/master/src/app/rpm-ostree-countme.timer#L6-L10. Please let me know if I missed something!

@dmnks
Copy link
Contributor

dmnks commented Feb 15, 2021

Is the expectation that the system is up and running 24/7? If not, the problem is, if the system happens to be off at the next (randomly scheduled) time of the event, it won't be counted. The Persistent=yes option "fixes" that by making sure that, if the previous iteration was missed, the event fires as soon as the system is booted up again.

That's what I was referring to above, though - the time of counting is then directly related to the time the system boots/wakes up and therefore is not truly "random" anymore (you're "leaking" the fact that the system has just booted up for the first time in the given week). However, I'm not sure how big of a problem that is in reality - it might as well not be worth obsessing over :)

(Sorry for the late reply again, btw)

@travier
Copy link
Author

travier commented Feb 16, 2021

Indeed, you're right. This does not matter for Fedora CoreOS where systems usually are up 24/7 but IoT & Silverblue won't be. Will add that. Thanks!

@travier
Copy link
Author

travier commented Mar 22, 2021

Closing in favor of #1174.

@travier travier closed this as completed Mar 22, 2021
travier added a commit to travier/rpm-ostree that referenced this issue Apr 13, 2021
Make sure that we do not use the internal Count Me logic in DNF in
rpm-ostree as we have our own external implementation that is aware of
the different behavior regarding repo handling.

See also the discussions in:
  - rpm-software-management/libdnf#1174
  - rpm-software-management/libdnf#1068
  - coreos#2671
travier added a commit to travier/rpm-ostree that referenced this issue Apr 13, 2021
Make sure that we do not use the internal Count Me logic in DNF in
rpm-ostree as we have our own external implementation that is aware of
the different behavior regarding repo handling.

See also the discussions in:
  - rpm-software-management/libdnf#1174
  - rpm-software-management/libdnf#1068
  - coreos#2671
travier added a commit to travier/rpm-ostree that referenced this issue Apr 14, 2021
Make sure that we do not use the internal Count Me logic in DNF in
rpm-ostree as we have our own external implementation that is aware of
the different behavior regarding repo handling.

See also the discussions in:
  - rpm-software-management/libdnf#1174
  - rpm-software-management/libdnf#1068
  - coreos#2671

Also remove the corresponding note in the docs which not needed anymore.
cgwalters pushed a commit to coreos/rpm-ostree that referenced this issue Apr 16, 2021
Make sure that we do not use the internal Count Me logic in DNF in
rpm-ostree as we have our own external implementation that is aware of
the different behavior regarding repo handling.

See also the discussions in:
  - rpm-software-management/libdnf#1174
  - rpm-software-management/libdnf#1068
  - #2671

Also remove the corresponding note in the docs which not needed anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants