-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow access to metalink/countme fields for external DNF Count Me implementations #1068
Comments
This is a valid request. Indeed, we don't expose this functionality from libdnf, since it's been designed from the beginning in such a way that it blends nicely with the regular HTTP metadata traffic. That means, rather than ship a dedicated systemd timer to send a separate GET request just for the countme flag itself, we simply bundle the flag with the next metalink request the user (or makecache timer) would sooner or later make anyway. This makes it seamless and non-intrusive for the user. The assumption has been that most (workstation) systems would be updated sort of regularly. That's also why this feature was implemented in the DNF stack in the first place. In fact, we do ship the makecache systemd timer with DNF in That being said, the need to issue an out-of-band HTTP request just for the countme flag in specific cases seems completely reasonable. For that, I think the easiest way would be for libdnf to expose a new method for an out-of-band countme request that doesn't store the metalink on disk but just performs the HTTP request itself. You could then ship a systemd timer that calls the method at regular intervals (you could even use the "Persistent=true" option (see Another solution would be to just expose I'll have a closer look this week to see what can be done. |
Thanks for the detailed answer!
Yes, this is the current plan I'm working on.
My current PoC is indeed replicating both the repo configuration parsing logic and the "Count Me" logic to figure out the window and store the timestamp. The best option for us would be something that does not need root access and either does the request itself or gives us the URL to perform the request. |
Quick update: we merged the initial external implementation in rpm-ostree while we figure out how to proceed here: coreos/rpm-ostree#2372 I would like to add another feature/option request: asking libdnf not to perform Count Me requests explicitly, even though the countme value is set to true in repositories configs. This would be useful in our case as we already do those requests directly and don't want libdnf duplicating that when someone overlay a package on an rpm-ostree system. |
As the person currently responsible for the server-side The DNF (The downside of an anonymous, opt-out system is that we don't - actually can't - identify individual users or systems. So we can't tell the difference between (e.g.) 1000 well-behaved client systems behind the same NAT gateway and one misconfigured/malicious system sending 1000 fake The usual approach elsewhere would be to have an opt-in API/service ( This compromise solution - just let other clients piggyback on DNF
...still, if everyone's OK with that, it's definitely better than nothing. |
This is true, and thanks for bringing it up, Will. We have had numerous discussions about the best privacy-aware yet useful way of gathering these statistics, before settling for the final solution that involves a In my previous comment, I was somewhat fine with the idea of exposing this functionality, but now that I got back to it after some time and having read Will's comment, I have changed my mind. I'd rather not allow other clients to send such out-of-band requests either. It would defeat the original design and our strict intention not to track and not to allow third parties to track.
Indeed. This is going to be off-topic, and I think I already shared this with Will a while ago, but there is one interesting way to obtain much richer statistics by using a method called Differential Privacy, more specifically a technique similar to RAPPOR (from Google) that combines Randomized response with a Bloom filter, which together allow for encoding arbitrary strings in a privacy-strong way, on the client, so that the data that leaves the client machine is only meaningful in the statistical sense. It's basically privacy-aware "telemetry" (however bizzare that word combination may sound 😄 ) that allows for a transparent quantification of the privacy risk to end users, which AIUI can be pretty small if done right, and might be acceptable to at least a portion of the userbase. There's a lot to be researched about this topic if one is interested (the RAPPOR paper in particular is pretty dense but interesting, and there's also a well-written blog post series), but the idea behind it is pretty simple and elegant. In any case, if we ever get to anything like this (and most likely not within DNF as Will noted), it would have to be opt-in for sure.
Could you please clarify a bit? How would the clients "piggyback"? :) |
In addition to my above comment, I apologize for not responding sooner to your original request, or just paying more attention to it from the beginning. The implementation you have now looks alright, at first glance at least - while it does these "dreaded" out-of-band requests just for the purpose of countme, it does it in a randomized fashion. One of the concerns I always had was that, by specifying which request the countme flag is included in (i.e. the first one in that particular week), that fact alone could be seen as "information leak". So I added this (perhaps silly) check that ensures that the addition of the flag is randomized over a fixed number of requests (hardcoded as 4 currently). However, with a timer, randomizing the time when it fires fulfills the same purpose (it still isn't fully random, though, since if you were using the We were initially considering a similar timer-based approach in DNF, but that still seemed too "intrusive" in a sense, esp. when compared to just adding the flag to the regular metalink requests that would normally occur anyway. In your case, though, it's a bit different - as you noted in the beginning, there are no such regular updates normally done on an rpm-ostree system. So it seems reasonable to have such a timer there, I think. |
Better counting and reporting (with hardware info, etc.) for Fedora CoreOS is tracked in coreos/fedora-coreos-tracker#86. We initially discussed working on that but this is a much bigger and harder topic thus I went with replicating the count me mechanism as a first step.
Thanks for raising that. I will take a look at it.
Yes, our other goal with this timer is to make is easy to disable for users that do not want to be counted. We will update the documentation and post announcements on how to disable counting before turning it on by default. |
I wasn't aware of this, thanks for sharing.
Generally, the question goes like this: Could a series of weekly countme requests form a pattern that could help a bad guy fingerprint a particular system? Basically, could there be "side-channel" leakage based on the timing of those requests? As an extreme example, if a system would send a countme request at roughly the same time every Tuesday (because the machine is set up to wake up automatically at a certain time, or because the user maintains a very strict and precise work schedule 😄 ), the fact that the countme flag is a one-in-a-week event per one system makes it much more special (less likely to occur) and thus easier to distinguish from the other normal requests logged on that server in that time period. Now, correlating a number of such hits over a longer time span (of months) is just a step away from correlating IP addresses (if they change over time) to one physical system, hence tracking it. That said, this is probably a very far-fetched scenario, as there could be easier ways to fingerprint users, based on scarce data (e.g. a specific set of packages being regularly updated) that's already part of the URLs nowadays. Plus, there's the whole TLS layer at play (at least when fetching from Fedora repositories). But it's not difficult to reduce the risk of such fingerprinting based on the timing of countme requests, by adding a random component to it, which is what both implementations are currently doing, so that's fine. Also, I guess that going to such lengths in the case of countme helps communicate to the community that, indeed, we take privacy very seriously in Fedora. Especially since anything connected to "user data gathering" always raises a lot of concerns in the open-source, not to mention Fedora, community.
Sounds good! |
Which is also why we need the following:
Currently, if users disable the timer but overlay a package on top of the base image, then the count me logic will trigger and they will report their system which not ideal. The instructions to disable count me support are thus unfortunately not as simple as "disable this timer". |
So I've read again the description for |
Is the expectation that the system is up and running 24/7? If not, the problem is, if the system happens to be off at the next (randomly scheduled) time of the event, it won't be counted. The That's what I was referring to above, though - the time of counting is then directly related to the time the system boots/wakes up and therefore is not truly "random" anymore (you're "leaking" the fact that the system has just booted up for the first time in the given week). However, I'm not sure how big of a problem that is in reality - it might as well not be worth obsessing over :) (Sorry for the late reply again, btw) |
Indeed, you're right. This does not matter for Fedora CoreOS where systems usually are up 24/7 but IoT & Silverblue won't be. Will add that. Thanks! |
Closing in favor of #1174. |
Make sure that we do not use the internal Count Me logic in DNF in rpm-ostree as we have our own external implementation that is aware of the different behavior regarding repo handling. See also the discussions in: - rpm-software-management/libdnf#1174 - rpm-software-management/libdnf#1068 - coreos#2671
Make sure that we do not use the internal Count Me logic in DNF in rpm-ostree as we have our own external implementation that is aware of the different behavior regarding repo handling. See also the discussions in: - rpm-software-management/libdnf#1174 - rpm-software-management/libdnf#1068 - coreos#2671
Make sure that we do not use the internal Count Me logic in DNF in rpm-ostree as we have our own external implementation that is aware of the different behavior regarding repo handling. See also the discussions in: - rpm-software-management/libdnf#1174 - rpm-software-management/libdnf#1068 - coreos#2671 Also remove the corresponding note in the docs which not needed anymore.
Make sure that we do not use the internal Count Me logic in DNF in rpm-ostree as we have our own external implementation that is aware of the different behavior regarding repo handling. See also the discussions in: - rpm-software-management/libdnf#1174 - rpm-software-management/libdnf#1068 - #2671 Also remove the corresponding note in the docs which not needed anymore.
I'm currently working on implementing DNF Count Me support in rpm-ostree.
Unfortunately, we can not use the current support from libdnf in rpm-ostree for several reasons:
Would you be open to a PR extending the libdnf API to allow external implementations of the Count Me logic?
Thanks!
The text was updated successfully, but these errors were encountered: