-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add implementation of snapping mechanism #46
Conversation
Hi Dan, I see this is still in draft. Are you planning on working on this further to finish the implementation? I'm not sure if there is great value in the snapping mechanism, owing to its granular output set (especially for small epsilon), but it may be worthwhile having for use as a reference nonetheless. Note that we have since implemented defences (#47, from this paper) against the floating point vulnerability that the snapping mechanism seeks to resolve. |
Hi Naoise, Sorry for the delay, I was dealing with some hardware issues and then I was away on a wee break now that that sort of thing is possible again. While away, I noticed your paper and wanted to reach out and ask if there is still value in this PR, but you have pre-empted that question. I'm happy to finish this PR, if only as a reference implementation. I can also add a warning pointing out that a better alternative exists. On a side note: interesting work in your paper. I still need to fully familiarise myself with it, but I was wanting to look at the effect of random floating point numbers on the sampling of the Gaussian distribution as well. Implementing the snapping mechanism was a way to better understand some of the work up to this point. I was only just starting and was planning to delve deeper after I was back, so no great loss for me that you beat me to it but it's cool to know there was definitely something there. |
First draft implementation of the snapping mechanism. This addresses a vulnerability in the Laplace mechanism and its derivatives, stemming from floating-point numbers. The mechanism was proposed as a solution to this vulnerability by Ilya Mironov Paper link: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/10/lsbs.pdf
71ac493
to
0b0768b
Compare
Codecov Report
@@ Coverage Diff @@
## main #46 +/- ##
==========================================
+ Coverage 99.60% 99.61% +0.01%
==========================================
Files 33 34 +1
Lines 2515 2599 +84
==========================================
+ Hits 2505 2589 +84
Misses 10 10
Continue to review full report at Codecov.
|
6fa6b0e
to
dc8d191
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Dan,
I've made a first run-through of the code for syntax purposes and proposed a few changes. I will run through the code next week to double-check its implementation versus the Mironov paper.
Additionally, can you add some tests to check the behaviour of the special functions within the mechanism (i.e., _get_nearest_power_of_2
, _round_to_nearest_power_of_2
)?
Let me know if you have any questions. Thanks for all your hard work on this!
Hi @naoise-h, Thank you for looking over this. I have made the changes you asked for. I changed the implementation to compute the effective epsilon and use that, so the mechanism is epsilon-DP for the given epsilon. I do have a small issue with the different floating point types supported by numpy. I use the machine epsilon of the basic float type to compute the effective value of epsilon, and I cast to double type in _get_nearest_power_of_2. This works fine for Python's float type but on machines with support for 96 or 128 bit floats, where np.longdouble could be used, this might cause strange behaviour. How should I best address that possibility, if at all? Re: checking the implementation against the paper, a few points that might be of interest. Scaling to the sensitivity is not defined fully in the paper so I try to scale the inputs and bounds before applying the mechanism to make the rounding step easier to reason about. I think the scaling is consistent and makes sense, but let me know if there is anthing that I can improve. I also align the implementation with that of the LaplaceTruncated mechanism, allowing for arbitrary bounds, which I then use to compute a symmetric bound, which is then used as per the paper. The value is also scaled and offset in the same way as the bounds, and this process is reversed after the mechanism is applied. |
87de49e
to
4332ca6
Compare
Tests were failing due to math.nextafter only being introduced in Python 3.9, so it was replaced with np.nextafter.
4332ca6
to
f40e5cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a few more requested changes below, hopefully they all make sense. As for the points you raise:
- To deal with the different floating point types, we can require the input to be a float (as a check in
_check_all
), and throw an error if the input is a higher-precision float (or attempt to cast it as a float). - As for the sensitivity problem, scaling to unit sensitivity is the right way to go. As it's just a pre- and post-processing step, it won't affect the DP guarantee, so all good there.
- Would there be value in letting the user specify
bound
instead oflower
andupper
? Would that reduce complexity?
One last comment is that there are two warnings being thrown in the tests (link). Can the code causing these be fixed?
diffprivlib/mechanisms/snapping.py
Outdated
if not (isinstance(epsilon, float) or isinstance(epsilon, np.float64)): | ||
warnings.warn("The snapping mechanism expects epsilon to be a double precision floating-point number for" | ||
"precise rounding; epsilon will be cast to 64-bit float", DiffprivlibCompatibilityWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies, I meant for the float check to be on the input value, not epsilon. This can be checked in _check_all(value)
. Also, it may be best to do a quick sanity check like float(value) != value
, since it should still be possible to input an integer, etc. Something like this:
def _check_all(self, value):
super()._check_all(value)
if float(value) != value:
warnings.warn()
return True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this isn't quite ready.
The reason I'm checking epsilon
is the two methods that depend on floating-point implementation details:_get_nearest_power_of_2
and effective_epsilon
, operate on epsilon
(or values derived from it) not the input value.
The other thing I was considering is to cast all values to np.longdouble, which is system dependant, and adapt the code to work with whatever precision that provides. It would complicate _get_nearest_power_of_2
but would potentially lower the impact of the machine epsilon on the mechanism.
0cc7c62
to
5da802e
Compare
Hi @naoise-h, Thanks for all the feedback. I pushed a set of changes that should resolve the latest batch of comments.
I noticed that I did change the implementation to query the mantissa size of the floating point type the system provides, which should make things more robust and easier to adjust in the future, should there be any need for it.
It would reduce complexity slightly, but not by a huge amount as scaling to sensitivity would still need to be performed. I think having it be consistent with LaplaceTruncated is good. Of course, if a single bound is what the user wants, the mechanism can just be instantiated with |
Happy new year @naoise-h, This PR is ready to review, if you have the time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your changes, and your very valuable contribution to diffprivlib.
Thank you @naoise-h, I appreciate all your feedback on this. It's been fun! |
First draft implementation of the snapping mechanism. This addresses a vulnerability
in the Laplace mechanism and its derivatives, stemming from floating-point numbers.
The mechanism was proposed as a solution to this vulnerability by Ilya Mironov
Paper link: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/10/lsbs.pdf