Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add implementation of snapping mechanism #46

Merged
merged 13 commits into from
Jan 17, 2022

Conversation

danrr
Copy link
Contributor

@danrr danrr commented Jul 12, 2021

First draft implementation of the snapping mechanism. This addresses a vulnerability
in the Laplace mechanism and its derivatives, stemming from floating-point numbers.
The mechanism was proposed as a solution to this vulnerability by Ilya Mironov

Paper link: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/10/lsbs.pdf

@naoise-h naoise-h linked an issue Jul 15, 2021 that may be closed by this pull request
@stefano81 stefano81 requested a review from naoise-h September 22, 2021 21:24
@naoise-h
Copy link
Member

Hi Dan,

I see this is still in draft. Are you planning on working on this further to finish the implementation? I'm not sure if there is great value in the snapping mechanism, owing to its granular output set (especially for small epsilon), but it may be worthwhile having for use as a reference nonetheless.

Note that we have since implemented defences (#47, from this paper) against the floating point vulnerability that the snapping mechanism seeks to resolve.

@danrr
Copy link
Contributor Author

danrr commented Sep 28, 2021

Hi Naoise,

Sorry for the delay, I was dealing with some hardware issues and then I was away on a wee break now that that sort of thing is possible again.

While away, I noticed your paper and wanted to reach out and ask if there is still value in this PR, but you have pre-empted that question. I'm happy to finish this PR, if only as a reference implementation. I can also add a warning pointing out that a better alternative exists.

On a side note: interesting work in your paper. I still need to fully familiarise myself with it, but I was wanting to look at the effect of random floating point numbers on the sampling of the Gaussian distribution as well. Implementing the snapping mechanism was a way to better understand some of the work up to this point. I was only just starting and was planning to delve deeper after I was back, so no great loss for me that you beat me to it but it's cool to know there was definitely something there.

First draft implementation of the snapping mechanism. This addresses a vulnerability
in the Laplace mechanism and its derivatives, stemming from floating-point numbers.
The mechanism was proposed as a solution to this vulnerability by Ilya Mironov

Paper link: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/10/lsbs.pdf
@danrr danrr force-pushed the implement-snapping-mechanism branch from 71ac493 to 0b0768b Compare October 27, 2021 14:31
@danrr danrr marked this pull request as ready for review October 27, 2021 14:33
@codecov
Copy link

codecov bot commented Oct 27, 2021

Codecov Report

Merging #46 (5da802e) into main (90b319a) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #46      +/-   ##
==========================================
+ Coverage   99.60%   99.61%   +0.01%     
==========================================
  Files          33       34       +1     
  Lines        2515     2599      +84     
==========================================
+ Hits         2505     2589      +84     
  Misses         10       10              
Impacted Files Coverage Δ
diffprivlib/mechanisms/__init__.py 100.00% <100.00%> (ø)
diffprivlib/mechanisms/snapping.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90b319a...5da802e. Read the comment docs.

@danrr danrr force-pushed the implement-snapping-mechanism branch from 6fa6b0e to dc8d191 Compare October 28, 2021 09:15
Copy link
Member

@naoise-h naoise-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dan,

I've made a first run-through of the code for syntax purposes and proposed a few changes. I will run through the code next week to double-check its implementation versus the Mironov paper.

Additionally, can you add some tests to check the behaviour of the special functions within the mechanism (i.e., _get_nearest_power_of_2, _round_to_nearest_power_of_2)?

Let me know if you have any questions. Thanks for all your hard work on this!

diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
tests/mechanisms/test_Snapping.py Outdated Show resolved Hide resolved
@danrr
Copy link
Contributor Author

danrr commented Nov 14, 2021

Hi @naoise-h,

Thank you for looking over this. I have made the changes you asked for. I changed the implementation to compute the effective epsilon and use that, so the mechanism is epsilon-DP for the given epsilon. I do have a small issue with the different floating point types supported by numpy. I use the machine epsilon of the basic float type to compute the effective value of epsilon, and I cast to double type in _get_nearest_power_of_2. This works fine for Python's float type but on machines with support for 96 or 128 bit floats, where np.longdouble could be used, this might cause strange behaviour. How should I best address that possibility, if at all?

Re: checking the implementation against the paper, a few points that might be of interest. Scaling to the sensitivity is not defined fully in the paper so I try to scale the inputs and bounds before applying the mechanism to make the rounding step easier to reason about. I think the scaling is consistent and makes sense, but let me know if there is anthing that I can improve.

I also align the implementation with that of the LaplaceTruncated mechanism, allowing for arbitrary bounds, which I then use to compute a symmetric bound, which is then used as per the paper. The value is also scaled and offset in the same way as the bounds, and this process is reversed after the mechanism is applied.

@danrr danrr force-pushed the implement-snapping-mechanism branch from 87de49e to 4332ca6 Compare November 14, 2021 22:10
Tests were failing due to math.nextafter only being introduced
in Python 3.9, so it was replaced with np.nextafter.
@danrr danrr force-pushed the implement-snapping-mechanism branch from 4332ca6 to f40e5cd Compare November 16, 2021 09:29
Copy link
Member

@naoise-h naoise-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a few more requested changes below, hopefully they all make sense. As for the points you raise:

  1. To deal with the different floating point types, we can require the input to be a float (as a check in _check_all), and throw an error if the input is a higher-precision float (or attempt to cast it as a float).
  2. As for the sensitivity problem, scaling to unit sensitivity is the right way to go. As it's just a pre- and post-processing step, it won't affect the DP guarantee, so all good there.
  3. Would there be value in letting the user specify bound instead of lower and upper? Would that reduce complexity?

One last comment is that there are two warnings being thrown in the tests (link). Can the code causing these be fixed?

diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
diffprivlib/mechanisms/snapping.py Outdated Show resolved Hide resolved
tests/mechanisms/test_Snapping.py Outdated Show resolved Hide resolved
docs/modules/mechanisms.rst Outdated Show resolved Hide resolved
Comment on lines 52 to 54
if not (isinstance(epsilon, float) or isinstance(epsilon, np.float64)):
warnings.warn("The snapping mechanism expects epsilon to be a double precision floating-point number for"
"precise rounding; epsilon will be cast to 64-bit float", DiffprivlibCompatibilityWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, I meant for the float check to be on the input value, not epsilon. This can be checked in _check_all(value). Also, it may be best to do a quick sanity check like float(value) != value, since it should still be possible to input an integer, etc. Something like this:

def _check_all(self, value):
    super()._check_all(value)
    if float(value) != value:
        warnings.warn()

    return True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this isn't quite ready.

The reason I'm checking epsilon is the two methods that depend on floating-point implementation details:_get_nearest_power_of_2 and effective_epsilon, operate on epsilon (or values derived from it) not the input value.

The other thing I was considering is to cast all values to np.longdouble, which is system dependant, and adapt the code to work with whatever precision that provides. It would complicate _get_nearest_power_of_2 but would potentially lower the impact of the machine epsilon on the mechanism.

@danrr danrr force-pushed the implement-snapping-mechanism branch from 0cc7c62 to 5da802e Compare December 3, 2021 14:14
@danrr
Copy link
Contributor Author

danrr commented Dec 3, 2021

Hi @naoise-h,

Thanks for all the feedback. I pushed a set of changes that should resolve the latest batch of comments.

To deal with the different floating point types, we can require the input to be a float (as a check in _check_all), and throw an error if the input is a higher-precision float (or attempt to cast it as a float).

I noticed that _check_epsilon_delta casts epsilon to float and this would happen if it was np.doublelong, so I think it would be consistent to just work with the standard double-precision float type and not worry about triple- or quad- precision floats. This also saves me from having to re-write the bit manipulation code, as struct is not aware of higher precision floats. I removed the warning about floating point types as a consequence. The downside is a user wouldn't be able to get slightly better accuracy by using a higher-precision floats, but, if someone wants accuracy, they probably shouldn't be using the snapping mechanism in the first place.

I did change the implementation to query the mantissa size of the floating point type the system provides, which should make things more robust and easier to adjust in the future, should there be any need for it.

Would there be value in letting the user specify bound instead of lower and upper? Would that reduce complexity?

It would reduce complexity slightly, but not by a huge amount as scaling to sensitivity would still need to be performed. I think having it be consistent with LaplaceTruncated is good. Of course, if a single bound is what the user wants, the mechanism can just be instantiated with lower=-bound, upper=bound.

@danrr
Copy link
Contributor Author

danrr commented Jan 12, 2022

Happy new year @naoise-h,

This PR is ready to review, if you have the time.

Copy link
Member

@naoise-h naoise-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your changes, and your very valuable contribution to diffprivlib.

@naoise-h naoise-h changed the title Add implementation of snapping mechanism [FEA] Add implementation of snapping mechanism Jan 17, 2022
@naoise-h naoise-h merged commit e7990d2 into IBM:main Jan 17, 2022
@danrr danrr deleted the implement-snapping-mechanism branch January 17, 2022 16:28
@danrr
Copy link
Contributor Author

danrr commented Jan 17, 2022

Thank you @naoise-h, I appreciate all your feedback on this. It's been fun!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Floating-point privacy vulnerabilities
2 participants