Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: RFC: Create type SecureString #24738

Closed
wants to merge 2 commits into from
Closed

WIP: RFC: Create type SecureString #24738

wants to merge 2 commits into from

Conversation

omus
Copy link
Member

@omus omus commented Nov 24, 2017

Part of the problem in #24731 is that when working with structures which try to securely wipe themselves when finalized you end up wiping the underlying data once the first struct has been finalized. In an ideal world we would only securely wipe the data once the last reference has been removed.

SecureString is a new type which allows other structures to reference secure data which will only be securely wiped once the SecureString is no longer referenced or explicitly wiped. The new string type also allows me to stop using deepcopy to work around unwanted securezero! calls which avoids having to duplicating sensitive information.

@omus omus added security System security concerns and vulnerabilities strings "Strings!" labels Nov 24, 2017
@omus
Copy link
Member Author

omus commented Nov 24, 2017

Note that this PR also addresses the same issue that was addressed in #24731. The solution proposed here is definitely more developer friendly and contains less gotchas when developing new code around secure strings.


function SecureString(str::AbstractString)
s = new(str)
finalizer(securezero!, s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand a design based on finalizers, since finalizers may take a long time to call. Don't you need to zero the data as soon as it falls out of scope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best practise would be to securely zero the data as soon it is no longer in use. The finalizer is used mostly as a fail-safe to ensure that at the very least the data is zeroed when the the instance is garbage collected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make sure to mention this in the docstring for SecureString

```
"""
mutable struct SecureString <: AbstractString
string::String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector{UInt8}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need to switch this to use non-immutable memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nice thing I have noticed about using String is that when doing SecureString("password") the string literal memory will also be wiped out when running securezero!(::SecureString).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is also true when using Vector{UInt8} internally as vector can access the memory of the original String.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is also true when using Vector{UInt8} internally as vector can access the memory of the original String

Yes, but (1) you're not supposed to use such a vector to mutate a string, (2) we plan to fix this to least make it much harder to mutate a string via a vector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wiping string literals is definitely not a good idea. It basically breaks your program. If secure data is put in a string literal there's nothing we can do about it.

@omus omus changed the title Create type SecureString WIP: Create type SecureString Nov 24, 2017
@omus
Copy link
Member Author

omus commented Nov 24, 2017

I don't understand a design based on finalizers, since finalizers may take a long time to call. Don't you need to zero the data as soon as it falls out of scope?

After giving this some more thought I think there are two approaches we can take when dealing with secure data:

  1. Explicitly copy and wipe data: whenever we allocate secure memory we explicitly wipe the data soon as would fall out of scope. If multiple data structures need to reference the same secure data they would each need to have duplicate copies of the secure data to ensure there copy is not wiped out accidentally.
  2. Rely on finalizers: secure memory is only wiped when there are no references to the secure memory and the finalizer is called. If multiple data structures need to reference the same secure data they can all reference the same secure memory without duplication.

The main issue with explicitly copying and wiping data is that it is error prone it is easy to forget to wipe the data. Additionally making duplicate copies of secure data seems like a bad solution as it increases the chances for exposure.

Disadvantages of relying on finalizers is the finalizers may take a long time to call. This could be mitigated with some kind of reference counting approach.

@omus
Copy link
Member Author

omus commented Nov 24, 2017

My end goal here is to find an approach where we can have secure string data without having it be harder to use than any other String type.

@stevengj
Copy link
Member

I just think that if you're worried about secure data being in memory for too long, then a finalizer-based approach is inherently insufficient. See also #11207.

@omus
Copy link
Member Author

omus commented Dec 6, 2017

I did some experimentation with trying to implement reference count and some nice results with using convert(::Type{SecureString}, s::SecureString) = ... but unfortunately without a way to count dereferences this methodology is dead without lower level support.

@omus
Copy link
Member Author

omus commented Dec 6, 2017

The current implementation is not dissimilar to just using String with securezero! but the new type introduces three advantages:

  1. Container types that use SecureString do not need to implement their own finalizer to ensure that sensitive data is wiped. Additionally, keeping the shredding finalizer in one place means that we don't have to worry about garbage collection accidentally wiping data early (Fix LibGit2 securezero! issue #24731).
  2. Internally the zeroed memory is overwriting data in a mutable Vector{UInt8} container instead of overwriting immutable string data. (Since passed in Strings are still currently zeroed this point isn't very strong)
  3. Using the SecureString type in another container clearly indicates that the container will hold secure information and you may need to use deepcopy to maintain your own copy.

I've also introduced a new exported function shred! (named after the command line program) for SecureString rather than use securezero! as the new function name is clear and state any implementation details in the function name.

@omus omus changed the title WIP: Create type SecureString RFC: Create type SecureString Dec 6, 2017
@stevengj
Copy link
Member

stevengj commented Dec 6, 2017

Again, I'm skeptical that anyone who needs this functionality should be relying on finalizers.

@omus
Copy link
Member Author

omus commented Dec 6, 2017

Again, I'm skeptical that anyone who needs this functionality should be relying on finalizers.

The finalizer is just used as a failsafe. Best practise is to use shred! when the sensitive data is no longer required.

@StefanKarpinski
Copy link
Member

The name SecureString is a bit vague – it makes one think that it's extra careful to prevent buffer overflows or something. How about calling this SecretString since it's for holding secrets?

@omus
Copy link
Member Author

omus commented Dec 6, 2017

The name SecureString is used in C#. I'm okay with SecretString but I think ConfidentialString might be a better name if we're moving in that direction

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 6, 2017

I can't tell from a quick perusal if this is the case, but SecretString/SecureString probably shouldn't have any convert methods to other string types so that the chance of accidentally copying the data somewhere else is minimized.

I do like the approach here: having a separate type allows us to limit the behaviors that one can do with this type – such as conversion and copying. Just using String and relying on the user not to copy and remember to securezero! a secret can't give nearly as strong guarantees. With a type you can be sure that the only things that are done with a string are creating it, passing it around in places that are designed to hold secrets and treated accordingly, and that ultimately it is cleaned up – ideally explicitly, but one way or the other.

To that end, I wonder if this shouldn't be a completely opaque data type instead of a subtype of AbstractString – after all any usage of a secret in a place that's not explicitly designed to handle secret data carefully is potentially a leak. So the type would be something like this:

struct Secret
    data::Vector{UInt8}
end

It would have no methods except constructors and shred!. If you want the data, you have to reach in and grab it explicitly. Methods for safely getting secret data like prompting the user and saving the result straight into a Secret object or reading the contents of a file straight into one could be provided as well.

@omus
Copy link
Member Author

omus commented Dec 6, 2017

It would have no methods except constructors and shred!

Overall I like this idea. I think we may want a few additional methods like write(io::IO, ::Secret). I'll experiment with this later and see how it works out in practise.

@StefanKarpinski
Copy link
Member

I think that doing write(io, secret.data) explicitly might be better – that way it's explicit everywhere code might be using / exposing secret data.

@yurivish
Copy link
Contributor

yurivish commented Dec 7, 2017

Given the importance of properly handling confidential data, another idea might be to require the user to explicitly shred the secret when they're done with it, and to complain if one is ever left un-shredded.

From this perspective, cleaning up the memory "at some future point" with a finalizer feels like a convenience that could mask a bug – better to throw an error instead.

(This also feels like a place where Rust-like ownership or Rc semantics would help clarify who owns the data and/or to deterministically ensure that it gets dropped. Maybe in Julia 3.0!)

@StefanKarpinski
Copy link
Member

I like that idea but I'm not sure if we can actually raise an error in a finalizer. Which task gets the error in that case? I also vaguely recall that printing in a finalizer is a problem.

@nalimilan
Copy link
Member

Would it make sense/be possible to use mlock to ensure the secret data is never swapped to disk?

@stevengj
Copy link
Member

stevengj commented Dec 7, 2017

Currently, exceptions caught during finalization print something to stderr rather than throwing "normally" to the caller.

@StefanKarpinski
Copy link
Member

Printing a warning seems like it might be good. Of course, if only the GC has a reference to a secret then it's pretty safe – after all, malicious code doesn't have a reference to it either. Sure, proactively "shredding" the data is better, but there's nothing that wrong with GC shredding it either.

@omus
Copy link
Member Author

omus commented Dec 11, 2017

My current approach is to print a warning if the data has not been shredded and then proceed to shred the data.

@StefanKarpinski
Copy link
Member

That sounds like the best option.

@StefanKarpinski
Copy link
Member

Sorry, didn't mean to close and reopen.

@omus
Copy link
Member Author

omus commented Dec 13, 2017

I've done a bunch of work on this but I'm not sure I can have it ready for the feature freeze. The good news is that while implementing this I've uncovered and fixed some issues I'll make some PRs for yet.

@omus
Copy link
Member Author

omus commented Dec 13, 2017

How would people feel if at the very least I try to get the rename of securezero! to shred! in before the freeze?

@StefanKarpinski
Copy link
Member

This is not a public API, right? If it's not public, you can always change it after the feature freeze.

@Keno
Copy link
Member

Keno commented May 30, 2018

Bump on this, it would be nice to stop all the CI failures due to the runtime incorrectly overwriting immutable memory.

@omus
Copy link
Member Author

omus commented May 31, 2018

Did most of the rebase. Will hopefully push something for tomorrow morning.

omus added 2 commits May 31, 2018 12:55
Originally used `isequal` to deal with `Nullable`
@omus omus changed the title RFC: Create type SecureString WIP: RFC: Create type SecureString May 31, 2018
@omus omus force-pushed the cv/securestring branch from eb79d97 to 3818ace Compare May 31, 2018 17:57
@omus
Copy link
Member Author

omus commented May 31, 2018

Finished the rebase. SecureString itself needs a bunch of work and currently doesn't have the limited method set as suggested by @StefanKarpinski.

I've tested the LibGit2 usage of SecureString which is currently working but some additional effort will be needed to properly secure the SecureString implementation. I'll try to do this work yet but my time is limited so if others want to help out I'm all for it.

@Keno Keno added this to the 0.7 milestone Jun 1, 2018
return s
end

isshredded(s::SecureString) = sum(s.data) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want all(s.data .== 0); as there's a (minute) possibility that this can overflow to precisely zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather all(iszero, s.data).

@staticfloat
Copy link
Member

I like the direction of this PR, but I think we basically want to get rid of lines 38-52 in strings/secure.jl.

With regards to the "right API" for dealing with these objects, I don't think having ss.data be the right way to get data out of a SecureString object is what we want; I think it would be better to have a read(::SecureString) function that returns a String instead, (named so as to mirror the write(::SecureString, ::String) method we use to put data within it. In either case, what we don't want is for users to be able to just generically treat a SecureString as if it were a String; we want the ergonomics to explicitly require the user to deal with the fact that it's a SecureString, so as long as the functions defined don't overlap with String methods, we should be good to go.

@mbauman
Copy link
Member

mbauman commented Jun 4, 2018

I'm in agreement with the calls for read and write APIs. In fact, I think this would be much easier to use safely as an IO object — perhaps SecureBuffer a la IOBuffer? I've already started work on transitioning the APIs; I'll have a full report tomorrow about what this could look like.

@omus
Copy link
Member Author

omus commented Jun 20, 2018

Succeeded by: #27565

@omus omus closed this Jun 20, 2018
@omus omus deleted the cv/securestring branch June 20, 2018 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security System security concerns and vulnerabilities strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants