-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/crypto/argon2: add API variants to support a buffer pool #36634
Comments
In general, that sounds like a legit request to me. The Unfortunately, we cannot just use an
that returns a closure with e.g. a Calling code would use this as following:
/cc @FiloSottile |
I'd want to manage the buffer pool in my code. Roughly:
|
@networkimprov Any specific reason for that - since that sounds like a significantly more complex API?! |
Well your argon2.New() is essentially 2 methods; I've added one more, plus a named type. And generally x.New() returns an object. There are many ways one might manage a buffer pool, not just sync.Pool; we can't lock in one scheme. Also some apps would need to expand or shrink the pool. |
There is now a
I don't see why this should be an issue. For instance
Applications that require this behavior can still achieve this by calling |
ioutil.Discard must look like any io.Writer; that it uses a sync.Pool is an implementation detail for an object fulfilling io.Writer. Our topic doesn't have a predefined contract. Re shrink/expand pool "by calling argon2.New again after invoking f ... more than n times or after t seconds." That creates another sync.Pool. Now I have busy & idle f() widgets; how do I pick an idle one? EDIT: f() can crash with OOM, unless New() allocates all pool buffers, causing a quantum leap in RAM use. I want to see where I allocate memory and where I can block. And many apps already have a pool of resources which get allocated to connections. A package like this provides a toolkit, not an engine. In my model, we can drop type Blockset:
|
Sorry, but I'm not able to understand the relations here. Why is it not acceptable to implement (a potential)
The
When re-assigning
Maybe I misunderstood the proposal but AFAICS your initial request was about a (potential) performance problem (w.r.t. memory allocation). For example a request handler that hashes a password - The fact that Argon2 is a memory-hard function eventually becomes a performance problem when that request handler becomes a "hot" path. (High-traffic) The implementation you've showed does not contain a solution for a fundamental problem:
Let's assume A implementation that uses
That looks unreasonable more complex to me than the following code - since I'm not able to see why it is necessary to tightly control the allocations to (significantly) reduce the GC preasure:
Also, you cannot avoid the a potential OOM panic in the general case (i.e. |
Is there a benchmark showing that the allocation is significant next to the password hashing work? If yes, why can't we use a global sync.Pool? It gets drained in two GC cycles, and most applications will use a consistent buffer size (and if not they will tend to grow to the max size, which is fine?). |
Basically my point. However, a global |
Couldn't we just have an array of 32 pools internally, one for each power of two and use the appropriate one transparently? (Conceptually, that is. I think we can have even fewer since people are unlikely to need 1 byte buffers, or 4 gig buffers) |
An implementation of the idea mentioned above shows no significant performance improvement compared to always allocating the buffers. However, this is an isolated benchmark and it's possible that continuously allocating larger chunks of memory may cause performance degeneration in a more complex system...
@balasanjay That sounds like a potential solution, too - with the advantage that we don't need a new API. |
That will cause an unnecessary memory spike: first pool1, then pool1 + pool2, then pool2 @FiloSottile I'm concerned about memory consumption and OOM panics, not allocation performance. For example, using 32MB memory per hash calc, 1000 concurrent logins consume 32GB. That's not very many logins.
It waits. Otherwise it could OOM. Note that sync.Pool.New is optional, and only useful if you don't care about OOM, or limit pool use by external means. |
Yes, for applications that want to shrink the pool explicitly - which I assume is an edge case + should not be a problem...
If you're not concerned about the alloc performance/GC pressure, then I don't understand why you want to block when allocating the memory for Argon2 and not just limit the amount of parallel requests - The effect will be the same AFAICS, right?! EDIT:
The |
1000 concurrent logins will always consume 32GB. All we can do is try to help with sequential logins that happen between GC cycles, but I would expect the GC to notice the memory pressure and start being pretty aggressive. Have we seen it be a problem in practice? |
It causes a spike on expand as well. Also f() needs to return an error when its pool is exhausted:
Ah; so I'd need a custom pool (e.g. via buffered channel) to avoid OOM. Since memory is the factor limiting logins, it makes sense to use a memory pool to implement a concurrency limit. There may not be much reason to enhance the API if the pool would be internal. |
Therefore, I guess: proposal declined / abandoned - since rate limiting has to be done the application itself?! |
Your API proposal is abandoned? I proposed one that's flexible and idiomatic, which can use sync.Pool for unlimited pool or a buffered channel for limited one. |
@networkimprov The only thing we can do here is help reducing the number of sequential allocations. If you want to limit the number of used memory you have to limit the amount of concurrent calls of If you want to limit the amount of memory used for agron2 to e.g 4 GB and use 32 MB per derivation then you can handle 128 requests concurrently. |
Given the time recommendation varies between 0.5s (argon2 devs) and 1s (libsodium devs), if can't immediately be started, probably should fail immediately? No waiting. |
@renthraysk can you give references and/or elaborate? Why would deferring the computation of password to hash for logins (i.e. not the original one) make the hash less secure? |
@networkimprov Was just thinking in regards to response times, what's the point of waiting up to a second to start the argon2 derivation which will take another 1s. So if don't wait then the code for limiting the number of argon2 derivations that are in flight becomes simpler. |
@networkimprov Do we agree that the application has to deal with limiting the amount of (concurrently) allocated memory - such that no argon2 API change is necessary?! If so, this proposal can be closed. |
@renthraysk you can't reject login requests just because there's a spike in concurrent ones. Is that what you suggested, or did I miss something? |
@networkimprov |
If the system has the capacity to handle another logged-in user, disallowing login seems wrong. Better to post a note on the login page: "Due to pervasive criminality on the Internet, your login may take up to 10 minutes to complete. While you're waiting, please phone your legislative representative and request that your nation begin performing customs inspections on inbound international Internet traffic." :-) |
@aead why does your benchmark in #36634 (comment) show no difference in allocations? It looks like nothing was pooled. Can you post your code? |
@networkimprov It does. It reduces the amount of allocated memory dramatically: You may refer to the number of allocations per operation. The benchmark usually does fewer allocations per operation - for instance: The difference is just one allocation per operation but that's the large 32 MB buffer - which is exactly what I would expect by using a pool. |
Could we benchmark this with concurrent argon2 ops and allocations of random size amidst them? Memory fragmentation would produce different results. I wouldn't expect better performance from a pool in this benchmark. |
Feel free to do so. At the moment it's pure speculation whether the memory allocation is significant next to the hashing. As long as there is no benchmark proofing it, I would consider this proposal as not needed b/c either out-of-scope (see app rate limiting) or no evidence for a performance problem (caused by mem allocs). |
I'll test this in my app when I have a chance. @gopherbot please add Proposal-Hold EDIT: cc @dmitshur re gopherbot failure (also tried the label in quotes) |
Proposal labels are restricted from Gopherbot. Not a bug. We don't add new API to fix performance problems that aren't measured, If you want to bring this proposal back, you need both a clear performance win |
I just helped someone who had a cgroup-constrained process OOM, so this is a second-hand experience report. Currently, limiting concurrent argon2 logins doesn't prevent creating a lot of garbage. In this case, the programmer added a mutex around the argon2 usage, but still saw process memory use climb 500 MB with a few clicks of the login button. As far as I know, there is no current mechanism for limiting Go max heap size (enforcing GC to run, instead of OOMing), and for that reason a pool of some sort sounds like a good idea to me, at least in the interim. See also: #29696 |
@tv42 any chance we could see a reproducer for that behavior? |
@networkimprov I have no source code to give, I'd have to write it from scratch as much as everyone. |
@aead your changes have quite interesting levels of improvement in bytes/OP not sure if this can't be made into an argon2 default API?
|
argon2.Key() & .IDKey() appear to thrash memory by allocating a (typically) large buffer and using it once.
https://github.com/golang/crypto/blob/master/argon2/argon2.go#L157
This might not scale well where every login on a high-traffic site calls this API to check a password.
Could API variants be added which take a config struct that optionally contains a buffer, so we can use a buffer pool?
@aead @mark-rushakoff @bradfitz
The text was updated successfully, but these errors were encountered: