Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substantial memory allocations in DefaultPasswordValidator (HIBP) #2354

Closed
6 tasks done
harnash opened this issue Apr 2, 2022 · 3 comments
Closed
6 tasks done

Substantial memory allocations in DefaultPasswordValidator (HIBP) #2354

harnash opened this issue Apr 2, 2022 · 3 comments
Labels
bug Something is not working.

Comments

@harnash
Copy link
Contributor

harnash commented Apr 2, 2022

Preflight checklist

Describe the bug

It seems that method fetch() in selfservice/strategy/password/validator.go is allocating memory without limits. Looking briefly at the code I can see that it stores cached counter for each validated password against HIBP API. This could be a problematic in the long run since all the validated password hashes are stored in the memory indefinitely causing steady memory allocation eventually exhausting all available memory.

I believe this implementation stores this values to save calls to the API which is good. However we would need to limit number of data stored in this "cache" to not "leak" memory and add unnecessary memory pressure for the host. In previous app which had similar concept of cache we used https://pkg.go.dev/github.com/hashicorp/golang-lru?utm_source=godoc and it works well. We could also think of implementing something simpler but I'm not sure if that's going to be reliable on larger set of records.

Another thing is that while this local cache is fine for local testing where there is only one instance of Kratos in production when there are more instances this cache might not be as effective since requests will be spread across multiple instances and all of those instances have to populate this cache individually.

Let me know what you think. This is not a critical bug but it should be fixed rather quickly to avoid unnecessary memory leak hunting :-)

Reproducing the bug

  1. Run Kratos instance for a substantial amount of time (24hrs+)
  2. Generate substantial traffic to Kratos endpoints (login/registration)
  3. Observe memory usage and notice it is steadily increasing over time

Relevant log output

Here is the 24hrs graph https://pasteboard.co/9Iw7qL5NHZb5.png (24+1 instances and the biggest graph shows cumulative memory usage for all instances).

Relevant configuration

No response

Version

0.8.0

On which operating system are you observing this issue?

Linux

In which environment are you deploying?

Kubernetes

Additional Context

No response

@harnash harnash added the bug Something is not working. label Apr 2, 2022
@aeneasr
Copy link
Member

aeneasr commented Apr 2, 2022

Are you using Argon2 or BCrypt for hashing?

It is however entirely possible that the cache is at fault here. It's astonishing that there's that much data being cached! Improving the cache is a good idea, we can probably use ristretto or rather https://github.com/outcaste-io/ristretto (which is maintained) as we use that lib across all projects usually :)

@harnash
Copy link
Contributor Author

harnash commented Apr 3, 2022

We are using our custom (legacy) hasher which is based on BCrypt and AES256. I did run profiler and it seems that this method is causing excessive allocations.

I can take a look at ristretto to see if that could help with the memory allocation and if it does I will create a PR with the proposed implementation.

@harnash
Copy link
Contributor Author

harnash commented Apr 8, 2022

Our initial tests on PROD looks promising. We will leave it running for a weekend and if on Monday memory usage will be reasonable we will create a PR with this changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working.
Projects
None yet
Development

No branches or pull requests

2 participants