Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Argon2 configuration best practices #572

Closed
rauanmayemir opened this issue Jul 13, 2020 · 15 comments · Fixed by #803
Closed

Document Argon2 configuration best practices #572

rauanmayemir opened this issue Jul 13, 2020 · 15 comments · Fixed by #803
Labels
corp/m4 Up for M4 at Ory Corp. good first issue A good issue to tackle when being a novice to the project. help wanted We are looking for help on this one.
Milestone

Comments

@rauanmayemir
Copy link

Describe the bug

I've been trying to set up kratos v0.4.4 with selfservice-ui-node locally in minikube. While slow, I managed to succeed with registering and verifying my identity.

However, trying to login simply hangs the service. Occasionally it makes the whole minikube unresponsive, so I have to completely shut it down and restart.

I caught the liveness status updates that gives a clue of what happened:

  Warning  Unhealthy       114s (x2 over 116s)  kubelet, minikube  Readiness probe failed: Get http://172.17.0.11:15021/healthz/ready: dial tcp 172.17.0.11:15021: connect: connection refused
  Warning  Unhealthy       113s                 kubelet, minikube  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy       19s                  kubelet, minikube  Readiness probe failed: Get http://172.17.0.11:15020/app-health/kratos/readyz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy       11s (x2 over 21s)    kubelet, minikube  Liveness probe failed: Get http://172.17.0.11:15020/app-health/kratos/livez: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy       10s                  kubelet, minikube  Readiness probe failed: HTTP probe failed with statuscode: 500

It seems like kratos pod choked on the login request and even stopped responding to health requests, so k8s just restarted the pod.

Here's what was in the kratos logs from the time when I opened the selfservice at https://auth.ips.test (this time k8s weren't able to restart the pod and simply hang):

time=2020-07-13T11:48:32Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/sessions/whoami request_id=b943ee28-e868-441c-85c6-f0d61d048d9c
time=2020-07-13T11:48:32Z level=info msg=No valid session cookie found. audience=audit error=map[debug: message:request does not have a valid authentication session reason:No active session was found in this request. status:Unauthorized status_code:401] http_request=map[headers:map[accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 cache-control:max-age=0 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:b943ee28-e868-441c-85c6-f0d61d048d9c] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55308 scheme:http] service_name=ORY Kratos service_version=v0.4.3-alpha.1
time=2020-07-13T11:48:32Z level=error msg=An error occurred while handling a request audience=application error=map[debug: message:The request could not be authorized reason:No valid session cookie found. status:Unauthorized status_code:401] http_request=map[headers:map[accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 cache-control:max-age=0 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:b943ee28-e868-441c-85c6-f0d61d048d9c] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55308 scheme:http] http_response=map[status_code:401] service_name=kratos service_version=
time=2020-07-13T11:48:32Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/sessions/whoami request_id=b943ee28-e868-441c-85c6-f0d61d048d9c status=401 text_status=Unauthorized took=24.236895ms
time=2020-07-13T11:48:32Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login request_id=682f0670-744c-489f-aae5-fadb8cf61bce
time=2020-07-13T11:48:32Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login request_id=682f0670-744c-489f-aae5-fadb8cf61bce status=302 text_status=Found took=20.368698ms
time=2020-07-13T11:48:32Z level=info msg=started handling request method=GET name=admin#http://ips-auth-kratos-admin.default.svc.cluster.local/ remote=127.0.0.1:55788 request=/self-service/browser/flows/requests/login?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=b3723016-7c19-4698-90fc-57ae736702df
time=2020-07-13T11:48:32Z level=info msg=completed handling request method=GET name=admin#http://ips-auth-kratos-admin.default.svc.cluster.local/ remote=127.0.0.1:55788 request=/self-service/browser/flows/requests/login?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=b3723016-7c19-4698-90fc-57ae736702df status=200 text_status=OK took=5.518439ms
time=2020-07-13T11:48:33Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/sessions/whoami request_id=3398ca0d-2c26-447b-a32a-c7998ae65d9a
time=2020-07-13T11:48:33Z level=info msg=No valid session cookie found. audience=audit error=map[debug: message:request does not have a valid authentication session reason:No active session was found in this request. status:Unauthorized status_code:401] http_request=map[headers:map[accept:image/webp,image/apng,image/*,*/*;q=0.8 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 referer:https://auth.ips.test/auth/login?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:3398ca0d-2c26-447b-a32a-c7998ae65d9a] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55308 scheme:http] service_name=ORY Kratos service_version=v0.4.3-alpha.1
time=2020-07-13T11:48:33Z level=error msg=An error occurred while handling a request audience=application error=map[debug: message:The request could not be authorized reason:No valid session cookie found. status:Unauthorized status_code:401] http_request=map[headers:map[accept:image/webp,image/apng,image/*,*/*;q=0.8 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 referer:https://auth.ips.test/auth/login?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:3398ca0d-2c26-447b-a32a-c7998ae65d9a] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55308 scheme:http] http_response=map[status_code:401] service_name=kratos service_version=
time=2020-07-13T11:48:33Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/sessions/whoami request_id=3398ca0d-2c26-447b-a32a-c7998ae65d9a status=401 text_status=Unauthorized took=975.21µs
time=2020-07-13T11:48:33Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login request_id=6a67651c-1383-40b5-8a4f-c3883524b028
time=2020-07-13T11:48:33Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login request_id=6a67651c-1383-40b5-8a4f-c3883524b028 status=302 text_status=Found took=8.778967ms
time=2020-07-13T11:48:33Z level=info msg=started handling request method=GET name=admin#http://ips-auth-kratos-admin.default.svc.cluster.local/ remote=127.0.0.1:55788 request=/self-service/browser/flows/requests/login?request=7d75d569-531d-46c2-87b8-639a2119927d request_id=19186ea2-0f4d-41c6-8246-af087193a004
time=2020-07-13T11:48:33Z level=info msg=completed handling request method=GET name=admin#http://ips-auth-kratos-admin.default.svc.cluster.local/ remote=127.0.0.1:55788 request=/self-service/browser/flows/requests/login?request=7d75d569-531d-46c2-87b8-639a2119927d request_id=19186ea2-0f4d-41c6-8246-af087193a004 status=200 text_status=OK took=2.801678ms
time=2020-07-13T11:48:51Z level=info msg=started handling request method=POST name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=f19362c3-1ed9-4dda-a8af-2e3fe068f3a0
time=2020-07-13T11:49:15Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55576 request=/sessions/whoami request_id=b8751665-bd9b-4ea4-a238-949a3e4d8ac8
time=2020-07-13T11:49:15Z level=info msg=No valid session cookie found. audience=audit error=map[debug: message:request does not have a valid authentication session reason:No active session was found in this request. status:Unauthorized status_code:401] http_request=map[headers:map[accept:image/webp,image/apng,image/*,*/*;q=0.8 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 cache-control:no-cache referer:https://auth.ips.test/.ory/kratos/public/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:b8751665-bd9b-4ea4-a238-949a3e4d8ac8] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55576 scheme:http] service_name=ORY Kratos service_version=v0.4.3-alpha.1
time=2020-07-13T11:49:15Z level=error msg=An error occurred while handling a request audience=application error=map[debug: message:The request could not be authorized reason:No valid session cookie found. status:Unauthorized status_code:401] http_request=map[headers:map[accept:image/webp,image/apng,image/*,*/*;q=0.8 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 cache-control:no-cache referer:https://auth.ips.test/.ory/kratos/public/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:b8751665-bd9b-4ea4-a238-949a3e4d8ac8] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55576 scheme:http] http_response=map[status_code:401] service_name=kratos service_version=
time=2020-07-13T11:49:15Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55576 request=/sessions/whoami request_id=b8751665-bd9b-4ea4-a238-949a3e4d8ac8 status=401 text_status=Unauthorized took=2.029955ms
time=2020-07-13T11:49:29Z level=info msg=completed handling request method=POST name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=f19362c3-1ed9-4dda-a8af-2e3fe068f3a0 status=302 text_status=Found took=37.664910136s
time=2020-07-13T11:49:29Z level=info msg=started handling request method=POST name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55848 request=/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=f19362c3-1ed9-4dda-a8af-2e3fe068f3a0

Reproducing the bug

Here's my config:

helm-kratos-values.yaml

replicaCount: 1

image:
  repository: oryd/kratos
  tag: v0.4.3-alpha.1-sqlite
  pullPolicy: IfNotPresent

service:
  admin:
    enabled: true
    type: ClusterIP
    port: 80
    annotations: {}
  public:
    enabled: true
    type: ClusterIP
    port: 80
    annotations: {}

kratos:
  development: true
  autoMigrate: true

  config:
    dsn: "postgres://connection"
    secrets:
      cookie:
        - omitted

    identity:
      default_schema_url: file:///etc/config/identity.traits.schema.json

    courier:
      smtp:
        connection_uri: smtp://uri/
        from_address: notifications@example.com
    serve:
      public:
        base_url: https://auth.ips.test/.ory/kratos/public
      admin:
        base_url: http://ips-auth-kratos-admin.default.svc.cluster.local/

    selfservice:
      default_browser_return_url: https://auth.ips.test/
      strategies:
        password:
          enabled: true

      flows:
        error:
          ui_url: https://auth.ips.test/error

        settings:
          ui_url: https://auth.ips.test/settings
          privileged_session_max_age: 15m

        recovery:
          enabled: true
          ui_url: https://auth.ips.test/recovery

        verification:
          enabled: true
          ui_url: https://auth.ips.test/verify
          after:
            default_browser_return_url: https://auth.ips.test

        logout:
          after:
            default_browser_return_url: https://auth.ips.test/auth/login

        login:
          ui_url: https://auth.ips.test/auth/login
          request_lifespan: 10m

        registration:
          request_lifespan: 10m
          ui_url: https://auth.ips.test/auth/registration
          after:
            password:
              hooks:
                - hook: session

I manually updated helm-generated configmap to include /etc/config/identity.traits.schema.json, got the default one from the latest tagged kratos release.

auth-selfservice-ui.yaml

apiVersion: v1
kind: Service
metadata:
  name: ips-auth-selfservice
  labels:
    app: auth
spec:
  ports:
    - port: 80
      targetPort: 3000
      name: http
  selector:
    app: auth
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ips-auth-selfservice
spec:
  selector:
    matchLabels:
      app: auth
  replicas: 1
  template:
    metadata:
      labels:
        app: auth
    spec:
      containers:
        - name: kratos
          image: oryd/kratos-selfservice-ui-node:v0.4.4-alpha.1
          env:
            - name: SECURITY_MODE
              value: "cookie"
            - name: BASE_URL
              value: "https://auth.ips.test"
            - name: KRATOS_PUBLIC_URL
              value: "http://ips-auth-kratos-public.default.svc.cluster.local"
            - name: KRATOS_ADMIN_URL
              value: "http://ips-auth-kratos-admin.default.svc.cluster.local"
          ports:
            - containerPort: 3000
---

**Environment**

- Kubernetes: v1.18.3
- Istio: v1.6.5
@aeneasr
Copy link
Member

aeneasr commented Jul 13, 2020

You're probably allocating too many resources for Argon2. Running Istio in Minikube is already a performance sink and adding Argon2 to the mix could make your VM unresponsive.

@rauanmayemir
Copy link
Author

I did not realize Argon2 is that expensive. 😄
I'll try to adjust it.

@aeneasr
Copy link
Member

aeneasr commented Jul 13, 2020

Depends on the config but the defaults are pretty high: https://github.com/ory/kratos/blob/master/driver/configuration/provider_viper.go#L111-L115 (4GB RAM with 4 iterations with 2*CPU parallelism)

@rauanmayemir
Copy link
Author

I've tried to adjust the config and it's still hanging:

hashers:
  argon2:
    memory: 524288
    iterations: 2
    parallelism: 1
    salt_length: 16
    key_length: 16

This is my minikube config:

{
    "cpus": 4,
    "dashboard": true,
    "kubernetes-version": "1.18.3",
    "memory": 8192,
    "vm-driver": "virtualbox"
}

I'm just shocked this could be that resource-heavy.

@aeneasr
Copy link
Member

aeneasr commented Jul 14, 2020

Try iterations: 1. Also make sure that the spike is really Kratos. Also keep in mind that you're running Istio in VirtualBox/Minikube and have not enabled the minimum requirements. Running applications that consume non-significant memory/cpu (such as password hashing) on Istio on a below-minimum resource machine is going to cause issues.

Start minikube with 16384 MB of memory and 4 CPUs. This example uses Kubernetes version 1.17.5. You can change the version to any Kubernetes version supported by Istio by altering the --kubernetes-version value:

https://istio.io/latest/docs/setup/platform-setup/minikube/

@aeneasr
Copy link
Member

aeneasr commented Jul 14, 2020

By the way you're still using 512MB RAM, but on a machine that is already over-utilized by Istio. In our quickstart, we have dialed down everything quite a lot to make sure that it runs everywhere:

https://github.com/ory/kratos/blob/master/contrib/quickstart/kratos/email-password/.kratos.yml#L58-L64

I wouldn't recommend doing that in prod though.

@rauanmayemir
Copy link
Author

I changed the config to:

memory: 65536
iterations: 1
parallelism: 1
salt_length: 16
key_length: 16

But it's still crashing. Will try it later on the actual k8s cluster. I acknowledge that istio is super resource-hungry.

@rauanmayemir
Copy link
Author

This works fine on a beefier hardware.

@aeneasr
Copy link
Member

aeneasr commented Jul 15, 2020

Ok, can I close this then or do you need further clarification? :)

@alsuren
Copy link
Contributor

alsuren commented Aug 17, 2020

Sorry for reviving an old thread, but it seems the most appropriate one.

https://tools.ietf.org/html/draft-irtf-cfrg-argon2-10#section-4 ("Parameter Choice") says:

We suggest the following settings:

   o  Backend server authentication, that takes 0.5 seconds on a 2 GHz
      CPU using 4 cores -- Argon2id with 8 lanes and 4 GiB of RAM.

   o  Key derivation for hard-drive encryption, that takes 3 seconds on
      a 2 GHz CPU using 2 cores - Argon2id with 4 lanes and 6 GiB of
      RAM.

   o  Frontend server authentication, that takes 0.5 seconds on a 2 GHz
      CPU using 2 cores - Argon2id with 4 lanes and 1 GiB of RAM.

I would say that Kratos' is mostly used for Frontend server authentication but its default parameters are tuned for Backend server authentication. Would you accept a patch which tunes the parameters appropriately, or should I just tune them for my cluster and post my parameters on this thread?

@aeneasr
Copy link
Member

aeneasr commented Aug 21, 2020

All good, we should definitely document that and maybe also change the defaults used in the demo. If you're up for a PR @alsuren please go ahead :)

@aeneasr aeneasr reopened this Aug 21, 2020
@aeneasr aeneasr changed the title Login action hangs the pod and makes the cluster unresponsive Document Argon2 configuration best practices Aug 21, 2020
@aeneasr aeneasr added docs good first issue A good issue to tackle when being a novice to the project. help wanted We are looking for help on this one. labels Aug 21, 2020
@aeneasr aeneasr added this to the unplanned milestone Aug 21, 2020
@aeneasr aeneasr added the corp/m2 Up for M2 at Ory Corp. label Aug 21, 2020
@aeneasr
Copy link
Member

aeneasr commented Aug 21, 2020

@tricky42 #572 (comment) is relevant to you probably

@aikoven
Copy link

aikoven commented Sep 3, 2020

I have a similar problem where the Kratos process becomes unresponsive. My Argon2 config is:

argon2:
  parallelism: 1
  memory: 65536
  iterations: 1
  salt_length: 16
  key_length: 16

Still sometimes (not every time) when I try to perform login, Kratos process starts consuming 4+ cores of CPU and 4GB+ memory, then login request dies by timeout.

Example log:

time=2020-09-03T10:39:59Z level=info msg=completed handling request method=POST name=public#https://my-domain/.ory/kratos/public remote=172.30.134.6 request=/self-service/browser/flows/login/strategies/password?request=f1959ca9-8288-4936-a4dd-bd18ebaca97c request_id=5145e17fbf79383a92b437d852ccf889 status=302 text_status=Found took=52.018030695s

I'm running Kratos in Kubernetes, and while that request lasts, the pod becomes unready.

@aeneasr
Copy link
Member

aeneasr commented Sep 3, 2020

Make sure to have allocated enough CPU and memory limits!

@aikoven
Copy link

aikoven commented Sep 3, 2020

Thanks, I'll try it.

I was concerned by kubectl top pod output that showed 4+ cpu core usage. But that may be due to overall node cpu saturation probably.

@aeneasr aeneasr removed the corp/m2 Up for M2 at Ory Corp. label Oct 7, 2020
aeneasr pushed a commit that referenced this issue Nov 6, 2020
This patch adds the new command "hashers argon2 calibrate" which allows one to pick the desired hashing time for password hashing and then chooses the optimal parameters for the hardware the command is running on:

```
$ kratos hashers argon2 calibrate 500ms
Increasing memory to get over 500ms:
    took 2.846592732s in try 0
    took 6.006488824s in try 1
  took 4.42657975s with 4.00GB of memory
[...]
Decreasing iterations to get under 500ms:
    took 484.257775ms in try 0
    took 488.784192ms in try 1
  took 486.534204ms with 3 iterations
Settled on 3 iterations.

{
  "memory": 1048576,
  "iterations": 3,
  "parallelism": 32,
  "salt_length": 16,
  "key_length": 32
}
```

Closes #723
Closes #572
Closes #647
@aeneasr aeneasr added the corp/m4 Up for M4 at Ory Corp. label Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
corp/m4 Up for M4 at Ory Corp. good first issue A good issue to tackle when being a novice to the project. help wanted We are looking for help on this one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants