Report on response times consistency over daily KCB runs on AWS environment #992

andyuk1986 · 2024-09-27T18:16:26Z

The goal is to collect and analyze the response time numbers over daily KCB runs on AWS environment to understand whether the numbers are consistent, how much is the fluctuation and what could cause it, and if any additional actions/improvements are needed from our side based on the analyzed data.

It has been chosen to analyze the 50th percentile response time for cpuUsageLoginTest benchmark. The response times were taken for Browser to Log In Endpoint and Browser posts correct credentials Gatling tests to compare the response time with and without DB involvement/password hashing.

Metrics comparison for test without DB usage

Please find below the metrics comparison for cpuUsageLoginTest->Browser to Log In Endpoint test, which doesn't use DB during run.

The data is captured starting from 12.09.2024, i.e. after the lambda issue has been fixed with this commit . As you can see, the 50thPctRespTime has consistent value of 56ms, then there is a bit drop[1] starting from 19.09.2024 build, but after again the number is consistently 53ms.

Metrics comparison for test with DB usage

Please find below the metrics comparison for cpuUsageLoginTest->Browser posts correct credentials test, which works with DB and password hashing.

Here you can see that the 50thPctRespTime mainly fluctuates between 115ms and 120ms , which is about 5ms. By looking at these numbers we can say that the fluctuation is about ±3%.

As the metric is consistent (there are no fluctuations) for the test without DB involvement, we can assume that the fluctuation here is caused by DB usage and password hashing.

As a conclusion based on the data given above, we can say that running KCB on AWS environment ensures consistent reliable response numbers with around ±3% fluctuation in certain cases.

[1] There is an issue reported here as performance drop was spotted starting from 19.09.2024 daily runs.

Response time difference in A/A setup

The second part of this initiative was to collect and compare the response time numbers in A/A setup to understand how much differ the response times of both sites when one of the sites is closer located to Database than the second one.
The data was collected in the period of 27.09.2024 - 05.11.2024 (with some pauses). The 50th percentile response time was taken everyday from "Authentication Code SLO" Grafana Dashboard for all pods of both sites. In the screenshot below you can see the part of the retrieved and calculated data:

In the screenshot you can see the response time numbers of specified requests for all pods on both sites, the Average response time for each request on each site. Then in the Diff column you can see the difference of the response time between sites per request per day. And in the AVG DIFF column you can see the average difference for each request between sites. Please note, that during AVG DIFF calculation the spikes which occurred for a few times during this period of time were NOT taken into account .

As you can see the response time difference is in the range of 6-13ms depending on the request. From the numbers it is visible that authenticate and logout requests are the ones which heavily work with database and here the response time difference is 13ms.

One more thing which I have noticed is that starting from 10/10/2024 the overall response time numbers has increased, e.g. for /realms/{realm}/login-actions/authenticate before it was in average 110-123ms, then after 10/10/2024 in average 130-143ms. But in general the average difference between sites remained the same.
You can find the screenshot with all numbers below:

The text was updated successfully, but these errors were encountered:

kami619 · 2024-10-01T14:31:06Z

@andyuk1986 great writeup Anna, I came up with a python script for community users to be able to leverage the data from the daily runs, as they don't have insight into the Horreum data. Should we think of hosting a community Horreum instance for folks to look at these reports on their own giving more visibility to the Keycloak project performance characteristics in general?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report on response times consistency over daily KCB runs on AWS environment #992

Report on response times consistency over daily KCB runs on AWS environment #992

andyuk1986 commented Sep 27, 2024 •

edited

Loading

kami619 commented Oct 1, 2024

Report on response times consistency over daily KCB runs on AWS environment #992

Report on response times consistency over daily KCB runs on AWS environment #992

Comments

andyuk1986 commented Sep 27, 2024 • edited Loading

Metrics comparison for test without DB usage

Metrics comparison for test with DB usage

Response time difference in A/A setup

kami619 commented Oct 1, 2024

andyuk1986 commented Sep 27, 2024 •

edited

Loading