Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report on response times consistency over daily KCB runs on AWS environment #992

Open
andyuk1986 opened this issue Sep 27, 2024 · 1 comment

Comments

@andyuk1986
Copy link
Contributor

andyuk1986 commented Sep 27, 2024

The goal is to collect and analyze the response time numbers over daily KCB runs on AWS environment to understand whether the numbers are consistent, how much is the fluctuation and what could cause it, and if any additional actions/improvements are needed from our side based on the analyzed data.

It has been chosen to analyze the 50th percentile response time for cpuUsageLoginTest benchmark. The response times were taken for Browser to Log In Endpoint and Browser posts correct credentials Gatling tests to compare the response time with and without DB involvement/password hashing.

Metrics comparison for test without DB usage

Please find below the metrics comparison for cpuUsageLoginTest->Browser to Log In Endpoint test, which doesn't use DB during run.
image

The data is captured starting from 12.09.2024, i.e. after the lambda issue has been fixed with this commit . As you can see, the 50thPctRespTime has consistent value of 56ms, then there is a bit drop[1] starting from 19.09.2024 build, but after again the number is consistently 53ms.

Metrics comparison for test with DB usage

Please find below the metrics comparison for cpuUsageLoginTest->Browser posts correct credentials test, which works with DB and password hashing.

image

Here you can see that the 50thPctRespTime mainly fluctuates between 115ms and 120ms , which is about 5ms. By looking at these numbers we can say that the fluctuation is about ±3%.

As the metric is consistent (there are no fluctuations) for the test without DB involvement, we can assume that the fluctuation here is caused by DB usage and password hashing.

As a conclusion based on the data given above, we can say that running KCB on AWS environment ensures consistent reliable response numbers with around ±3% fluctuation in certain cases.

[1] There is an issue reported here as performance drop was spotted starting from 19.09.2024 daily runs.

Response time difference in A/A setup

The second part of this initiative was to collect and compare the response time numbers in A/A setup to understand how much differ the response times of both sites when one of the sites is closer located to Database than the second one.
The data was collected in the period of 27.09.2024 - 05.11.2024 (with some pauses). The 50th percentile response time was taken everyday from "Authentication Code SLO" Grafana Dashboard for all pods of both sites. In the screenshot below you can see the part of the retrieved and calculated data:

Screenshot from 2024-11-13 08-33-02

In the screenshot you can see the response time numbers of specified requests for all pods on both sites, the Average response time for each request on each site. Then in the Diff column you can see the difference of the response time between sites per request per day. And in the AVG DIFF column you can see the average difference for each request between sites. Please note, that during AVG DIFF calculation the spikes which occurred for a few times during this period of time were NOT taken into account .

As you can see the response time difference is in the range of 6-13ms depending on the request. From the numbers it is visible that authenticate and logout requests are the ones which heavily work with database and here the response time difference is 13ms.

One more thing which I have noticed is that starting from 10/10/2024 the overall response time numbers has increased, e.g. for /realms/{realm}/login-actions/authenticate before it was in average 110-123ms, then after 10/10/2024 in average 130-143ms. But in general the average difference between sites remained the same.
You can find the screenshot with all numbers below:
Screenshot from 2024-11-13 09-02-53

@kami619
Copy link
Contributor

kami619 commented Oct 1, 2024

@andyuk1986 great writeup Anna, I came up with a python script for community users to be able to leverage the data from the daily runs, as they don't have insight into the Horreum data. Should we think of hosting a community Horreum instance for folks to look at these reports on their own giving more visibility to the Keycloak project performance characteristics in general?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants