You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have noticed that the GET /kafkacruisecontrol/loadendpoint ignores the start, end and time parameters under some conditions (which we have not yet been able to identify).
This can be easily reproduced using cccli. For instance, retrieving the cluster load for a given 1 hour time window does not always return the same results.
The first time shows the right average across every dimension for the 1 hour time windows.
However, waiting a bit and running the same query outputs different values. We suspect these values correspond to the cluster load for the default time window, effectively ignoring the start and end parameters. I believe this corresponds to the time window between the earliest available timestamp and the current time.
Periodically running a command with start and end (or time) parameters, will inconsistently return one or the other.
Plotting this into a graph we can confirm how the load of the cluster oscillates between the two time windows:
In blue we can see the live system metrics while in purple we see the cluster load as reported by the Cruise Control endpoint.
After cutting down Kafka traffic to half, we can see that the CruiseControl load reflects that after a delay (which makes sense as it is not live data but the accumulated average over the last time window). However, what it is not expected is that the values show "waves". From observation we suspect the low points of the waves correspond to querying the load within the 1 hour time window, as they converge with the system metric after that time. The high points of the wave take longer to converge, approximately after 4 hours, which we suspect is the default time window.
Could you help me understand why this is happening and how to prevent it? Thank you very much!
The text was updated successfully, but these errors were encountered:
Hello,
We have noticed that the
GET /kafkacruisecontrol/load
endpoint ignores thestart
,end
andtime
parameters under some conditions (which we have not yet been able to identify).This can be easily reproduced using
cccli
. For instance, retrieving the cluster load for a given 1 hour time window does not always return the same results.The first time shows the right average across every dimension for the 1 hour time windows.
However, waiting a bit and running the same query outputs different values. We suspect these values correspond to the cluster load for the default time window, effectively ignoring the
start
andend
parameters. I believe this corresponds to the time window between the earliest available timestamp and the current time.In fact, running the same without
start
andend
arguments returns the same values as the previous command:$ cccli -a kafka-dev-cruise-control-headless:9090 load Starting long-running poll of http://kafka-dev-cruise-control-headless:9090/kafkacruisecontrol/load?allow_capacity_estimation=False HOST BROKER RACK DISK_CAP(MB) DISK(MB)/_(%)_ CORE_NUM CPU(%) NW_IN_CAP(KB/s) LEADER_NW_IN(KB/s) FOLLOWER_NW_IN(KB/s) NW_OUT_CAP(KB/s) NW_OUT(KB/s) PNW_OUT(KB/s) LEADERS/REPLICAS -, 10000,us-east-1a, 1192092.000, 10931.121/00.92, 1, 11.381, 97656.000, 386.458, 762.798, 195312.000, 1155.897, 3446.797, 62/185 -, 10001,us-east-1b, 1192092.000, 10931.121/00.92, 1, 10.193, 97656.000, 343.802, 805.454, 195312.000, 1032.157, 3446.797, 58/185 -, 10002,us-east-1c, 1192092.000, 10931.121/00.92, 1, 10.482, 97656.000, 418.996, 730.259, 195312.000, 1258.743, 3446.797, 65/185
Periodically running a command with
start
andend
(ortime
) parameters, will inconsistently return one or the other.Plotting this into a graph we can confirm how the load of the cluster oscillates between the two time windows:
In blue we can see the live system metrics while in purple we see the cluster load as reported by the Cruise Control endpoint.
After cutting down Kafka traffic to half, we can see that the CruiseControl load reflects that after a delay (which makes sense as it is not live data but the accumulated average over the last time window). However, what it is not expected is that the values show "waves". From observation we suspect the low points of the waves correspond to querying the load within the 1 hour time window, as they converge with the system metric after that time. The high points of the wave take longer to converge, approximately after 4 hours, which we suspect is the default time window.
Could you help me understand why this is happening and how to prevent it? Thank you very much!
The text was updated successfully, but these errors were encountered: