-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hotspot方法:关于PS度量因置信度的可解释性 #5
Comments
Hello, Although I know a bit of Chinese, I'm in no way fluent so I will answer in English. Following the ripple effect property, we know that:
So we know that the above are properties of the true root cause. The problem now is to find which set of elements is the root cause. To do this we need to search through sets of elements and measure their likelihood of being the root cause (HotSpot uses the PS score to do this). What is done in HotSpot is
The key idea is that a root cause in multi-dimensional data like this will affect all the descendant elements evenly. This is what the PS score (and GPS in Squeeze, NPS in AutoRoot, and partly the risk score in RiskLoc) try to measure. I hope the above helped a bit in understanding. If you have an interest in this work, consider staring the github repository. |
thanks but there is a situation in reality, where S decreases by 20%, but e does not necessarily decrease by 20%, so the ripple effect has certain limitations. Do you know what scenarios the ripple effect is suitable for? |
I assume e is a leaf element of S? Since S decrease by 20% then these 20% need to come from somewhere, this somewhere is the leaf elements of S (since those build up S together). For S to have a forecast error of 20% then the leaf elements (as an aggregate, i.e., together) must also have have forecast error of 20% due to the nature of the multi-dimensional problem. If S is a root cause of an anomaly, then the leaf elements will have its forecasting error evenly distributed following the ripple effect. If the forecasting error is more randomly distributed among the leaf elements then its less likely that S is the root cause. The above is also the asusmption of the ripple effect. So it's suitable in situations where you believe that prediction errors in the root cause elements will be evenly distributed (in practice this seems to work quite well). In practice, I found that the most difficult step is to get accurate forecasting values for all leaf elements. Since these are usually quite fine-grained, they don't actually have much data and any forecasts are often inaccurate. This can skew the results. |
thanks for your answer For example, the following figure shows that province=Beijing is the root cause. The KPI corresponding to province=Beijing has dropped by 40%. The first sample (Province=Beijing, ISP = Mobile) has dropped by 60%, while the second sample (Province=Beijing, ISP = Unicom) does not change, the ripple effect does not hold here |
Actually, I would say that it does work however the true root cause is not Beijing. In the example, (beijing, unicom) is normal so it does not make much sense to say that the whole (beijing, *) is abnormal. Instead, the root cause that best explains the anomaly should be (beijing, mobile). Note that both (shanghai, mobile) and (guangdong, mobile) are normal so the root cause won't be (*, mobile). So, even if the (beijing, *) had dropped 40% and should by itself be considered abnormal the location of the problem is actually the Mobile ISP in Beijing. |
大佬您好,PS方法采用RE(涟漪效应)来度量因的置信度,如何理解PS方法的原理
很多人的猜想类似于下面的:
如果属性值是因 , 属性值的变化和属性值样本的变化符合涟漪效应;
如果属性值的变化和属性值样本的变化符合涟漪效应,则属性值是因
这种理解对么
The text was updated successfully, but these errors were encountered: