Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add half-open and fix degrade-bug #553

Closed
wants to merge 8 commits into from
Closed

add half-open and fix degrade-bug #553

wants to merge 8 commits into from

Conversation

CalvinKirs
Copy link

@CalvinKirs CalvinKirs commented Mar 7, 2019

#154

Describe what this PR does / why we need it

原有的异常数以及异常比例这里会出现问题。位于同一个滑动窗口当降级关闭之后无论如何都会再次进入降级。
您好,目前来看的话限流这里是用总的传递数来进行计算的。 (这样是最合理的而且目前也确实是这样做的)。
基于rt这里也已经完善,是拿最近一次请求的rt与计数进行比较的。另外这里的话清除的话只会清除当前时间窗口的一个异常室温总和,最小为1个RT(经过多个时间窗口,每次刚进入时间窗口进进入准降级状态),最大为5个RT(持续一个窗口的一个准降级-降级的RT总和)。
异常数目这里还是原有的逻辑,降级关闭之后放进一个请求,依赖于这个请求的结果来进行是否开关。降级打开之前会对分钟秒级别的纬度减去各自所属的时间窗口的值。如果是新的时间窗口则不做操作。
我这里觉得降级关闭之后应该是一个新的开始,所以-1个异常数 里是可以接受的。不这么做的话异常数目这里位于同一个时间窗口降级关闭之后始终是要再次打开的。因为数已经达到了阈值,必须去减

Does this pull request fix one issue?

Describe how you did it

Describe how to verify it

ut

Special notes for reviews

@codecov-io
Copy link

codecov-io commented Mar 7, 2019

Codecov Report

Merging #553 into master will increase coverage by 1.29%.
The diff coverage is 24.52%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #553      +/-   ##
============================================
+ Coverage     39.09%   40.39%   +1.29%     
- Complexity     1223     1293      +70     
============================================
  Files           275      279       +4     
  Lines          8674     8925     +251     
  Branches       1159     1194      +35     
============================================
+ Hits           3391     3605     +214     
+ Misses         4855     4852       -3     
- Partials        428      468      +40
Impacted Files Coverage Δ Complexity Δ
...sp/sentinel/slots/statistic/data/MetricBucket.java 77.35% <0%> (-18%) 23 <0> (ø)
...a/csp/sentinel/slots/statistic/base/LongAdder.java 31.91% <0%> (-2.13%) 10 <0> (-1)
...a/com/alibaba/csp/sentinel/node/StatisticNode.java 50.43% <29.62%> (-5.75%) 28 <0> (+1)
.../csp/sentinel/slots/block/degrade/DegradeRule.java 50.41% <34%> (-5.15%) 20 <3> (+3)
...p/sentinel/slots/statistic/metric/ArrayMetric.java 60.37% <5.55%> (-7%) 30 <1> (+1)
...ntinel/slots/block/degrade/DegradeRuleManager.java 21.27% <0%> (-5.12%) 7% <0%> (ø)
...csp/sentinel/slots/block/flow/FlowRuleManager.java 60.86% <0%> (-1.36%) 7% <0%> (ø)
...l/slots/block/flow/param/ParamFlowRuleManager.java 75% <0%> (-1.2%) 7% <0%> (ø)
...el/slots/block/authority/AuthorityRuleManager.java 63.79% <0%> (-1.12%) 9% <0%> (ø)
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 044cdbb...0bc146f. Read the comment docs.

@sczyh30 sczyh30 added the to-review To review label Mar 7, 2019
@sczyh30
Copy link
Member

sczyh30 commented Mar 8, 2019

For reviewers: some previous discussions can be found here: #525

@sczyh30 sczyh30 added the size/L Indicate a PR that changes 100-499 lines. label Mar 11, 2019
@CalvinKirs
Copy link
Author

CarpenterLee大佬还是建议不要剔除这些数据。我目前想到的一种做法是这些记录不剔除,但是会记录下来,逻辑判断的时候做逻辑删除。直到该时间统计的窗口过期。基于秒基于分钟。都需要做统计。有点儿重。

@CalvinKirs
Copy link
Author

还有另一种做法,就是,既然统计数据要保证百分百的准确性,那么OK,熔断这里自己备份一份数据,需要记录qps,异常数,异常比率。达到熔断的时候和Hy一样进行直接reset。这样两者之间互不相关。

@CarpenterLee
Copy link
Contributor

Thanks for your contribution!

For the reasons listed in #525, I agree that it's necessary to support HALF-OPEN state. But as for the code, improvement may be needed.

Sentinel statistical module is carefully designed. The Node and StatisticNode in your current implementation is not well enough, such as adding LongAdder in StatisticNode. As you could see, amost all statistical work is done in ArrayMetric.

You could refine the current implementation and explain your design in more detail. And the following principles should be abided:

  1. accuracy, never modify the past statistics;
  2. coherence, follow the current statistical pattern in ArrayMetric and MetricBucket.

@CalvinKirs
Copy link
Author

CalvinKirs commented May 13, 2019

谢谢你的贡献!

由于#525中列出的原因,我同意支持HALF-OPEN状态是必要的。但至于代码,可能需要改进。

Sentinel统计模块经过精心设计。该NodeStatisticNode在当前的实现不是不够好,如加入LongAdder StatisticNode。正如您所看到的,所有的统计工作都已完成ArrayMetric

您可以优化当前的实现并更详细地解释您的设计。并且应遵守以下原则:

  1. 准确性,绝不修改过去的统计数据;
  2. 连贯性,按照目前的统计模式ArrayMetricMetricBucket

我不是很倾向于单独熔断的数据于此 因为两者虽然从表面上看是一个概念 但是意义完全不一样。我这里统计只是异常的一个数据而言,一个 变量足矣,如果采用滑动窗口的话显得多余且有点儿浪费

所以个人觉得放于StatisticNode更合适,毕竟数据源自于ArrayMetric,但是它属于更上一层的数据统计而非,所以没有必要再给他加一层数据源。或者我这样讲更清楚,我们可以把ArrayMetric当作一个数据源头,那么既然数据源头已经有了,我们在StatisticNode相当于做了一层包装,所以于此我觉得更为合适

@pouchrobot
Copy link

ping @CalvinKirs
Conflict happens after merging a previous commit.
Please rebase the branch against master and push it back again. Thanks a lot.

@sczyh30 sczyh30 added the area/circuit-breaking Issues or PRs related to circuit breaking label Aug 3, 2019
@bert82503
Copy link

bert82503 commented Aug 3, 2019

反向思考下,Sentinel从2012年诞生,在阿里内部使用了4年之久,我觉得这个问题你们应该遇到过了,也想到过半开启模式,但最终为何没采取这种方式呢?能否普及下相关历史,或你们的想法与考量
@sczyh30 @CarpenterLee

@sczyh30
Copy link
Member

sczyh30 commented Aug 3, 2019

@edwardlee03 Some more details of the circuit breaking statistics should be well designed. Discussions are welcomed!

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sczyh30 sczyh30 removed the to-review To review label Jul 30, 2020
@sczyh30
Copy link
Member

sczyh30 commented Jul 30, 2020

Resolved in #1490. Thanks for your contribution!

@sczyh30 sczyh30 closed this Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/circuit-breaking Issues or PRs related to circuit breaking size/L Indicate a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants