canal_instance_traffic_delay指标问题 #4454

yourse007 · 2022-10-14T06:49:49Z

I have searched the issues of this repository and believe that this is not a duplicate.
I have checked the FAQ of this repository and believe that this is not a duplicate.

environment

canal version 1.1.4、1.1.5
mysql version 5.7

Issue Description

canal_instance_traffic_delay的计算逻辑是currentTimestamp - localExecTime
仅在收到有效的binlog或heartbeat的时候才会更新localExecTime
有效的binlog指的是：根据filter.regex过滤之后的binlog

在某些场景下，该指标会持续上涨，造成数据有延迟的假象。

Steps to reproduce

问题场景：
mysql实例上有schema A和schema B，filter.regex只配置了schema A，但只有schema B上有业务流量
问题现象：
此时mysql master持续发送schema B的binlog，但被instance全部过滤掉了，不会更新localExecTime;
且此时mysql master也不会发送heartbeat事件，所以localExecTime就永远不会被更新；
造成的现象就是canal_instance_traffic_delay指标持续上涨，但其实此时canalInstance和mysql master之间是没有任何延迟的。

另外，AbstractEventParser#buildHeartBeatTimeTask中构造的heartBeat类型的entry并没有起任何作用，在sink环节直接被丢掉了，也没有被用于更新localExecTime.

Expected behaviour

如上述场景，canalInstance和master无延迟，canal_instance_traffic_delay理论上不应该持续上涨。

Actual behaviour

解法

两个思路：

用过滤之前的binlog.executeTime来更新localExecTime
在MysqlDetectingTimeTask机制中周期构造heartBeat类型的entry，且eventType=MHEARTBEAT，以此来模拟mysql master的心跳效果

If there is an exception, please attach the exception trace:

Just put your stack trace here!

The text was updated successfully, but these errors were encountered:

jackila · 2022-11-16T02:40:23Z

还有一种处理方式：优化了EntryCollector的指标采集方式。获取latestExecTime时同时获取latestInterval
如果now - latestExecTime > MASTER_HEARTBEAT_PERIOD_SECONDS * 1000,那么就使用now - latestExecTime
而如果now - latestExecTime < MASTER_HEARTBEAT_PERIOD_SECONDS * 1000,则使用latestInterval

这里面考虑的因素一方面是上面你的问题，另一个问题是当前处理方式的不够准确。如果没有数据，延迟可能回到MASTER_HEARTBEAT_PERIOD_SECONDS之久

agapple · 2022-11-16T06:12:18Z

先确认一下是否是老版本问题，印象中有修复过这个问题，#2616

目前的机制：默认在过滤后，会基于一定的策略放过binlog中的事务begin和commit事件，比如每间隔5秒或者 8192个空事件，通过这些event来触发cursor的位点推荐和延迟状态更新

jackila · 2022-11-17T07:13:54Z

过滤空事务头的机制对于一般情况是能够保证的。但是如果一个长久处于假死状态（本地测试）的数据库，还是会出现issue中的问题？

不过我觉得这只是一种edge case

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

canal_instance_traffic_delay指标问题 #4454

canal_instance_traffic_delay指标问题 #4454

yourse007 commented Oct 14, 2022

jackila commented Nov 16, 2022 •

edited

Loading

agapple commented Nov 16, 2022

jackila commented Nov 17, 2022

canal_instance_traffic_delay指标问题 #4454

canal_instance_traffic_delay指标问题 #4454

Comments

yourse007 commented Oct 14, 2022

environment

Issue Description

Steps to reproduce

Expected behaviour

Actual behaviour

解法

jackila commented Nov 16, 2022 • edited Loading

agapple commented Nov 16, 2022

jackila commented Nov 17, 2022

jackila commented Nov 16, 2022 •

edited

Loading