-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added debug log of AllocatedIPCount of ippool #3926
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3926 +/- ##
=======================================
Coverage 80.83% 80.84%
=======================================
Files 51 51
Lines 4514 4516 +2
=======================================
+ Hits 3649 3651 +2
Misses 699 699
Partials 166 166
Flags with carried forward coverage won't be shown. Click here to find out more.
|
pkg/gcmanager/scanAll_IPPool.go
Outdated
@@ -371,6 +379,18 @@ func (s *SpiderGC) executeScanAll(ctx context.Context) { | |||
} | |||
} | |||
} | |||
|
|||
if *pool.Status.AllocatedIPCount != int64(tempAllocatedIPCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这肯定是会产生 新 bug 的,
更新 ipool 都是 基于 resource version, 这个统计 必须是 在 基于 相同的 resource version
前面在 统计 ip 使用量 过程中,一旦 成功 做了 gc ip , resource version 就变了。 再 拿着 老的 基于 老 resource version 统计的 ip用量 来更新 ,如果此时有 ip 分配或者 释放的 并发,或者 ip 池 ip 的添加,这个老的 统计ip 用量 再 刷进去 就是 错误的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateAllocatedIPCount 判断了更新 apierrors.IsConflict(err) ?如果 resource version 变了,打印日志,更新失败。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你已经再想想,你统计用量时的version 和你刷新的version 是同一个么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
的确不是同一个。
e1cda70
to
d2cedea
Compare
pkg/ippoolmanager/ippool_manager.go
Outdated
func (im *ipPoolManager) UpdateAllocatedIPCount(ctx context.Context, poolName string, allocatedIPCount *int64) error { | ||
logger := logutils.FromContext(ctx) | ||
|
||
ipPool, err := im.GetIPPoolByName(ctx, poolName, constant.IgnoreCache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个version 有啥用,不是 统计状态时的version,这期间的统计值 变了多少哦,你没弄明白version的意义
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整个pr思路错了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已纠正。
pkg/gcmanager/scanAll_IPPool.go
Outdated
@@ -352,6 +359,7 @@ func (s *SpiderGC) executeScanAll(ctx context.Context) { | |||
|
|||
GCIP: | |||
if flagGCIPPoolIP { | |||
tempAllocatedIPCount-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个代码写的复杂了,不就统计下 已用 ip 中的数组成员 数量,就出来了?
也不需要 加来减去的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
赞同。
需要找出 用量统计的根因,gc并不能解决真正的问题,只是事后补救,这是两个纬度 |
d2cedea
to
42941c7
Compare
test/e2e/reclaim/reclaim_test.go
Outdated
@@ -586,6 +586,23 @@ var _ = Describe("test ip with reclaim ip case", Label("reclaim"), func() { | |||
// Delete Pod | |||
Expect(frame.DeletePod(podName, namespace)).To(Succeed(), "Failed to delete pod %v/%v\n", namespace, podName) | |||
GinkgoWriter.Printf("succeed to delete pod %v/%v\n", namespace, podName) | |||
|
|||
// Check whether the dirty IP data is recovered successfully and whether the AllocatedIPCount decreases and meets expectations? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用例是在手动更新 AllocatedIPCount 为一个错误的值,期望在新的改进引入后,分配 IP 或则 释放 IP,或则 gc all 回收异常 IP,都能够纠正这个 AllocatedIPCount 值。
IPPool 的状态是健壮的。
42941c7
to
e7ee784
Compare
e7ee784
to
f59f549
Compare
Signed-off-by: ty-dc <tao.yang@daocloud.io>
f59f549
to
a9069ca
Compare
(1) 这个日志 真有 能帮助 debug 问题根因么 |
当前没有任何日志去获悉 ippool 的 AllocatedIPCount 数量在什么时候开始出现异常的。AllocatedIPCount++,AllocatedIPCount-- 只出现在文中两处,补充这样的日志,当 IPPool 分配与释放 IP 打印当前的 AllocatedIPCount,能够知道在哪一次分配和释放出现的记录错误的问题。 |
你再想想,你这行 日志 真的能 进行 生产监控么 |
Thanks for contributing!
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes ##3771
Special notes for your reviewer: