-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove vcache for vald agent due to vcache delete timing control failure and time ordered concurrent vector queue called vqueue #1028
Conversation
[CHATOPS:HELP] ChatOps commands.
|
4dcc4a1
to
2d94bcb
Compare
Codecov Report
@@ Coverage Diff @@
## master #1028 +/- ##
==========================================
- Coverage 15.01% 14.42% -0.60%
==========================================
Files 497 495 -2
Lines 28703 28398 -305
==========================================
- Hits 4310 4096 -214
+ Misses 24123 24046 -77
+ Partials 270 256 -14
Continue to review full report at Codecov.
|
3e44b48
to
051e19d
Compare
…ure and time ordered concurrent vector queue called vqueue Signed-off-by: kpango <kpango@vdaas.org>
e364393
to
f4a5911
Compare
Signed-off-by: kpango <kpango@vdaas.org>
…che to vqueue Signed-off-by: kpango <kpango@vdaas.org>
Signed-off-by: kpango <kpango@vdaas.org>
…ontrol-failure-and-add-concurrent-queue
Signed-off-by: kpango <kpango@vdaas.org>
…ontrol-failure-and-add-concurrent-queue
WithInsertBufferSize(100), | ||
} | ||
|
||
func WithErrGroup(eg errgroup.Group) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
exported function WithErrGroup
should have comment or be unexported (golint)
} | ||
} | ||
|
||
func WithInsertBufferSize(size int) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
exported function WithInsertBufferSize
should have comment or be unexported (golint)
} | ||
} | ||
|
||
func WithDeleteBufferSize(size int) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
exported function WithDeleteBufferSize
should have comment or be unexported (golint)
} | ||
} | ||
|
||
func WithInsertBufferPoolSize(size int) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
exported function WithInsertBufferPoolSize
should have comment or be unexported (golint)
} | ||
} | ||
|
||
func WithDeleteBufferPoolSize(size int) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
exported function WithDeleteBufferPoolSize
should have comment or be unexported (golint)
}) | ||
dup := make(map[string]bool, len(uii)/2) | ||
dl := make([]int, 0, len(uii)/2) | ||
for i, idx := range uii { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
only one cuddle assignment allowed before range statement (wsl)
dup[idx.uuid] = true | ||
} | ||
} | ||
sort.Sort(sort.Reverse(sort.IntSlice(dl))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
expressions should not be cuddled with blocks (wsl)
func (v *vqueue) RangePopInsert(ctx context.Context, f func(uuid string, vector []float32) bool) { | ||
if v.finalizing.Load().(bool) { | ||
for !v.finalizing.Load().(bool) { | ||
time.Sleep(time.Millisecond * 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
mnd: Magic number: 100, in detected (gomnd)
// | ||
|
||
// Package vqueue manages the vector cache layer for reducing FFI overhead for fast Agent processing. | ||
package vqueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
package should be vqueue_test
instead of vqueue
(testpackage)
// | ||
|
||
// Package vqueue manages the vector cache layer for reducing FFI overhead for fast Agent processing. | ||
package vqueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [golangci] reported by reviewdog 🐶
package should be vqueue_test
instead of vqueue
(testpackage)
Signed-off-by: kpango <kpango@vdaas.org>
Seems okay to merge but please ask to the others. |
Signed-off-by: kpango <kpango@vdaas.org>
log.Info("create index delete phase finished") | ||
n.gc() | ||
log.Info("create index insert phase started") | ||
n.vq.RangePopInsert(ctx, func(uuid string, vector []float32) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it is correct:
The current logic is process all the delete request first, and insert request later
For example we have 2 request, reqA and reqB
reqA: insert vecA request, time t+0
reqB: delete vecA request, time t+10
If they are processing the same vector, even the reqB comes later then reqA, the final result will be the vecA is inserted, but maybe it is not correct.
Am I correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why I remove the Insert Queue here.
https://github.com/vdaas/vald/pull/1028/files#diff-d8459a99ea4f8ec96e01405d5f55171070363986154ad61e269c7911dbfd91a9R304-R327
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... I think it is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…urrent-queue Signed-off-by: kpango <kpango@vdaas.org>
Signed-off-by: kpango kpango@vdaas.org
Description:
We are currently experiencing a bug where the Vald agent is unable to ensure the integrity of vector Delete and Insert when performing high frequency Upsert and Delete in a multi-threaded environment.
The Vald Agent was using Vector Cache = vcache to reduce the overhead of calling FFI inline in user requests, and also to aggregate FFI calls during CreateIndex.
However, since vcache uses a sync.Map-like implementation and does not support ordered loops, depending on the timing, the old vcache may not be deleted and may be loaded during the next CreateIndex.
In addition, Insert Vector Cache (IVC) and Delete Vector Cache (DVC) compare the timestamps of execution plans with the same UUID and delete unnecessary caches to avoid duplication of FFIs.
I believe that this reciprocal cache deletion logic further expands the garbage data and the Delete and Insert operations were not executed.
To solve this problem, we need to removed the vcache and added a cache layer to ensure orderliness.
So, I implemented TOCVQ (Time Ordered Concurrent Vector Queue) to avoid this problem, always calculating the execution plan from the head of the queue (operating on the old time axis) and calling FFI calls to strongly guarantee the consistency of the results.
Related Issue:
How Has This Been Tested?:
Environment:
Types of changes:
Changes to Core Features:
Checklist: