-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add garbage collection for text type #104
Add garbage collection for text type #104
Conversation
Adds a function to collect garbage of RGATreeSplit that occurs when editing text.
Adds a function to collect garbage of RGATreeSplit that occurs when editing rich text.
Garbage collection synchronization test added for text type elements. This is because garbage collection information must be synchronized between different clients.
@dc7303 Thanks for contributing. I'll do a code review soon. |
@mojosoeun recently implemented the GC feature in the JS SDK, so if possible, it would be nice to review this PR together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sending PR. 👍
I left a short review before looking at the purge algorithm.
We hope to group the package into 3 parts. Standard library, 3rd party library, internal package.
Change editedTextElementMapByCreatedAt to removedNodeTextElementMapByCreatedAt. This is to cache only when deleted nodes exist. For this, this feature compares the offset of RGATreeSpltPos.
Codecov Report
@@ Coverage Diff @@
## master #104 +/- ##
==========================================
+ Coverage 56.37% 57.46% +1.08%
==========================================
Files 27 27
Lines 2519 2560 +41
==========================================
+ Hits 1420 1471 +51
+ Misses 1007 995 -12
- Partials 92 94 +2
Continue to review full report at Codecov.
|
func (s *RGATreeSplit) purge(node *RGATreeSplitNode) { | ||
if node.prev != nil { | ||
node.prev.next = node.next | ||
if node.next != nil { | ||
node.next.prev = node.prev | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dc7303 I have a question. is the purpose of the purge function to remove a specific node from the LinkedList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! 👍
When the text is modified, it does not remove the node for the deleted text, just tombstone marking. (Tombstone is removedAt
)
This means that there are persistent nodes for deleted text, which can be a waste of memory.
We need to remove the tombstone marked nodes after synchronizing with other clients to fix this problem.
When executing garbage collection for large size of text garbage test
, try to modify RGATreeSplit.cleanupRemovedNodes
as shown below. Then you can see that the size of the object before and after the modification is different.
// rga_tree_split.go
func (s *RGATreeSplit) cleanupRemovedNodes(ticket *time.Ticket) int {
count := 0
for _, node := range s.removedNodeMap {
if node.removedAt != nil && ticket.Compare(node.removedAt) >= 0 {
s.treeByIndex.Delete(node.indexNode)
// s.purge(node)
s.treeByID.Remove(node.id)
delete(s.removedNodeMap, removedNodeMapKey(node.createdAt().Key()))
count++
}
}
return count
}
In addition, you can check the state of the tree through the Text.AnnotatedString()
method within doc.Update()
.
t.Run("garbage collection for text test", func(t *testing.T) {
...
// Example
err = doc.Update(func(root *proxy.ObjectProxy) error {
text := root.GetText("k1")
fmt.Println(text.AnnotatedString())
return nil
}, "edit text k1")
}
// Before s.purge(node) comment processing
// [0:0:00:0 ][2:1:00:0 Hi][1:2:00:5 ][2:2:00:0 j][2:3:00:0 ane]
// After s.purge(node) comment processing
// [2:1:00:0 Hi]{1:2:00:0 Hello}[1:2:00:5 ][2:2:00:0 j][2:3:00:0 ane]{1:3:00:0 m}{1:3:00:1 ario}{1:2:00:6 world}
I found out that the garbage collection for large size of text garbage test
test was wrong thanks to your comments. Thanks to you, I modified this test. Thank you so much! 😄
Corrected because the size of the object could not be output properly.
The coverage measurement method of 'Go cover' is basically only measured within the package being tested. So I added 'root_test.go' and wrote test code in it to increase the coverage score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work. 👍
When I check the algorithm, I think it is okay to delete nodes marked with a tombstone. As the editing progresses, the root node begins to split. When we need to find for RGATreeSplitNode
for the given location(offset
) outside the RGATreeSplit, it is not possible to know exactly how nodes are splits, so we use the approximate location by findFloorNode
. Even if the node the given location is deleted, it will ok because we use the approximate location.
It would be nice to check detailed tests while editing the actual examples in the JS SDK.
Please merge master and fix the errors caused by the recently introduced linters.
We can use the ID createdAt and offset in Node.id to get the unique value that can identify the node.
Thanks for the nice explanation. I think it would be nice if we could present this well in the RGATreeSplit or GC design documents we will work on later. 😄 |
Fixed an error notified by Lint after master merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
Now, I think we need to implement it in the JS SDK and do various tests using examples.
I left a brief comment.
Sort the import groups. Co-authored-by: Youngteac Hong <susukang98@gmail.com>
Connect adjacent insNext and insPrev when running purge. In #104, GC was also introduced for text nodes marked with tombstones. However, in release(), the link between adjacent insPrev and insNext is missing, so this commit fixes it. `insPrev` and `insNext` are used to remember other nodes connected to the insert. For example, when abc is divided into a, b, c, it looks like this: [abc] divided to [a]<->[b]<->[c]. Even if we completely delete b, I think it should keep the relationship of [a] <-> [c]. And disconnecting `node.prev` and `node.next` is missing when purging a node from RGATreeList. So this commit added it.
Adds a function to collect garbage of RGATreeSplit that occurs when editing Text and RichText. Garbage collection synchronization test added for text type elements. This is because garbage collection information must be synchronized between different clients. Co-authored-by: Youngteac Hong <susukang98@gmail.com>
) Connect adjacent insNext and insPrev when running purge. In yorkie-team#104, GC was also introduced for text nodes marked with tombstones. However, in release(), the link between adjacent insPrev and insNext is missing, so this commit fixes it. `insPrev` and `insNext` are used to remember other nodes connected to the insert. For example, when abc is divided into a, b, c, it looks like this: [abc] divided to [a]<->[b]<->[c]. Even if we completely delete b, I think it should keep the relationship of [a] <-> [c]. And disconnecting `node.prev` and `node.next` is missing when purging a node from RGATreeList. So this commit added it.
What type of PR is this?
What this PR does / why we need it:
There may be RGASplitNode removed when the text type is modified. And this can accumulate and result in wasted memory.
So I implemented a garbage collector for text types.
I will share this PR and continue to analyze and improve it.
Please give us honest feedback.
Which issue(s) this PR fixes:
Fixes #58
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation: