Add garbage collection for text type #104

dc7303 · 2020-11-17T14:44:44Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind feature

What this PR does / why we need it:
There may be RGASplitNode removed when the text type is modified. And this can accumulate and result in wasted memory.
So I implemented a garbage collector for text types.

I will share this PR and continue to analyze and improve it.
Please give us honest feedback.

Which issue(s) this PR fixes:

Fixes #58

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Additional documentation:

Adds a function to collect garbage of RGATreeSplit that occurs when editing text.

Adds a function to collect garbage of RGATreeSplit that occurs when editing rich text.

Garbage collection synchronization test added for text type elements. This is because garbage collection information must be synchronized between different clients.

hackerwins · 2020-11-17T22:49:35Z

@dc7303 Thanks for contributing. I'll do a code review soon.

hackerwins · 2020-11-22T04:27:30Z

@mojosoeun recently implemented the GC feature in the JS SDK, so if possible, it would be nice to review this PR together.

hackerwins

Thank you for sending PR. 👍

I left a short review before looking at the purge algorithm.

pkg/document/json/root.go

api/converter/converter_test.go

pkg/document/json/rga_tree_split.go

We hope to group the package into 3 parts. Standard library, 3rd party library, internal package.

Change editedTextElementMapByCreatedAt to removedNodeTextElementMapByCreatedAt. This is to cache only when deleted nodes exist. For this, this feature compares the offset of RGATreeSpltPos.

codecov · 2020-12-06T12:56:06Z

Codecov Report

Merging #104 (151a901) into master (c4276a1) will increase coverage by 1.08%.
The diff coverage is 89.79%.

@@            Coverage Diff             @@
##           master     #104      +/-   ##
==========================================
+ Coverage   56.37%   57.46%   +1.08%     
==========================================
  Files          27       27              
  Lines        2519     2560      +41     
==========================================
+ Hits         1420     1471      +51     
+ Misses       1007      995      -12     
- Partials       92       94       +2

Impacted Files	Coverage Δ
pkg/document/json/rga_tree_split.go	`77.53% <86.11%> (+4.43%)`	⬆️
pkg/document/json/rich_text.go	`65.21% <100.00%> (+2.99%)`	⬆️
pkg/document/json/root.go	`34.61% <100.00%> (+19.05%)`	⬆️
pkg/document/json/text.go	`52.94% <100.00%> (+4.45%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4276a1...151a901. Read the comment docs.

mojosoeun · 2020-12-18T14:23:00Z

pkg/document/json/rga_tree_split.go

+func (s *RGATreeSplit) purge(node *RGATreeSplitNode) {
+	if node.prev != nil {
+		node.prev.next = node.next
+		if node.next != nil {
+			node.next.prev = node.prev
+		}


@dc7303 I have a question. is the purpose of the purge function to remove a specific node from the LinkedList?

Thanks for the review! 👍

When the text is modified, it does not remove the node for the deleted text, just tombstone marking. (Tombstone is removedAt)
This means that there are persistent nodes for deleted text, which can be a waste of memory.
We need to remove the tombstone marked nodes after synchronizing with other clients to fix this problem.

When executing garbage collection for large size of text garbage test, try to modify RGATreeSplit.cleanupRemovedNodes as shown below. Then you can see that the size of the object before and after the modification is different.

// rga_tree_split.go func (s *RGATreeSplit) cleanupRemovedNodes(ticket *time.Ticket) int { count := 0 for _, node := range s.removedNodeMap { if node.removedAt != nil && ticket.Compare(node.removedAt) >= 0 { s.treeByIndex.Delete(node.indexNode) // s.purge(node) s.treeByID.Remove(node.id) delete(s.removedNodeMap, removedNodeMapKey(node.createdAt().Key())) count++ } } return count }

In addition, you can check the state of the tree through the Text.AnnotatedString() method within doc.Update().

t.Run("garbage collection for text test", func(t *testing.T) { ... // Example err = doc.Update(func(root *proxy.ObjectProxy) error { text := root.GetText("k1") fmt.Println(text.AnnotatedString()) return nil }, "edit text k1") } // Before s.purge(node) comment processing // [0:0:00:0 ][2:1:00:0 Hi][1:2:00:5 ][2:2:00:0 j][2:3:00:0 ane] // After s.purge(node) comment processing // [2:1:00:0 Hi]{1:2:00:0 Hello}[1:2:00:5 ][2:2:00:0 j][2:3:00:0 ane]{1:3:00:0 m}{1:3:00:1 ario}{1:2:00:6 world}

I found out that the garbage collection for large size of text garbage test test was wrong thanks to your comments. Thanks to you, I modified this test. Thank you so much! 😄

Corrected because the size of the object could not be output properly.

The coverage measurement method of 'Go cover' is basically only measured within the package being tested. So I added 'root_test.go' and wrote test code in it to increase the coverage score.

hackerwins

Thank you for your work. 👍

When I check the algorithm, I think it is okay to delete nodes marked with a tombstone. As the editing progresses, the root node begins to split. When we need to find for RGATreeSplitNode for the given location(offset) outside the RGATreeSplit, it is not possible to know exactly how nodes are splits, so we use the approximate location by findFloorNode. Even if the node the given location is deleted, it will ok because we use the approximate location.

It would be nice to check detailed tests while editing the actual examples in the JS SDK.

Please merge master and fix the errors caused by the recently introduced linters.

pkg/document/json/rga_tree_split.go

api/converter/converter_test.go

pkg/document/document_test.go

We can use the ID createdAt and offset in Node.id to get the unique value that can identify the node.

dc7303 · 2020-12-20T09:50:52Z

Thank you for your work. 👍

When I check the algorithm, I think it is okay to delete nodes marked with a tombstone. As the editing progresses, the root node begins to split. When we need to find for RGATreeSplitNode for the given location(offset) outside the RGATreeSplit, it is not possible to know exactly how nodes are splits, so we use the approximate location by findFloorNode. Even if the node the given location is deleted, it will ok because we use the approximate location.

It would be nice to check detailed tests while editing the actual examples in the JS SDK.

Please merge master and fix the errors caused by the recently introduced linters.

Thanks for the nice explanation. I think it would be nice if we could present this well in the RGATreeSplit or GC design documents we will work on later. 😄

Fixed an error notified by Lint after master merge.

hackerwins

LGTM. 👍

Now, I think we need to implement it in the JS SDK and do various tests using examples.

I left a brief comment.

pkg/document/json/root_test.go

Sort the import groups. Co-authored-by: Youngteac Hong <susukang98@gmail.com>

Connect adjacent insNext and insPrev when running purge. In #104, GC was also introduced for text nodes marked with tombstones. However, in release(), the link between adjacent insPrev and insNext is missing, so this commit fixes it. `insPrev` and `insNext` are used to remember other nodes connected to the insert. For example, when abc is divided into a, b, c, it looks like this: [abc] divided to [a]<->[b]<->[c]. Even if we completely delete b, I think it should keep the relationship of [a] <-> [c]. And disconnecting `node.prev` and `node.next` is missing when purging a node from RGATreeList. So this commit added it.

Adds a function to collect garbage of RGATreeSplit that occurs when editing Text and RichText. Garbage collection synchronization test added for text type elements. This is because garbage collection information must be synchronized between different clients. Co-authored-by: Youngteac Hong <susukang98@gmail.com>

) Connect adjacent insNext and insPrev when running purge. In yorkie-team#104, GC was also introduced for text nodes marked with tombstones. However, in release(), the link between adjacent insPrev and insNext is missing, so this commit fixes it. `insPrev` and `insNext` are used to remember other nodes connected to the insert. For example, when abc is divided into a, b, c, it looks like this: [abc] divided to [a]<->[b]<->[c]. Even if we completely delete b, I think it should keep the relationship of [a] <-> [c]. And disconnecting `node.prev` and `node.next` is missing when purging a node from RGATreeList. So this commit added it.

dc7303 added 5 commits November 17, 2020 22:37

Add garbage collection for text

bdfea21

Adds a function to collect garbage of RGATreeSplit that occurs when editing text.

Add garbage collection for rich text

ec47b84

Adds a function to collect garbage of RGATreeSplit that occurs when editing rich text.

Add client test for GC

c3e7277

Garbage collection synchronization test added for text type elements. This is because garbage collection information must be synchronized between different clients.

Change removedNodeMap to removedNodesMap

6d88de7

Fix textSize variable in large size of text garbage test

5feaf71

dc7303 requested a review from hackerwins November 17, 2020 14:45

dc7303 changed the title ~~Feature/garbage collection~~ Add garbage collection for text type Nov 17, 2020

hackerwins self-assigned this Nov 17, 2020

hackerwins requested a review from mojosoeun November 22, 2020 04:26

hackerwins requested changes Nov 22, 2020

View reviewed changes

pkg/document/json/root.go Outdated Show resolved Hide resolved

api/converter/converter_test.go Outdated Show resolved Hide resolved

pkg/document/json/rga_tree_split.go Outdated Show resolved Hide resolved

dc7303 added 3 commits November 22, 2020 14:36

Change removedNodesMap to removedNodeMap

70afac9

Fix group the package in converter_test

65cc0ab

We hope to group the package into 3 parts. Standard library, 3rd party library, internal package.

Fix editedTextElementMapByCreatedAt property

d16bfc2

Change editedTextElementMapByCreatedAt to removedNodeTextElementMapByCreatedAt. This is to cache only when deleted nodes exist. For this, this feature compares the offset of RGATreeSpltPos.

mojosoeun reviewed Dec 18, 2020

View reviewed changes

Fix test case to output memory status

18a8ae4

Corrected because the size of the object could not be output properly.

dc7303 assigned dc7303 and mojosoeun Dec 19, 2020

Add garbage collect test for TextElement

5ce761c

The coverage measurement method of 'Go cover' is basically only measured within the package being tested. So I added 'root_test.go' and wrote test code in it to increase the coverage score.

hackerwins requested changes Dec 19, 2020

View reviewed changes

pkg/document/json/rga_tree_split.go Outdated Show resolved Hide resolved

api/converter/converter_test.go Outdated Show resolved Hide resolved

pkg/document/document_test.go Outdated Show resolved Hide resolved

hackerwins removed their assignment Dec 19, 2020

dc7303 added 2 commits December 20, 2020 18:15

Cleanup import code syntax

b52d78d

Remove removedNodeMapKey type

1eefbb3

We can use the ID createdAt and offset in Node.id to get the unique value that can identify the node.

dc7303 added 2 commits December 20, 2020 21:37

Merge branch 'master' into feature/garbage-collection

648778d

Clean up code with lint error

1cc149a

Fixed an error notified by Lint after master merge.

dc7303 assigned hackerwins Dec 20, 2020

hackerwins approved these changes Dec 21, 2020

View reviewed changes

pkg/document/json/root_test.go Outdated Show resolved Hide resolved

Cleanup root_test.go

151a901

Sort the import groups. Co-authored-by: Youngteac Hong <susukang98@gmail.com>

hackerwins merged commit 76faebc into yorkie-team:master Dec 21, 2020

hackerwins mentioned this pull request Dec 24, 2020

Connect adjacent insNext and insPrev when running purge #123

Merged

2 tasks

mojosoeun mentioned this pull request Feb 9, 2021

Garbage collection for Text and RichText yorkie-team/yorkie-js-sdk#137

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add garbage collection for text type #104

Add garbage collection for text type #104

dc7303 commented Nov 17, 2020

hackerwins commented Nov 17, 2020

hackerwins commented Nov 22, 2020

hackerwins left a comment

codecov bot commented Dec 6, 2020 •

edited

Loading

mojosoeun Dec 18, 2020 •

edited

Loading

dc7303 Dec 19, 2020

hackerwins left a comment

dc7303 commented Dec 20, 2020

hackerwins left a comment

Add garbage collection for text type #104

Add garbage collection for text type #104

Conversation

dc7303 commented Nov 17, 2020

hackerwins commented Nov 17, 2020

hackerwins commented Nov 22, 2020

hackerwins left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 6, 2020 • edited Loading

Codecov Report

mojosoeun Dec 18, 2020 • edited Loading

Choose a reason for hiding this comment

dc7303 Dec 19, 2020

Choose a reason for hiding this comment

hackerwins left a comment

Choose a reason for hiding this comment

dc7303 commented Dec 20, 2020

hackerwins left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 6, 2020 •

edited

Loading

mojosoeun Dec 18, 2020 •

edited

Loading