Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add garbage collection for text type #104

Merged
merged 15 commits into from
Dec 21, 2020

Conversation

dc7303
Copy link
Member

@dc7303 dc7303 commented Nov 17, 2020

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind feature

What this PR does / why we need it:
There may be RGASplitNode removed when the text type is modified. And this can accumulate and result in wasted memory.
So I implemented a garbage collector for text types.

I will share this PR and continue to analyze and improve it.
Please give us honest feedback.

Which issue(s) this PR fixes:

Fixes #58

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Additional documentation:


Adds a function to collect garbage of RGATreeSplit
that occurs when editing text.
Adds a function to collect garbage of RGATreeSplit
that occurs when editing rich text.
Garbage collection synchronization test added for text type elements.
This is because garbage collection information
must be synchronized between different clients.
@dc7303 dc7303 requested a review from hackerwins November 17, 2020 14:45
@dc7303 dc7303 changed the title Feature/garbage collection Add garbage collection for text type Nov 17, 2020
@hackerwins
Copy link
Member

@dc7303 Thanks for contributing. I'll do a code review soon.

@hackerwins hackerwins self-assigned this Nov 17, 2020
@hackerwins hackerwins requested a review from mojosoeun November 22, 2020 04:26
@hackerwins
Copy link
Member

@mojosoeun recently implemented the GC feature in the JS SDK, so if possible, it would be nice to review this PR together.

Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sending PR. 👍

I left a short review before looking at the purge algorithm.

pkg/document/json/root.go Outdated Show resolved Hide resolved
api/converter/converter_test.go Outdated Show resolved Hide resolved
pkg/document/json/rga_tree_split.go Outdated Show resolved Hide resolved
We hope to group the package into 3 parts.
Standard library, 3rd party library, internal package.
Change editedTextElementMapByCreatedAt to
removedNodeTextElementMapByCreatedAt.
This is to cache only when deleted nodes exist.
For this, this feature compares the offset of RGATreeSpltPos.
@codecov
Copy link

codecov bot commented Dec 6, 2020

Codecov Report

Merging #104 (151a901) into master (c4276a1) will increase coverage by 1.08%.
The diff coverage is 89.79%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #104      +/-   ##
==========================================
+ Coverage   56.37%   57.46%   +1.08%     
==========================================
  Files          27       27              
  Lines        2519     2560      +41     
==========================================
+ Hits         1420     1471      +51     
+ Misses       1007      995      -12     
- Partials       92       94       +2     
Impacted Files Coverage Δ
pkg/document/json/rga_tree_split.go 77.53% <86.11%> (+4.43%) ⬆️
pkg/document/json/rich_text.go 65.21% <100.00%> (+2.99%) ⬆️
pkg/document/json/root.go 34.61% <100.00%> (+19.05%) ⬆️
pkg/document/json/text.go 52.94% <100.00%> (+4.45%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4276a1...151a901. Read the comment docs.

Comment on lines +551 to +556
func (s *RGATreeSplit) purge(node *RGATreeSplitNode) {
if node.prev != nil {
node.prev.next = node.next
if node.next != nil {
node.next.prev = node.prev
}
Copy link
Member

@mojosoeun mojosoeun Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dc7303 I have a question. is the purpose of the purge function to remove a specific node from the LinkedList?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! 👍

When the text is modified, it does not remove the node for the deleted text, just tombstone marking. (Tombstone is removedAt)
This means that there are persistent nodes for deleted text, which can be a waste of memory.
We need to remove the tombstone marked nodes after synchronizing with other clients to fix this problem.

When executing garbage collection for large size of text garbage test, try to modify RGATreeSplit.cleanupRemovedNodes as shown below. Then you can see that the size of the object before and after the modification is different.

// rga_tree_split.go
func (s *RGATreeSplit) cleanupRemovedNodes(ticket *time.Ticket) int {
	count := 0
	for _, node := range s.removedNodeMap {
		if node.removedAt != nil && ticket.Compare(node.removedAt) >= 0 {
			s.treeByIndex.Delete(node.indexNode)
			// s.purge(node)
			s.treeByID.Remove(node.id)
			delete(s.removedNodeMap, removedNodeMapKey(node.createdAt().Key()))
			count++
		}
	}

	return count
}

In addition, you can check the state of the tree through the Text.AnnotatedString() method within doc.Update().

	t.Run("garbage collection for text test", func(t *testing.T) {
		
		...

		// Example
		err = doc.Update(func(root *proxy.ObjectProxy) error {
			text := root.GetText("k1")
			fmt.Println(text.AnnotatedString())
			return nil
		}, "edit text k1")
	}

// Before s.purge(node) comment processing
//     [0:0:00:0 ][2:1:00:0 Hi][1:2:00:5  ][2:2:00:0 j][2:3:00:0 ane]
// After s.purge(node) comment processing
//     [2:1:00:0 Hi]{1:2:00:0 Hello}[1:2:00:5  ][2:2:00:0 j][2:3:00:0 ane]{1:3:00:0 m}{1:3:00:1 ario}{1:2:00:6 world}

I found out that the garbage collection for large size of text garbage test test was wrong thanks to your comments. Thanks to you, I modified this test. Thank you so much! 😄

Corrected because the size of the object could not be output properly.
The coverage measurement method of 'Go cover' is basically only measured
within the package being tested. So I added 'root_test.go' and wrote test
code in it to increase the coverage score.
Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work. 👍

When I check the algorithm, I think it is okay to delete nodes marked with a tombstone. As the editing progresses, the root node begins to split. When we need to find for RGATreeSplitNode for the given location(offset) outside the RGATreeSplit, it is not possible to know exactly how nodes are splits, so we use the approximate location by findFloorNode. Even if the node the given location is deleted, it will ok because we use the approximate location.

It would be nice to check detailed tests while editing the actual examples in the JS SDK.

Please merge master and fix the errors caused by the recently introduced linters.

pkg/document/json/rga_tree_split.go Outdated Show resolved Hide resolved
api/converter/converter_test.go Outdated Show resolved Hide resolved
pkg/document/document_test.go Outdated Show resolved Hide resolved
@hackerwins hackerwins removed their assignment Dec 19, 2020
We can use the ID createdAt and offset in Node.id to get the unique
value that can identify the node.
@dc7303
Copy link
Member Author

dc7303 commented Dec 20, 2020

Thank you for your work. 👍

When I check the algorithm, I think it is okay to delete nodes marked with a tombstone. As the editing progresses, the root node begins to split. When we need to find for RGATreeSplitNode for the given location(offset) outside the RGATreeSplit, it is not possible to know exactly how nodes are splits, so we use the approximate location by findFloorNode. Even if the node the given location is deleted, it will ok because we use the approximate location.

It would be nice to check detailed tests while editing the actual examples in the JS SDK.

Please merge master and fix the errors caused by the recently introduced linters.

Thanks for the nice explanation. I think it would be nice if we could present this well in the RGATreeSplit or GC design documents we will work on later. 😄

Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

Now, I think we need to implement it in the JS SDK and do various tests using examples.

I left a brief comment.

pkg/document/json/root_test.go Outdated Show resolved Hide resolved
Sort the import groups.

Co-authored-by: Youngteac Hong <susukang98@gmail.com>
@hackerwins hackerwins merged commit 76faebc into yorkie-team:master Dec 21, 2020
hackerwins added a commit that referenced this pull request Dec 24, 2020
Connect adjacent insNext and insPrev when running purge.

In #104, GC was also introduced for text nodes marked with tombstones. 
However, in release(), the link between adjacent insPrev and insNext is 
missing, so this commit fixes it.

`insPrev` and `insNext` are used to remember other nodes connected to 
the insert. For example, when abc is divided into a, b, c, it looks like
 this: [abc] divided to [a]<->[b]<->[c]. Even if we completely delete b,
 I think it should keep the relationship of [a] <-> [c].

And disconnecting `node.prev` and `node.next` is missing when purging a 
node from RGATreeList. So this commit added it.
jeonjonghyeok pushed a commit to jeonjonghyeok/yorkie that referenced this pull request Aug 4, 2022
Adds a function to collect garbage of RGATreeSplit that occurs when
editing Text and RichText.

Garbage collection synchronization test added for text type elements.
This is because garbage collection information must be synchronized 
between different clients.

Co-authored-by: Youngteac Hong <susukang98@gmail.com>
jeonjonghyeok pushed a commit to jeonjonghyeok/yorkie that referenced this pull request Aug 4, 2022
)

Connect adjacent insNext and insPrev when running purge.

In yorkie-team#104, GC was also introduced for text nodes marked with tombstones. 
However, in release(), the link between adjacent insPrev and insNext is 
missing, so this commit fixes it.

`insPrev` and `insNext` are used to remember other nodes connected to 
the insert. For example, when abc is divided into a, b, c, it looks like
 this: [abc] divided to [a]<->[b]<->[c]. Even if we completely delete b,
 I think it should keep the relationship of [a] <-> [c].

And disconnecting `node.prev` and `node.next` is missing when purging a 
node from RGATreeList. So this commit added it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garbage collection for Text and RichText
3 participants