Skip to content

Latest commit



101 lines (59 loc) · 6.29 KB

File metadata and controls

101 lines (59 loc) · 6.29 KB
title target-version

Document Removal


Document removal is a crucial feature to save resources. Without document removal, accumulated documents and their relevant data will waste storage resources.

Yorkie implements this feature by dividing the document removal process into three steps:

  1. Implement the basic document remove API by simply setting RemovedAt date in docInfo.
  2. Determine whether to include removed documents in ListDocuments API used by Dashboard and CLI(Admin).
  3. Implement a housekeeping background process to physically remove document(is_removed = true) and their relevant data.

Also, there are some considerations to be made:

  • User should be able to remove documents with API to optimize their system.
  • Document removal should propagate among peers.

This document explains Step 1: Implementing the basic document removal API with soft removal.


Explains document removal mechanisms, especially soft removal.


We only discuss soft (logical) removal in this document. Hard (physical) removal using housekeeping will not be discussed in this document.

Proposal Details

State transition of Document

As we introduced document removal, a new state: Removed is added. Below is the new state transition diagram of the document.

 ┌──────────┐ Attach ┌──────────┐ Remove ┌─────────┐
 │ Detached ├───────►│ Attached ├───────►│ Removed │
 └──────────┘        └─┬─┬──────┘        └─────────┘
           ▲           │ │     ▲
           └───────────┘ └─────┘
              Detach     PushPull

This state diagram shows how document state transitions are made. This also tells what the document can do in a certain state. A detached document, for example, cannot be removed and its state changed to removed.

As you can see above, by introducing document removal, Attached document can now be changed to Removed. Keep in mind that only Attached document can perform Remove to change its state to Removed.

How it Works?

The overall document removal process looks like this:

  1. The client calls the remove document API(RemoveDocument) with IsRemoved = true in the request ChangePack
  2. In PushPull of RemoveDocument(or other APIs), check IsRemoved. And if IsRemoved = true, simply set RemovedAt date in docInfo to logically (softly) remove the document.
  3. Set response ChangePack's IsRemoved to true to inform document removal.
  4. After a certain period of time(DocumentRemoveDuration), the document is physically removed from the database.

The very first step of document removal is to perform soft removal of the document, which is shown as Step 1 ~ Step 3 above.

Step 1: The client calls the document remove API(RemoveDocument) with IsRemoved = true in the request ChangePack


Lines 599 to 638 in dcf4cfb

// Remove removes the given document.
func (c *Client) Remove(ctx context.Context, doc *document.Document) error {
if c.status != activated {
return ErrClientNotActivated
attachment, ok := c.attachments[doc.Key()]
if !ok {
return ErrDocumentNotAttached
pbChangePack, err := converter.ToChangePack(doc.CreateChangePack())
if err != nil {
return err
pbChangePack.IsRemoved = true
res, err := c.client.RemoveDocument(ctx, &api.RemoveDocumentRequest{
DocumentId: attachment.docID.String(),
ChangePack: pbChangePack,
if err != nil {
return err
pack, err := converter.FromChangePack(res.ChangePack)
if err != nil {
return err
if err := doc.ApplyChangePack(pack); err != nil {
return err
if doc.Status() == document.StatusRemoved {
delete(c.attachments, doc.Key())
return nil

The first step is to receive a client document removal request and validate the document's state.

As we have discussed earlier, only Attached document can perform Remove action and change the state to Removed. This is implemented in the first few lines of code above.

Step 2: In PushPull of RemoveDocument(or other APIs), check IsRemoved. And if IsRemoved = true, simply set RemovedAt date in docInfo to logically (softly) remove the document.

// PushPull stores the given changes and returns accumulated changes of the
// given document.
// CAUTION(hackerwins, krapie): docInfo's state is constantly mutating as they are
// constantly used as parameters in subsequent subroutines.
func PushPull(
ctx context.Context,
be *backend.Backend,
project *types.Project,
clientInfo *database.ClientInfo,
docInfo *database.DocInfo,
reqPack *change.Pack,
) (*ServerPack, error) {
start := gotime.Now()
defer func() {
// TODO: Changes may be reordered or missing during communication on the network.
// We should check the change.pack with checkpoint to make sure the changes are in the correct order.
initialServerSeq := docInfo.ServerSeq
// 01. push changes: filter out the changes that are already saved in the database.
cpAfterPush, pushedChanges := pushChanges(ctx, clientInfo, docInfo, reqPack, initialServerSeq)
// 02. pull pack: pull changes or a snapshot from the database and create a response pack.
respPack, err := pullPack(ctx, be, clientInfo, docInfo, reqPack, cpAfterPush, initialServerSeq)
if err != nil {
return nil, err
if err := clientInfo.UpdateCheckpoint(docInfo.ID, respPack.Checkpoint); err != nil {
return nil, err
// 03. store pushed changes, docInfo and checkpoint of the client to DB.
if len(pushedChanges) > 0 || reqPack.IsRemoved {
if err := be.DB.CreateChangeInfos(
); err != nil {
return nil, err

The second step is to check the removing flag IsRemoved in RemoveDocument API's PushPull() method, and if IsRemoved is true, use be.DB.CreateChangeInfos() method to logically (softly) remove the document.

// CreateChangeInfos stores the given changes and doc info.
func (c *Client) CreateChangeInfos(
ctx context.Context,
projectID types.ID,
docInfo *database.DocInfo,
initialServerSeq int64,
changes []*change.Change,
isRemoved bool,
) error {
encodedDocID, err := encodeID(docInfo.ID)
if err != nil {
return err
var models []mongo.WriteModel
for _, cn := range changes {
encodedOperations, err := database.EncodeOperations(cn.Operations())
if err != nil {
return err
models = append(models, mongo.NewUpdateOneModel().SetFilter(bson.M{
"doc_id": encodedDocID,
"server_seq": cn.ServerSeq(),
}).SetUpdate(bson.M{"$set": bson.M{
"actor_id": encodeActorID(cn.ID().ActorID()),
"client_seq": cn.ID().ClientSeq(),
"lamport": cn.ID().Lamport(),
"message": cn.Message(),
"operations": encodedOperations,
// TODO(hackerwins): We need to handle the updates for the two collections
// below atomically.
if len(changes) > 0 {
if _, err = c.collection(colChanges).BulkWrite(
); err != nil {
return fmt.Errorf("bulk write changes: %w", err)
now := gotime.Now()
updateFields := bson.M{
"server_seq": docInfo.ServerSeq,
"updated_at": now,
if isRemoved {
updateFields["removed_at"] = now
res, err := c.collection(colDocuments).UpdateOne(ctx, bson.M{
"_id": encodedDocID,
"server_seq": initialServerSeq,
}, bson.M{
"$set": updateFields,
if err != nil {
return fmt.Errorf("update document: %w", err)
if res.MatchedCount == 0 {
return fmt.Errorf("%s: %w", docInfo.ID, database.ErrConflictOnUpdate)
if isRemoved {
docInfo.RemovedAt = now
return nil

In be.DB.CreateChangeInfos(), if isRemoved flag is true, RemovedAt date is updated to now in docInfo. This is how we perform soft removal of documents: just initialize the date in RemovedAt of docInfo. If isRemoved is false, RemovedAt date will not be set and stay as not exists.

By this mechanism, we can filter out softly (logically) removed documents by checking RemovedAt exists or not. And perform hard (physical) removing.

Step 3: Set the ChangePack response ChangePack's IsRemoved to true to inform document removal.

// RemoveDocument removes the given document.
func (s *yorkieServer) RemoveDocument(
ctx context.Context,
req *api.RemoveDocumentRequest,
) (*api.RemoveDocumentResponse, error) {
actorID, err := time.ActorIDFromBytes(req.ClientId)
if err != nil {
return nil, err
pack, err := converter.FromChangePack(req.ChangePack)
if err != nil {
return nil, err
docID := types.ID(req.DocumentId)
if err := docID.Validate(); err != nil {
return nil, err
if err := auth.VerifyAccess(ctx, s.backend, &types.AccessInfo{
Method: types.RemoveDocument,
Attributes: auth.AccessAttributes(pack),
}); err != nil {
return nil, err
if pack.HasChanges() {
locker, err := s.backend.Coordinator.NewLocker(
packs.PushPullKey(projects.From(ctx).ID, pack.DocumentKey),
if err != nil {
return nil, err
if err := locker.Lock(ctx); err != nil {
return nil, err
defer func() {
if err := locker.Unlock(ctx); err != nil {
clientInfo, err := clients.FindClientInfo(
if err != nil {
return nil, err
docInfo, err := documents.FindDocInfo(
if err != nil {
return nil, err
if err := clientInfo.RemoveDocument(docInfo.ID); err != nil {
return nil, err
pulled, err := packs.PushPull(ctx, s.backend, projects.From(ctx), clientInfo, docInfo, pack)
if err != nil {
return nil, err
pbChangePack, err := pulled.ToPBChangePack()
if err != nil {
return nil, err
return &api.RemoveDocumentResponse{
ChangePack: pbChangePack,
}, nil

After the soft (logical) removal of the document is done, we inform the client that the document is removed by setting IsRemoved to true in the response ChangePack.

Since we have set docInfo's RemovedAt date, all other APIs' responses will also include IsRemoved as true. This is because all APIs use PushPull() method as a common method to perform document operations, and it will also set IsRemoved to true if RemovedAt date exists in docInfo.

This will ensure that if a document is removed, all other peers who currently use it will also know that the document is removed by the next API response's IsRemoved flag. This is how we ensure document removal propagation among peers.

New Key - ID Mapping in Document

As we introduced document removal, Key - ID mapping in documents changed.

Before document removal was introduced, Key - ID mapping had one to one (1:1) relationship. But after document removal was introduced, Key - ID mapping now has one to many (1:N) relation. This is because we now reuse Key to create new documents, and there can be several documents with the same Key, but different ID.

Before document removal, Key - ID mapping has a one to one (1:1) relationship. But after document removal, the Key - ID mapping now has a one to many (1:N) relation. This is because we now reuse the Key to create new documents, and there can be several documents with the same Key but different ID.

Due to this, we need to use document ID in API requests instead of document Key to properly identify documents.