Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rethink about log compaction #7162

Closed
xiang90 opened this issue Jan 15, 2017 · 4 comments
Closed

rethink about log compaction #7162

xiang90 opened this issue Jan 15, 2017 · 4 comments

Comments

@xiang90
Copy link
Contributor

xiang90 commented Jan 15, 2017

Now we compact raft log every 100,000 entries.

So we will keep at most 100,000 entries in-memory.

Keeping more entries in memory is good for fast follower recovery. If a follower dies, and it restarts within 100,000 entries lagging to the leader, the leader can send entries to followers without triggering a snapshot sent. Sending a snapshot is usually more expensive than sending entries.

However, 100,000 fixed number can be dangerous, and causes OOM. We assume each entry is around 1KB. So 100,000 entries is only 100MB. However, the max entry size is 1MB. In this cause, 100,000 entries cost 100GB.

I propose that we also need to take entries size into consideration when decide to compaction.

@mitake
Copy link
Contributor

mitake commented Jan 16, 2017

It seems to be interesting and important. logcabin configures the trigger based on both of size and a number of not snapshotted entries (although taking a snapshot is triggered when all of the conditions are satisfied): https://github.com/logcabin/logcabin/blob/master/Server/StateMachine.cc#L593

I also think that even a number of entries are small, replaying them on revived followers can be longer if they contain much of puts. Because of parallelism (nondeterminism) unfriendly nature of state machine replication, even replaying cannot exploit multicore. Maybe

  • more accurate estimation of replay cost based on various parameters (e.g. size, a number of puts in Txn, etc)
  • parallelising replay based on analyzing dependency relation between entries (it won't be easy at all)

would be helpful for stable operation of etcd cluster and increase its availability?

@heyitsanthony heyitsanthony added this to the unplanned milestone Jan 17, 2017
@xiang90 xiang90 self-assigned this Jan 31, 2017
@xiang90
Copy link
Contributor Author

xiang90 commented Jan 31, 2017

@mitake I assigned this to both you and me. I assume you are interested in this one :)

@mitake
Copy link
Contributor

mitake commented Feb 1, 2017

@xiang90 sure, of course. Thanks!

@stale
Copy link

stale bot commented Apr 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants