-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added doc for leases and the lease lifecycle. #1218
Conversation
docs/leases-and-lease-lifecycle.md
Outdated
|
||
In KCL, a lease provides a temporal assignment between one Kinesis shard and an assigned worker. | ||
Leases are persistent for the duration of shard processing (detailed later). | ||
However, lease assignment is transient -- leases may be "stolen" by other workers in the same KCL application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lease assignment is transient is a confusing statement, because the assignment has a consistent logic, it is the ownership that is transient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the assignment has a consistent logic
Pedantically, yes and no. "Yes", the programmed logic doesn't change, yet "no" in that X leases divvied up between Y workers will not produce consistent assignments due to intentional shuffling by KCL.
Per offline consensus, will reword to:
However, the worker that is processing a lease may change since leases may be "stolen" by other workers in the same KCL application.
docs/leases-and-lease-lifecycle.md
Outdated
## Lease Table | ||
|
||
To persist metadata about lease state (e.g., last read checkpoint, current assigned worker), KCL creates a lease table in [DynamoDB][dynamodb]. | ||
Each KCL application will have its own distinct lease table that transcludes the application name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great verbage, but "transcludes" I don't know will be accessible. I would suggest something more like includes.
We can keep this but I think most will have to look it up in the dictionary
docs/plantuml/lease-shard-sync.puml
Outdated
end | ||
|
||
PSS->PSS: runShardSync() | ||
opt if not required to sync shards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a leader not be required to sync shards? Is this if there are no shards, and if so can we say that explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some examples:
- KCL is a multi-threaded app, and the stream could be purged at some time before, or during, the
runShardSync()
- the known shard list is complete (i.e., no holes in the hash range)
- the known shard list is incomplete, but the holes don't cross the threshold to trigger a sync
- some future edit that may add, or remove, another condition that affects sync logic
if so can we say that explicitly
The more specific/nuanced documentation is, the more likely it will suffer bitrot with an implementation change. We should balance documentation to convey ideas vs. repacking the implementation as English.
Update pending to provide some additional details, yet not a full enumeration.
docs/leases-and-lease-lifecycle.md
Outdated
![Abridged sequence diagram of the Shard Sync process. | ||
Listed participants are the Scheduler, LeaseCoordinator, PeriodicShardSyncManager, ShardSyncTask, | ||
Lease Table (DDB), LeaseRefresher, LeaseSynchronizer, HierarchicalShardSyncer, and ShardDetector. | ||
](images/lease-shard-sync.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image is hard to read and small. I don't know if we can increase text size. Also it introduces a lot of new terms that the customer maynot be familiar with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image is hard to read and small.
Agreed, and this is after creating the abridged version that omits several classes/calls. It's likely a safe assumption that KCL users know how to click-into an image, or zoom-in via their browsers.
it introduces a lot of new terms that the customer maynot be familiar with
Anything specific to call out? Code links are provided immediately after so any reader can dive deeper.
docs/leases-and-lease-lifecycle.md
Outdated
* In general, leases spend the majority of their life in this state. | ||
1. `SHARD_END`: The associated shard is `SHARD_END` and all records have been processed by KCL for the shard. | ||
1. `DELETION`: Since there are no more records to process, KCL will delete the lease from the lease table. | ||
* Deletion will only be triggered after all parents of a child shard have converged to `SHARD_END`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not clear which is getting deleted here from reading the sentence. Maybe we should say a lease is deleted only after the child shard(s) has checkpointed atleast once. So its not just converging to SHARD_END along with other co-parent, but also that the child lease must have started processing and checkpoined atleast once.
docs/leases-and-lease-lifecycle.md
Outdated
* First time starting KCL with an empty lease table. | ||
* Stream mutations (i.e., split, merge) that create child shards. | ||
* In multi-stream mode, dynamic discovery of a new stream. | ||
1. `CREATION`: Leases are created 1:1 for each discovered shard, and initialized at the configured initial position. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for root leases they are created with configured initial position, all other leases for mutations of shards have TRIM_HORIZON ?
docs/leases-and-lease-lifecycle.md
Outdated
|
||
Lease syncing is a complex responsibility owned by the "leader" host in a KCL application. | ||
By invoking the [ListShards API][list-shards], KCL will identify the shards for the configured stream(s). | ||
This process is scheduled at a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a section on this below, but I would recommend to add it here so it doesnt seem like ListShards happens periodically but only when the auditor determines leases are missing, othterwise childshard leases are created when parent reaches SHARD_END, or something along those lines ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid readers incorrectly assuming that ListShards
is invoked every interval, rewording (per offline consensus):
This process is scheduled at a configurable interval so KCL can self-identify new shards introduced via stream mutations.
... to:
This process is scheduled at a configurable interval so KCL can decide whether it should query for potentially-new shards.
docs/leases-and-lease-lifecycle.md
Outdated
1. `LeaseRefresher` invokes a DDB `scan` against the lease table which has a cost proportional to the number of leases. | ||
1. Frequent balancing may cause high lease turn-over which incurs DDB `write` costs, and potentially redundant work for stolen leases. | ||
1. High `maxLeasesToStealAtOneTime` may cause churn. | ||
* For example, worker `B` steals multiple leases from worker `A` creating a numerical imbalance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always only steals to balance, but wont steal max if max is not needed to balance ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. If applicable, we can reintroduce this language later w/ a better example.
+ minor code cleanup
private final ShardInfo shardInfo; | ||
private final ShardDetector shardDetector; | ||
|
||
StreamIdentifier streamIdentifier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes intended ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Per Lombok @Value:
all fields are made private and final by default, and setters are not generated
Explicit private final
are unnecessary/redundant, and get flagged by IntelliJ as such.
+ decomposed shard sync UML into two separate diagrams (initialization, loop)
Issue #, if available:
N/A
Description of changes:
Added doc for leases and the lease lifecycle.
Rendered @ https://github.com/stair-aws/amazon-kinesis-client/blob/lease-doc/docs/lease-lifecycle.md
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
edit: fixed rendered link since the file was renamed.