Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📚 CDK docs: made ordering requirement more clear for stream_slices #5627

Merged
merged 1 commit into from
Aug 25, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/connector-development/cdk-python/stream-slices.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ When a stream is being read incrementally, Slices can be used to control when st

When slicing is enabled, a state message will be output by the connector after reading every slice. Slicing is completely optional and is provided as a way for connectors to checkpoint state in a more granular way than basic interval-based state checkpointing. Slicing is typically used when reading a large amount of data or when the underlying data source imposes strict rate limits that make it difficult to re-read the same data over and over again. This being said, interval-based checkpointing is compatible with slicing with one difference: intervals are counted within a slice rather than across all records. In other words, the counter used to determine if the interval has been reached \(e.g: every 10k records\) resets at the beginning of every slice.

The relationship between records in a slice is up to the developer, but slices are typically used to implement date-based checkpointing, for example to group records generated within a particular hour, day, or month etc.
The relationship between records in a slice is up to the developer, but the list of slices must be yielded in ascending order, using the cursor field as context for the ordering. This is to ensure that the state can't be updated to a timestamp that is ahead of other slices yet to be processed. Slices are typically used to implement date-based checkpointing, for example to group records generated within a particular hour, day, or month etc.

Slices can be hard-coded or generated dynamically \(e.g: by making a query\).

The only restriction imposed on slices is that they must be described with a list of `dict`s returned from the `Stream.stream_slices()` method, where each `dict` describes a slice. The `dict`s may have any schema, and are passed as input to each stream's `read_stream` method. This way, the connector can read the current slice description \(the input `dict`\) and use that to make queries as needed.
An important restriction imposed on slices is that they must be described with a list of `dict`s returned from the `Stream.stream_slices()` method, where each `dict` describes a slice. The `dict`s may have any schema, and are passed as input to each stream's `read_stream` method. This way, the connector can read the current slice description \(the input `dict`\) and use that to make queries as needed. As described above, this list of dicts must be in appropriate ascending order based on the cursor field.

### Use cases

Expand Down