-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
33647: changefeed: add experimental support for cloud storage sinks r=mrtracy a=danhhz The data files are named `<timestamp>_<topic>_<schema_id>_<uniquer>.<ext>`. `<timestamp>` is truncated to some bucket size, specified by the required sink param `bucket_size`. Bucket size is a tradeoff between number of files and the end-to-end latency of data being resolved. `<topic>` corresponds to one SQL table. `<schema_id>` changes whenever the SQL table schema changes, which allows us to guarantee to users that _all entries in a given file have the same schema_. `<uniquer>` is used to keep nodes in a cluster from overwriting each other's data and should be ignored by external users. `<ext>` implies the format of the file: currently the only option is `ndjson`, which means a text file conforming to the "Newline Delimited JSON" spec. Each record in the data files is a value, keys are not included, so the `envelope` option must be set to `row`, which is the default. Within a file, records are not guaranteed to be sorted by timestamp. A duplicate of some record might exist in a different file or even in the same file. The resolved timestamp files are named `<timestamp>.RESOLVED`. This is carefully done so that we can offer the following external guarantee: At any given time, if the the files are iterated in lexicographic filename order, then encountering any filename containing `RESOLVED` means that everything before it is finalized (and thus can be ingested into some other system and deleted, included in hive queries, etc). A typical user of cloudStorageSink would periodically do exactly this. Still TODO is writing out data schemas, Avro support, bounding memory usage. Eliminating duplicates would be great, but may not be immediately practical. Partially completes #28675 Release note (enterprise change): `CHANGEFEED`s now experimentally support writing to cloud storage, for easy use with analytics databases Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
- Loading branch information
Showing
10 changed files
with
764 additions
and
132 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.