-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: Metamorphic roachtests #111066
Labels
A-cdc
Change Data Capture
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-cdc
Comments
cc @cockroachdb/cdc |
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Nov 16, 2023
Prior to this commit, roachtest/cdc relies solely on periodic checks of changefeed status and latency. This patch takes the first step to introduce a metamorphic testing framework. Given the absence of a way to evaluate the output file correctness yet, this new approach involves running two changefeeds with different configurations, retrieving their roachtests’ output files, and comparing their data outputs. Due to potential duplicates in the changefeed output, the test follows these steps” 1. create two empty tables with the same scheme as the workload tables 2. convert parquet data to datums 3. execute `UPSERT` statements on the tables with the datums to eliminate duplicates 4. confirm the identical content of the two tables by checking their fingerprints Limitations with this approach include: - This solution only works for parquet files as of now. (A round trip conversion is guaranteed between parquet data format and datums. Other data formats are more complicated.) - INSERT is the only operation involved. - Due to the large file size, the test randomly selects one target table for changefeeds. - Currently, the changefeeds use the same configurations. However, we plan to change this soon following a discussion to determine the specfic configurations that will be randomized. Part of: cockroachdb#111066 Release note: None
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Nov 16, 2023
Prior to this commit, roachtest/cdc relies solely on periodic checks of changefeed status and latency. This patch takes the first step to introduce a metamorphic testing framework. Given the absence of a way to evaluate the output file correctness yet, this new approach involves running two changefeeds with different configurations, retrieving their roachtests’ output files, and comparing their data outputs. Due to potential duplicates in the changefeed output, the test follows these steps” 1. create two empty tables with the same scheme as the workload tables 2. convert parquet data to datums 3. execute `UPSERT` statements on the tables with the datums to eliminate duplicates 4. confirm the identical content of the two tables by checking their fingerprints Limitations with this approach include: - This solution only works for parquet files as of now. (A round trip conversion is guaranteed between parquet data format and datums. Other data formats are more complicated.) - INSERT is the only operation involved. - Due to the large file size, the test randomly selects one target table for changefeeds. - Currently, the changefeeds use the same configurations. However, we plan to change this soon following a discussion to determine the specfic configurations that will be randomized. Part of: cockroachdb#111066 Release note: None
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Nov 16, 2023
Prior to this commit, roachtest/cdc relies solely on periodic checks of changefeed status and latency. This patch takes the first step to introduce a metamorphic testing framework. Given the absence of a way to evaluate the output file correctness yet, this new approach involves running two changefeeds with different configurations, retrieving their roachtests’ output files, and comparing their data outputs. Due to potential duplicates in the changefeed output, the test follows these steps” 1. create two empty tables with the same scheme as the workload tables 2. convert parquet data to datums 3. execute `UPSERT` statements on the tables with the datums to eliminate duplicates 4. confirm the identical content of the two tables by checking their fingerprints Limitations with this approach include: - This solution only works for parquet files as of now. (A round trip conversion is guaranteed between parquet data format and datums. Other data formats are more complicated.) - INSERT is the only operation involved. - Due to the large file size, the test randomly selects one target table for changefeeds. - Currently, the changefeeds use the same configurations. However, we plan to change this soon following a discussion to determine the specfic configurations that will be randomized. Part of: cockroachdb#111066 Release note: None
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 15, 2024
Prior to this commit, roachtest/cdc relies solely on periodic checks of changefeed status and latency. This patch takes the first step to introduce a metamorphic testing framework. Given the absence of a way to evaluate the output file correctness yet, this new approach involves running two changefeeds with different configurations, retrieving their roachtests’ output files, and comparing their data outputs. Due to potential duplicates in the changefeed output, the test follows these steps” 1. create two empty tables with the same scheme as the workload tables 2. convert parquet data to datums 3. execute `UPSERT` statements on the tables with the datums to eliminate duplicates 4. confirm the identical content of the two tables by checking their fingerprints Limitations with this approach include: - This solution only works for parquet files as of now. (A round trip conversion is guaranteed between parquet data format and datums. Other data formats are more complicated.) - INSERT is the only operation involved. - Due to the large file size, the test randomly selects one target table for changefeeds. - Currently, the changefeeds use the same configurations. However, we plan to change this soon following a discussion to determine the specfic configurations that will be randomized. Part of: cockroachdb#111066 Release note: None
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 15, 2024
Prior to this commit, roachtest/cdc relies solely on periodic checks of changefeed status and latency. This patch takes the first step to introduce a metamorphic testing framework. Given the absence of a way to evaluate the output file correctness yet, this new approach involves running two changefeeds with different configurations, retrieving their roachtests’ output files, and comparing their data outputs. Due to potential duplicates in the changefeed output, the test follows these steps” 1. create two empty tables with the same scheme as the workload tables 2. convert parquet data to datums 3. execute `UPSERT` statements on the tables with the datums to eliminate duplicates 4. confirm the identical content of the two tables by checking their fingerprints Limitations with this approach include: - This solution only works for parquet files as of now. (A round trip conversion is guaranteed between parquet data format and datums. Other data formats are more complicated.) - INSERT is the only operation involved. - Due to the large file size, the test randomly selects one target table for changefeeds. - Currently, the changefeeds use the same configurations. However, we plan to change this soon following a discussion to determine the specfic configurations that will be randomized. Part of: cockroachdb#111066 Release note: None
craig bot
pushed a commit
that referenced
this issue
Feb 16, 2024
114504: roachtest/tests: introduce metamorphic testing to cdc r=jayshrivastava a=wenyihu6 Prior to this commit, roachtest/cdc relies solely on periodic checks of changefeed status and latency. This patch takes the first step to introduce a metamorphic testing framework. Given the absence of a way to evaluate the output file correctness yet, this new approach involves running two changefeeds with different configurations, retrieving their roachtests’ output files, and comparing their data outputs. Due to potential duplicates in the changefeed output, the test follows these steps: 1. create two empty tables with the same scheme as the workload tables 2. convert parquet data to datums 3. execute `UPSERT` statements on the tables with the datums to eliminate duplicates 4. confirm the identical content of the two tables by checking their fingerprints Limitations with this approach include: - This solution only works for parquet files as of now. (A round trip conversion is guaranteed between parquet data format and datums. Other data formats are more complicated.) - INSERT is the only operation involved. - Due to the large file size, the test randomly selects one target table for changefeeds. - Currently, the changefeeds use the same configurations. However, we plan to change this soon following a discussion to determine the specfic configurations that will be randomized. Part of: #111066 Release note: None Co-authored-by: Wenyi Hu <wenyi@cockroachlabs.com>
wenyihu6
added a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 21, 2024
Prior to this commit, roachtest/cdc relies solely on periodic checks of changefeed status and latency. This patch takes the first step to introduce a metamorphic testing framework. Given the absence of a way to evaluate the output file correctness yet, this new approach involves running two changefeeds with different configurations, retrieving their roachtests’ output files, and comparing their data outputs. Due to potential duplicates in the changefeed output, the test follows these steps” 1. create two empty tables with the same scheme as the workload tables 2. convert parquet data to datums 3. execute `UPSERT` statements on the tables with the datums to eliminate duplicates 4. confirm the identical content of the two tables by checking their fingerprints Limitations with this approach include: - This solution only works for parquet files as of now. (A round trip conversion is guaranteed between parquet data format and datums. Other data formats are more complicated.) - INSERT is the only operation involved. - Due to the large file size, the test randomly selects one target table for changefeeds. - Currently, the changefeeds use the same configurations. However, we plan to change this soon following a discussion to determine the specfic configurations that will be randomized. Part of: cockroachdb#111066 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-cdc
Change Data Capture
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-cdc
Implement metamorphic roachtests tests to increase feature coverage confidence.
Some of the ideas are:
Jira issue: CRDB-31748
The text was updated successfully, but these errors were encountered: