Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48850][DOCS][SS][SQL] Add documentation for new options added to State Data Source #47274

Closed
wants to merge 4 commits into from

Conversation

eason-yuchen-liu
Copy link
Contributor

@eason-yuchen-liu eason-yuchen-liu commented Jul 9, 2024

What changes were proposed in this pull request?

In #46944 and #47188, we introduced some new options to the State Data Source. This PR aims to explain these new features in the documentation.

Why are the changes needed?

It is necessary to reflect the latest change in the documentation website.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The API Doc website can be rendered correctly.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the DOCS label Jul 9, 2024
@eason-yuchen-liu eason-yuchen-liu marked this pull request as ready for review July 9, 2024 22:14
@eason-yuchen-liu eason-yuchen-liu marked this pull request as draft July 10, 2024 17:45
@eason-yuchen-liu eason-yuchen-liu changed the title [SPARK-48850][DOCS][SS][SQL] Add documentation for snapshot related options in State Data Source [SPARK-48850][DOCS][SS][SQL] Add documentation for new options added to State Data Source Jul 10, 2024
@eason-yuchen-liu eason-yuchen-liu marked this pull request as ready for review July 10, 2024 22:23
Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind building the doc and capturing snapshot for the change? It would be much easier to review the change with markdown/code. Thanks!

<td>readChangeFeed</td>
<td>boolean</td>
<td>false</td>
<td>If set to true, will read the change of state over microbatches. The output table schema will also change. Two columns 'batch_id'(long) and 'change_type'(string) will be appended to the front. Option 'changeStartBatchId' must be specified with this option. Option 'batchId', 'joinSide', 'snapshotStartBatchId', 'snapshotPartitionId' is conflict with this option. An example usage of this option can be found below.</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output table schema will also change. Two columns 'batch_id'(long) and 'change_type'(string) will be appended to the front.

We could simply defer to the next section, to make the explanation to be concise.

is conflict with this option

cannot be used with this option (probably more clearer)

An example usage of this option can be found below.

probably better to explicitly mention the section name? link would be even better.

@eason-yuchen-liu
Copy link
Contributor Author

Screenshot 2024-07-11 at 11 43 28 AM
Screenshot 2024-07-11 at 11 43 42 AM

@eason-yuchen-liu
Copy link
Contributor Author

CI failure is not relevant.

@HeartSaVioR
Copy link
Contributor

https://github.com/eason-yuchen-liu/spark/runs/27338180115

Looks like it only failed with protobuf breaking change. Not sure why, probably it tried to find the proto file from changeset? I'll ignore it anyway.

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
…to State Data Source

### What changes were proposed in this pull request?

In apache#46944 and apache#47188, we introduced some new options to the State Data Source. This PR aims to explain these new features in the documentation.

### Why are the changes needed?

It is necessary to reflect the latest change in the documentation website.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The API Doc website can be rendered correctly.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47274 from eason-yuchen-liu/snapshot-doc.

Authored-by: Yuchen Liu <yuchen.liu@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…to State Data Source

### What changes were proposed in this pull request?

In apache#46944 and apache#47188, we introduced some new options to the State Data Source. This PR aims to explain these new features in the documentation.

### Why are the changes needed?

It is necessary to reflect the latest change in the documentation website.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The API Doc website can be rendered correctly.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47274 from eason-yuchen-liu/snapshot-doc.

Authored-by: Yuchen Liu <yuchen.liu@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants