-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lightning: support data files with bom header (#40813) #40834
lightning: support data files with bom header (#40813) #40834
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/run-build |
…813-to-release-6.5
/run-build |
/retest-required |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: de917c3
|
/retest-required |
3 similar comments
/retest-required |
/retest-required |
/retest-required |
@ti-chi-bot: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This is an automated cherry-pick of #40813
What problem does this PR solve?
Issue Number: close #40744
Problem Summary:
What is changed and how it works?
When reading the CSV / SQL data file's first block, trim the BOM header, and update the processed position.
Originally I planned to use a wrapped Reader to handle that. However, this will cause the processed position in parser not updated, thus resulting in abnormalities.
For example, when the file has a BOM header, after processing the first line data with 10 bytes, the parser position should be set to 10 + 3 = 13 , not 10. If set to 10, after processing all the file data, the final position is not equal to the chunk end offset, which is calculated by the file size. Then this termination if clause won't be hit:
tidb/br/pkg/lightning/restore/restore.go
Lines 2722 to 2725 in 4686338
So the data manipulation is handled when reading data blocks. In this way we can also update the processed positions in the block parser.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.