-
-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BulkCopy not raising exception on big datasets #475
Comments
Hi Also, if you would be able to reproduce it - does the issue reproduce on previous versions, e.g. 6.8.1? |
I have tried to reproduce the issue in a test and it doesn't seem to appear with a synthetic test - a serialization exception is always correctly propagated. Can you provide more context please? |
I will try to build demo for you.
On 9 May 2024, at 12:11, Oleg V. Kozlyuk ***@***.***> wrote:
I have tried to reproduce the issue in a test and it doesn't seem to appear with a synthetic test - a serialization exception is always correctly propagated. Can you provide more context please?
—
Reply to this email directly, view it on GitHub<#475 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AYPRFDYB2YAGH2KUWALCNJTZBNDUBAVCNFSM6AAAAABHL2EAQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSGM3DMMZTG4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Please try version 7.5.0 to see if it fixes the issue for you |
Nop, still not working correctly, here is a demo to replicate this problem. Let me know, maybe I am doing something wrong. |
by setting the brekpoin at the log line and by seeing what data got inserted into Clickhouse. In my case there is some data inserted. I think you can close this issue as not able to replicate. I have no clue what setup causes this situation on my machine. |
I think there may be just a misunderstanding here, which means I'll need to make docs more clear
ClickHouse is not transactional (other than experimental features) - and hence BulkCopy adapter isn't as well. If you have a large set of data and the serialization (or insertion) has failed for one of the batches, it is expected that some of the batches may have made it through already, and it falls on the higher-level code to handles the situation. I.e. in this particular case, 20k of rows get into database before exception - and that's by design. If you need to have some semblance of atomicity, the data needs to go as one 'batch' - so you can try setting |
I was copying data from MSSQL to ClickHouse. My table was 2M rows. Each time I was running the copy I was missing some rows. I was not getting any exception during the copy, everything was running nice and smooth (batch size 10k), but the data was missing, I reduced the data to 1k rows executed and this is what I got:
Which explained me what is the problem. Grate, but it took me a long time to realize that I have wrong column definition in CH. But the exception was not thrown on big data copy, it was just silent.
The text was updated successfully, but these errors were encountered: