Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkCopy not raising exception on big datasets #475

Closed
marcinatpayholdingcom opened this issue May 7, 2024 · 10 comments
Closed

BulkCopy not raising exception on big datasets #475

marcinatpayholdingcom opened this issue May 7, 2024 · 10 comments
Assignees
Labels
bug Something isn't working investigation Additional investigation required

Comments

@marcinatpayholdingcom
Copy link

marcinatpayholdingcom commented May 7, 2024

I was copying data from MSSQL to ClickHouse. My table was 2M rows. Each time I was running the copy I was missing some rows. I was not getting any exception during the copy, everything was running nice and smooth (batch size 10k), but the data was missing, I reduced the data to 1k rows executed and this is what I got:

ClickHouse.Client.Copy.ClickHouseBulkCopySerializationException: Error when serializing data
 ---> System.InvalidCastException: Object cannot be cast from DBNull to other types.
   at System.DBNull.System.IConvertible.ToInt16(IFormatProvider provider)
   at ClickHouse.Client.Types.Int16Type.Write(ExtendedBinaryWriter writer, Object value)
   at ClickHouse.Client.Copy.ClickHouseBulkCopy.BulkCopyHttpContent.SerializeBatchAsync(Stream stream)
   --- End of inner exception stack trace ---
   at ClickHouse.Client.Copy.ClickHouseBulkCopy.BulkCopyHttpContent.SerializeBatchAsync(Stream stream)
   at ClickHouse.Client.Copy.ClickHouseBulkCopy.BulkCopyHttpContent.SerializeToStreamAsync(Stream stream, TransportContext context)
   at System.Net.Http.HttpContent.<CopyToAsync>g__WaitAsync|56_0(ValueTask copyTask)
   at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at ClickHouse.Client.ADO.ClickHouseConnection.PostContentAsync(String sql, HttpContent httpData, CancellationToken token)
   at ClickHouse.Client.Copy.ClickHouseBulkCopy.SendBatchAsync(BulkCopyHttpContent batchContent, CancellationToken token)
   at ClickHouse.Client.Copy.ClickHouseBulkCopy.WriteToServerAsync(IEnumerable`1 rows, CancellationToken token)
   at DataSync.Data.Clients.Clickhouse.ChClient.<>c__DisplayClass4_0.<<BulkInsertAsync>b__0>d.MoveNext()

Which explained me what is the problem. Grate, but it took me a long time to realize that I have wrong column definition in CH. But the exception was not thrown on big data copy, it was just silent.

@DarkWanderer DarkWanderer self-assigned this May 8, 2024
@DarkWanderer DarkWanderer added the bug Something isn't working label May 8, 2024
@DarkWanderer
Copy link
Owner

DarkWanderer commented May 8, 2024

Hi
Thank you for your report, that would be a serious bug. Can you clarify a bit more on how this appeared - was the exception happening somewhere near the beginning of dataset or near the end?

Also, if you would be able to reproduce it - does the issue reproduce on previous versions, e.g. 6.8.1?

@DarkWanderer
Copy link
Owner

I have tried to reproduce the issue in a test and it doesn't seem to appear with a synthetic test - a serialization exception is always correctly propagated. Can you provide more context please?

@DarkWanderer DarkWanderer added the investigation Additional investigation required label May 9, 2024
@marcinatpayholdingcom
Copy link
Author

marcinatpayholdingcom commented May 9, 2024 via email

@DarkWanderer
Copy link
Owner

Please try version 7.5.0 to see if it fixes the issue for you

@marcinatpayholdingcom
Copy link
Author

Nop, still not working correctly, here is a demo to replicate this problem. Let me know, maybe I am doing something wrong.
https://github.com/marcinatpayholdingcom/clickhouseclient_bug_replication

@DarkWanderer
Copy link
Owner

DarkWanderer commented May 26, 2024

Sorry - took me a while to get my hands on it. However, for me it works perfectly:

Hello, World!

O:\Projects\clickhouseclient_bug_replication\ClickHouse.ClientNotSerializing\bin\Debug\net8.0\ClickHouse.ClientNotSerializing.exe (process 21308) exited with code 0.
Press any key to close this window . . .

Any additional details I am missing? Do you use any special flags or parameters (e.g. UseSessions)? What OS was this reproed on?

Here are specific exceptions which are shown in the debugger:
image

image

@marcinatpayholdingcom
Copy link
Author

marcinatpayholdingcom commented May 26, 2024

I am running this on Windows
this part should throw the serialization exception but it does not:
image

I do not have any special setting, I just run it as it is in the project.
So the problem is, it does not throw, so only part of the data is inserted and chunks with broken data are skipped.

@DarkWanderer
Copy link
Owner

DarkWanderer commented May 26, 2024

For me it is triggered quite perfectly every time
image

Just to double check, when you say "exception is not thrown", have you checked it in the debugger by putting a breakpoint on the logger._LogError line, or did you check only by looking at the logs? Because your test project does not output any logs even in case of error

@marcinatpayholdingcom
Copy link
Author

marcinatpayholdingcom commented May 26, 2024

by setting the brekpoin at the log line and by seeing what data got inserted into Clickhouse.
When you get the serialization exception that means that no data got inserted into Clickhouse in the first attempt, am I right?

In my case there is some data inserted.

I think you can close this issue as not able to replicate. I have no clue what setup causes this situation on my machine.
First I thought that it was because of the original project I am working on, maybe I did something wrong. But when I created this demo and things are still failing incorrectly then I give up.

@DarkWanderer
Copy link
Owner

I think there may be just a misunderstanding here, which means I'll need to make docs more clear

When you get the serialization exception that means that no data got inserted into Clickhouse in the first attempt, am I right?

ClickHouse is not transactional (other than experimental features) - and hence BulkCopy adapter isn't as well. If you have a large set of data and the serialization (or insertion) has failed for one of the batches, it is expected that some of the batches may have made it through already, and it falls on the higher-level code to handles the situation. I.e. in this particular case, 20k of rows get into database before exception - and that's by design.

If you need to have some semblance of atomicity, the data needs to go as one 'batch' - so you can try setting BatchSize to Int32.MaxValue (reference). It would be slow and memory-consuming, but you would get a binary result.

@DarkWanderer DarkWanderer closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working investigation Additional investigation required
Projects
None yet
Development

No branches or pull requests

2 participants