-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on JSON import #19719
Comments
Does it help to set |
Indeed, the flag solved the problem! |
@inkrement Hello, maybe you can estimate the number of characters in a single JSON in your dataset? Maybe you have too large strings or simply many fields. I see a potential bug in code, but It could appear only on extremely big JSON's |
Hi! the documents should be rather small (max 30 fields & max. 1000 chars each). However, I already recognized some broken documents (some other processes inferred at the output). If there is something broken, does the CH-parser detect it directly at the end of the current JSON document or could it be the case that it keeps reading? As the single-threaded version works fine, I guess this has to do with distribution of lines/documents. |
@inkrement Thanks! The parallelized version of parsers consists of several pieces and one of the is
It detects the line in document where the error occurred, but your case is unusual. |
Just to be sure: Are we talking about JSON or JSONEachRow? |
JSONEachRow, because JSON is not supported for input. |
It reproduces easily
|
Ok, thanks! In this case, I guess my JSON is broken.. Thanks anyways (I'll close the issue for now). |
Describe the bug
When importing a jsonlines file (gzip-compressed around 2GB), I always get an error message indicating some issues regarding memory allocation. The machine has 512GB RAM and most of it is not used, therefore, it could be related with the configuration or just simply a software bug. I am using the default config, but did not change it since a year (has something changed in the default values that could be relevant here?). Either way, I am not sure how to fix it. I am using the most recent version (Client & Server 21.1.2.15) and I can reproduce the error. However, due to copyright issues, I cannot share the original dataset.
I import the file as follows
Error message and/or stacktrace
And after a while, I always receive the following error message:
The text was updated successfully, but these errors were encountered: