-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Stream Load导入csv行数较多时行分割错误 #35954
Comments
请求头csv相关参数设置如下,其中f4为base64格式,所以包含\r\n:
相同的一批数据,如果按每10条一批插入就没问题,按10000条左右(共64MB左右)批量插入就会报错,其中某行会从lineN被截断 |
#34364 这个pr应该fix这个问题,你试下2.1.3版本看看的 |
好的非常感谢,我试一下 |
我跟这个pr应该不是相同的问题,他的问题是必现的,跟数据量无关,我的是跟批量文件大小相关的,把大文件(64MB左右)拆分为小几个文件再执行就不会报错 |
@heartdance 试了新版还是不行吗?那个bug大文件容易触发到 |
是的,还是有同样的问题 |
我看了下be日志,有如下报错: stream_load.cpp:349] append body content failed, errmsg=[INTERNAL_ERROR]cancelled: closed, id=...
stream_load_executor.cpp:100] fragment execute failed, err_msg=[DATA_QUALITY_ERROR]too many filtered rows, id=... |
please add my wechat: Faith_xzc |
我在想是不是doris数据加载时做了文件分割来并行加载,分割时没考虑enclose符号的问题,我也遇到了类似的问题 |
包围符里的数据有换行符的时候是有个bug可能会导致切分错误,#38347 这个pr最近修了 |
Search before asking
Version
doris-2.1.2-rc04
What's Wrong?
数据中包含换行符,但是使用引号包裹了,也设置了enclose: "。当数据行数不多时不会报错,一旦行数多了就会分割错误,报错为:Reason: actual column number in csv file is less than schema column number: 97, schema column number: 102
What You Expected?
行数多少应该不影响csv解析,怀疑是buffer或者哪里的问题没有正常解析
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: