Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [connector-doris] 包含%的数据导入报错 #4242

Closed
1 task done
feng-kui opened this issue Mar 1, 2023 · 12 comments · Fixed by #4880
Closed
1 task done

[Bug] [connector-doris] 包含%的数据导入报错 #4242

feng-kui opened this issue Mar 1, 2023 · 12 comments · Fixed by #4880

Comments

@feng-kui
Copy link

feng-kui commented Mar 1, 2023

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

varchar类型字段的值中,包含%,stream load导入报错(但老版本2.1.3没有问题),如下:

{"timestamp":"2023-03-01T02:44:08.106+00:00","status":500,"error":"Internal Server Error","path":"/api/example_db/table20/_stream_load"}

SeaTunnel Version

2.3.0

SeaTunnel Config

source {
  Hive {
    table_name = "dw.test"
    metastore_uri = "thrift://x.x.x.x:9083"
    result_table_name = "tmp_table"
  }

}

transform {

  sql {
    sql = "select '1' id,'a%' name from tmp_table"
  }
}

sink {
  Doris {
    nodeUrls=["header-01:8031"]
    database="test"
    table="test"
    username="root"
    password="123"
    batch_max_rows=1000
    max_retries=0
    labelPrefix="fix_test_"
    sink.properties.format="JSON"
    sink.properties.strip_outer_array = "true"
}
}


### Running Command

```shell
start-seatunnel-spark-connector-v2.sh -m yarn -e client -c test.conf

Error Exception

{"timestamp":"2023-03-01T02:44:08.106+00:00","status":500,"error":"Internal Server Error","path":"/api/example_db/test/_stream_load"}


### Flink or Spark Version

_No response_

### Java or Scala Version

_No response_

### Screenshots

_No response_

### Are you willing to submit PR?

- [ ] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
@feng-kui feng-kui added the bug label Mar 1, 2023
@zhaifb
Copy link
Contributor

zhaifb commented Mar 3, 2023

我也是遇到类似的报错
{"timestamp":"2023-03-01T02:44:08.106+00:00","status":500,"error":"Internal Server Error","path":"/api/example_db/table20/_stream_load"}

请问你是怎么定位出是里面含有%导致的?怎么看具体报错信息?

@Hisoka-X
Copy link
Member

Hisoka-X commented Mar 4, 2023

@CalvinKirs Hi, clavin. any help info for this? Is SeaTunnel problem or Doris problem?

@feng-kui
Copy link
Author

feng-kui commented Mar 7, 2023

我也是遇到类似的报错 {"timestamp":"2023-03-01T02:44:08.106+00:00","status":500,"error":"Internal Server Error","path":"/api/example_db/table20/_stream_load"}

请问你是怎么定位出是里面含有%导致的?怎么看具体报错信息?

一批数据一批数据的试,最后定位到报错数据,对字段的值看着可疑的,同步试验出来的

@TyrantLucifer
Copy link
Member

@CalvinKirs @zy-kkk PTAL

@CalvinKirs
Copy link
Member

Which version is doris?

@CalvinKirs
Copy link
Member

apache/doris#17267

@feng-kui
Copy link
Author

doris是哪个版本的?

doris是1.2.1版本

@feng-kui
Copy link
Author

阿帕奇/多丽丝#17267

这个也是我提的,seatunnel版本2.1.3没有这个问题(写入的doris集群为同一个)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

@github-actions github-actions bot added the stale label Apr 26, 2023
@666fishsun
Copy link

@feng-kui hi, is it still not solved? Is that still not work with the latest version of seatunnel?

@zhaifb
Copy link
Contributor

zhaifb commented Jun 2, 2023

@CalvinKirs @zy-kkk @TyrantLucifer 这个问题我们发现是因为 doris FE 中spring mvc 对 application/x-www-form-urlencoded 类型解码造成的。

我们调整了发送的头信息,改成非 application/x-www-form-urlencoded类型就可以了。

我们查看了,doris的文档和 flink doris connector,没有地方需要传递content-type,但是spring mvc 校验出现问题。不传递content-type 默认就会使 application/x-www-form-urlencoded 则会引发问题。
显示传递一个 application/json 问题解决,目前没有遇到问题。

curl -XPUT --location-trusted -u root: -H "label:test3" -H "timeout:10000" -H "format: csv" -H "columns: id,name" -H "column_separator:," -H "Content-type: application/json" -H "Expect: 100-continue" http://172.18.116.60:8030/api/test/table20/_stream_load -d '1,a
2,b
3,c
4,%'

@zhaifb
Copy link
Contributor

zhaifb commented Jun 2, 2023

image
@feng-kui seatunel 2.1.3版本没问题是因为,指定的content-type不是application/x-www-form-urlencoded 所以没问题。@CalvinKirs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants