[feature]:support spark connector sink data to doris #6256

Kyofin · 2021-07-17T00:18:41Z

Proposed changes

support spark conncetor write dataframe to doris

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation Update (if none of the other choices apply)
Code refactor (Modify the code structure, format the code, etc...)
Optimization. Including functional usability improvements and performance improvements.
Dependency. Such as changes related to third-party components.
Other.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
Compiling and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
If these changes need document changes, I have updated the document
Any dependent changes have been merged

Further comments

JNSimba · 2021-07-23T05:57:48Z

...on/spark-doris-connector/src/main/scala/org/apache/doris/spark/sql/DorisSourceProvider.scala

+
+      val buffer = ListBuffer[String]()
+      partition.foreach(row => {
+        val rowString = row.toSeq.mkString("\t")


Do need to consider the Null value ？

hf200012

It is recommended to add the number of failed load retries
It is recommended to get the list of surviving BE nodes through FE, and directly connect to BE to execute Stream load through rotation training or other strategies.
Add to execute stream load according to the time interval, two strategies, one is according to the number of records you have implemented so far, the other is according to the time interval

hf200012 · 2021-08-05T04:22:55Z

extension/spark-doris-connector/src/main/java/org/apache/doris/spark/DorisStreamLoad.java

+    }
+
+    public void load(String value) throws StreamLoadException {
+        LoadResponse loadResponse = loadBatch(value);


It is recommended to add the maximum number of failed retries ，Avoid failures caused by short-term network jitter

hf200012 · 2021-08-05T04:24:29Z

extension/spark-doris-connector/src/main/java/org/apache/doris/spark/DorisStreamLoad.java

+
+    private LoadResponse loadBatch(String value) {
+        Calendar calendar = Calendar.getInstance();
+        String label = String.format("audit_%s%02d%02d_%02d%02d%02d_%s",


It is recommended to use spark_connector_ for the label name, which is easy to distinguish

hf200012 · 2021-08-05T04:27:30Z

extension/spark-doris-connector/src/main/scala/org/apache/doris/spark/sql/DorisOptions.scala

+package org.apache.doris.spark.sql
+
+object DorisOptions {
+  val beHostPort="beHostPort"


The IP address of FE should be configured here, and the list of surviving BE nodes can be obtained through FE, and then communication training or other strategies during Stream load, connect to BE to execute load, and avoid the pressure of FE

morningman

LGTM

github-actions · 2021-08-15T11:38:53Z

PR approved by at least one committer and no changes requested.

github-actions · 2021-08-15T11:38:55Z

PR approved by anyone and no changes requested.

support spark conector write dataframe to doris

Kyofin added 2 commits July 16, 2021 11:54

[feature]:support spark connector sink to doris

536c852

[feature]:support multiple be host port

8f97616

morningman added area/spark-connector Issues or PRs related to Spark connector kind/feature Categorizes issue or PR as related to a new feature. labels Jul 17, 2021

JNSimba reviewed Jul 23, 2021

View reviewed changes

Kyofin added 2 commits July 28, 2021 12:03

fix: consider row field is null value

5c0052f

test: add dataframe sink doris test

72d9217

hf200012 reviewed Aug 5, 2021

View reviewed changes

Kyofin added 2 commits August 13, 2021 14:08

feat: get be node from fe

4da4d41

feat: implement retry when do stream load to doris

20436eb

morningman approved these changes Aug 15, 2021

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 15, 2021

github-actions bot added the reviewed label Aug 15, 2021

morningman merged commit b13e512 into apache:master Aug 16, 2021

songchuangyuan pushed a commit to songchuangyuan/incubator-doris that referenced this pull request Aug 20, 2021

[Feature] Support spark connector sink data to Doris (apache#6256)

1493134

support spark conector write dataframe to doris

morningman mentioned this pull request Oct 10, 2021

Release note 0.15.0 #6806

Closed

songchuangyuan pushed a commit to songchuangyuan/incubator-doris that referenced this pull request Oct 18, 2021

[Feature] Support spark connector sink data to Doris (apache#6256)

aeef019

support spark conector write dataframe to doris

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature]:support spark connector sink data to doris #6256

[feature]:support spark connector sink data to doris #6256

Kyofin commented Jul 17, 2021

JNSimba Jul 23, 2021

hf200012 left a comment

hf200012 Aug 5, 2021

hf200012 Aug 5, 2021

hf200012 Aug 5, 2021

morningman left a comment

github-actions bot commented Aug 15, 2021

github-actions bot commented Aug 15, 2021

[feature]:support spark connector sink data to doris #6256

[feature]:support spark connector sink data to doris #6256

Conversation

Kyofin commented Jul 17, 2021

Proposed changes

Types of changes

Checklist

Further comments

JNSimba Jul 23, 2021

Choose a reason for hiding this comment

hf200012 left a comment

Choose a reason for hiding this comment

hf200012 Aug 5, 2021

Choose a reason for hiding this comment

hf200012 Aug 5, 2021

Choose a reason for hiding this comment

hf200012 Aug 5, 2021

Choose a reason for hiding this comment

morningman left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 15, 2021

github-actions bot commented Aug 15, 2021