Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature]:support spark connector sink data to doris #6256

Merged
merged 6 commits into from
Aug 16, 2021
Merged

[feature]:support spark connector sink data to doris #6256

merged 6 commits into from
Aug 16, 2021

Conversation

Kyofin
Copy link
Contributor

@Kyofin Kyofin commented Jul 17, 2021

Proposed changes

support spark conncetor write dataframe to doris

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)
  • Optimization. Including functional usability improvements and performance improvements.
  • Dependency. Such as changes related to third-party components.
  • Other.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

Further comments

@morningman morningman added area/spark-connector Issues or PRs related to Spark connector kind/feature Categorizes issue or PR as related to a new feature. labels Jul 17, 2021

val buffer = ListBuffer[String]()
partition.foreach(row => {
val rowString = row.toSeq.mkString("\t")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do need to consider the Null value ?

Copy link
Contributor

@hf200012 hf200012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It is recommended to add the number of failed load retries
  2. It is recommended to get the list of surviving BE nodes through FE, and directly connect to BE to execute Stream load through rotation training or other strategies.
  3. Add to execute stream load according to the time interval, two strategies, one is according to the number of records you have implemented so far, the other is according to the time interval

}

public void load(String value) throws StreamLoadException {
LoadResponse loadResponse = loadBatch(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to add the maximum number of failed retries ,Avoid failures caused by short-term network jitter


private LoadResponse loadBatch(String value) {
Calendar calendar = Calendar.getInstance();
String label = String.format("audit_%s%02d%02d_%02d%02d%02d_%s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to use spark_connector_ for the label name, which is easy to distinguish

package org.apache.doris.spark.sql

object DorisOptions {
val beHostPort="beHostPort"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IP address of FE should be configured here, and the list of surviving BE nodes can be obtained through FE, and then communication training or other strategies during Stream load, connect to BE to execute load, and avoid the pressure of FE

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 15, 2021
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit b13e512 into apache:master Aug 16, 2021
songchuangyuan pushed a commit to songchuangyuan/incubator-doris that referenced this pull request Aug 20, 2021
support spark conector write dataframe to doris
@morningman morningman mentioned this pull request Oct 10, 2021
songchuangyuan pushed a commit to songchuangyuan/incubator-doris that referenced this pull request Oct 18, 2021
support spark conector write dataframe to doris
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/spark-connector Issues or PRs related to Spark connector kind/feature Categorizes issue or PR as related to a new feature. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants