Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2852] Add support for Kafka as source/sink on Nexmark #5019

Merged

Conversation

aromanenko-dev
Copy link
Contributor

Allows to use Kafka as as source/sink for Nexmark benchmark.
Based on original implementation of #3937 by @vectorijk

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand:
    • What the pull request does
    • Why it does it
    • How it does it
    • Why this approach
  • Each commit in the pull request should have a meaningful subject line and body.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@aromanenko-dev
Copy link
Contributor Author

R: @echauchot

@echauchot echauchot self-requested a review April 4, 2018 13:59
@echauchot
Copy link
Contributor

Thanks @aromanenko-dev !

@echauchot
Copy link
Contributor

Thanks @aromanenko-dev !
Please reword commit to include ticket number and meaningfull title, beware with authorship

Copy link
Contributor

@echauchot echauchot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
some small changes + one new small feature + include my PR

/**
* Send {@code events} to Kafka.
*/
private void sinkEventsToKafka(PCollection<Event> events) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please wire this method up and add a COMBINED mode similar to what is done in Pub/Sub?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I think we should refactor the whole COMBINED mode:

  • See NexmarkLauncher#createSource: it does a switch on the source type to configure sink.
  • it sends synthetic events to sink when in COMBINED mode but NexmarkUtils#COMBINED states that combine modes is for "Both publish and consume, but as separate jobs".
    Once refactored to something more coherent, implement it for kafka.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is important in the connection with MOM:

  • have the ability to keep a track of the generated events that lead to a benchmark result
  • be able to read events from a topic
  • write benchmark results to topic

* Send {@code events} to Kafka.
*/
private void sinkEventsToKafka(PCollection<Event> events) {
PTransform<PCollection<byte[]>, PDone> io = KafkaIO.<Long, byte[]>write()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment that I had on the previous PR. IMHO it is very wired code: it explicitely uses PTransform in place of Write transform and also it specifies a key and the associated coder whereas there is no key in the input PCollection. To be quicker I submited a PR to your repo so that you could include the fix in that PR branch. See aromanenko-dev#2

throw new RuntimeException("Missing --bootstrapServers");
}

KafkaIO.Read<Long, byte[]> io = KafkaIO.<Long, byte[]>read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename io to read

private PCollection<Event> sourceEventsFromKafka(Pipeline p) {
NexmarkUtils.console("Reading events from Kafka Topic %s", options.getKafkaSourceTopic());

if (Strings.isNullOrEmpty(options.getBootstrapServers())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use checkArgument

checkArgument(!Strings.isNullOrEmpty(options.getBootstrapServers()),
"Missing --bootstrapServers");

PTransform<PCollection<String>, PDone> io = KafkaIO.<Long, String>write()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above See my PR #2 on your repo

@aromanenko-dev aromanenko-dev force-pushed the BEAM-2852-nexmark-kafka-source-sink branch 2 times, most recently from 2e744e5 to 8f7d724 Compare April 5, 2018 09:59
@echauchot
Copy link
Contributor

@aromanenko-dev please update gradle build to reflect maven one, squash this commit with code style one and run "Run Java PreCommit" to launch gradle build. LGTM at green lights of gradle build

@aromanenko-dev aromanenko-dev force-pushed the BEAM-2852-nexmark-kafka-source-sink branch from ca3436c to 6df0add Compare April 10, 2018 13:27
@aromanenko-dev
Copy link
Contributor Author

Run Java PreCommit

@echauchot echauchot merged commit 506fb32 into apache:master Apr 10, 2018
@echauchot
Copy link
Contributor

echauchot commented Apr 10, 2018

Merging this PR, we will takle COMBINED mode (refactoring this mode in pub/sub and apply it to kafka) in another PR. I opened a ticket for that: https://issues.apache.org/jira/browse/BEAM-4048

@aromanenko-dev aromanenko-dev deleted the BEAM-2852-nexmark-kafka-source-sink branch April 10, 2018 14:27
@vectorijk
Copy link
Contributor

Thanks! @aromanenko-dev

@aromanenko-dev
Copy link
Contributor Author

@vectorijk Thank you for your initial work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants