-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3371] Enable running integration tests on Spark #6244
Conversation
BTW: what is the reason of the fact that filesToStage option is duplicated in |
Run Java PreCommit |
Created issues for failing tests: |
a96baa6
to
6bdb72e
Compare
@iemejia rebased and fixed the spotless issue. Could you take a look? |
@iemejia ping :) Alternatively, (if you're very busy), please suggest some other reviewer. |
You are absolutely right @lgajowy I should have passed this review since I was a bit loaded these last days. Asking @aromanenko-dev to take a look (thanks Alexey!). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I added a couple of minor notes.
|
||
List<String> result = PipelineResources.prepareFilesForStaging(filesToStage, temporaryLocation); | ||
|
||
assertThat(result, is(empty())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add another entry to filesToStage
list, which is existent path, and assert that it removes ONLY non existent one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, good idea.
...ers/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java
Outdated
Show resolved
Hide resolved
Hi @aromanenko-dev - I applied the suggestions. PTAL. |
Retest this please |
@aromanenko-dev I amended it a little bit and left a comment (ptal). |
...ers/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java
Outdated
Show resolved
Hide resolved
...ers/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java
Outdated
Show resolved
Hide resolved
...ers/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineExecutionEnvironment.java
Outdated
Show resolved
Hide resolved
@mxm @aromanenko-dev thanks! I added tests for Flink. Due to the fact that Spark runner invokes the code in run() method and creates SparkContext there (but does not delete it later?), I had some trouble with creating similar tests for Spark too so i didn't submit them in this PR. Maybe we can do this later. I got the following error when I run similar tests:
I tried to reuse the context (with @mxm @aromanenko-dev could you take a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lgajowy looks good for the Flink side.
@@ -435,7 +451,7 @@ public void visitPrimitiveTransform(TransformHierarchy.Node node) { | |||
protected <TransformT extends PTransform<? super PInput, POutput>> | |||
TransformEvaluator<TransformT> translate( | |||
TransformHierarchy.Node node, TransformT transform, Class<TransformT> transformClass) { | |||
//--- determine if node is bounded/unbounded. | |||
// --- determine if node is bounded/unbounded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted.
@@ -49,6 +63,7 @@ public void shouldRecognizeAndTranslateStreamingPipeline() { | |||
.apply( | |||
ParDo.of( | |||
new DoFn<Long, String>() { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted.
|
||
@Test | ||
public void shouldNotPrepareFilesToStageWhenFlinkMasterIsSetToLocal() throws IOException { | ||
FlinkPipelineOptions options = testPreparingResourcesToStage("[local]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 3 similar tests above could be only one parameterised but not a big deal for now.
Since Flink side is ok (thanks to @mxm for review) and other part is fine for me as well, so it LGTM and let's wait for green tests and I think we can merge it after. |
@lgajowy Could you squash all commits before? |
Run Java PreCommit |
Directories are now packaged to .jar files and placed in temporary location. They can be staged from there later by the runner.
… in Flink runner. Add unit tests
12d621d
to
dd0d74b
Compare
@aromanenko-dev I rewrote the history squashing the commits that were irrelevant. Initially, this PR was all about Spark (BEAM-3371) but I also added one commit related to Flink (BEAM-3370) which is about adding the tests. All this in light of the recent dev list discussion and docs. Let me know if this looks good to you too - I have never really received feedback about the history shape, although I try to care about it. It looks that we (commiters especially) should be more aware about the git history so this PR is a good chance to actually get the feedback and practice this. :) |
What is the status of this PR? |
@aaltay @aromanenko-dev suggested some additional manual checks what happens if we run this using spark-submit. I didn't test that yet - I should be able to do it later this week. |
Run Flink ValidatesRunner |
Run Spark ValidatesRunner |
@lgajowy I performed testing on my side - I created a fat jar with basic WordBasic pipeline using mvn artifacts built from the branch of this PR and run it on my virtual YARN/Spark cluster. I didn't see any issues with that, so, I think it LGTM and we can merge it. |
@aromanenko-dev thanks a lot! :) |
This PR enables running IOIT on Spark runner. One can do that by setting up remote spark cluster (spark master) and running this command:
I experienced some difficulties with TFRecordIOIT, ParquetIOIT and XmlIOIT tests:
java.lang.NoSuchMethodError: org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V
)java.io.FileNotFoundException: No files found for spec: PREFIX_1534520885787*
Those issues are of different nature than the one described in BEAM-3371 and have to be tackled separately.
@iemejia Could you take a look when you're back? There's no need to rush - I'll be off for 2 weeks now.
CC: @pabloem
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username
) to look at it.Post-Commit Tests Status (on master branch)