Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The LoadTests Go GBK Flink Batch job is flaky #30507

Open
github-actions bot opened this issue Mar 5, 2024 · 6 comments
Open

The LoadTests Go GBK Flink Batch job is flaky #30507

github-actions bot opened this issue Mar 5, 2024 · 6 comments

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2024

The LoadTests Go GBK Flink Batch is failing over 50% of the time
Please visit https://github.com/apache/beam/actions/workflows/beam_LoadTests_Go_GBK_Flink_Batch.yml?query=is%3Afailure+branch%3Amaster to see the logs.

@volatilemolotov
Copy link
Contributor

Tried increasing the timeout to almost 12h but it still times out
https://github.com/volatilemolotov/beam/actions/runs/8627370677/job/23647207749

@github-actions github-actions bot added this to the 2.59.0 Release milestone Aug 20, 2024
@github-actions github-actions bot reopened this Aug 23, 2024
Copy link
Contributor Author

Reopening since the workflow is still flaky

@damccorm damccorm removed this from the 2.59.0 Release milestone Aug 23, 2024
@liferoad liferoad self-assigned this Nov 19, 2024
@liferoad
Copy link
Collaborator

Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
	at org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:207)
	at org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:181)

@liferoad
Copy link
Collaborator

liferoad commented Nov 21, 2024

Tested this locally:

./gradlew :sdks:go:test:load:run -PloadTest.mainClass=group_by_key -Prunner=FlinkRunner -PloadTest.args='--influx_namespace=flink --influx_measurement=go_batch_gbk_1 --input_options="{\"num_records\":200000000,\"key_size\":1,\"value_size\":9}" --iterations=1 --fanout=1 --parallelism=5 --endpoint=localhost:8099 --environment_type=DOCKER --environment_config=gcr.io/apache-beam-testing/beam-sdk/beam_go_sdk:latest --runner=FlinkRunner'
024/11/21 21:58:01 Failed to execute job:      connecting to job service
failed to dial server at localhost:8099
        caused by:
context deadline exceeded
panic: Failed to execute job:   connecting to job service
        failed to dial server at localhost:8099
                caused by:
        context deadline exceeded

goroutine 1 [running]:
github.com/apache/beam/sdks/v2/go/pkg/beam/log.Fatalf({0x234e280, 0x3bc7c60}, {0x21193ff?, 0x3bc7c60?}, {0xc00078ff28?, 0x0?, 0x0?})
        /usr/local/google/home/xqhu/Dev/beam/sdks/go/pkg/beam/log/log.go:162 +0x7d
main.main()
        /usr/local/google/home/xqhu/Dev/beam/sdks/go/test/load/group_by_key/group_by_key.go:98 +0x3c9

> Task :sdks:go:test:load:run FAILED

FAILURE: Build failed with an exception.

@liferoad
Copy link
Collaborator

image

liferoad added a commit that referenced this issue Nov 22, 2024
From #30507 (comment), try to use the default machine types for Flink with more memory.
liferoad added a commit that referenced this issue Nov 22, 2024
From #30507 (comment), try to use the default machine types for Flink with more memory.
@liferoad
Copy link
Collaborator

liferoad commented Nov 23, 2024

Steps to run a local test

  1. run the local flink cluster
wget https://downloads.apache.org/flink/flink-1.17.2/flink-1.17.2-bin-scala_2.12.tgz
tar zxvf flink-1.17.2-bin-scala_2.12.tgz
cd flink-1.17.2
./bin/start-cluster.sh
  1. run the job server
docker run --net=host gcr.io/apache-beam-testing/beam_portability/beam_flink1.17_job_server --flink-master=localhost:8081
  1. run a Go test
./gradlew :sdks:go:test:load:run -PloadTest.mainClass=group_by_key -Prunner=FlinkRunner -PloadTest.args='--influx_namespace=flink --influx_measurement=go_batch_gbk_1 --input_options="{\"num_records\":200,\"key_size\":1,\"value_size\":9}" --iterations=1 --fanout=1 --parallelism=1 --endpoint=localhost:8099 --environment_type=DOCKER --environment_config=gcr.io/apache-beam-testing/beam-sdk/beam_go_sdk --runner=PortableRunner'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants