Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky-test: Failed to get tenant admin data for tenant public #15652

Closed
lhotari opened this issue May 18, 2022 · 13 comments
Closed

Flaky-test: Failed to get tenant admin data for tenant public #15652

lhotari opened this issue May 18, 2022 · 13 comments

Comments

@lhotari
Copy link
Member

lhotari commented May 18, 2022

There's a common problem in tests which make them flaky "Failed to get tenant admin data for tenant public" happens when creating a namespace.

example failure

2022-05-18T06:44:24,018 - INFO  - [docker-java-stream--110056506:DockerUtils$2@252] - DOCKER.exec(ReaderMessagingTest-cpgsi-pulsar-broker-1:/pulsar/bin/pulsar-admin namespaces create public/ns-ivorhkkv --clusters ReaderMessagingTest-cpgsi): STDOUT: 2022-05-18T06:44:24,012+0000 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/namespaces/public/ns-ivorhkkv] Failed to perform http put request: javax.ws.rs.InternalServerErrorException: HTTP 500 Failed to get data from /admin/policies/public
  2022-05-18T06:44:24,079 - INFO  - [docker-java-stream--110056506:DockerUtils$2@252] - DOCKER.exec(ReaderMessagingTest-cpgsi-pulsar-broker-1:/pulsar/bin/pulsar-admin namespaces create public/ns-ivorhkkv --clusters ReaderMessagingTest-cpgsi): STDERR: --- An unexpected error occurred in the server ---
  
  Message: Failed to get data from /admin/policies/public
  
  Stacktrace:
  
  org.apache.pulsar.metadata.api.MetadataStoreException: Failed to get data from /admin/policies/public
  	at org.apache.pulsar.broker.resources.BaseResources.get(BaseResources.java:88)
  	at org.apache.pulsar.broker.resources.TenantResources.getTenant(TenantResources.java:62)
  	at org.apache.pulsar.broker.web.PulsarWebResource.validateClusterForTenant(PulsarWebResource.java:401)
  	at org.apache.pulsar.broker.admin.impl.NamespacesBase.lambda$validatePolicies$89(NamespacesBase.java:2068)
  	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
  	at org.apache.pulsar.broker.admin.impl.NamespacesBase.validatePolicies(NamespacesBase.java:2068)
  	at org.apache.pulsar.broker.admin.impl.NamespacesBase.lambda$internalCreateNamespace$3(NamespacesBase.java:136)
  	at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718)
  	at java.base/java.util.concurrent.CompletableF
  2022-05-18T06:44:24,079 - INFO  - [docker-java-stream--110056506:DockerUtils$2@252] - DOCKER.exec(ReaderMessagingTest-cpgsi-pulsar-broker-1:/pulsar/bin/pulsar-admin namespaces create public/ns-ivorhkkv --clusters ReaderMessagingTest-cpgsi): STDERR: uture.postComplete(CompletableFuture.java:510)
  	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
  	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$existsFromStore$8(ZKMetadataStore.java:315)
  	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  	at java.base/java.lang.Thread.run(Thread.java:833)
  Caused by: java.util.concurrent.TimeoutException
  	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1960)
  	at java.base/java.util.concurr
@lhotari
Copy link
Member Author

lhotari commented May 18, 2022

Similar problem in a completely different test.

2022-05-18T07:03:48,574+0000 [metadata-store-12-1] ERROR org.apache.pulsar.broker.web.PulsarWebResource - Failed to get tenant admin data for tenant public
org.apache.pulsar.metadata.api.MetadataStoreException: Failed to get data from /admin/policies/public
	at org.apache.pulsar.broker.resources.BaseResources.get(BaseResources.java:88) ~[org.apache.pulsar-pulsar-broker-common-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at org.apache.pulsar.broker.resources.TenantResources.getTenant(TenantResources.java:62) ~[org.apache.pulsar-pulsar-broker-common-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at org.apache.pulsar.broker.web.PulsarWebResource.validateClusterForTenant(PulsarWebResource.java:401) ~[org.apache.pulsar-pulsar-broker-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at org.apache.pulsar.broker.admin.impl.NamespacesBase.lambda$validatePolicies$89(NamespacesBase.java:2068) ~[org.apache.pulsar-pulsar-broker-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
	at org.apache.pulsar.broker.admin.impl.NamespacesBase.validatePolicies(NamespacesBase.java:2068) ~[org.apache.pulsar-pulsar-broker-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at org.apache.pulsar.broker.admin.impl.NamespacesBase.lambda$internalCreateNamespace$3(NamespacesBase.java:136) ~[org.apache.pulsar-pulsar-broker-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$existsFromStore$8(ZKMetadataStore.java:315) ~[org.apache.pulsar-pulsar-metadata-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1960) ~[?:?]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2095) ~[?:?]
	at org.apache.pulsar.broker.resources.BaseResources.get(BaseResources.java:83) ~[org.apache.pulsar-pulsar-broker-common-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	... 17 more
2022-05-18T07:03:48,578+0000 [metadata-store-12-1] ERROR org.apache.pulsar.broker.admin.v2.Namespaces - [null] Failed to create namespace public/ns-vxxptgiy
java.util.concurrent.CompletionException: org.apache.pulsar.broker.web.RestException: HTTP 500 Failed to get data from /admin/policies/public
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:722) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$existsFromStore$8(ZKMetadataStore.java:315) ~[org.apache.pulsar-pulsar-metadata-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.apache.pulsar.broker.web.RestException: HTTP 500 Failed to get data from /admin/policies/public
	at org.apache.pulsar.broker.web.PulsarWebResource.validateClusterForTenant(PulsarWebResource.java:408) ~[org.apache.pulsar-pulsar-broker-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at org.apache.pulsar.broker.admin.impl.NamespacesBase.lambda$validatePolicies$89(NamespacesBase.java:2068) ~[org.apache.pulsar-pulsar-broker-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at java.lang.Iterable.forEach(Iterable.java:75) ~[?:?]
	at org.apache.pulsar.broker.admin.impl.NamespacesBase.validatePolicies(NamespacesBase.java:2068) ~[org.apache.pulsar-pulsar-broker-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at org.apache.pulsar.broker.admin.impl.NamespacesBase.lambda$internalCreateNamespace$3(NamespacesBase.java:136) ~[org.apache.pulsar-pulsar-broker-2.11.0-SNAPSHOT.jar:2.11.0-SNAPSHOT]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
	... 10 more

@lhotari lhotari changed the title Flaky-test: ReaderMessagingTest.testReaderReconnectAndRead Flaky-test: Failed to get tenant admin data for tenant public May 18, 2022
@lhotari
Copy link
Member Author

lhotari commented May 18, 2022

@Technoboy- Have you seen these errors? asking, since #15518 changes seem in the same area where these timeouts happen.

@lhotari
Copy link
Member Author

lhotari commented May 18, 2022

@lhotari
Copy link
Member Author

lhotari commented May 18, 2022

@lhotari
Copy link
Member Author

lhotari commented May 18, 2022

#15603 might be one of the changes to check as well.

I'm running an experiment where I revert both #15603 and #15518 to see if that makes a difference.

Experiment is in my own fork, lhotari#70

@lhotari
Copy link
Member Author

lhotari commented May 18, 2022

Yet another flaky failure from another PR, https://github.com/apache/pulsar/runs/6463528996?check_suite_focus=true . That failed in @wolfstudy 's PR #15628 .

@Technoboy- Technoboy- self-assigned this May 18, 2022
@Technoboy-
Copy link
Contributor

Yes, this is related to #15518.
There is one place to call sync method in the createNamespace. I will fix it later.
Thanks @lhotari for helping creating this issue.

@Technoboy-
Copy link
Contributor

Make a fix here d2c0fc8

@lhotari
Copy link
Member Author

lhotari commented May 18, 2022

Make a fix here d2c0fc8

@Technoboy- looks good, is that already in a PR?

@Technoboy-
Copy link
Contributor

Make a fix here d2c0fc8

@Technoboy- looks good, is that already in a PR?

It contains in #15605.

@lhotari
Copy link
Member Author

lhotari commented May 18, 2022

@Technoboy- It looks like #15605 is very flaky since you had to re-run tests. Did you investigate the failures?

I reverted a few PRs which were dependent in my experiment lhotari#70 and all tests passed on the first run attempt without any re-running. Perhaps I was just lucky...

@Technoboy-
Copy link
Contributor

@Technoboy- It looks like #15605 is very flaky since you had to re-run tests. Did you investigate the failures?

I reverted a few PRs which were dependent in my experiment lhotari#70 and all tests passed on the first run attempt without any re-running. Perhaps I was just lucky...

Re-run only one time, it's an old flaky test. And I close 4 patches and reopen to test the fix, it's all passed.

@Technoboy-
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants