Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix flaky test test_acl_revoke_pub_sub_while_subscribed #3768

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

chakaz
Copy link
Collaborator

@chakaz chakaz commented Sep 23, 2024

The reason it failed is that, in some rare cases, the subscriber did not
get the first few messages of the publisher. This is likely due to
timing of subscribe and publish, in different connections / threads.

Given Pub/Sub has very weak guarantees, it's probably ok as is, so I
just added a sleep to get the test to pass always.

Fixes #3678

async with async_timeout.timeout(10):
while total_msgs != 10:
try:
res = await channel.get_message(ignore_subscribe_messages=True, timeout=5)
if res is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so get_message can return None even if it doesn't timeout 😱

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I've seen it, but it still doesn't explain everything.
See the failure, it shows receiving message4 while expecting message0:

2024-09-23T10:58:07.4929895Z >                   assert res["data"] == f"message{total_msgs}"
2024-09-23T10:58:07.4930555Z E                   AssertionError: assert equals failed
2024-09-23T10:58:07.4931314Z E                     'message�^4�'  'message�^0�'

(ignore unprintable coloring characters)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow

The reason it failed is that, in some rare cases, the subscriber did not
get the first few messages of the publisher. This is likely due to
timing of subscribe and publish, in different connections / threads.

Given Pub/Sub has very weak guarantees, it's probably ok as is, so I
just added a sleep to get the test to pass always.
@chakaz chakaz changed the title WIP WIP WIP DO NOT REVIEW fix: Fix flaky test test_acl_revoke_pub_sub_while_subscribed Sep 24, 2024
@chakaz chakaz requested a review from kostasrim September 24, 2024 08:06
Copy link
Contributor

@kostasrim kostasrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

@@ -707,6 +710,10 @@ async def subscribe_worker(channel: aioredis.client.PubSub):
subscriber_obj = subscriber.pubsub()
await subscriber_obj.subscribe("channel")

# There's a rare timing issue if we don't wait here, but given the weak guarantees of Pub/Sub,
# that's probably OK.
await asyncio.sleep(1)
Copy link
Contributor

@kostasrim kostasrim Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look back on my PR that added the logs, I had the exact same suspicion. Although I added the asyncio.sleep on line 698 with hopes that the test would help the producer side (and then the subscriber would get all the messages). Little did I know that we also needed this here because I imagined that subscribe above would be enough to receive all of the messages. Oh well 🤷 :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that if it was a different command, it would have been a bug (that such a sleep is required). But Pub/Sub has very weak guarantees..

@chakaz chakaz merged commit 9aadc0c into main Sep 24, 2024
12 checks passed
@chakaz chakaz deleted the chakaz/pubsub-acl branch September 24, 2024 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_acl_revoke_pub_sub_while_subscribed failed
2 participants