Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc test flakyness fixes #5233

Merged
merged 5 commits into from
Sep 11, 2023
Merged

Misc test flakyness fixes #5233

merged 5 commits into from
Sep 11, 2023

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Sep 7, 2023

Assorted flakyness fixes from #5198, might not be flaky on main.

Migrate some tests using neon_simple_env to just neon_env_builder and using initial_tenant to make flakyness understanding easier. (Did not understand the flakyness of test_timeline_create_break_after_uninit_mark.)

test_download_remote_layers_api is flaky because we have no atomic "wait for WAL, checkpoint, wait for upload and do not receive any more WAL".

test_tenant_size fixes are just boilerplate which should had always existed; we should wait for the tenant to be active. similarly for test_timeline_delete.

test_timeline_size_post_checkpoint fails often for me with reading zero from metrics. Give it a few attempts.

@github-actions
Copy link

github-actions bot commented Sep 7, 2023

1644 tests run: 1571 passed, 0 failed, 73 skipped (full report)


Code coverage (full report)

  • functions: 53.1% (7601 of 14319 functions)
  • lines: 81.4% (44732 of 54934 lines)

The comment gets automatically updated with the latest test results
64a1749 at 2023-09-11T08:20:45.328Z :recycle:

@koivunej
Copy link
Member Author

koivunej commented Sep 7, 2023

  • test_delete_tenant_exercise_crash_safety_failpoints[Check.RETRY_WITH_RESTART-noop-tenant-delete-before-create-local-mark-False]: release

and

  • test_delete_tenant_exercise_crash_safety_failpoints[Check.RETRY_WITH_RESTART-noop-tenant-delete-before-create-local-mark-False]: release

Looks like a missing ... wait? I'll handle it later, it seems I am permanently collecting these forever PRs if I want to always fix the next flaky.

It seems to have been marked as flaky in the prev test run, so it needs more wait_tenant_become_active.

@koivunej koivunej marked this pull request as ready for review September 7, 2023 15:33
@koivunej
Copy link
Member Author

koivunej commented Sep 7, 2023

f7a0114 was green, but cloud e2e failed, don't want to wait for it.

@koivunej koivunej enabled auto-merge (squash) September 7, 2023 15:40
@koivunej koivunej disabled auto-merge September 7, 2023 16:57
@koivunej koivunej enabled auto-merge (squash) September 7, 2023 17:40
@koivunej koivunej requested a review from hlinnaka September 7, 2023 17:40
@koivunej
Copy link
Member Author

koivunej commented Sep 7, 2023

Do we now have a requirement that branches need to be up to date? I needed to rebase but there were no conflicts.

@arpad-m
Copy link
Member

arpad-m commented Sep 7, 2023

Do we now have a requirement that branches need to be up to date? I needed to rebase but there were no conflicts.

From my understanding the message is not there because of a requirement for PRs, but to expose the button, so that one can get CI improvements more quickly to contributor PRs. It's faster to press the button than to ask contributors to update, or to push the update yourself manually. See this slack message.

@koivunej koivunej merged commit a55a78a into main Sep 11, 2023
30 checks passed
@koivunej koivunej deleted the misc_test_fixes branch September 11, 2023 08:42
koivunej added a commit that referenced this pull request Sep 16, 2023
The test is still flaky, perhaps more after #5233, see #3831.

Do one more `timeline_checkpoint` *after* shutting down safekeepers
*before* shutting down pageserver. Put more effort into not compacting
or creating image layers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants