Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI is extremely flaky when running Presto/Hive tests #17750

Closed
suddjian opened this issue Dec 14, 2021 · 4 comments · Fixed by #17782
Closed

CI is extremely flaky when running Presto/Hive tests #17750

suddjian opened this issue Dec 14, 2021 · 4 comments · Fixed by #17782

Comments

@suddjian
Copy link
Member

suddjian commented Dec 14, 2021

These tests are the same ones that run against other analytics DBs, but for some reason when we run them against Presto/Hive we get false positives (or are they false negatives? 🤔) at a much greater rate. These tests also seem to take much longer to run, which may be related.

If we can get to the bottom of this it will massively improve the experience of contributing to Superset.

How to reproduce the bug

Run CI a few times.

Expected results

CI should pass if the PR hasn't introduced any bugs.

Actual results

It fails a lot. Often multiple times in a row, with different tests.

Additional context

I want to start putting together a list of the tests that have failed, to see if there is some sort of pattern. Feel free to add new entries to this list (only if you are certain that it really is a false positive!). Please also link to a relevant line of the GitHub Action where the test failed.

@suddjian suddjian added the #bug Bug report label Dec 14, 2021
@suddjian suddjian removed the #bug Bug report label Dec 14, 2021
@zhaoyongjie
Copy link
Member

zhaoyongjie commented Dec 15, 2021

A little clue for this issue. I will try to update the images or increase Java heap memory for fixing it.

https://github.com/apache/superset/runs/4530486125?check_suite_focus=true#step:9:5105

@rusackas
Copy link
Member

rusackas commented Dec 15, 2021

Just one more case:

In this PR, I've had repeated failures of test-postgres-presto (3.8) on the test test_get_dashboard_view__user_access_with_dashboard_permission. The error I see (here) is:

ERROR at setup of TestDashboardRoleBasedSecurity.test_get_dashboard_view__user_access_with_dashboard_permission

I'm on my third run now... wish me luck 🤞

@rusackas
Copy link
Member

Dropping a link to the Slack thread where there's more discussion on this.

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Dec 16, 2021

I found that some large SQL insert tasks will cause Presto null-pointer error. I submit the same SQL through presto-cli or JDBC without any problems.

image

image

(superset) yongjie.zhao@:incubator-superset$ pytest -s --durations=0 tests/integration_tests/security_tests.py
....
....
0.03s teardown tests/integration_tests/security_tests.py::TestDatasources::test_get_user_datasources_gamma_with_schema
0.03s call     tests/integration_tests/security_tests.py::TestSecurityManager::test_raise_for_access_query
0.03s call     tests/integration_tests/security_tests.py::TestSecurityManager::test_raise_for_access_table
0.02s call     tests/integration_tests/security_tests.py::TestDatasources::test_get_user_datasources_admin
0.01s call     tests/integration_tests/security_tests.py::TestDatasources::test_get_user_datasources_gamma
0.01s call     tests/integration_tests/security_tests.py::TestDatasources::test_get_user_datasources_gamma_with_schema
0.01s call     tests/integration_tests/security_tests.py::TestSecurityManager::test_can_access_table
0.01s teardown tests/integration_tests/security_tests.py::TestRolePermission::test_public_sync_role_builtin_perms

(69 durations < 0.005s hidden.  Use -vv to show these durations.)
================================================================== short test summary info ==================================================================
FAILED tests/integration_tests/security_tests.py::TestRolePermission::test_public_sync_role_data_perms - sqlalchemy.orm.exc.NoResultFound: No row was foun...
=================================================== 1 failed, 45 passed, 13 warnings in 235.82s (0:03:55) ===================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants