-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix part of #5312, part of #59: Introduce better script execution support #5313
Fix part of #5312, part of #59: Introduce better script execution support #5313
Conversation
These issues were found after I started using a new development environment.
ProfileAndDeviceIdFragmentTest had been updated to use a newer fragment initialization pattern, but that's no longer needed and seems to be causing what appears to be timing discrepancies between local dev and CI.
The issue ultimately arose from test parameters being initialized after they're needed in the launched UI. This type of change was tried earlier in the branch, but reverted since it didn't seem necessary. It is, however, necessary when there are environment differences (e.g. local vs. CI) or when running certain tests individually. Due to the difficulty in finding this issue, ActivityScenarioRule has been added as a prohibited pattern in the static regex checks (along with ActivityTestRule since that's deprecated and discouraged, anyway).
The test was suffering from some proto encoding inconsistencies that seem to occur between some development machines vs. on CI. The fix improves the test's robustness by extracting the raw encoded string, verifying that the other outputs in the intent message correctly correspond to that string, and that the string (as a parsed proto) contains the correct values. As a result, the test no longer depends on a hardcoded encoding value to be present for verification. This does result in a bit more logic than is generally good to have in a test (and it lengthened the test code quite a bit), but it seems necessary in this particular case.
This is a script-specific dispatcher which will allow for better asynchronous support in upcoming PRs (especially for command execution). This change serves to prepare for those changes.
The main change here is ensuring that Bazel 4.0.0 is used & bzlmod disabled in newer versions of Bazel when running operations in a test Bazel environment. This commit also introduces some more timing tweaks on CommandExecutor for some tests, though these only affect very specific tests (as many script tests directly call a script's main() function and thus don't overwrite its executor behavior). This commit attempted to introduce "--batch" mode to runs, but the isolation didn't actually seem to improve stability and, instead, substantially slowed down some of the tests.
…ipt-execution-support Conflicts: scripts/src/java/org/oppia/android/scripts/ci/ComputeAffectedTests.kt scripts/src/javatests/org/oppia/android/scripts/ci/ComputeAffectedTestsTest.kt scripts/src/javatests/org/oppia/android/scripts/common/BazelClientTest.kt
Plus, actually makes use of the new script background dispatcher in CommandExecutorImpl to make the new wiring make more sense (though its real utility will come in a follow-up PR).
…ipt-execution-support
Hi @BenHenning, I'm going to mark this PR as stale because it hasn't had any updates for 7 days. If no further activity occurs within 7 days, it will be automatically closed so that others can take up the issue. |
This was done by removing the //testing dependency and, instead, having instrumentation targets depend on the direct module within //testing that they need to build. This module & its corresponding implementation binding (and tests) needed to be moved out of //testing and into their own /firebase package.
Also includes some minor clean-ups within TestBazelWorkspaceTest.
Hi @BenHenning, I'm going to mark this PR as stale because it hasn't had any updates for 7 days. If no further activity occurs within 7 days, it will be automatically closed so that others can take up the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BenHenning!
This was pretty clear to read. I had a question, but non-blocking. PTAL.
Hi @BenHenning, this PR is ready to be merged. Please address any remaining comments prior to merging, and feel free to merge this PR once the CI checks pass and you're happy with it. Thanks! |
Latest develop changes merged cleanly, and your comment appeared to be non-blocking so I resolved it @adhiamboperes (please let me know if you do want follow-up work done here). |
@seanlip PTAL for codeowners. I've also set auto-merge in case there are no problems found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for scripts/assets/maven_dependencies.textproto
Unassigning @seanlip since they have already approved the PR. |
Hi @BenHenning, this PR is ready to be merged. Please address any remaining comments prior to merging, and feel free to merge this PR once the CI checks pass and you're happy with it. Thanks! |
) ## Explanation Fixes #5312 This updates the TODO open check script to be locally runnable rather than requiring the developer to manually download the list of issues from GitHub to analyze. This simplifies things and allows the script to be easily run within the ``static_checks.sh`` script. This is being done via interacting directly with GitHub using its RESTful API (see https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues) in conjunction with the user's local auth token used to set up their ``gh`` tool (which needs to be set up, hence the changes to the wiki documentation and clear error messages from the new ``GitHubClient`` utility). To further simplify things, a ``regenerate`` mode was added to regenerate the TODO exemptions textproto file (which is helpful for #4929 hence why this comes before that PR). The new command syntax to perform the TODO check is: ```sh bazel run //scripts:todo_open_check -- <path_to_dir_root> <path_to_proto_binary> [regenerate] ``` With a specific example to just perform the check: ```sh bazel run //scripts:todo_open_check -- $(pwd) scripts/assets/todo_open_exemptions.pb ``` And an example to also perform regeneration: ```sh bazel run //scripts:todo_open_check -- $(pwd) scripts/assets/todo_open_exemptions.pb regenerate ``` Some other things specifically to note: - TODO exemptions needed to be updated in this PR due to TODO utility & test changes. The new file was created using the new regenerate functionality. - The TODO check has been added to the end of ``static_checks.sh``. - The GitHub CI workflow was updated to use the new script syntax appropriately. - This is the first time scripts have been updated to integrate with Retrofit, and this setup is going to be reused in the future for other services. - The data model for issues has been updated to better represent the remote data structure. - Moshi is being used along with Retrofit for an easier interaction with GitHub as a remote endpoint. All of this has been wrapped in a new ``GitHubClient``. - ``GitHubClient`` is designed to download all issues regardless of length (whereas before the manual download step was limited to the first 2000 issues of the repository) using pagination. - New tests were added to verify the regenerate flow (and properly set up the mock OkHttp server since the script now relies on an HTTP endpoint to download the GitHub issues itself). - ``GitHubIssue`` is exempted from tests since it's just a basic data structure, so there's no specific logic to test. - ``GitHubService`` is exempted from tests since it's a template to generate code via Retrofit's annotation processor, so there's no specific logic to test. - All scripts proto libraries were updated to use normal Java (rather than Java lite) generation to provide text format support. - The file paths included in ``TodoOpenCheck``'s output has been simplified to be relative to the repository root rather than using absolute paths (for parity with many other app scripts). - Since ``GitHubClient``'s tests required interacting with ``gh``, a new ``FakeCommandExecutor`` was added (along with its own tests) to provide support for orchestrating local utilities. This may be useful in other tests in the future, though some of those script tests intentionally integrate with environment commands like ``git`` and ``bazel``. ## Essential Checklist - [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".) - [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation. - [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide). - [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)). - [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop". - [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)). ## For UI-specific PRs only This is an infrastructure-only PR. --------- Co-authored-by: Adhiambo Peres <59600948+adhiamboperes@users.noreply.github.com> Co-authored-by: Sean Lip <sean@seanlip.org>
## Explanation Fix part of #59 Fix part of #3926 As part of developing downstream PRs for #59, it was discovered that PRs which change a LOT of files (such as #4937) can run into problems where ComputeAffectedTests simply times out trying to compute the entire list of affected targets. **Critical performance and compatibility fixes** There have been past efforts to optimize the affected tests workflows (bucketing, breaking up some of the computations), but it was discovered that the most expensive part of the process is running the ``rbuildfiles`` query to figure out which BUILD files were affected by changed files. It was known this was an expensive step in the past, but it wasn't clear until this PR exactly how to address it. This PR changes the script to now: - Filter ``rbuildfiles`` to only run under first-party targets (since it shouldn't be possible for third-party BUILD files to be affected by first-party changes). This reduces the search graph. - Introduce Bazel command sharding for this step like the script already does for others. This breaks up the command to run on a subset of files multiple times, combining the output rather than running a single command with a large input. It seems that ``rbuildfiles`` does not scale linearly with its input size, so this drastically improves the script's performance. It's thought that this approach is also more logically correct due to more correct sibling matching semantics, but it's a bit hard to reason about ``bazel query`` behavior at times so I can't be 100% confident in this. Nevertheless, the existing tests pass and I haven't seen any issues from using these changes in downstream PRs. Separately, another issue was discovered wherein some commands (including certain Bazel commands) can actually cause ``CommandExecutorImpl`` to soft-lock and always time out. This was due to an issue in the previous implementation wherein it would wait to read a command's stdout until after the timeout has been completed (i.e. it assumed the process would finish). This isn't correct, however: stdout is blocking I/O, and some commands are implemented to only continue execution after their standard output is read. The new implementation makes use of coroutine actors to consume stdout and stderr at the same time as waiting for execution to ensure all commands can continue execution and that they finish within the desired timeout. Note that the new ``ScriptBackgroundCoroutineDispatcher`` was actually introduced (in #5313) specifically to support this new ``CommandExecutorImpl`` implementation, though it has since been found to have lots of other nice benefits by providing scripts with a reliable mechanism for performing asynchronous operations without having to manage their own execution dispatcher. Command execution for Bazel commands has also been updated to time out after 5 minutes rather than the previous 1 minute. Despite the optimizations and robustness improvements above, some commands still take quite some time to run for especially large and complex cases. While this change may result in a slower failure turnaround in cases when commands are soft-locked, it should result in better CI and script robustness in the long-term. **Better support for chained PRs & possibly merge queues** ComputeAffectedTests was also updated to use a merge base commit rather than a reference to the develop branch (where this new commit hash is provided by the CI workflow). The idea behind this is that the merge base commit is: - More logically correct (as ``ComputeAffectedTests`` is meant to run in contexts where a branch wants to be merged into a destination). - Better compatible with chained PRs. This allows for **significantly** better CI performance in chained PR situations since now only the tests relevant to the child PR will be run rather than all tests for the child & its parental hierarchy (see downstream PRs' CI runs to see this working in action). - Hopefully better compatibility with merge queues (#3926). This hasn't been verified, but the flexibility in customizing the destination for affected tests should be the main prerequisite to properly supporting merge queues. **Other changes** ``GitClient`` was updated to have a peace-of-mind check to ensure the base commit (provided as explained in the previous section) matches the merge base of the current branch. This should always be true (except maybe in merge queues--this will need to be verified). Note that this is only a soft warning, not an assertion failure. ``RepositoryFile`` was cleaned up slightly to be a bit more consistent with other directory management approaches done in scripts. I can see this being refactored more in the future. Callsite behavior isn't expected to be affected by these changes. Some script tests were updated to have consistent formatting (which required updating the TODO exemptions). ``TodoIssueResolvedCheckTest`` and ``TodoOpenCheckTest`` also had some of their test file management cleaned up a bit. **A note on testing** These are inherently difficult things to test. I've verified what I could via CI and general observation, but I've also largely relied on existing tests to catch regressions (and many were caught during changes to the script). Since these are mainly implementation and not behavioral changes, I'm comfortable with the level of testing that was done. ## Essential Checklist - [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".) - [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation. - [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide). - [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)). - [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop". - [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)). ## For UI-specific PRs only N/A -- This is an infrastructure-only change. --------- Co-authored-by: Adhiambo Peres <59600948+adhiamboperes@users.noreply.github.com> Co-authored-by: Sean Lip <sean@seanlip.org>
Explanation
Fixes part of #5312
Fixes part of #59
This PR helps prepare for changes coming in #5315 and #4929 (the latter of which is the start of the main upcoming Bazel migration PR chain) by introducing one main scripts-based change:
ScriptBackgroundCoroutineDispatcher
: a Kotlin coroutine dispatcher for executing asynchronous tasks in scripts that also supports proper Java executor service shutdown (so that scripts don't hang). This dispatcher is multi-threaded to help simplify executing large numbers of parallel background tasks.All scripts have been migrated over to running their primary operations within the context of this new dispatcher. Relevant script utilities have been updated to use it, including
CommandExecutor
(though this is mainly a placeholder change for the main executor changes which are coming in #4929).Miscellaneous details to note:
GenerateMavenDependenciesList.kt
andwiki/Updating-Maven-Dependencies.md
. These aren't functionally needed, they were just something I noticed while developing.kotlinx-coroutines-core
was updated from 1.4.1 to 1.4.3 in order to work around StackOverflow in DispatchedContinuation Kotlin/kotlinx.coroutines#2371 which was causing flakiness in one of the new dispatcher tests.testClose_pendingTaskLongerThanCloseTimeout_taskIsNotRun
intentionally takes ~2 seconds to run in order to provide some assurance that, without cancellation, the task would run and the test would fail (this has been manually verified in a few different situations of the dispatcher and/or test changing; some changes won't result in a failure due to how cancellation works internally for executor service & the converted coroutine dispatcher).Note that historically these changes were originally part of #4929, but they were split out so that they could be used by #5315 (which ended up being convenient to include prior to #4929).
Essential Checklist
For UI-specific PRs only
This PR doesn't include any user-facing changes since it only impacts scripts.