Fix part of #5312, part of #59: Introduce better script execution support #5313

BenHenning · 2024-01-19T22:01:22Z

Explanation

Fixes part of #5312
Fixes part of #59

This PR helps prepare for changes coming in #5315 and #4929 (the latter of which is the start of the main upcoming Bazel migration PR chain) by introducing one main scripts-based change: ScriptBackgroundCoroutineDispatcher: a Kotlin coroutine dispatcher for executing asynchronous tasks in scripts that also supports proper Java executor service shutdown (so that scripts don't hang). This dispatcher is multi-threaded to help simplify executing large numbers of parallel background tasks.

All scripts have been migrated over to running their primary operations within the context of this new dispatcher. Relevant script utilities have been updated to use it, including CommandExecutor (though this is mainly a placeholder change for the main executor changes which are coming in #4929).

Miscellaneous details to note:

A bunch of 'e.g.' typos were fixed in GenerateMavenDependenciesList.kt and wiki/Updating-Maven-Dependencies.md. These aren't functionally needed, they were just something I noticed while developing.
kotlinx-coroutines-core was updated from 1.4.1 to 1.4.3 in order to work around StackOverflow in DispatchedContinuation Kotlin/kotlinx.coroutines#2371 which was causing flakiness in one of the new dispatcher tests.
testClose_pendingTaskLongerThanCloseTimeout_taskIsNotRun intentionally takes ~2 seconds to run in order to provide some assurance that, without cancellation, the task would run and the test would fail (this has been manually verified in a few different situations of the dispatcher and/or test changing; some changes won't result in a failure due to how cancellation works internally for executor service & the converted coroutine dispatcher).

Note that historically these changes were originally part of #4929, but they were split out so that they could be used by #5315 (which ended up being convenient to include prior to #4929).

Essential Checklist

The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".)
Any changes to scripts/assets files have their rationale included in the PR explanation.
The PR follows the style guide.
The PR does not contain any unnecessary code changes from Android Studio (reference).
The PR is made from a branch that's not called "develop" and is up-to-date with "develop".
The PR is assigned to the appropriate reviewers (reference).

For UI-specific PRs only

This PR doesn't include any user-facing changes since it only impacts scripts.

These issues were found after I started using a new development environment.

ProfileAndDeviceIdFragmentTest had been updated to use a newer fragment initialization pattern, but that's no longer needed and seems to be causing what appears to be timing discrepancies between local dev and CI.

The issue ultimately arose from test parameters being initialized after they're needed in the launched UI. This type of change was tried earlier in the branch, but reverted since it didn't seem necessary. It is, however, necessary when there are environment differences (e.g. local vs. CI) or when running certain tests individually. Due to the difficulty in finding this issue, ActivityScenarioRule has been added as a prohibited pattern in the static regex checks (along with ActivityTestRule since that's deprecated and discouraged, anyway).

The test was suffering from some proto encoding inconsistencies that seem to occur between some development machines vs. on CI. The fix improves the test's robustness by extracting the raw encoded string, verifying that the other outputs in the intent message correctly correspond to that string, and that the string (as a parsed proto) contains the correct values. As a result, the test no longer depends on a hardcoded encoding value to be present for verification. This does result in a bit more logic than is generally good to have in a test (and it lengthened the test code quite a bit), but it seems necessary in this particular case.

This is a script-specific dispatcher which will allow for better asynchronous support in upcoming PRs (especially for command execution). This change serves to prepare for those changes.

The main change here is ensuring that Bazel 4.0.0 is used & bzlmod disabled in newer versions of Bazel when running operations in a test Bazel environment. This commit also introduces some more timing tweaks on CommandExecutor for some tests, though these only affect very specific tests (as many script tests directly call a script's main() function and thus don't overwrite its executor behavior). This commit attempted to introduce "--batch" mode to runs, but the isolation didn't actually seem to improve stability and, instead, substantially slowed down some of the tests.

…ipt-execution-support Conflicts: scripts/src/java/org/oppia/android/scripts/ci/ComputeAffectedTests.kt scripts/src/javatests/org/oppia/android/scripts/ci/ComputeAffectedTestsTest.kt scripts/src/javatests/org/oppia/android/scripts/common/BazelClientTest.kt

Plus, actually makes use of the new script background dispatcher in CommandExecutorImpl to make the new wiring make more sense (though its real utility will come in a follow-up PR).

…ipt-execution-support

oppiabot · 2024-01-29T22:49:04Z

Hi @BenHenning, I'm going to mark this PR as stale because it hasn't had any updates for 7 days. If no further activity occurs within 7 days, it will be automatically closed so that others can take up the issue.
If you are still working on this PR, please make a follow-up commit within 3 days (and submit it for review, if applicable). Please also let us know if you are stuck so we can help you!

This was done by removing the //testing dependency and, instead, having instrumentation targets depend on the direct module within //testing that they need to build. This module & its corresponding implementation binding (and tests) needed to be moved out of //testing and into their own /firebase package.

…ecific-issues

Also includes some minor clean-ups within TestBazelWorkspaceTest.

…ecific-issues

oppiabot · 2024-02-27T23:29:34Z

Hi @BenHenning, I'm going to mark this PR as stale because it hasn't had any updates for 7 days. If no further activity occurs within 7 days, it will be automatically closed so that others can take up the issue.
If you are still working on this PR, please make a follow-up commit within 3 days (and submit it for review, if applicable). Please also let us know if you are stuck so we can help you!

adhiamboperes

Thanks @BenHenning!

This was pretty clear to read. I had a question, but non-blocking. PTAL.

scripts/src/java/org/oppia/android/scripts/common/BUILD.bazel

oppiabot · 2024-03-07T23:23:51Z

Hi @BenHenning, this PR is ready to be merged. Please address any remaining comments prior to merging, and feel free to merge this PR once the CI checks pass and you're happy with it. Thanks!

BenHenning · 2024-03-14T23:33:28Z

Latest develop changes merged cleanly, and your comment appeared to be non-blocking so I resolved it @adhiamboperes (please let me know if you do want follow-up work done here).

BenHenning · 2024-03-14T23:34:05Z

@seanlip PTAL for codeowners.

I've also set auto-merge in case there are no problems found.

seanlip

LGTM for scripts/assets/maven_dependencies.textproto

oppiabot · 2024-03-15T01:17:25Z

Unassigning @seanlip since they have already approved the PR.

oppiabot · 2024-03-15T01:17:28Z

Hi @BenHenning, this PR is ready to be merged. Please address any remaining comments prior to merging, and feel free to merge this PR once the CI checks pass and you're happy with it. Thanks!

) ## Explanation Fixes #5312 This updates the TODO open check script to be locally runnable rather than requiring the developer to manually download the list of issues from GitHub to analyze. This simplifies things and allows the script to be easily run within the ``static_checks.sh`` script. This is being done via interacting directly with GitHub using its RESTful API (see https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues) in conjunction with the user's local auth token used to set up their ``gh`` tool (which needs to be set up, hence the changes to the wiki documentation and clear error messages from the new ``GitHubClient`` utility). To further simplify things, a ``regenerate`` mode was added to regenerate the TODO exemptions textproto file (which is helpful for #4929 hence why this comes before that PR). The new command syntax to perform the TODO check is: ```sh bazel run //scripts:todo_open_check -- <path_to_dir_root> <path_to_proto_binary> [regenerate] ``` With a specific example to just perform the check: ```sh bazel run //scripts:todo_open_check -- $(pwd) scripts/assets/todo_open_exemptions.pb ``` And an example to also perform regeneration: ```sh bazel run //scripts:todo_open_check -- $(pwd) scripts/assets/todo_open_exemptions.pb regenerate ``` Some other things specifically to note: - TODO exemptions needed to be updated in this PR due to TODO utility & test changes. The new file was created using the new regenerate functionality. - The TODO check has been added to the end of ``static_checks.sh``. - The GitHub CI workflow was updated to use the new script syntax appropriately. - This is the first time scripts have been updated to integrate with Retrofit, and this setup is going to be reused in the future for other services. - The data model for issues has been updated to better represent the remote data structure. - Moshi is being used along with Retrofit for an easier interaction with GitHub as a remote endpoint. All of this has been wrapped in a new ``GitHubClient``. - ``GitHubClient`` is designed to download all issues regardless of length (whereas before the manual download step was limited to the first 2000 issues of the repository) using pagination. - New tests were added to verify the regenerate flow (and properly set up the mock OkHttp server since the script now relies on an HTTP endpoint to download the GitHub issues itself). - ``GitHubIssue`` is exempted from tests since it's just a basic data structure, so there's no specific logic to test. - ``GitHubService`` is exempted from tests since it's a template to generate code via Retrofit's annotation processor, so there's no specific logic to test. - All scripts proto libraries were updated to use normal Java (rather than Java lite) generation to provide text format support. - The file paths included in ``TodoOpenCheck``'s output has been simplified to be relative to the repository root rather than using absolute paths (for parity with many other app scripts). - Since ``GitHubClient``'s tests required interacting with ``gh``, a new ``FakeCommandExecutor`` was added (along with its own tests) to provide support for orchestrating local utilities. This may be useful in other tests in the future, though some of those script tests intentionally integrate with environment commands like ``git`` and ``bazel``. ## Essential Checklist - [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".) - [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation. - [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide). - [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)). - [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop". - [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)). ## For UI-specific PRs only This is an infrastructure-only PR. --------- Co-authored-by: Adhiambo Peres <59600948+adhiamboperes@users.noreply.github.com> Co-authored-by: Sean Lip <sean@seanlip.org>

## Explanation Fix part of #59 Fix part of #3926 As part of developing downstream PRs for #59, it was discovered that PRs which change a LOT of files (such as #4937) can run into problems where ComputeAffectedTests simply times out trying to compute the entire list of affected targets. **Critical performance and compatibility fixes** There have been past efforts to optimize the affected tests workflows (bucketing, breaking up some of the computations), but it was discovered that the most expensive part of the process is running the ``rbuildfiles`` query to figure out which BUILD files were affected by changed files. It was known this was an expensive step in the past, but it wasn't clear until this PR exactly how to address it. This PR changes the script to now: - Filter ``rbuildfiles`` to only run under first-party targets (since it shouldn't be possible for third-party BUILD files to be affected by first-party changes). This reduces the search graph. - Introduce Bazel command sharding for this step like the script already does for others. This breaks up the command to run on a subset of files multiple times, combining the output rather than running a single command with a large input. It seems that ``rbuildfiles`` does not scale linearly with its input size, so this drastically improves the script's performance. It's thought that this approach is also more logically correct due to more correct sibling matching semantics, but it's a bit hard to reason about ``bazel query`` behavior at times so I can't be 100% confident in this. Nevertheless, the existing tests pass and I haven't seen any issues from using these changes in downstream PRs. Separately, another issue was discovered wherein some commands (including certain Bazel commands) can actually cause ``CommandExecutorImpl`` to soft-lock and always time out. This was due to an issue in the previous implementation wherein it would wait to read a command's stdout until after the timeout has been completed (i.e. it assumed the process would finish). This isn't correct, however: stdout is blocking I/O, and some commands are implemented to only continue execution after their standard output is read. The new implementation makes use of coroutine actors to consume stdout and stderr at the same time as waiting for execution to ensure all commands can continue execution and that they finish within the desired timeout. Note that the new ``ScriptBackgroundCoroutineDispatcher`` was actually introduced (in #5313) specifically to support this new ``CommandExecutorImpl`` implementation, though it has since been found to have lots of other nice benefits by providing scripts with a reliable mechanism for performing asynchronous operations without having to manage their own execution dispatcher. Command execution for Bazel commands has also been updated to time out after 5 minutes rather than the previous 1 minute. Despite the optimizations and robustness improvements above, some commands still take quite some time to run for especially large and complex cases. While this change may result in a slower failure turnaround in cases when commands are soft-locked, it should result in better CI and script robustness in the long-term. **Better support for chained PRs & possibly merge queues** ComputeAffectedTests was also updated to use a merge base commit rather than a reference to the develop branch (where this new commit hash is provided by the CI workflow). The idea behind this is that the merge base commit is: - More logically correct (as ``ComputeAffectedTests`` is meant to run in contexts where a branch wants to be merged into a destination). - Better compatible with chained PRs. This allows for **significantly** better CI performance in chained PR situations since now only the tests relevant to the child PR will be run rather than all tests for the child & its parental hierarchy (see downstream PRs' CI runs to see this working in action). - Hopefully better compatibility with merge queues (#3926). This hasn't been verified, but the flexibility in customizing the destination for affected tests should be the main prerequisite to properly supporting merge queues. **Other changes** ``GitClient`` was updated to have a peace-of-mind check to ensure the base commit (provided as explained in the previous section) matches the merge base of the current branch. This should always be true (except maybe in merge queues--this will need to be verified). Note that this is only a soft warning, not an assertion failure. ``RepositoryFile`` was cleaned up slightly to be a bit more consistent with other directory management approaches done in scripts. I can see this being refactored more in the future. Callsite behavior isn't expected to be affected by these changes. Some script tests were updated to have consistent formatting (which required updating the TODO exemptions). ``TodoIssueResolvedCheckTest`` and ``TodoOpenCheckTest`` also had some of their test file management cleaned up a bit. **A note on testing** These are inherently difficult things to test. I've verified what I could via CI and general observation, but I've also largely relied on existing tests to catch regressions (and many were caught during changes to the script). Since these are mainly implementation and not behavioral changes, I'm comfortable with the level of testing that was done. ## Essential Checklist - [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".) - [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation. - [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide). - [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)). - [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop". - [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)). ## For UI-specific PRs only N/A -- This is an infrastructure-only change. --------- Co-authored-by: Adhiambo Peres <59600948+adhiamboperes@users.noreply.github.com> Co-authored-by: Sean Lip <sean@seanlip.org>

BenHenning added 14 commits August 22, 2023 01:30

Fix a variety of dev platform-specific issues.

667bf89

These issues were found after I started using a new development environment.

Tidy some things up, and revert fragment test.

fb59232

ProfileAndDeviceIdFragmentTest had been updated to use a newer fragment initialization pattern, but that's no longer needed and seems to be causing what appears to be timing discrepancies between local dev and CI.

Merge branch 'develop' into fix-platform-specific-issues

714f3ea

Merge branch 'develop' into fix-platform-specific-issues

e3d4091

Post-merge fix.

5671673

Merge branch 'develop' into fix-platform-specific-issues

bd1466d

Add ScriptBackgroundCoroutineDispatcher.

000bba3

This is a script-specific dispatcher which will allow for better asynchronous support in upcoming PRs (especially for command execution). This change serves to prepare for those changes.

Post-merge fixes.

0b55bd4

Plus, actually makes use of the new script background dispatcher in CommandExecutorImpl to make the new wiring make more sense (though its real utility will come in a follow-up PR).

Fix BUILD file lint issue.

bd97611

Merge branch 'fix-platform-specific-issues' into introduce-better-scr…

d54829d

…ipt-execution-support

adhiamboperes assigned BenHenning Jan 22, 2024

oppiabot bot added the stale Corresponds to items that haven't seen a recent update and may be automatically closed. label Jan 29, 2024

oppiabot bot closed this Feb 5, 2024

BenHenning added 12 commits February 6, 2024 22:33

Merge branch 'develop' into fix-platform-specific-issues

211487c

Post-merge fixes.

1a666a9

Add smoke tests for instr. binaries & tests.

a2ee5d3

Some minor refactoring for readability.

d5b9012

Merge branch 'develop' into fix-instrumentation-build-failure

9f50469

Add missing CODEOWNERS line.

8dbf4dc

Remove old files from CODEOWNERS.

9c38057

Merge branch 'fix-instrumentation-build-failure' into fix-platform-sp…

f2ff4e3

…ecific-issues

Add missing tests for TestBlazeWorkspace changes.

fad48ed

Also includes some minor clean-ups within TestBazelWorkspaceTest.

Add missing Firebase auth tests.

e1802c8

Merge branch 'fix-instrumentation-build-failure' into fix-platform-sp…

00c32f6

…ecific-issues

oppiabot bot added the stale Corresponds to items that haven't seen a recent update and may be automatically closed. label Feb 27, 2024

Merge branch 'develop' into introduce-better-script-execution-support

e0e74d5

oppiabot bot removed the stale Corresponds to items that haven't seen a recent update and may be automatically closed. label Mar 4, 2024

adhiamboperes approved these changes Mar 7, 2024

View reviewed changes

scripts/src/java/org/oppia/android/scripts/common/BUILD.bazel Show resolved Hide resolved

adhiamboperes assigned BenHenning and unassigned adhiamboperes Mar 7, 2024

oppiabot bot added the PR: LGTM label Mar 7, 2024

Merge branch 'develop' into introduce-better-script-execution-support

70df9b3

BenHenning requested a review from seanlip March 14, 2024 23:33

BenHenning assigned seanlip and unassigned BenHenning Mar 14, 2024

BenHenning enabled auto-merge (squash) March 14, 2024 23:34

seanlip approved these changes Mar 15, 2024

View reviewed changes

Merge branch 'develop' into introduce-better-script-execution-support

51a96ce

oppiabot bot unassigned seanlip Mar 15, 2024

oppiabot bot assigned BenHenning Mar 15, 2024

BenHenning merged commit e2f94e4 into develop Mar 15, 2024
44 checks passed

BenHenning deleted the introduce-better-script-execution-support branch March 15, 2024 01:43

BenHenning restored the introduce-better-script-execution-support branch March 19, 2024 23:59

BenHenning deleted the introduce-better-script-execution-support branch March 20, 2024 00:00

BenHenning mentioned this pull request Mar 20, 2024

Fix #5312: Make todo open check locally runnable #5315

Merged

6 tasks

BenHenning mentioned this pull request May 16, 2024

Fix part of #59, part of #3926: Upgrade compute affected tests #4929

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix part of #5312, part of #59: Introduce better script execution support #5313

Fix part of #5312, part of #59: Introduce better script execution support #5313

BenHenning commented Jan 19, 2024 •

edited

Loading

oppiabot bot commented Jan 29, 2024

oppiabot bot commented Feb 27, 2024

adhiamboperes left a comment

oppiabot bot commented Mar 7, 2024

BenHenning commented Mar 14, 2024

BenHenning commented Mar 14, 2024

seanlip left a comment

oppiabot bot commented Mar 15, 2024

oppiabot bot commented Mar 15, 2024

Fix part of #5312, part of #59: Introduce better script execution support #5313

Fix part of #5312, part of #59: Introduce better script execution support #5313

Conversation

BenHenning commented Jan 19, 2024 • edited Loading

Explanation

Essential Checklist

For UI-specific PRs only

oppiabot bot commented Jan 29, 2024

oppiabot bot commented Feb 27, 2024

adhiamboperes left a comment

Choose a reason for hiding this comment

oppiabot bot commented Mar 7, 2024

BenHenning commented Mar 14, 2024

BenHenning commented Mar 14, 2024

seanlip left a comment

Choose a reason for hiding this comment

oppiabot bot commented Mar 15, 2024

oppiabot bot commented Mar 15, 2024

BenHenning commented Jan 19, 2024 •

edited

Loading