-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable SuperPMI collection of a libraries tests run #91101
Enable SuperPMI collection of a libraries tests run #91101
Conversation
Here is win-x64
|
@@ -62,25 +62,34 @@ private void WriteRunScript(string templateContent, string extension) | |||
{ | |||
bool isUnix = extension == ".sh"; | |||
string lineFeed = isUnix ? "\n" : "\r\n"; | |||
string[] newlineSeparator = new string[] { Environment.NewLine }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that I don't use the changes in this file anymore (I did in an intermediate stage). But it seems like a useful change to take anyway, as currently the generated commands don't handle multi-line commands.
@jakobbotsch PTAL |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsCurrently, we have a PMI collection of the libraries tests. A PMI collection doesn't represent actual code run, so doesn't have PGO data and compilations, OSR compilations, and tends to overemphasize generics since it attempts many instantiations that might not occur in practice. Similar to #74961, which enabled a collection of a run coreclr tests, this change enables a collection of a run of libraries tests. We collect two different scenarios: "normal", meaning no configuration switch variables set, and "no_tiered_compilation", meaning we set The changes here are similar to (and sometimes a copy of) the changes in #74961, altered because the process of running the libraries tests is somewhat different in a few ways. The "control flow" is as follows:
The change to CultureInfoCurrentCulture.cs is to fix a problem where the test creates a new clean environment but copies over a few environment variables. It needs to also copy over The collected data is quite large: about 700,000 methods in the "normal" scenario, and 300,000 in the "no_tiered_compilation" scenario, for a total of about 17GB for both.
|
@ericstj This touches the libraries CI test run process. Maybe you or someone you suggest should review? |
I've seen a failure like this multiple times during the "Prepare Testhost" step:
I don't think anything I've done would be involved during this step. If the "Visual Studio 2022 with C++ tools required" is not a red herring, then perhaps some of the machines (but not all?) that we are running on in AzDO/Helix for the internal dnceng instance are misconfigured? Current test run: https://dev.azure.com/dnceng/internal/_build/results?buildId=2252048&view=results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I assume these two new collections effectively subsume libraries.pmi
and libraries_tests.pmi
-- can we remove those collections? Both in the interest of saving download/disk size but also to avoid misleading/disproportionate diffs on changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like either @carlossanlop or @ViktorHofer to have a look over this.
@@ -62,7 +67,7 @@ | |||
<Message Condition="'$(TestArchiveRuntimeFile)' != ''" Importance="High" Text="TestArchiveRuntimeFile: $(TestArchiveRuntimeFile)" /> | |||
|
|||
<!-- Re-invoke MSBuild on this project to create the correlation payload --> | |||
<MSBuild Projects="$(MSBuildProjectFile)" Targets="PrepareCorrelationPayloadDirectory" Properties="Scenarios=$(_Scenarios)" /> | |||
<MSBuild Projects="$(MSBuildProjectFile)" Targets="PrepareCorrelationPayloadDirectory" Properties="$(_PropertiesToPass)" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up passing a lot more global properties than were previously passed. Why do we need to do that? Moreover, why are we even doing an MSBuild invocation rather than just running the target in the same evaluation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a problem to pass through more properties?
I need it to pass through the SuperPmiCollect
property, at least. It seemed simpler and more consistent to just pass through all the ones we pass through on the per-scenario msbuild re-invocation. It's a bit unfortunate that the libraries and runtime versions of this file are (so) different: it (src\tests\Common\helixpublishwitharcade.proj) passes through all the properties.
I can't answer the question about re-invocation. Maybe it provides easier-to-control ordering? The coreclr .proj file does this too (and for more cases, it appears).
######################################################################################################## | ||
# | ||
# Finalize SuperPMI collection: (1) merge all MCH files generated by all Helix jobs, (2) upload MCH file to Azure Storage, (3) upload log files | ||
# Note that all these steps are "condition: always()" because we want to upload as much of the collection | ||
# as possible, even if there were test failures. | ||
# | ||
######################################################################################################## | ||
|
||
- ${{ if eq(parameters.SuperPmiCollect, true) }}: | ||
|
||
# Create required directories for merged mch collection and superpmi logs | ||
- ${{ if ne(parameters.osGroup, 'windows') }}: | ||
- script: | | ||
mkdir -p $(MergedMchFileLocation) | ||
mkdir -p $(SpmiLogsLocation) | ||
displayName: 'Create SuperPMI directories' | ||
condition: always() | ||
- ${{ if eq(parameters.osGroup, 'windows') }}: | ||
- script: | | ||
mkdir $(MergedMchFileLocation) | ||
mkdir $(SpmiLogsLocation) | ||
displayName: 'Create SuperPMI directories' | ||
condition: always() | ||
|
||
- script: $(PythonScript) $(Build.SourcesDirectory)/src/coreclr/scripts/superpmi.py merge-mch -log_level DEBUG -pattern $(MchFilesLocation)$(SuperPmiCollectionName).$(SuperPmiCollectionType)*.mch -output_mch_path $(MergedMchFileLocation)$(SuperPmiCollectionName).$(SuperPmiCollectionType).$(MchFileTag).mch -core_root $(SuperPmiMcsPath) | ||
displayName: 'Merge $(SuperPmiCollectionName)-$(SuperPmiCollectionType) SuperPMI collections' | ||
condition: always() | ||
|
||
- template: /eng/pipelines/common/upload-artifact-step.yml | ||
parameters: | ||
rootFolder: $(MergedMchFileLocation) | ||
includeRootFolder: false | ||
archiveType: $(archiveType) | ||
tarCompression: $(tarCompression) | ||
archiveExtension: $(archiveExtension) | ||
artifactName: 'SuperPMI_Collection_$(SuperPmiCollectionName)_$(SuperPmiCollectionType)_$(osGroup)$(osSubgroup)_$(archType)_$(buildConfig)' | ||
displayName: 'Upload artifacts SuperPMI $(SuperPmiCollectionName)-$(SuperPmiCollectionType) collection' | ||
condition: always() | ||
|
||
# Add authenticated pip feed | ||
- task: PipAuthenticate@1 | ||
displayName: 'Pip Authenticate' | ||
inputs: | ||
artifactFeeds: public/dotnet-public-pypi | ||
onlyAddExtraIndex: false | ||
condition: always() | ||
|
||
# Ensure the Python azure-storage-blob package is installed before doing the upload. | ||
- script: $(PipScript) install --user --upgrade pip && $(PipScript) install --user azure.storage.blob==12.5.0 --force-reinstall | ||
displayName: Upgrade Pip to latest and install azure-storage-blob Python package | ||
condition: always() | ||
|
||
- script: $(PythonScript) $(Build.SourcesDirectory)/src/coreclr/scripts/superpmi.py upload -log_level DEBUG -arch $(archType) -build_type $(buildConfig) -mch_files $(MergedMchFileLocation)$(SuperPmiCollectionName).$(SuperPmiCollectionType).$(MchFileTag).mch -core_root $(Build.SourcesDirectory)/artifacts/bin/coreclr/$(osGroup).x64.$(buildConfigUpper) | ||
displayName: 'Upload SuperPMI $(SuperPmiCollectionName)-$(SuperPmiCollectionType) collection to Azure Storage' | ||
condition: always() | ||
env: | ||
CLRJIT_AZ_KEY: $(clrjit_key1) # secret key stored as variable in pipeline | ||
|
||
- task: CopyFiles@2 | ||
displayName: Copying superpmi.log of all partitions | ||
inputs: | ||
sourceFolder: '$(MchFilesLocation)' | ||
contents: '**/$(SuperPmiCollectionName).$(SuperPmiCollectionType)*.log' | ||
targetFolder: '$(SpmiLogsLocation)' | ||
condition: always() | ||
|
||
- task: PublishPipelineArtifact@1 | ||
displayName: Publish SuperPMI logs | ||
inputs: | ||
targetPath: $(SpmiLogsLocation) | ||
artifactName: 'SuperPMI_Logs_$(SuperPmiCollectionName)_$(SuperPmiCollectionType)_$(osGroup)$(osSubgroup)_$(archType)_$(buildConfig)' | ||
condition: always() | ||
|
||
######################################################################################################## | ||
# | ||
# End of SuperPMI processing | ||
# | ||
######################################################################################################## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes add a lot of complexity to the already non-trivial libraries yml files. Please consider moving this block into another YML and conditionally import it based on the PMI boolean.
# Convenience variables | ||
- name: buildConfig | ||
value: ${{ parameters.buildConfig }} | ||
- name: archType | ||
value: ${{ parameters.archType }} | ||
- name: osGroup | ||
value: ${{ parameters.osGroup }} | ||
- name: osSubgroup | ||
value: ${{ parameters.osSubgroup }} | ||
- name: buildConfigUpper | ||
${{ if eq(parameters.buildConfig, 'debug') }}: | ||
value: 'Debug' | ||
${{ if eq(parameters.buildConfig, 'release') }}: | ||
value: 'Release' | ||
${{ if eq(parameters.buildConfig, 'checked') }}: | ||
value: 'Checked' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly happy about defining these variables here. We don't need any of those in src/libraries and some of those might already be defined. I would prefer if you find a way to avoid them.
In general, as the SuperPMI logic is completely optional it would be good to not increase the YML complexity. Can you move this into a feature YMl template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little unfortunate these aren't already defined at a higher level (base-job.yml?) -- they are available in the coreclr side.
I think it makes sense to remove one or both. We did the same thing when we added the coreclr_tests run collection: we removed the corresponding PMI collection. Now, these are the last remaining PMI collections. An argument could be made that PMI generates unique code patterns not otherwise seen due to its attempt to aggressively instantiate generics. If we decide that's not interesting enough, for the cost, we can drop the PMI collections. @AndyAyersMS: opinions? |
src/libraries/sendtohelixhelp.proj
Outdated
<ItemGroup Condition=" '$(TargetOS)' == 'windows' and '$(SuperPmiCollect)' == 'true' "> | ||
<!-- Set variables needed by the test wrapper scripts to do SuperPMI collection --> | ||
<HelixPreCommand Include="set spmi_enable_collection=1" /> | ||
<!-- spmi_collect_dir can point to any temporary directory. We choose %HELIX_WORKITEM_PAYLOAD%\spmi_collect for convenience, and | ||
because we know it can be used, but %TEMP%\spmi_collect might be better. --> | ||
<HelixPreCommand Include="set spmi_collect_dir=%HELIX_WORKITEM_PAYLOAD%\spmi_collect" /> | ||
<HelixPreCommand Include="if not exist %spmi_collect_dir% mkdir %spmi_collect_dir%" /> | ||
<HelixPreCommand Include="set spmi_core_root=%HELIX_CORRELATION_PAYLOAD%\coreclr" /> | ||
</ItemGroup> | ||
|
||
<ItemGroup Condition=" '$(TargetOS)' == 'windows' and '$(SuperPmiCollect)' == 'true' "> | ||
<!-- Merge all the per-test generated .MC files as a post-step. Superpmi.py needs superpmi, mcs, and the JIT to do processing. --> | ||
<HelixPostCommand Include="echo on" /> | ||
<HelixPostCommand Include="set spmi_core_root=%HELIX_CORRELATION_PAYLOAD%\coreclr" /> | ||
<HelixPostCommand Include="set spmi_collection_name=$(SuperPmiCollectionName)" /> | ||
<HelixPostCommand Include="set spmi_collection_type=$(SuperPmiCollectionType)" /> | ||
<HelixPostCommand Include="set spmi_collection_mch_file_tag=$(TargetOS).$(TargetArchitecture).$(Configuration)" /> | ||
<HelixPostCommand Include="set spmi_superpmi_py=%HELIX_CORRELATION_PAYLOAD%\spmi_scripts\superpmi.py" /> | ||
<HelixPostCommand Include="set spmi_upload_dir=%HELIX_WORKITEM_UPLOAD_ROOT%" /> | ||
<HelixPostCommand Include="if not exist %spmi_upload_dir% mkdir %spmi_upload_dir%" /> | ||
<HelixPostCommand Include="set spmi_output_base_name=%spmi_collection_name%.%spmi_collection_type%.%spmi_collection_mch_file_tag%" /> | ||
<HelixPostCommand Include="set spmi_finalmch=%spmi_upload_dir%\%spmi_output_base_name%.mch" /> | ||
<HelixPostCommand Include="set spmi_log_file=%spmi_upload_dir%\%spmi_output_base_name%.log" /> | ||
<HelixPostCommand Include="%HELIX_PYTHONPATH% %spmi_superpmi_py% collect -log_level DEBUG -core_root %spmi_core_root% --skip_cleanup --clean --ci --skip_collection_step --skip_toc_step -temp_dir %spmi_collect_dir% -output_mch_path %spmi_finalmch% -log_file %spmi_log_file%" /> | ||
</ItemGroup> | ||
|
||
<ItemGroup Condition=" '$(TargetOS)' != 'windows' and '$(SuperPmiCollect)' == 'true' "> | ||
<!-- Set variables needed by the test wrapper scripts to do SuperPMI collection --> | ||
<HelixPreCommand Include="export spmi_enable_collection=1" /> | ||
<HelixPreCommand Include="export spmi_collect_dir=$HELIX_WORKITEM_PAYLOAD/spmi_collect" /> | ||
<HelixPreCommand Include="mkdir -p $spmi_collect_dir" /> | ||
<HelixPreCommand Include="export spmi_core_root=$HELIX_CORRELATION_PAYLOAD/coreclr" /> | ||
</ItemGroup> | ||
|
||
<ItemGroup Condition=" '$(TargetOS)' != 'windows' and '$(SuperPmiCollect)' == 'true' "> | ||
<!-- Merge all the per-test generated .MC files as a post-step. Superpmi.py needs superpmi, mcs, and the JIT to do processing. --> | ||
<HelixPostCommand Include="export spmi_core_root=$HELIX_CORRELATION_PAYLOAD/coreclr" /> | ||
<HelixPostCommand Include="export spmi_collection_name=$(SuperPmiCollectionName)" /> | ||
<HelixPostCommand Include="export spmi_collection_type=$(SuperPmiCollectionType)" /> | ||
<HelixPostCommand Include="export spmi_collection_mch_file_tag=$(TargetOS).$(TargetArchitecture).$(Configuration)" /> | ||
<HelixPostCommand Include="export spmi_superpmi_py=$HELIX_CORRELATION_PAYLOAD/spmi_scripts/superpmi.py" /> | ||
<HelixPostCommand Include="export spmi_upload_dir=$HELIX_WORKITEM_UPLOAD_ROOT" /> | ||
<HelixPostCommand Include="mkdir -p $spmi_upload_dir" /> | ||
<HelixPostCommand Include="export spmi_output_base_name=$spmi_collection_name.$spmi_collection_type.$spmi_collection_mch_file_tag" /> | ||
<HelixPostCommand Include="export spmi_finalmch=$spmi_upload_dir/$spmi_output_base_name.mch" /> | ||
<HelixPostCommand Include="export spmi_log_file=$spmi_upload_dir/$spmi_output_base_name.log" /> | ||
<HelixPostCommand Include="$HELIX_PYTHONPATH $spmi_superpmi_py collect -log_level DEBUG -core_root $spmi_core_root --skip_cleanup --clean --ci --skip_collection_step --skip_toc_step -temp_dir $spmi_collect_dir -output_mch_path $spmi_finalmch -log_file $spmi_log_file" /> | ||
</ItemGroup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same feedback as for the YML change. In general, the sendtohelix and sendtohelixhelp projects are already quite complex. I would prefer if we can move this optional feature into an optional import, i.e. an msbuild targets file.
Leave only new libraries run collection, for testing new code, and 'realworld' to make sure I didn't break anything fundamental.
Disk space on AzDO machines is running out during collection. Print disk space to see what we've got. E.g., for linux-x64, the Helix work items create 25.5GB, and the merged .mch is 12GB. In total, the correlation payload is 1.8GB and all downloaded workitems are 29GB. The total space needed including merged .mch is 44GB.
It needs to copy `SuperPMIShim*` environment variables when creating a new "empty" environment. (It already copies `DOTNET_*` variables.)
Put HelixPreCommand and HelixPostCommand specification in sendtohelix-superpmi-collect.targets file. Note that the order of these commands should not be sensitive (that is, it should not matter if they are added before or after other Helix commands).
b468dc3
to
a71c37f
Compare
@ViktorHofer @ericstj I extracted some SuperPMI-specific logic to separate .yml and .targets files. PTAL. The current test run is: https://dev.azure.com/dnceng/internal/_build/results?buildId=2254732&view=results (but it keeps hitting the "no VS2022 build tools" error (like every other pipeline)) |
value: 'Debug' | ||
${{ if eq(parameters.buildConfig, 'release') }}: | ||
value: 'Release' | ||
${{ if eq(parameters.buildConfig, 'checked') }}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still wonder why we need this here. We don't have a checked configuration in libraries. Also, please move this and other variables that are only needed by the superpmi YML file into the YML file, i.e. PythonScript
and others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just being consistent with the definition in the coreclr files.
Also, please move this and other variables that are only needed by the superpmi YML file into the YML file, i.e. PythonScript and others.
I don't believe that's possible: the superpmi YML file is a "steps" template and as such can't define variables. (These here are not actual YML variables definitions either, they are "variables" parameters to a template that later injects them in a "variables:" section)
Add superpmi-collect-variables.yml "variables template". These are used by the superpmi-postprocess-step.yml file, as well as other SuperPMI collection code.
``` The 'stages' parameter is not a valid StageList. /eng/pipelines/libraries/run-test-job.yml (Line: 31, Col: 1): Unexpected value 'variables' ```
@ViktorHofer I was able to extract the setting of superpmi variables. I was not able to make it work to extract out the naming of the job (templates are very particular and annoyingly limited). PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reacting to my feedback and cleaning things up.
The test failure in Depth1Test looks like a variant of #90777. |
Test in internal pipeline with these changes: https://dev.azure.com/dnceng/internal/_build/results?buildId=2255715&view=results |
With dotnet#91101 we have a SuperPMI collection of the libraries tests being run, so the existing PMI-based collection is someone duplicative: remove the PMI collection. It's arguable that the PMI collection of the libraries themselves is also duplicative, but we can decide whether to remove that one separately.
With #91101 we have a SuperPMI collection of the libraries tests being run, so the existing PMI-based collection is someone duplicative: remove the PMI collection. It's arguable that the PMI collection of the libraries themselves is also duplicative, but we can decide whether to remove that one separately.
Currently, we have a PMI collection of the libraries tests. A PMI collection doesn't represent actual code run, so doesn't have PGO data and compilations, OSR compilations, and tends to overemphasize generics since it attempts many instantiations that might not occur in practice.
Similar to #74961, which enabled a collection of a run coreclr tests, this change enables a collection of a run of libraries tests.
We collect two different scenarios: "normal", meaning no configuration switch variables set, and "no_tiered_compilation", meaning we set
DOTNET_TieredCompilation=0
.Because the amount of data collected is so large, we create each of these scenarios as a separate process, and a separate resulting .mch file. (If done all at once, we end up running out of disk space on the Azure DevOps machines that collect all the per-Helix collections and merge them into the single, large resulting .mch file.)The changes here are similar to (and sometimes a copy of) the changes in #74961, altered because the process of running the libraries tests is somewhat different in a few ways.
The "control flow" is as follows:
RunTests.cmd/sh
file, and is activated (enabled superpmi collection) by the Helix "pre" commands mentioned above.The change to CultureInfoCurrentCulture.cs is to fix a problem where the test creates a new clean environment but copies over a few environment variables. It needs to also copy over
SuperPMIShim*
variables.The collected data is quite large: about 700,000 methods in the "normal" scenario, and 300,000 in the "no_tiered_compilation" scenario, for a total of about 17GB for both.