Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve dual 'source of truth' for publishing info #4101

Closed
1 task
mmitche opened this issue Feb 7, 2024 · 55 comments
Closed
1 task

Resolve dual 'source of truth' for publishing info #4101

mmitche opened this issue Feb 7, 2024 · 55 comments
Labels
area-infra Source-build infrastructure and reporting Epic Groups multiple user stories. Can be grouped under a theme.

Comments

@mmitche
Copy link
Member

mmitche commented Feb 7, 2024

Right now we have two sources of truth for publishing information. We have the standard arcade publishing routines that involve Publishing.props, shipping/non-shipping info, relative paths for blobs, etc. Then we have the source-build method, which involves a couple targets (GetCategorizedIntermediateNupkgContents). This method isn't the source of truth for the product at the moment, and also doesn't preserve a lot of interesting information like relative blob path layouts, what assets are produced by a repo leg (it assumes zips files and nupkgs). We could add that kind of info in there, but we would just end up duplicating existing info. It also doesn't generate manifests that are usable by downstream infra like the staging pipelines.

This needs to get resolved. Ideally, we would use the same infra for both VMR and individual repo builds. We would go through the same standard arcade paths for publishing, generating the same manifest data, between VMR and individual repo builds.

So basically, when arcade repos build, they runs the normal Publish.proj, creating a manifest using PushToAzureDevOpsArtifacts with the outputs that arcade says it produced. Same for runtime, aspnetcore, etc. this would include the manifest. When we get to the end of the VMR build, we now have 1 manifest for each repo built in each vertical. These get merged together just like any other build.

The important bits here are:

  • The VMR needs to be able to build what MSFT builds today, and so it should probably lean on the existing sources of truth to do that.
  • The VMR needs to produce outputs that are deployable through the same mechanisms we use to deploy assets today (repo publishing and .NET staging+release).
  • The publishing process must remain source-only compatible.

Work Items

@mmitche mmitche converted this from a draft issue Feb 7, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added area-infra Source-build infrastructure and reporting untriaged labels Feb 7, 2024
@mmitche mmitche added the Epic Groups multiple user stories. Can be grouped under a theme. label Feb 7, 2024
@mmitche mmitche moved this to Ready in .NET Unified Build Feb 7, 2024
@mmitche
Copy link
Member Author

mmitche commented Feb 7, 2024

@NikolaMilosavljevic

@mmitche
Copy link
Member Author

mmitche commented Feb 7, 2024

Other notes. The way we do things today leads to differences between VMR and individual builds around how repos are finding upstream input artifacts. For instance: https://github.com/dotnet/installer/blob/main/src/redist/targets/GenerateLayout.targets#L264

Because VMR builds don't really "publish" the outputs of the repo build based on the layout the repo specifies, they just copy them over, a repo downstream will need to check the flag for VMR build, then alter the relative path that it's looking in.

Here's how I generally envision this works:

  1. Inner build runs the normal publish step. This gathers up the artifacts in ItemToPushToBlobFeed and whatnot, and calls PushToAzureDevOpsArtifacts.
  • We change this task to PushToBuildStorage to reflect its purpose better. It takes a flag that indicates whether it should do an AzDO push (note that this is a vso comment based upload today, not an actual upload), or whether it's running in a VMR build.
  • If in a VMR build, we push to a specified shared location on disk (blob-feed). This location preserves desired relative blob path, manifest files, etc..
  1. Downstream consumer builds use the shared location, like they do today, but in the same way they use the external shared locations today (via correct relative paths)
  2. The end of each VMR vertical gathers all manifest files from all builds and merges them, producing the output of the vertical. This uses the standard AzDO push like we do today (if running in azdo).
  3. The end of the full VMR build does a BAR upload with all vertical manifests as input.

The main difference between how repos move artifacts between themselves and this is just that the former has one extra step we don't need:

Repo automation:

Publish To AzDO artifacts -> Goes to Shipping, NonShipping and AssetManifests artifacts -> Call maestro publishing based on default channels -> Gather Build artifacts and push to AzDO feeds + blob storage --> Downstream repos consume based on dep updates

In the VMR case, if you wanted to match this exactly, you'd do:

Publish To Build artifacts -> Goes to Shipping, NonShipping and AssetManifests artifact folders on disk -> Call VMR publishing -> Gather Build artifacts and push to another location on disk --> Downstream repos consume based on those locations

But we don't really need the additional intermediate storage location and so a couple steps are degenerate. We can have Publish To Build artifacts do more work:

Publish To Build artifacts -> Goes to assets folders and feeds on disk, with generate manifests in assets --> Downstream repos consume based on those locations

@ViktorHofer
Copy link
Member

But we don't really need the additional intermediate storage location and so a couple steps are degenerate. We can have Publish To Build artifacts do more work:

Couple weeks ago when Nikolai implemented the removal of the intermediate packages in the VMR, we decided that the inner repo's responsibility shouldn't be to lay the files out to a specific location inside the VMR. Instead we created the manifest file that the VMR orchestrator then reads to copy the files to the shared location.

Just to provide context... I think that you propose makes sense and that it should be fine if the Publish step in the inner repo copies the artifacts directly into that shared location.

And I'm all for merging these two different publishing paths. Maintaining both is cumbersome.

@NikolaMilosavljevic
Copy link
Member

@mmitche publishing is conditioned to not run in product or inner-build. Were you thinking of enabling this in VMR, in inner-, or outer- repo build? If outer, we'd still be using GetCategorizedIntermediateNupkgContents, which we need to preserve for our current intermediate model of flowing artifacts between repos.

@mmitche
Copy link
Member Author

mmitche commented Feb 7, 2024

@mmitche publishing is conditioned to not run in product or inner-build. Were you thinking of enabling this in VMR, in inner-, or outer- repo build? If outer, we'd still be using GetCategorizedIntermediateNupkgContents, which we need to preserve for our current intermediate model of flowing artifacts between repos.

Yeah I was thinking that enabling publishing in the inner builds and product builds. Essentially no diff in when publishing is run. I think you still need the nupkg categorization like you say (for repo source build) to correctly divide up the artifacts for the intermediates. But I think this can be done within the confines of the existing publishing infra.

Brainstorming:

So in repo-source build you don't want to publish the inner-source artifacts directly (since they collide with the officially built MSFT ones), you want to wrap them and publish those. I wonder whether you could integrate the categorization into the manifest artifact data though, then use that to create intermediates like we do today.

For example, let's say runtime produces 50 nupkgs. Inner source build runs, using the publishing infra to identify those 50 nupkgs and produce a manifest like it would with any normal build. But we disable the push to AzDO part and just generate the manifest and a temporary layout on disk. This would be something like that change of "PushToAzureDevOpsArtifacts->PushToBuildArtifacts", where outer-source build would just specify that things get published to disk, not emit VSO logging commands. This would be very similar to how it would work in the full product build (copy to a specified shared location). Then we add manifest data for those artifacts which indicates that category. This would fit in exactly with the way the repos work today with their other types of categorization (e.g. nonshipping vs. shipping). The repo knows this categorization today, it just lives in outer vs. inner SB. So we move it and use the extensible ManifestArtifactData to indicate the category. :

https://github.com/dotnet/aspnetcore/blob/main/eng/Publishing.props#L68-L72

      <ItemsToPushToBlobFeed Include="@(_ChecksumsToPublish)">
        <ManifestArtifactData>NonShipping=true</ManifestArtifactData>
        <PublishFlatContainer>true</PublishFlatContainer>
        <RelativeBlobPath>$(_UploadPathRoot)/Runtime/$(_PackageVersion)/%(Filename)%(Extension)</RelativeBlobPath>
      </ItemsToPushToBlobFeed>

Maybe for this artifact this goes to:

      <ItemsToPushToBlobFeed Include="@(_ChecksumsToPublish)">
        <ManifestArtifactData>NonShipping=true;IntermediateCategory=crossgen</ManifestArtifactData>
        <PublishFlatContainer>true</PublishFlatContainer>
        <RelativeBlobPath>$(_UploadPathRoot)/Runtime/$(_PackageVersion)/%(Filename)%(Extension)</RelativeBlobPath>
      </ItemsToPushToBlobFeed>

So inner-source build is using the normal publish infra and let's say the shared location specified for publishing is artifacts\foo. The artifacts and manifest go there. Then outer-source build comes along and instead of doing the classification, it just reads the manifest from inner source build and creates the source build intermediates based on the manifest and classification. It will know the shared location to find them too. It then uses the normal publishing infra like it does today to push to the azure devops build storage.

Then, in full orchestrated mode, the main differences end up pretty minimal from repo source build:

  • Outer VMR builds specifies a shared location for the inner-publish that is the shared blob-feed location
  • Outer repo builds don't create any intermediates (like they don't today)
  • Outer repo builds publish using the standard infra, but this is a no-op because there are no intermediates.

@NikolaMilosavljevic
Copy link
Member

That should work. Would orchestrator still read the, new, manifest and copy the artifacts, or would outer repo build copy them to local storage during publish?

@mmitche
Copy link
Member Author

mmitche commented Feb 8, 2024

I think the orchestrator doesn't have to re-copy the artifacts. I think it would just tell the outer repo where the common publish location is, and outer repo would pass to inner repo. But, I think it would read the manifest to determine the versions of outputted artifacts.

This could then replace the logic today that determines versions for *PackageVersions.props. Rather than reading the output directory, read the manifests of the builds that a repo mentions as its dependencies. So even if msbuild builds before roslyn, unless we identify that msbuild should flow to roslyn, we don't actually read the manifests and generate PackageVersions.props This would then fit rather nicely with the need to only update versions based on what a repo depends on for @mthalman's multhreaded work.

Again, this is mostly brainstorming. Needs to be designed and tested. I can kind of see how it would fit together though and get rid of a lot of the custom SB infra in the process.

@NikolaMilosavljevic
Copy link
Member

Few more notes for brainstorming.

1 - Source-build package consumption expects both shipping and non-shipping packages in the same folder. So, we either 1) change that and introduce shipping and non-shipping feeds, or 2) allow new publishing process to copy both package types to the same output location.

2 - Outer build is producing a symbols package by harvesting all PDBs from inner-repo artifacts (obj folders). We need to include this package in repo manifest, and publish to local storage (VMR-provided location). This process needs to be moved to the inner-build and out of source-build infra - perhaps a new project, that could run in source-only if needed.

3 - Outer build is currently filtering out all *.symbols.nupkg - I think these should not be filtered anymore, as publishing to public feeds would happen after successful VMR build, and we want to publish everything we publish today from individual repo builds.

4 - Outer build is currently building a list of non-shipping packages and including it in the Intermediate package. The list was used to filter-out non-shipping packages during VMR's prebuilt detection. As we do not use Intermediate packages in VMR anymore, this list isn't used and all related code should be removed.

5 - There is already a simple target, that's currently blocked from running in inner build, that copies artifacts to source-build storage - PublishToSourceBuildStorage - https://github.com/dotnet/arcade/blob/e9a8e07465adf515a595e2afde2ffe893e973838/src/Microsoft.DotNet.Arcade.Sdk/tools/Publish.proj#L121-L125 This target should likely be rolled into new task i.e. "Publish To Build artifacts".

@ViktorHofer
Copy link
Member

1 - Source-build package consumption expects both shipping and non-shipping packages in the same folder. So, we either 1) change that and introduce shipping and non-shipping feeds, or 2) allow new publishing process to copy both package types to the same output location.

For a second I wondered if NuGet respects subfolders in local feeds as well but it doesn't. I think introducing a second local feed for the shipping vs non-shipping folder makes sense. It would make what I'm currently doing for #4104 much cleaner.

2 - Outer build is producing a symbols package by harvesting all PDBs from inner-repo artifacts (obj folders). We need to include this package in repo manifest, and publish to local storage (VMR-provided location). This process needs to be moved to the inner-build and out of source-build infra - perhaps a new project, that could run in source-only if needed.

In general it would be good to not rely on the outer repo build at all for publishing. The MSFT build doesn't have that concept. That would help with eventually collapsing the outer and the inner repo build.

3 - Outer build is currently filtering out all *.symbols.nupkg - I think these should not be filtered anymore, as publishing to public feeds would happen after successful VMR build, and we want to publish everything we publish today from individual repo builds.

Exactly.

4 - Outer build is currently building a list of non-shipping packages and including it in the Intermediate package. The list was used to filter-out non-shipping packages during VMR's prebuilt detection. As we do not use Intermediate packages in VMR anymore, this list isn't used and all related code should be removed.

Interesting. Can you please point me to that code in question? It might help with the above issue that I'm currently working on.

@mmitche
Copy link
Member Author

mmitche commented Feb 8, 2024

For a second I wondered if NuGet respects subfolders in local feeds as well but it doesn't. I think introducing a second local feed for the shipping vs non-shipping folder makes sense. It would make what I'm currently doing for #4104 much cleaner.

Agreed. That maps very nicely onto what we do for individual repo builds.

4 - Outer build is currently building a list of non-shipping packages and including it in the Intermediate package. The list was used to filter-out non-shipping packages during VMR's prebuilt detection. As we do not use Intermediate packages in VMR anymore, this list isn't used and all related code should be removed.

Prebuilt detection should just be able to read the manifest data for non-shipping vs. shipping. I think we would include the manifest in the intermediate packages.

@NikolaMilosavljevic
Copy link
Member

NikolaMilosavljevic commented Mar 5, 2024

OK - by using relative paths for publishing assets, we get something like this in Linux build - previously all these files were in a single directory:

├── Private.SourceBuilt.Artifacts..fedora.39-x64.tar.gz
├── Runtime
│   └── 9.0.0-preview.2.24123.1
│       ├── dotnet-apphost-pack-9.0.0-preview.2.24123.1-fedora.39-x64.tar.gz
│       ├── dotnet-crossgen2-9.0.0-preview.2.24123.1-fedora.39-x64.tar.gz
│       ├── dotnet-nethost-9.0.0-preview.2.24123.1-fedora.39-x64.tar.gz
│       ├── dotnet-runtime-9.0.0-preview.2.24123.1-fedora.39-x64.tar.gz
│       ├── dotnet-runtime-internal-9.0.0-preview.2.24123.1-fedora.39-x64.tar.gz
│       └── dotnet-runtime-symbols-fedora.39-x64-9.0.0-preview.2.24123.1.tar.gz
├── Sdk
│   ├── 9.0.100-preview.2.24123.3
│   │   ├── dotnet-toolset-internal-9.0.100-preview.2.24123.3.zip
│   │   └── dotnet-toolset-langpack-9.0.100-preview.2.24123.3.zip
│   └── 9.0.100-preview.3.24123.1
│       ├── dotnet-sdk-9.0.100-preview.3.24123.1-fedora.39-x64.tar.gz
│       ├── dotnet-sdk-9.0.100-preview.3.24123.1-fedora.39-x64.tar.gz.sha512
│       ├── fedora.39_x64_Release_version_badge.svg
│       ├── productCommit-fedora.39-x64.json
│       └── productCommit-fedora.39-x64.txt
├── aspnetcore
│   └── Runtime
│       └── 9.0.0-preview.2.24121.1
│           ├── aspnetcore-runtime-9.0.0-preview.2.24121.1-fedora.39-x64.tar.gz
│           ├── aspnetcore-runtime-composite-9.0.0-preview.2.24121.1-fedora.39-x64.tar.gz
│           ├── aspnetcore-targeting-pack-9.0.0-preview.2.24121.1-fedora.39-x64.tar.gz
│           └── aspnetcore_base_runtime.version
└── dotnet-symbols-all--fedora.39-x64.tar.gz

It's a bit confusing for consumption by distro maintainers. There are also two versioned sub-directories under Sdk. The first one is for artifacts produced by sdk repo, the second was created during publishing of installer artifacts.

Please ignore the issue with missing version in PSB artifacts and symbols archive - working on a fix.

@mmitche
Copy link
Member Author

mmitche commented Mar 5, 2024

It does match what we ship to customers though (this is the dotnetcli layout). What happens to the sdk outputs in the current layout?

@NikolaMilosavljevic
Copy link
Member

It does match what we ship to customers though (this is the dotnetcli layout). What happens to the sdk outputs in the current layout?

OK, that makes sense. Today, all these files are in root assets directory, no sub-directories.

@mmitche
Copy link
Member Author

mmitche commented Mar 5, 2024

The dual directory is also point in time for .NET 9, since the installer+sdk repo merge will happen eventually and then there will be one dir for the sdk.

@NikolaMilosavljevic
Copy link
Member

We could preserve the flat assets layout in source-only build, without introducing new task parameter (i.e. we could use existing PublishFlatContainer parameter).

We would need to keep the source-only conditioning in installer, i.e. https://github.com/dotnet/installer/blob/f1db5daa0c86badfc626905ec0963b0b33fe23fb/src/redist/targets/GenerateLayout.targets#L113

@mmitche
Copy link
Member Author

mmitche commented Mar 5, 2024

I think PublishFlatContainer is not about directory flatness but about blobs vs. packages (but not sure on this).

We could also just take the final outputs and put them to a flat directory in the VMR, but let the individual repo flow use the correct layout.

@NikolaMilosavljevic
Copy link
Member

I think PublishFlatContainer is not about directory flatness but about blobs vs. packages (but not sure on this).

We could also just take the final outputs and put them to a flat directory in the VMR, but let the individual repo flow use the correct layout.

Perhaps we could just copy the SDK to assets root, after all repos finish building. SDK, related symbols archive and PSB archive would be in root, and those are the key source-build artifacts.

@ViktorHofer
Copy link
Member

ViktorHofer commented Mar 15, 2024

This is mostly completed aside from runtime and windowsdesktop. For those two repositories I realized that while we now unify publishing inside the VMR, outside they continue to use prepare-artifacts.proj and don't pass the -publish action in when building the repo. Prepare-artifacts.proj is a separate code path and does more than what the current PRs implement.

They generate at least checksum files. The windowsdesktop prepare-artifacts.proj should be easy to remove (maybe it's really just the checksum generation that is missing). The runtime one will take more time.

I think the remaining work for those two repositories can be done in follow-ups but we shouldn't close this issues until that work is completed. Note that this doesn't need to be done for aspnetcore (which also generates checksums) as they already use Arcade's Publish.proj with the Publishing.props extension point.

@ViktorHofer
Copy link
Member

With dotnet/windowsdesktop#4251, windowsdesktop's publishing infrastructure is now unified.

deployment-tools and runtime remain.

@ViktorHofer
Copy link
Member

ViktorHofer commented Mar 19, 2024

Deployment-tools ✅ dotnet/deployment-tools#348

I don't plan to work on runtime, that's much more difficult. I think we should get back to it when we have signing support on non-Windows. Meanwhile, this issue should be parked.

@ViktorHofer
Copy link
Member

Just filed #4239. Closing the epic.

@github-project-automation github-project-automation bot moved this from In Progress to Done in .NET Source Build Mar 19, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in .NET Unified Build Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-infra Source-build infrastructure and reporting Epic Groups multiple user stories. Can be grouped under a theme.
Projects
Archived in project
Status: Done
Development

No branches or pull requests

4 participants