Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cross-product matrix strategy #20

Closed
mikeharder opened this issue Oct 19, 2018 · 26 comments
Closed

Add cross-product matrix strategy #20

mikeharder opened this issue Oct 19, 2018 · 26 comments

Comments

@mikeharder
Copy link

mikeharder commented Oct 19, 2018

By "cross-product matrix", I mean the scenario where you want to use the matrix strategy on a cross-product of multiple dimensions. For example, say you want to run tests on 2 OS (Linux, Windows) and 2 versions of Python (3.6, 3.7). Currently, you'd need to manually create a matrix with all the combinations:

strategy:
  matrix:
    Linux_Python36:
      VM_IMAGE: 'ubuntu-16.04'
      PYTHON_VERSION: '3.6'
    Linux_Python37:
      VM_IMAGE: 'ubuntu-16.04'
      PYTHON_VERSION: '3.7'
    Windows_Python36:
      VM_IMAGE: 'vs2017-win2016'
      PYTHON_VERSION: '3.6'
    Windows_Python37:
      VM_IMAGE: 'vs2017-win2016'
      PYTHON_VERSION: '3.7'

While this isn't too bad with a small number of dimensions and a small number of variables in each dimension, it can quickly become unmaintainable as the dimensions and variables grow. It would be nice if there was a way to express this more succinctly, something like:

strategy:
  cross-product:
    PYTHON_VERSION: [ '3.6', '3.7' ]
    VM_IMAGE: [ `ubuntu-16.04`, `vs2017-win2016` ]
@vtbassmatt
Copy link
Member

The concept is good. A handful of issues to think on.

I need to predictably generate job names (in the current unrolled syntax, Windows_Python37 is the job name) for downstream dependencies and accessing output variables. I don't love "Job1" - "JobN" but that seems to be the most straightforward. We could define and document what order we generate them in.

I also need to handle exceptions as a subkey of cross-product. So the syntax might end up more like:

strategy:
  cross-product:
    vars:
      PYTHON_VERSION: [ '3.6', '3.7', '2.7' ]
      VM_IMAGE: [ 'ubuntu-16.04', 'vs2017-win2016' ]
    except:
    - PYTHON_VERSION: '2.7'  # skip Python 2.7 on Windows
      VM_IMAGE: 'vs2017-win2016'

@mikeharder
Copy link
Author

Overall this looks great. In combination with #4, it should be perfect for our current requirements.

A few questions about the job name:

  1. Would it make sense to allow optionally specifying a format string for the job name (example "{0}_{1}"), to override the default?
  2. How would dependsOn work? With the current matrix strategy, it appears that dependsOn just references the base job name:
- job: 'Test'
  strategy:
    matrix:
      Python36:
        python.version: '3.6'
      Python37:
        python.version: '3.7'

- job: 'Publish'
  dependsOn: 'Test'

Would cross-product work the same, so another job can depend on the entire cross product with a single dependency?

@vtbassmatt
Copy link
Member

Hmm, I need to think more about dependsOn. I thought you could "reach into" the matrix and depend on a single resulting job, but if not, then no reason to introduce that here. (In fact, this would likely be implemented as syntactic sugar on top of matrix, so we'd get the exact same level of support.)

Regarding a name: we have restrictions on what you can name jobs (most obvious one, no spaces) but wouldn't want to restrict values that way. I guess if we added some new functions - maybe make_job_name(string) - that you could manipulate the values with, that might cover it.

@mikeharder
Copy link
Author

A default job name would be sufficient for our needs -- I was just throwing out the idea of a format string. I also agree this should be syntactic sugar on top of matrix to minimize the number of fundamental concepts.

@jbergstroem
Copy link

Shameless bump! This would be a very good quality-of-life feature.

@sylveon
Copy link

sylveon commented Apr 17, 2019

This would also be very useful for compiling programs using multiple architectures and build configurations:

strategy:
  cross-product:
    PLATFORM: [ "ARM", "ARM64", "x86", "x64" ]
    CONFIGURATION: [ "Debug", "Release" ]

@t-eckert
Copy link

t-eckert commented May 7, 2019

@vtbassmatt I'm going to subscribe to this. I ran into this (like you did) when moving Click to Azure Pipelines.

@DonJayamanne
Copy link

DonJayamanne commented May 10, 2019

I know this may not be what everyone is after, however I've gone with the following approach:

parameters:
  jobs: []
  pythonVersions: ["3.7", "3.6", "3.5", "2.7"],
  operatingSystems: ["....."]

jobs:
- job: xxx
  strategy:
    matrix:
      ${{ each job in parameters.jobs }}:
        ${{ each py in parameters.pythonVersions }}:
            ${{ each os in parameters.operatingSystems }}:
              ${{ format('{0}{1}', py, os) }}:

You still get the matrix from where the template is used by passing parameters as such (this will basically override what was setup in the template):

- template xyz.yaml
  parameters
    pythonVersions: [...]
    operatinSystem: [...]

This approach has allowed us to generate over 100jobs using a simple small for loop, soon to grow to around 180!.

@AsValeO
Copy link

AsValeO commented Sep 16, 2019

@vtbassmatt

Hmm, I need to think more about dependsOn. I thought you could "reach into" the matrix and depend on a single resulting job, but if not, then no reason to introduce that here...

Is this not available now?
For example this matrix job builds for windows and linux:

- job: Build
    strategy:
      matrix:
       Windows:
        os: 'windows'
       Linux:
        os: 'linux'

and I want then to deploy only Windows result in next stage:

- stage: Deploy
  dependsOn: Build Windows

causes

"Stage Deploy depends on unknown stage Build Windows."

same for Job:

"Stage Build job JobAfterBuild depends on unknown job Build Windows."

Now deploy stage waits for whole matrix to complete with dependsOn: Build

@janpio
Copy link

janpio commented Oct 6, 2019

Thanks @DonJayamanne, that looks awesome. Can you maybe share a project where something like this is in use?

As an additional data point here: GitHub Actions (which seems to be based on Azure Pipelines a bit) has native support for this: https://help.github.com/en/articles/workflow-syntax-for-github-actions#example-9 (second example)

@vtbassmatt
Copy link
Member

We made some changes to the parser to support this in Actions. It's still on my radar for Azure Pipelines as well.

@JustinGrote
Copy link

JustinGrote commented Oct 10, 2019

I spent too much time overthinking this and trying to get @DonJayamanne's example to work. Ended up just using a powershell script to generate it. Here's an example for building Powershell on all OS, but don't try to build Windows Powershell on linux systems:

$os = @(
    'windows-latest'
    'vs2017-win2016'
    'ubuntu-latest'
    'macOS-latest'
)

$psversion = @(
    'pwsh'
    'pwsh-preview'
    'powershell'
)

$exclude = 'ubuntu-latest-powershell','macOS-latest-powershell'

$entries = @{}
foreach ($osItem in $os) {
    foreach ($psverItem in $psversion) {
        $entries."$osItem-$psverItem" = @{os=$osItem;psversion=$psverItem}
    }
}

$exclude.foreach{
    $entries.Remove($PSItem)
}

$entries.keys | sort | foreach {
    "      $PSItem`:"
    "        os: $($entries[$PSItem].os)"
    "        psversion: $($entries[$PSItem].psversion)"
}

Output

      macOS-latest-pwsh:
        os: macOS-latest
        psversion: pwsh
      macOS-latest-pwsh-preview:
        os: macOS-latest
        psversion: pwsh-preview
      ubuntu-latest-pwsh:
        os: ubuntu-latest
        psversion: pwsh
      ubuntu-latest-pwsh-preview:
        os: ubuntu-latest
        psversion: pwsh-preview
      vs2017-win2016-powershell:
        os: vs2017-win2016
        psversion: powershell
      vs2017-win2016-pwsh:
        os: vs2017-win2016
        psversion: pwsh
      vs2017-win2016-pwsh-preview:
        os: vs2017-win2016
        psversion: pwsh-preview
      windows-latest-powershell:
        os: windows-latest
        psversion: powershell
      windows-latest-pwsh:
        os: windows-latest
        psversion: pwsh
      windows-latest-pwsh-preview:
        os: windows-latest
        psversion: pwsh-preview

@vtbassmatt Github Actions does make this so much easier, would love to see this backported to Azure PIpelines.

@JustinGrote
Copy link

@vtbassmatt Update check, you said this was integrated into Github Actions and was on the roadmap for Azure PIpelines 6 months ago. Any progress update?

@JustinGrote
Copy link

Also @DonJayamanne do you have a concrete example of your method? I've taken a second pass and can't get it to work at all, it appears each is for iterative addition, you have to still specify every job individually. I checked your repositories and don't see any with devops pipelines yaml.

@vtbassmatt
Copy link
Member

@JustinGrote no progress on this one unfortunately.

@JustinGrote
Copy link

JustinGrote commented Feb 25, 2020

After a lot of trial and error, here's a concrete example of @DonJayamanne's method working. It just echos powershell scripts but it gives you an idea. Gonna try to work exclusions in with "if"

matrix.yml

parameters:
  platform: ['x86','x64']
  test: ['test1','test2']
  testlevel2: ['test1','test2']

jobs:
  - ${{ each platform in parameters.platform }}:
    - ${{ each test in parameters.test }}:
      - ${{ each testlevel2 in parameters.testlevel2 }}:
        - job: 
          displayName: ${{ platform }}_${{ test }}_${{ testlevel2 }}
          steps:
          - task: Powershell@2
            displayName: '${{ platform }}-${{ test }}-${{ testlevel2 }}'
            inputs:
              targettype: inline
              script: echo '${{ platform }}-${{ test }}-${{ testlevel2 }}'

azure-pipelines.yml

jobs:
  - template: matrix.yml

Result

image

@nedrebo
Copy link

nedrebo commented Mar 4, 2020

I have put up a repository with a basic example of a workaround here: https://github.com/nedrebo/parameterized-azure-jobs

@JustinGrote
Copy link

@nedrebo Nice! I have a pretty comprehensive powershell one here as well (this link may disappear as it's a link to a temporary feature branch)
https://github.com/JustinGrote/PowerCD/blob/ci/azure-pipelines.yml
https://dev.azure.com/justingrote/Github/_build/results?buildId=710&view=results

@vtbassmatt
Copy link
Member

FWIW we linked to @nedrebo's example in the official docs: https://docs.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=azure-devops&tabs=schema%2Cparameter-schema#matrix (it's a note near the bottom of that section)

@stale
Copy link

stale bot commented Apr 25, 2020

In order to consolidate to fewer feedback channels, we've moved suggestions and issue reporting to Developer Community. Sorry for any confusion resulting from this move.

@altendky
Copy link

How about an actual link to the corresponding issue over there?

@stale
Copy link

stale bot commented Apr 26, 2020

In order to consolidate to fewer feedback channels, we've moved suggestions and issue reporting to Developer Community. Sorry for any confusion resulting from this move.

@altendky
Copy link

Yes, but where is this issue covered? It's like I'm talking to a send-only bot... :]

@Saibamen
Copy link

I moved it into DC: https://developercommunity.visualstudio.com/idea/1008351/add-cross-product-matrix-strategy.html

Use Follow button on the right to track it

@stale
Copy link

stale bot commented Apr 28, 2020

In order to consolidate to fewer feedback channels, we've moved suggestions and issue reporting to Developer Community. Sorry for any confusion resulting from this move.

@stale stale bot closed this as completed Apr 29, 2020
@Saibamen
Copy link

Answer from MSFT:

Heng Liu [MSFT]
2 days ago
I have the same issue.

I have a matrix of two jobs. E.g, the first job runs 10 tasks, the second one only runs 5 of the 10 tasks. So the second job always finishes much sooner than the first one.

There is a successor job, depends on the matrix. I wish it can depend on the second job of the matrix, so that the successor job can start earlier.

However, I can't find a way to achieve that. The successor job has to wait for the whole matrix to complete.

Second answer:

Heng Liu [MSFT]
2 days ago
If there is a way to set dependsOn for individual matrix job, would you pls give me an example? Thanks!

We have a matrix of two jobs. The first one runs full sets of tasks, the second one only runs a part of that.

We would like to make the successor job depend on the second job, which finishes sooner. However, we can't find a way to do that and have to wait for the whole matrix to finish, which makes our build runs longer.

ghost pushed a commit to Azure/azure-sdk-tools that referenced this issue Feb 19, 2021
This PR is a port of functionality that is currently duplicated across the net/java/python repositories. The intent was to settle on an implementation before moving it to the /eng/common/scripts directory. After merge to this location, I'll update the net/java/python and js repos to point to the `eng/common/scripts` location and remove the scripts from `eng/scripts`.

Here is the PR text used against the other repos for reference:

This adds scripts, docs and samples supporting dynamic, cross-product matrix generation for azure pipeline jobs.
It aims to replicate the [cross-product matrix functionality in github actions](https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions#example-running-with-more-than-one-version-of-nodejs),
but also adds some additional features like sparse matrix generation, cross-product includes and excludes, parameter grouping and matrix filters.

This functionality is made possible by the ability for the azure pipelines yaml to take a [dynamic variable as an input
for a job matrix definition](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&tabs=yaml#multi-job-configuration) (see the code sample at the bottom of the linked section).

See the README.md file for more details on the config file syntax and usage, as well as implementation details.

The tests (`test-matrix-functions.tests.ps1`) contain a lot of detail on expected data structures at various processing stages. The `-`test-matrix-functions.ps1` file could perhaps be split up or use some more organization, so let me know if it's hard to navigate.

Example:
```
{
  "displayNames": {
    "true": "TestFromSource"
  },
  "matrix": {
    "Agent": {
      "ubuntu-18.04": { "OSVmImage": "MMSUbuntu18.04", "Pool": "azsdk-pool-mms-ubuntu-1804-general" },
      "windows-2019": { "OSVmImage": "MMS2019", "Pool": "azsdk-pool-mms-win-2019-general" },
      "macOS-10.15": { "OSVmImage": "macOS-10.15", "Pool": "Azure Pipelines" }
    },
    "JavaTestVersion": [ "1.8", "1.11" ],
    "AZURE_TEST_HTTP_CLIENTS": [ "okhttp", "netty" ]
  },
  "include": [
    {
      "Agent": {
          "ubuntu-18.04": { "OSVmImage": "MMSUbuntu18.04", "Pool": "azsdk-pool-mms-ubuntu-1804-general" }
      },
      "JavaTestVersion": "1.11",
      "AZURE_TEST_HTTP_CLIENTS": "netty",
      "TestFromSource": true
    }
  ]
}
```

Sparse matrix job generation in a pipeline: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=705622&view=results

![image](https://user-images.githubusercontent.com/1020379/106040177-151e1f80-60a8-11eb-823c-2af96b5e84aa.png)

Related discussion: microsoft/azure-pipelines-yaml#20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests