Have better strategy of merging spec durations #309

OriR · 2024-07-15T11:35:02Z

Hey

I ran into a couple of issues with merging spec durations:

The durations from each machine are written out with all the other spec durations, including those that haven't changed in this run, even though the plugin prints out to the console just the durations for a specific run.
The "average" strategy is not always optimal
1. If you're merging 3 different files where only the middle one has a change in a certain spec duration then it won't actually be the average of the "two" different durations. It's actually a division by 4 not by 2. It can happen because of issue No.1, because it writes all the durations for all specs even for those that haven't changed, so when merging using the merge utility, it will see the same duration for a given spec for all the machines it didn't run on
2. Sometimes (I would probably say most times), you don't want an average but rather just take the most recent one. Using the average means that the duration approaches the runtime of the spec but it can take a while. Using the most recent one (with a threshold defined) is roughly the same but with one single "jump". That lack of "memory" might be a good thing, if for instance, you added/removed a bunch of tests from a spec file, this would have a significant impact on the runtime and using the average it would take a bunch of updates till it actually reaches a reasonably close runtime.

I'll elaborate on each:

1 - All spec durations are written out.

Here's an example of what the plugin prints to the console:

09:27:41  cypress-split: here are passing spec timings
09:27:41  {
09:27:41    "durations": [
09:27:41      {
09:27:41        "spec": "cypress/e2e/A.spec.ts",
09:27:41        "duration": 111045
09:27:41      },
09:27:41      {
09:27:41        "spec": "cypress/e2e/B.spec.ts",
09:27:41        "duration": 33589
09:27:41      },
09:27:41      {
09:27:41        "spec": "cypress/e2e/C.spec.ts",
09:27:41        "duration": 38835
09:27:41      },
09:27:41      {
09:27:41        "spec": "cypress/e2e/D.spec.ts",
09:27:41        "duration": 9878
09:27:41      }
09:27:41    ]
09:27:41  }

While infact the after it prints this:

cypress-split: writing out updated timings file /cypress-timings-machine-2.json

You'd see something like this:

{
  "durations": [
    {
      "spec": "cypress/e2e/A.spec.ts",
      "duration": 111045
    },
    {
      "spec": "cypress/e2e/B.spec.ts",
      "duration": 33589
    },
    {
      "spec": "cypress/e2e/C.spec.ts",
      "duration": 38835
    },
    {
      "spec": "cypress/e2e/D.spec.ts",
      "duration": 3409
    },
    {
      "spec": "cypress/e2e/E.spec.ts",
      "duration": 88930
    },
    {
      "spec": "cypress/e2e/F.spec.ts",
      "duration": 234441
    },
    {
      "spec": "cypress/e2e/G.spec.ts",
      "duration": 93823
    }
  ]
}

Even though specs E-G haven't run on machine No.2. This isn't a problem by itself, but this causes an issue when merging the timings from different machines using the average strategy.

2 - Average strategy is not always optimal

2.1 - "Average" is not exactly the average

Simplifying the above example, let's assume we're running our tests on 3 machines, these would be the 3 written cypress-timings-machine-*.json files:

cypress-timings-machine-0.json:

{
  "durations": [
    {
      "spec": "cypress/e2e/A.spec.ts",
      "duration": 108745
    },
    {
      "spec": "cypress/e2e/B.spec.ts",
      "duration": 33589
    },
    {
      "spec": "cypress/e2e/C.spec.ts",
      "duration": 38835
    }
  ]
}

cypress-timings-machine-1.json:

{
  "durations": [
    {
      "spec": "cypress/e2e/A.spec.ts",
      "duration": 111045
    },
    {
      "spec": "cypress/e2e/B.spec.ts",
      "duration": 35123
    },
    {
      "spec": "cypress/e2e/C.spec.ts",
      "duration": 38835
    }
  ]
}

cypress-timings-machine-2.json:

{
  "durations": [
    {
      "spec": "cypress/e2e/A.spec.ts",
      "duration": 111045
    },
    {
      "spec": "cypress/e2e/B.spec.ts",
      "duration": 33589
    },
    {
      "spec": "cypress/e2e/C.spec.ts",
      "duration": 40123
    }
  ]
}

If we'll look at spec B.spec.ts and merge the durations using the current average code

cypress-split/src/timings.js

Lines 102 to 121 in 7a2cf41

    
           function mergeSplitTimings(timings, debug = noop) { 
        
             // [spec relative name]: duration 
        
             const specResults = {} 
        
             debug('merging %d timings', timings.length) 
        
             timings.forEach((json, k) => { 
        
               debug('timings list %d with %d entries', k + 1, json.durations.length) 
        
             }) 
        
             timings.forEach((json) => { 
        
               json.durations.forEach((item) => { 
        
                 if (!specResults[item.spec]) { 
        
                   specResults[item.spec] = item.duration 
        
                 } else { 
        
                   // average the durations 
        
                   const maxDuration = (item.duration + specResults[item.spec]) / 2 
        
                   specResults[item.spec] = maxDuration 
        
                 } 
        
               }) 
        
             })

We'll do (33589 + 35123) / 2 which results in 34356, then we'll merge the last one, which would be (34356 + 33589) / 2 and that gives 33972.5 while the real "average" is 34356. Although doesn't look like much, in tests that take 2+ minutes, this could mean a few seconds difference. This can tip the scale on how the tests are distributed across the machines, not by a lot but still.

If we'll calculate the same for C.spec.ts we'll see that it actually is the average of 40123 and 38835. However A.spec.ts, would not be like C but more like B. It's somewhat inconsistent.

2.2 - Taking the most recent duration of a spec

Again, with the above example, if we instead just take the duration of the most recent run, we avoid the average issue altogether. One could also say that the duration of the most recent is closer to the "real" runtime of the spec than the average duration of all runs.

The text was updated successfully, but these errors were encountered:

tuanna45 · 2024-07-27T10:24:54Z

I have the same problem as you. Hopefully, it should be fixed soon. @bahmutov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have better strategy of merging spec durations #309

Have better strategy of merging spec durations #309

OriR commented Jul 15, 2024

tuanna45 commented Jul 27, 2024

Have better strategy of merging spec durations #309

Have better strategy of merging spec durations #309

Comments

OriR commented Jul 15, 2024

1 - All spec durations are written out.

2 - Average strategy is not always optimal

2.1 - "Average" is not exactly the average

2.2 - Taking the most recent duration of a spec

tuanna45 commented Jul 27, 2024