Calculating multiple TopK accuracies is slow/inefficient #744

rauhs · 2018-08-27T11:19:58Z

When calculating the top-k accuracy for multi-class classifiers the current evaluation code is pretty slow. Especially if multiple Top-K accuracies are desired (this will re-do a lot of the work unnecessarily).

Current implementation will first sort (N logN):

machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassifierEvaluator.cs

Lines 442 to 446 in f85e722

    
           if (Utils.Size(_indicesArr) < _scoresArr.Length) 
        
               _indicesArr = new int[_scoresArr.Length]; 
        
           int j = 0; 
        
           foreach (var index in Enumerable.Range(0, _scoresArr.Length).OrderByDescending(i => _scoresArr[i])) 
        
               _indicesArr[j++] = index;

Then get the index:

machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassifierEvaluator.cs

Lines 345 to 350 in f85e722

    
           if (OutputTopKAcc > 0) 
        
           { 
        
               int idx = Array.IndexOf(indices, label); 
        
               if (0 <= idx && idx < OutputTopKAcc) 
        
                   _numCorrectTopK += weight; 
        
           }

A more efficient algorithm would be to just calculate the rank (O(N) and no memory needed) of the correct label and keeping track of the seen ranks (0 being the best-case, correct prediction). Then the Top-k accuracy can be easily returned for any k. Possibly changing the API to returning a vector of top-k predictions.

One issue to discuss: What happens when the score is equal for multiple values? There would be a "best-case" and "worst-case" top-k accuracy.

The text was updated successfully, but these errors were encountered:

GalOshri · 2018-10-25T21:23:16Z

Thank you for filing this issue! Let's continue the discussion here regarding how to best improve this.

rauhs · 2018-10-26T08:18:04Z

Sure! Let me expand a bit. I actually also have this implemented in "user space" and replaced the evaluator of ML.NET since the evaluation of Top-1 to Top-5 took often MUCH longer than training the model. Now it's super quick and I can get any Top-k accuracy. Code:

// License: MIT

    /// <summary>
    /// Evaluates the model + data (either test or train) and returns some statistics about it. Eg:
    /// - Top-1 ... Top-k accuracy
    /// </summary>
    public PerformanceNew EvaluateModelNew(TIn[] data, int upToTopK = 5)
    {
      var labelsOfScores = this.Learner.ScoreLabelNames; // Old: TryGetScoreLabels
      // Invert the array (so label => idx) so we can do fast lookups of the correct label to the index in the scores array
      var labelMappingIndexTable = labelsOfScores.Select((v, i) => new { Key = i, Value = v }).ToDictionary(o => o.Value, o => o.Key);

      // This holds the seen ranks of the correct labels. Eg:
      // a[0] = 4023; // 4023 correct predictions (rank 0)
      // a[1] = 1022; // incorrect, though, only 1-off (second highest probability)
      // a[2] = 842; // Two off! :/
      // etc...
      // Note: We need +1 capacity when we have classes that we've never seen. They'll be at the end.
      var seenRanks = new long[labelsOfScores.Length + 1];
      var seenRanksBestCase = new long[labelsOfScores.Length + 1];

      var predictions = this.Learner.PredictMany(data);
      var numUnseenClasses = 0;

      foreach (var prediction in predictions)
      {
        // Find the index of the correct label
        var wasKnownLabel = labelMappingIndexTable.TryGetValue(prediction.OriginalLabel, out var idxCorrect);
        if (!wasKnownLabel)
          numUnseenClasses += 1;

        // Get the probability that the CORRECT label has: (best case is that it's the highest probability):
        var correctProba = !wasKnownLabel ? 0 : prediction.GetScore[idxCorrect];

        // Find the rank of the *correct* label (in Scores[]). If 0 => Good, correct. And the lower the better.
        // The rank will be from 0 to N. (Not N-1).
        // Problem: What if we have probabilities that are equal to the correct prediction (eg, .6 .1 .1 .1 .1).
        // This actually happens a lot with some models.
        // Note: People who're interested in TopK accuracy will often present the predictions sorted (from highest probability down).
        // OrderBy is stable in .NET and thus implementing finding the rank that would occur during sorting would be another (thrid) Top-k accuracy measure.
        // Note, the higher the probability the lower the rank (top prob. wins), so this is the other way around than other rank applications.
        
        // We subtract -1 here since the predicted one is included and we are 0-based here:
        var correctRankWorstCase = !wasKnownLabel ? prediction.GetScore.Length : prediction.GetScore.Count(score => score >= correctProba) - 1;
        var correctRankBestCase = !wasKnownLabel ? prediction.GetScore.Length : prediction.GetScore.Count(score => score > correctProba);
        seenRanks[correctRankWorstCase] += 1;
        seenRanksBestCase[correctRankBestCase] += 1;
      }
      return new PerformanceNew
      {
        // Note: All unseen classes are not part of the accuracy (similar to how other implementations handles this).
        // So we substract the unseen classes. This makes sense since in the end we will train the model on ALL data and this case will be unlikely.
        Accuracies = seenRanks.Take(upToTopK).Select(l => l / (float)(data.Length - numUnseenClasses)).CumulativeSum().ToArray(),
        AccuraciesBestCase = seenRanksBestCase.Take(upToTopK).Select(l => l / (float)(data.Length - numUnseenClasses)).CumulativeSum().ToArray(),
      };
   }

Complexity should be O( num_instances * num_classes ) assuming map lookup is roughly constant.

Problems:

The used OrderBy is stable, i.e. the location of the correct label will end up in a predictable index. This is noted in the comments also. I think that most people who're interest in Top-K probability will sort the given class probabilities. And many learners (also depends very much on the cost function) will produce classes with identical probabilities. E.g. hinge loss is much worse in that regard. So the problem is: For some learners there is a pretty big difference between worst and best case Top-k probabiliy. Though, for many they're both identical.
The API changes, the output is not a single number (the requested Top-k for a given k) but an array of accuracies.
Note sure if even TopK should be parameter. The last step (Select + CumulativeSum) is a tiny portion of the runtime.

Let me know if anything is unclear.

artidoro · 2018-10-27T01:10:52Z

@rauhs I agree that there would be a complexity improvement with your suggested change! @Zruty0 do you think that TopK should remain a parameter?

Zruty0 · 2018-10-27T01:58:10Z

I think that, if we go forward and implement the evaluator change, we should just have top-K accuracies for all K between 1 and number of classes. Thanks @rauhs for the code sample!

Ivanidzo4ka added good first issue Good for newcomers up-for-grabs A good issue to fix if you are trying to contribute to the project question Further information is requested labels Oct 19, 2018

GalOshri added the need info This issue needs more info before triage label Oct 25, 2018

Zruty0 removed the need info This issue needs more info before triage label Oct 29, 2018

wschin assigned wschin and unassigned wschin Dec 20, 2018

rauhs mentioned this issue May 17, 2019

How to inspect OneVersusAll models #3701

Open

codemzs added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jun 30, 2019

antoniovs1029 added enhancement New feature or request perf Performance and Benchmarking related and removed question Further information is requested labels Jan 10, 2020

jasallen mentioned this issue Sep 12, 2020

Perf improvement for TopK Accuracy and return all topK in Classification Evaluator #5395

Merged

4 tasks

antoniovs1029 closed this as completed in #5395 Dec 9, 2020

ghost locked as resolved and limited conversation to collaborators Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating multiple TopK accuracies is slow/inefficient #744

Calculating multiple TopK accuracies is slow/inefficient #744

rauhs commented Aug 27, 2018

GalOshri commented Oct 25, 2018

rauhs commented Oct 26, 2018 •

edited

Loading

artidoro commented Oct 27, 2018

Zruty0 commented Oct 27, 2018

Calculating multiple TopK accuracies is slow/inefficient #744

Calculating multiple TopK accuracies is slow/inefficient #744

Comments

rauhs commented Aug 27, 2018

GalOshri commented Oct 25, 2018

rauhs commented Oct 26, 2018 • edited Loading

artidoro commented Oct 27, 2018

Zruty0 commented Oct 27, 2018

rauhs commented Oct 26, 2018 •

edited

Loading