Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating multiple TopK accuracies is slow/inefficient #744

Closed
rauhs opened this issue Aug 27, 2018 · 4 comments · Fixed by #5395
Closed

Calculating multiple TopK accuracies is slow/inefficient #744

rauhs opened this issue Aug 27, 2018 · 4 comments · Fixed by #5395
Labels
enhancement New feature or request good first issue Good for newcomers P2 Priority of the issue for triage purpose: Needs to be fixed at some point. perf Performance and Benchmarking related up-for-grabs A good issue to fix if you are trying to contribute to the project

Comments

@rauhs
Copy link
Contributor

rauhs commented Aug 27, 2018

When calculating the top-k accuracy for multi-class classifiers the current evaluation code is pretty slow. Especially if multiple Top-K accuracies are desired (this will re-do a lot of the work unnecessarily).

Current implementation will first sort (N logN):

Then get the index:

A more efficient algorithm would be to just calculate the rank (O(N) and no memory needed) of the correct label and keeping track of the seen ranks (0 being the best-case, correct prediction). Then the Top-k accuracy can be easily returned for any k. Possibly changing the API to returning a vector of top-k predictions.

One issue to discuss: What happens when the score is equal for multiple values? There would be a "best-case" and "worst-case" top-k accuracy.

@Ivanidzo4ka Ivanidzo4ka added good first issue Good for newcomers up-for-grabs A good issue to fix if you are trying to contribute to the project question Further information is requested labels Oct 19, 2018
@GalOshri GalOshri added the need info This issue needs more info before triage label Oct 25, 2018
@GalOshri
Copy link
Contributor

Thank you for filing this issue! Let's continue the discussion here regarding how to best improve this.

@rauhs
Copy link
Contributor Author

rauhs commented Oct 26, 2018

Sure! Let me expand a bit. I actually also have this implemented in "user space" and replaced the evaluator of ML.NET since the evaluation of Top-1 to Top-5 took often MUCH longer than training the model. Now it's super quick and I can get any Top-k accuracy. Code:

// License: MIT

    /// <summary>
    /// Evaluates the model + data (either test or train) and returns some statistics about it. Eg:
    /// - Top-1 ... Top-k accuracy
    /// </summary>
    public PerformanceNew EvaluateModelNew(TIn[] data, int upToTopK = 5)
    {
      var labelsOfScores = this.Learner.ScoreLabelNames; // Old: TryGetScoreLabels
      // Invert the array (so label => idx) so we can do fast lookups of the correct label to the index in the scores array
      var labelMappingIndexTable = labelsOfScores.Select((v, i) => new { Key = i, Value = v }).ToDictionary(o => o.Value, o => o.Key);

      // This holds the seen ranks of the correct labels. Eg:
      // a[0] = 4023; // 4023 correct predictions (rank 0)
      // a[1] = 1022; // incorrect, though, only 1-off (second highest probability)
      // a[2] = 842; // Two off! :/
      // etc...
      // Note: We need +1 capacity when we have classes that we've never seen. They'll be at the end.
      var seenRanks = new long[labelsOfScores.Length + 1];
      var seenRanksBestCase = new long[labelsOfScores.Length + 1];

      var predictions = this.Learner.PredictMany(data);
      var numUnseenClasses = 0;

      foreach (var prediction in predictions)
      {
        // Find the index of the correct label
        var wasKnownLabel = labelMappingIndexTable.TryGetValue(prediction.OriginalLabel, out var idxCorrect);
        if (!wasKnownLabel)
          numUnseenClasses += 1;

        // Get the probability that the CORRECT label has: (best case is that it's the highest probability):
        var correctProba = !wasKnownLabel ? 0 : prediction.GetScore[idxCorrect];

        // Find the rank of the *correct* label (in Scores[]). If 0 => Good, correct. And the lower the better.
        // The rank will be from 0 to N. (Not N-1).
        // Problem: What if we have probabilities that are equal to the correct prediction (eg, .6 .1 .1 .1 .1).
        // This actually happens a lot with some models.
        // Note: People who're interested in TopK accuracy will often present the predictions sorted (from highest probability down).
        // OrderBy is stable in .NET and thus implementing finding the rank that would occur during sorting would be another (thrid) Top-k accuracy measure.
        // Note, the higher the probability the lower the rank (top prob. wins), so this is the other way around than other rank applications.
        
        // We subtract -1 here since the predicted one is included and we are 0-based here:
        var correctRankWorstCase = !wasKnownLabel ? prediction.GetScore.Length : prediction.GetScore.Count(score => score >= correctProba) - 1;
        var correctRankBestCase = !wasKnownLabel ? prediction.GetScore.Length : prediction.GetScore.Count(score => score > correctProba);
        seenRanks[correctRankWorstCase] += 1;
        seenRanksBestCase[correctRankBestCase] += 1;
      }
      return new PerformanceNew
      {
        // Note: All unseen classes are not part of the accuracy (similar to how other implementations handles this).
        // So we substract the unseen classes. This makes sense since in the end we will train the model on ALL data and this case will be unlikely.
        Accuracies = seenRanks.Take(upToTopK).Select(l => l / (float)(data.Length - numUnseenClasses)).CumulativeSum().ToArray(),
        AccuraciesBestCase = seenRanksBestCase.Take(upToTopK).Select(l => l / (float)(data.Length - numUnseenClasses)).CumulativeSum().ToArray(),
      };
   }

Complexity should be O( num_instances * num_classes ) assuming map lookup is roughly constant.

Problems:

  • The used OrderBy is stable, i.e. the location of the correct label will end up in a predictable index. This is noted in the comments also. I think that most people who're interest in Top-K probability will sort the given class probabilities. And many learners (also depends very much on the cost function) will produce classes with identical probabilities. E.g. hinge loss is much worse in that regard. So the problem is: For some learners there is a pretty big difference between worst and best case Top-k probabiliy. Though, for many they're both identical.
  • The API changes, the output is not a single number (the requested Top-k for a given k) but an array of accuracies.
  • Note sure if even TopK should be parameter. The last step (Select + CumulativeSum) is a tiny portion of the runtime.

Let me know if anything is unclear.

@artidoro
Copy link
Contributor

@rauhs I agree that there would be a complexity improvement with your suggested change! @Zruty0 do you think that TopK should remain a parameter?

@Zruty0
Copy link
Contributor

Zruty0 commented Oct 27, 2018

I think that, if we go forward and implement the evaluator change, we should just have top-K accuracies for all K between 1 and number of classes. Thanks @rauhs for the code sample!

@Zruty0 Zruty0 removed the need info This issue needs more info before triage label Oct 29, 2018
@wschin wschin assigned wschin and unassigned wschin Dec 20, 2018
@codemzs codemzs added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jun 30, 2019
@antoniovs1029 antoniovs1029 added enhancement New feature or request perf Performance and Benchmarking related and removed question Further information is requested labels Jan 10, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request good first issue Good for newcomers P2 Priority of the issue for triage purpose: Needs to be fixed at some point. perf Performance and Benchmarking related up-for-grabs A good issue to fix if you are trying to contribute to the project
Projects
None yet
8 participants