Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recalculate total score and accuracy during indexing #126

Closed
wants to merge 3 commits into from

Conversation

smoogipoo
Copy link
Contributor

@smoogipoo smoogipoo commented Aug 29, 2022

Prereqs:

I've tested that this indexes the scores with the re-written total_score/accuracy members, but I don't know how to test beyond that. Appreciate guidance/testing by others.

@smoogipoo smoogipoo mentioned this pull request Aug 29, 2022
14 tasks
@notbakaneko
Copy link
Contributor

Is total_score not set or something?

@nanaya
Copy link
Contributor

nanaya commented Aug 29, 2022

legacy total score still need to be indexed (and added to schema).

Also I think there's another problem with having two different total scores as there can only be one sort option... I don't know how to fix it yet (short of having two different indexes).

@peppy suggested adding legacy score flag to the index and if the data contains legacy_total_score, use that for total score and mark the legacy score flag. The sorting then will still be consistent assuming the scores are filtered by the flag. The side effect is legacy and lazer scores can't be displayed on same leaderboard anymore as they have different total score calculation.

Sort field settings in the schema (settings.index.sort.*) may also need to be updated to include the flag ['is_legacy', 'total_score', 'id'] (https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-modules-index-sorting-conjunctions.html)

@smoogipoo
Copy link
Contributor Author

smoogipoo commented Aug 29, 2022

Yeah, I've yet to add legacy_total_score at all. I kinda had a brain fart that that's actually required for this PR to go through (right?).

In terms of you displaying them on the same leaderbords - you can. It just depends on whether users want to see stable-only scores (use legacy_total_score) or stable+lazer scores (use total_score / the new calcs). Orders will change but that's expected and iirc based on the survey results users think that's fine too.

@nanaya
Copy link
Contributor

nanaya commented Aug 29, 2022

it's elasticsearch thing. The data need to be stored in one specific order so it can be accessed quickly. It's currently sorted by total_score and sorting by any other field will be quite a bit slower.

Looking at how classic score is calculated for lazer scores, it's probably feasible to index them by classic scoring...?

@smoogipoo
Copy link
Contributor Author

smoogipoo commented Aug 29, 2022

Classic scoring matches standardised scoring, it's a constant (beatmap-wide) multiplier on top. It doesn't match osu!stable unfortunately.

@nanaya
Copy link
Contributor

nanaya commented Aug 29, 2022

At any rate, only one set of fields can be sort optimised. Currently it's total_score, id. If legacy_total_score is added then sorting by it will be slower.

Hence the suggestion above to store total score for legacy (data.legacy_total_score) and lazer (calculated) in same field (total_score) with a flag indicating which is which. The side effect is there's no way to fetch both legacy and lazer scores in single query (sorted).

@nanaya
Copy link
Contributor

nanaya commented Aug 29, 2022

The reason I mentioned classic scoring mode is if it makes sense displaying legacy and lazer score in same list, where total score for legacy is using data.legacy_total_score and for lazer is using ComputeScore(ScoringMode.Classic), then storing by those will allow sorting in most cases (since the order shouldn't change between Classic and Standardised as far as I can tell and according to the comment in the function).

The only missing case would be showing legacy score in standardised unless it can just be the inverse of ScoringMode.Classic (something along the line Math.Sqrt(legacy_total_score / ClassicScoreMultiplier) / Math.Max(1, totalBasicHitObjects) * max_score).

@smoogipoo
Copy link
Contributor Author

smoogipoo commented Aug 30, 2022

Yes, ordering scores by classic score or standardised score doesn't change ordering between those two scoring modes. But classic score can't be compared to legacy score since they're not necessarily even on the same magnitude scale. For example mania scores exceed 1M score in "classic" mode, whereas they can't in legacy.

So the options would be to either display everything in standardised/classic scoring mode with the new ordering, or display just legacy scores with the legacy total score (and not display new scores). Eventually everything will be displayed with the standardised/classic scoring mode, so legacy scores are converted and is exactly what the code in this PR does.

The only reason for legacy_total_score to exist is to make sure we don't lose data, because we can't recover it.

So perhaps having two leaderboards, one which is "legacy" and displays just legacy scores, and an "osu!lazer"/"next" leaderboard that displays everything. Is that not possible?/Is the IsLegacy flag enough for that?

@smoogipoo
Copy link
Contributor Author

Also please provide input on how leaderboards should be displayed @peppy

@peppy
Copy link
Member

peppy commented Aug 30, 2022

;Primarily, we are looking to migrate existing scores to the new table format. We are not changing score display.

To make this work short-term, the elastic schema needs a valid index that will allow displaying

  • only lazer scores ordered by lazer total_score (api return to lazer client)
  • only stable score ordered by stable total_score aka legacy_total_score (osu-web and web-10 displays)

As mentioned by @nanaya, two different sort indices cannot exist, so the proposal is to have one total_score field in ES with legacy_total_score (if legacy score) or total_score (if lazer) alongside a new bool which denotes whether a score is legacy or not (also added to the index to allow the dual lookups).


The above is only for ES. legacy_total_score needs to exist in the json regardless (for preservation sake).

@smoogipoo
Copy link
Contributor Author

Alright, sounds fine.

@nanaya
Copy link
Contributor

nanaya commented Aug 30, 2022

@notbakaneko please check if I'm missing something or if there's better way to handle this

@notbakaneko
Copy link
Contributor

notbakaneko commented Aug 30, 2022

So, how many types of ordering are there going to be?

  • classic/standard
  • legacy?

legacy_total_score ordering may not be the same as total_score and some scores might not have a legacy_total_score, but it needs to be sortable on either if both are available, is that correct?

If both are there, then ordering the index by ['is_legacy', 'total_score', 'id'] wouldn't be as useful as it seems as both legacy and standard scores would have the same _id which mean only one of them can exist, unless the _id field updated to include the legacy status or be auto-generated.
Currently, it looks like the preferable option is the have _id include whether a score is legacy or not and store both in same index as separate entries; they could also be stored in a separate index, but given how their schema would likely be the same and when they'd be indexed, it's probably more convenient to manage if they're in the same index.

Which score is being calculated by populateTotalScore? Is anything going to update the DB with this value?

Re: auto-generating, means they can't be accessed directly by /_doc/1, though, and some fields would need to be stored in _source and it makes updating existing indices more complicated for batch processes as everything would have to include a potential lookup and update_by_query to elasticsearch

@notbakaneko
Copy link
Contributor

It may also be worth considering enabling _source for some fields so that web can use those values directly; these were originally disabled because of the additional IO load from reading and writing those segments.

@peppy
Copy link
Member

peppy commented Aug 30, 2022

What is the _id currently consisting of that would cause clashes? I'm not sure I follow.

@nanaya
Copy link
Contributor

nanaya commented Aug 30, 2022

id is solo_scores id which includes both legacy and lazer scores so there's no problem there. And legacy score will only have legacy score in the total score, same with lazer score.

@smoogipoo
Copy link
Contributor Author

smoogipoo commented Aug 30, 2022

I think the problem is that there's two conflicting ideas. Going by peppy's idea:

To make this work short-term, the elastic schema needs a valid index that will allow displaying

* _only_ lazer scores ordered by lazer `total_score` (api return to lazer client)

* _only_ stable score ordered by stable `total_score` aka `legacy_total_score` (`osu-web` and `web-10` displays)

Then the above are non-issues, correct? Because ID is unique to the scoring method of that ID - legacy or non-legacy. I.e. we'll never display osu!lazer+osu!stable scores on the same leaderboard for now.

@notbakaneko
Copy link
Contributor

we'll never display osu!lazer+osu!stable scores on the same leaderboard for now.

Whether they're displayed together or not isn't relevant - is a single SoloScore going to have more than one sort order? i.e. does this case exist?

id total_score legacy_total_score
1 100 101
2 99 100
3 98 102

If it does then they should be stored as separate documents in the index

@nanaya
Copy link
Contributor

nanaya commented Aug 30, 2022

we'll never display osu!lazer+osu!stable scores on the same leaderboard for now.

Whether they're displayed together or not isn't relevant - is a single SoloScore going to have more than one sort order? i.e. does this case exist?

The display is the sort. SoloScore (legacy) will only have legacy_total_score sort and SoloScore (lazer) will only have total_score (computed, now also stored) sort.

@notbakaneko
Copy link
Contributor

What is the _id currently consisting of that would cause clashes? I'm not sure I follow.

_id is the document's identifier in elasticsearch and has to be unique, it's currently set to use the score's id but they're not the same thing. If stable and lazer scores will have a different order when sorted, then the scores should be stored as separate documents, meaning the document's _id needs to be different from the score's id.

@nanaya
Copy link
Contributor

nanaya commented Aug 30, 2022

If stable and lazer scores will have a different order when sorted, then the scores should be stored as separate documents, meaning the document's _id needs to be different from the score's id.

They're already two different scores. Lazer scores come from lazer client (and only have lazer total score), stable scores come from stable client (and only have legacy total score).

@notbakaneko
Copy link
Contributor

Then ['is_legacy', 'total_score', 'id'] should be fine

@smoogipoo
Copy link
Contributor Author

Going to close this PR for now then, reimplementation will differ a bit.

@smoogipoo smoogipoo closed this Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants