add recalculate_replay_hash parameter to pack functions #33

bemxio · 2022-03-16T21:48:16Z

referencing issue #32
everything should work well, feel free to review stuff
just added an if check into pack_replay_data, and if it's True, it will use hashlib to generate a hash of the current compressed replay events

tybug · 2022-03-18T19:07:14Z

hmm, after seeing this I think it's actually better to just expose two methods: Replay.calculate_replay_hash(replay_data: List[ReplayEvent]) and Replay#recalculate_replay_hash. The former returns the replay hash of the given replay events, and the latter sets the replay's replay hash based on Replay.calculate_replay_hash.

This way a consumer can calculate a replay's hash ahead of time, without needing to write it to disk. If a consumer wants a replay's hash to match, they should either call replay.recalculate_replay_hash() or replay.replay_hash = Replay.calculate_replay_hash(replay.replay_data) before calling replay.write_file.

Do you mind making these changes?

bemxio · 2022-03-18T19:28:41Z

sure, i will try to make it today
why do you like the method idea though? just asking out of curiosity

tybug · 2022-03-18T20:17:19Z

Because a reasonable thing for a user to want to do is calculate the new hash of a modified replay, but not necessarily save it to a new file. Maybe the user wants to save the new replay hash to a database or use it in some other way. Without this new method, the only way to calculate the new hash of a modified replay is to write it to a file (and if you then wanted to retrieve that new replay hash, you'd have to re-parse the saved replay).

As for why I prefer caling replay.relcaulcate_replay_hash() instead of a parameter to replay.write_file, it's a similar reason to the above: a user may want to update the replay hash of a replay and then go on to use that replay in other ways, without ever writing it to a file.

bemxio · 2022-03-20T14:30:49Z

alright, made the commit, everything seems to work nicely :)
feel free to change some details or suggest changes

tybug · 2022-03-20T16:07:55Z

could you add a test for this? probably just asserting that Replay.calculate_replay_hash matches the existing replay hash of the test replay.

bemxio · 2022-03-20T16:36:49Z

alright, sure
i will try to do it

bemxio · 2022-03-20T17:21:33Z

okay, i've added a test_replay_hash function, should work alright
i had to change the parameter from replay data to just the replay object, because of the RNG seed.
i think it's better to move the function out of a Replay class, since we are giving the replay as a parameter anyways

tybug · 2022-03-20T17:27:12Z

tests/test_replay.py

@@ -74,6 +74,10 @@ def test_replay_id(self):
        # we can parse it properly instead of erroring
        self.assertEqual(self._old_replayid_replay.replay_id, 1127598189)

+    def test_replay_hash(self):
+        for replay in self._replays:
+            self.assertEqual(Replay.calculate_replay_hash(replay), "b06ecaf5fc301545b8b23769b2a20451", "Replay hash is wrong")


why not just compare to replay.replay_hash instead of hardcoding b06ecaf5fc301545b8b23769b2a20451

with our settings of compressing replay events to LZMA, the replay hash in an original replay is wrong
unless we want to change it in the file, it needs to be hardcoded

Isn't the whole point of this change to be able to match the hash produced by osu? We should make the changes necessary to make that happen, or if it's not feasible, then I don't think we should be including the ability to calculate replay hashes at all.

In other words, the following needs to hold for all replays for this PR to have any value:

Replay.calculate_replay_hash(replay) == replay.replay_hash

Isn't the whole point of this change to be able to match the hash produced by osu?

the original point was to generate a hash for osu! (or any other clients for osu) based on replay data, in case it checks for the hash itself

even though the current stable version doesn't care, i am not sure about osu!lazer and other clients that support replays

But we don't want to just generate any hash, we want to generate the exact same hash that osu! is generating. I'm not even sure if this is possible since it's not documented how the osr hash is generated. If it's not possible for us to match osu's hash, then we shouldn't even try to calculate our own hash - we don't get to decide for ourselves what the replay hash should be. That's up to osu to specify and for us to follow.

But we don't want to just generate any hash, we want to generate the exact same hash that osu! is generating.

ah okay, i understand you
in that case, we would need to compress replay events into LZMA with the same settings as osu! does, since that's what it generates a hash from later on
i am not sure if it's possible to compress it in the same way again, considering we don't know exact settings used in osu! used for compressing

however, still, the point was to generate a hash, in case any client generates it for themselves based on compressed data in our packed replay and compares it to the one in the replay

however, still, the point was to generate a hash, in case any client generates it for themselves based on compressed data in our packed replay and compares it to the one in the replay

that would be good, but how do we know that an md5 hash of the replay data is going to return the hash we want? We need some way to check that we're using the correct hash function. Either by finding documentation that says osu uses the md5 of the replay data, or by verifying ourselves that

Replay.calculate_replay_hash(replay) == replay.replay_hash

for all replays, as above. We can't just guess that they're md5 hashing the replay data and hope we get lucky (unless you have a source that says that's what they're doing; it might be).

Maybe osu takes the timestamp of the replay into account for the hash, for instance, and not just the replay data. That's going to be the hard part of this pull request - figuring out exactly how to generate the replay hash (if it's possible at all). As you said, it might be as simple as changing our lzma compression algorithms, but I just don't know yet.

ah right, sorry, i forgot about the (includes certain properties of the replay) mention on osu! wiki.
well, we can't really do that much other than find information about it, if someone documented it, look into the source code of other replay writers, if some of them perhaps alter the hash in a correct way, or brute force our way and test what could potentially be included in data that is hashed

i will try to do some tests about it in my free time, i think that a score ID might be included there, since it's kinda an online identifier of the replay, but that's just speculation

To be honest I think it'll be very hard to figure out what hash algorithm uses, especially because it might be intentionally obfuscated for anticheat purposes. Good luck though, and I'll be interested to hear if you do figure it out.

osrparse/replay.py

tybug · 2022-03-20T17:30:36Z

i had to change the parameter from replay data to just the replay object, because of the RNG seed.

I actually think what we had before is expected behavior. I think Replay.calculate_replay_hash(replay_data: List[ReplayEvent]) should calculate the hash of the given replay data as before, regardless of whether the rng seed is present or not. Then the replay's replay.recalculate_replay_hash method should add a frame for the replay_seed at the end before calling Replay.calculate_replay_hash, so the replay hash matches.

bemxio · 2022-03-20T17:41:10Z

regardless of whether the rng seed is present or not.

Replay.calculate_replay_hash reuses _Packer.pack_replay_data and then calculates the hash from the packer's compressed data.
i can make it generate replay data by itself, without an RNG seed check etc.

add recalculate_replay_hash parameter to pack functions

0b21a5a

add functions to calculate hash instead of parameter

c204c5c

bemxio added 2 commits March 20, 2022 18:16

add test case for replay hash

ea1d51f

fix recalculate_replay_hash

d45efd6

tybug reviewed Mar 20, 2022

View reviewed changes

osrparse/replay.py Show resolved Hide resolved

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add recalculate_replay_hash parameter to pack functions #33

add recalculate_replay_hash parameter to pack functions #33

bemxio commented Mar 16, 2022

tybug commented Mar 18, 2022 •

edited

Loading

bemxio commented Mar 18, 2022

tybug commented Mar 18, 2022 •

edited

Loading

bemxio commented Mar 20, 2022

tybug commented Mar 20, 2022

bemxio commented Mar 20, 2022

bemxio commented Mar 20, 2022

tybug Mar 20, 2022

bemxio Mar 20, 2022

tybug Mar 20, 2022

tybug Mar 20, 2022

bemxio Mar 20, 2022

tybug Mar 20, 2022

bemxio Mar 20, 2022

tybug Mar 20, 2022 •

edited

Loading

bemxio Mar 20, 2022

tybug Mar 21, 2022

tybug commented Mar 20, 2022 •

edited

Loading

bemxio commented Mar 20, 2022 •

edited

Loading

add recalculate_replay_hash parameter to pack functions #33

add recalculate_replay_hash parameter to pack functions #33

Conversation

bemxio commented Mar 16, 2022

tybug commented Mar 18, 2022 • edited Loading

bemxio commented Mar 18, 2022

tybug commented Mar 18, 2022 • edited Loading

bemxio commented Mar 20, 2022

tybug commented Mar 20, 2022

bemxio commented Mar 20, 2022

bemxio commented Mar 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tybug Mar 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tybug commented Mar 20, 2022 • edited Loading

bemxio commented Mar 20, 2022 • edited Loading

tybug commented Mar 18, 2022 •

edited

Loading

tybug commented Mar 18, 2022 •

edited

Loading

tybug Mar 20, 2022 •

edited

Loading

tybug commented Mar 20, 2022 •

edited

Loading

bemxio commented Mar 20, 2022 •

edited

Loading