Skip to content

Commit

Permalink
Add KeyHsher class and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
NikhilBartwal committed Apr 21, 2021
1 parent 2a5eac9 commit 0b808ed
Showing 1 changed file with 33 additions and 0 deletions.
33 changes: 33 additions & 0 deletions src/datasets/keryhash.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,36 @@
# limitations under the License.

# Lint as: python3

"""
Hashing function for dataset keys using `hashlib.md5`
Requirements for the hash function:
- Provides a uniformly distributed hash from random space
- Adequately fast speed
- Working with multiple input types (in this case, `str`, `int` or `bytes`)
- Should be platform independent (generates same hash on different OS and systems)
The hashing function provides a unique 128-bit integer hash of the key provided.
The split name is being used here as the hash salt to avoid having same hashes
in different splits due to same keys
"""

import hashlib
from typing import Union

class KeyHasher(object):
"""KeyHasher class for providing hash using md5"""

def __init__(self, hash_salt: str):
self._split_md5 = hashlib.md5(_as_bytes(hash_salt))

def _as_bytes(self, hash_data: Union[str, int, bytes]) -> bytes:
"""
Returns the input hash_data in its bytes form
Args:
hash_data: the hash salt/key to be converted to bytes
"""

0 comments on commit 0b808ed

Please sign in to comment.