Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore definition line number of functions for caching #1779

Merged

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Jan 25, 2021

As noticed in #1718 , when a function used for processing with map is moved inside its python file, then the change of line number causes the caching mechanism to consider it as a different function. Therefore in this case, it recomputes everything.

This is because we were not ignoring the line number definition for such functions (even though we're doing it for lambda functions).

For example this code currently prints False:

from datasets.fingerprint import Hasher

# define once
def foo(x):
    return x

h = Hasher.hash(foo)

# define a second time elsewhere
def foo(x):
    return x

print(h == Hasher.hash(foo))

I changed this by ignoring the line number for all functions.

@lhoestq lhoestq merged commit 8a03ab7 into master Jan 26, 2021
@lhoestq lhoestq deleted the ignore-definition-line-number-of-functions-for-caching branch January 26, 2021 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant