Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudpickle non-deterministic dump when file is innocuously modified #385

Open
richardwu opened this issue Jun 22, 2020 · 3 comments
Open

Comments

@richardwu
Copy link

richardwu commented Jun 22, 2020

Cloudpickle seems to produce non-deterministic dumps when the file's formatting is "innocuously" modified (e.g., formatting changes outside of pickled object's definition) whereas dill and pickle would produce deterministic dumps.

For example, inserting a blank line anywhere after where the pickled function foo is defined will initially produce a different hash, then subsequently produce the same hash upon successive runs:

import cloudpickle
import dill
import pickle

def foo():
    pass

def get_cpickle():
    return cloudpickle.dumps(foo)

def get_dill():
    return dill.dumps(foo)

def get_pickle():
    return pickle.dumps(foo)

if __name__ == '__main__':
    print('Cpickle:', hash(get_cpickle()))
    print('Dill:', hash(get_dill()))
    print('Pickle:', hash(get_pickle()))

Command:

PYTHONHASHSEED=1 python bad_pickle.py

First run:

Cpickle: -185195056977094428
Dill: 1827482599472099751
Pickle: -2221802750934099445

Second run:

Cpickle: 5072829361071368526
Dill: 1827482599472099751
Pickle: -2221802750934099445

Blank line inserted after print('Cpickle:', ...) (third run):

Cpickle: -185195056977094428
Dill: 1827482599472099751
Pickle: -2221802750934099445

Fourth run:

Cpickle: 5072829361071368526
Dill: 1827482599472099751
Pickle: -2221802750934099445

This was tested on the following versions:

Cpickle version: 1.2.2
Dill version: 0.2.7.1
Python version: 3.6.10 (default, Jan  1 2020, 00:00:00)

This seems like perhaps Cloudpickle is also hashing some eventually cached version of the source file (e.g., .pyc).

This is also somewhat related to #120 .

@richardwu richardwu changed the title Cloudpickle non-deterministic dump when file is modified Cloudpickle non-deterministic dump when file is innocuously modified Jun 22, 2020
@pierreglaser
Copy link
Member

Thanks for your report. What you're reporting seems plausible, but I cannot reproduce on my machine (Ubuntu 18, Python 3.6, same cloudpickle version)...

@ogrisel
Copy link
Contributor

ogrisel commented Jul 20, 2020

For me, inserting a blank line changes the hash both for cloudpickle and dill. But the hash stays consistent across runs without code change.

@ogrisel
Copy link
Contributor

ogrisel commented Jul 20, 2020

Anyway, having deterministic pickles is probably out of scope for cloudpickle so I would be in favor of closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants