Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-29708: support SOURCE_DATE_EPOCH env var in py_compile (allow for reproducible builds of python packages) #296

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Doc/library/py_compile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@ byte-code cache files in the directory containing the source code.
enum and controls how the generated ``.pyc`` files are invalidated at
runtime.

If the :envvar:`SOURCE_DATE_EPOCH` environment variable is set, the timestamp
entry in the ``.pyc`` file header will be set to this value.
See https://reproducible-builds.org/specs/source-date-epoch/ for more info.

.. versionchanged:: 3.2
Changed default value of *cfile* to be :PEP:`3147`-compliant. Previous
default was *file* + ``'c'`` (``'o'`` if optimization was enabled).
Expand All @@ -71,6 +75,7 @@ byte-code cache files in the directory containing the source code.

.. versionchanged:: 3.7
The *invalidation_mode* parameter was added as specified in :pep:`552`.
Support for :envvar:`SOURCE_DATE_EPOCH` was also added.


.. class:: PycInvalidationMode
Expand Down
8 changes: 8 additions & 0 deletions Lib/py_compile.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,14 @@ def compile(file, cfile=None, dfile=None, doraise=False, optimize=-1,
pass
if invalidation_mode == PycInvalidationMode.TIMESTAMP:
source_stats = loader.path_stats(file)
source_date_epoch = os.environ.get('SOURCE_DATE_EPOCH')
if source_date_epoch:
try:
source_date_epoch = int(source_date_epoch)
except ValueError:
raise ValueError("SOURCE_DATE_EPOCH is not a valid integer")
if source_stats['mtime'] > source_date_epoch:
source_stats['mtime'] = source_date_epoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we making judgment calls about which one is greater? SOURCE_DATE_EPOCH means I decided I know better than you what the time is, that is the entire point. For example, Arch Linux stores all files in the package archive using the modification time of SOURCE_DATE_EPOCH.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH rpm and tar's --clamp-mtime option implement mtime clamping, so that old files keep their timestamps and just timestamps of files created during a build are modified to be SOURCE_DATE_EPOCH
can be tricky to get this right for both cases...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case, the mtime clamp will set the pyc modification time to SOURCE_DATE_EPOCH anyway.

It would result in general files being clamped to a "latest" mtime of SOURCE_DATE_EPOCH and the pyc header timestamps being exactly equal to SOURCE_DATE_EPOCH, so everyone wins.

Copy link
Contributor Author

@bmwiedemann bmwiedemann Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But python does an exact match when checking if it can use pyc files, so old .py files having .pyc files with 'too new' SOURCE_DATE_EPOCH in header would also result in .pyc files not to be used.

I think we might need something like

if source_stats['mtime'] > source_date_epoch or os.environ.get('ALWAYS_SET_MTIME'):
    source_stats['mtime'] = source_date_epoch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think worrying about whether .pyc files will be regenerated based on the static date set in them is getting beyond the scope of reproducible builds. If this is going to have a chance to land in Python 3.7 then let's keep this PR just to the production of .pyc files and not their consumption.

bytecode = importlib._bootstrap_external._code_to_timestamp_pyc(
code, source_stats['mtime'], source_stats['size'])
else:
Expand Down
12 changes: 12 additions & 0 deletions Lib/test/test_py_compile.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,18 @@ def test_bad_coding(self):
self.assertFalse(os.path.exists(
importlib.util.cache_from_source(bad_coding)))

def test_source_date_epoch(self):
testtime = 123456789
with support.EnvironmentVarGuard() as env:
env["SOURCE_DATE_EPOCH"] = str(testtime)
py_compile.compile(self.source_path, self.pyc_path)
self.assertTrue(os.path.exists(self.pyc_path))
self.assertFalse(os.path.exists(self.cache_path))
with open(self.pyc_path, "rb") as f:
f.read(4) # Skip the magic number.
timebytes = f.read(4) # Read timestamp.
self.assertEqual(testtime, int.from_bytes(timebytes, 'little'))

@unittest.skipIf(sys.flags.optimize > 0, 'test does not work with -O')
def test_double_dot_no_clobber(self):
# http://bugs.python.org/issue22966
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
If the SOURCE_DATE_EPOCH environment variable is set,
py_compile will use it to override the timestamps it puts into .pyc files.