Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-45020: Freeze the modules imported during startup. #28107

Closed

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Aug 31, 2021

[I'm going to split this PR up.]

Doing this provides some significant performance gains for runtime startup. This change also adds a command-line flag (-X frozen_modules=[on|off]) that allows users to opt out of (or into) using frozen modules. When run in a development environment it defaults to opting out, to avoid contributors wasting time trying to figure out why their changes aren't getting used.

Note that I plan on updating the tests for the frozen modules so they run against the frozen and unfrozen modules, like we do with importlib. We'll also be looking at making frozen modules more like source modules in a separate PR.

There are only a handful of things left to do here before this could be merged:

  • get the "check_generated_files" job passing
  • deal with frozen modules on the WIndows builds
  • sort out PyConfig.stdlib_dir on Windows
  • ensure default "-X frozen_modules=on" for PGO builds
  • problem in test_4_daemon_threads in test_threading (timing out with "-X frozen_modules=on")
  • drop Python/frozen_modules/MANIFEST?

https://bugs.python.org/issue45020

@gvanrossum
Copy link
Member

Another reason not to check the generated .h files into git is that when the marshal format changes in any way, each of these files is included in the diff as changed, which makes the diff look overwhelming (this one, for example, has 151 files!).

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits; the most disturbing part is that it doesn't seem to work on Windows at all (it always believes it's installed).

@@ -480,6 +480,14 @@ Miscellaneous options
objects and pyc files are desired as well as supressing the extra visual
location indicators when the interpreter displays tracebacks. See also
:envvar:`PYTHONNODEBUGRANGES`.
* ``-X frozen_modules`` determines whether or not frozen modules are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little vague on the exact syntax to use on the command line. Do I write

python -X frozen_modules=on

and

python -X frozen_modules=off

???

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this.

int user_site_directory;
wchar_t *check_hash_pycs_mode;
bool use_frozen_modules;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unrelated refactoring snuck in? I had expected just one line adding int use_frozen_modules;

Same for the following few files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed.

Makefile.pre.in Outdated
@@ -765,6 +765,671 @@ Python/frozen_modules/zipimport.h: $(srcdir)/Programs/_freeze_module $(srcdir)/L
$(srcdir)/Lib/zipimport.py \
$(srcdir)/Python/frozen_modules/zipimport.h

Python/frozen_modules/abc.h: $(srcdir)/Programs/_freeze_module $(srcdir)/Lib/abc.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you understand when to write $(srcdir)/ before a path and when not? It's definitely needed in shell commands, but it seems it's not used for dependencies. For example I see:

Modules/_math.o: Modules/_math.c Modules/_math.h

and none of the variables defining lists of .o files use it (e.g. MODULE_OBJS).

But usually when .h files are mentioned they do add $(srcdir), e.g. PEGEN_HEADERS or PYTHON_HEADERS.

Is this just not used any more, or does "make" only need it in certain cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clean this up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this.

FWIW, I'm not sure what "make" requires so I'll follow the existing patterns. There are cases where we do use $(srcdir) with dependencies but only where the rule is also defined with $(srcdir) or there is no corresponding rule. Neither applies so I'll drop the $srcdir)/ prefix.

}
}
/* We fall back to assuming this is an installed Python. */
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using some printf() calls it looks like on Windows stdlib is NULL when run out of the development tree and we always return true here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the Windows builds aren't right yet.

@ericsnowcurrently ericsnowcurrently marked this pull request as ready for review September 11, 2021 00:29
@@ -137,8 +137,11 @@ def test_good_module_name(self):
self.assertTrue(dialog.entry_ok().endswith('__init__.py'))
self.assertEqual(dialog.entry_error['text'], '')
dialog = self.Dummy_ModuleName('os.path')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path is an arbitrary dotted module name. I would rather switch.
Change is in two pieces because github would not do it in one.

Suggested change
dialog = self.Dummy_ModuleName('os.path')
dialog = self.Dummy_ModuleName('idlelib.idle')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this change. Thanks!

Comment on lines 140 to 144
with import_helper.CleanImport('os', 'os.path'):
import os.path
filename = dialog.entry_ok()
self.assertEqual(dialog.entry_error['text'], '')
self.assertTrue(filename.endswith('path.py'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with import_helper.CleanImport('os', 'os.path'):
import os.path
filename = dialog.entry_ok()
self.assertEqual(dialog.entry_error['text'], '')
self.assertTrue(filename.endswith('path.py'))
self.assertTrue(dialog.entry_ok().endswith('idle.py'))
self.assertEqual(dialog.entry_error['text'], '')

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to the latest commit, except I was daunted by the many changes in marshal.c and I will review those later.

Comment on lines +75 to +76
with import_helper.frozen_modules(), captured_stdout():
spec = importlib.util.find_spec(modname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What stdout content is being captured?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frozen test module (hello.py) prints out Hello world!.

@@ -185,7 +185,7 @@ class CleanImport(object):
importlib.import_module("foo") # new reference
"""

def __init__(self, *module_names):
def __init__(self, *module_names, usefrozen=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the class docstring to mention that it also disabled frozen modules unless usefrozen=True is passed?

Comment on lines +168 to +189
// To avoid unnecessary duplication during marshaling, all "complex"
// objects get stored in a lookup table that allows them to be
// referenced by later entries using an integer ID. However, this
// optimization is not applied if the refcount is exactly 1.
//
// The problem is that any objects cached during runtime
// initialization will get re-used in the code object during
// compilation. This means some objects will have a refcount > 1,
// even though there is only one instance within the code object.
// This is problematic it this happens inconsistently, such as only
// in non-debug builds (e.g. init_filters() in Python/_warnings.c);
// the generated marshal data we are freezing will be different even
// though the Python code hasn't changed.
//
// We address this by giving marshal the set of objects with
// refcount > 1 that actually only get used once in the code object.
// This mostly consists of interned strings and small integers.
PyObject *refs_before = get_ref_counts();
if (refs_before == NULL) {
return NULL;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fix this problem in marshal itself? Then everyone (notably all regular, non-frozen PYC files) would benefit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened https://bugs.python.org/issue45186 to cover this separately.

@ericsnowcurrently
Copy link
Member Author

Reviewed up to the latest commit, except I was daunted by the many changes in marshal.c and I will review those later.

Yeah, sorry about that. I can roll things back if you think it's too much. FWIW, it might be easier to review the 3 marshal-related changes separately:

The first one re-arranges code to make the second and third commits cleaner. None of the changes in marshal.c should have any effect on the behavior of the marshal module and only change the behavior of _freeze_module.c.

@gvanrossum
Copy link
Member

I would much rather review separate PRs than separate commits on a single PR. Review and iteration can be much more focused that way.

@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented Sep 13, 2021

Regarding the "address sanitizer" CI failure, this is similar to the earlier failure with the macOS job. When running with frozen modules, test_4_daemon_threads in test_threading is timing out (but only on a non-debug build). This implies some correlation between finalization of frozen modules and of daemon threads, leading to a deadlock or infinite loop. I'm going to take a look at the runtime finalization code to track this down.


[UPDATE] Never mind. The test relies on os.__file__ and runs in a loop. So it's not an issue with runtime finalization.

@ericsnowcurrently ericsnowcurrently marked this pull request as draft September 13, 2021 16:41
@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented Sep 13, 2021

@ericsnowcurrently
Copy link
Member Author

Pretty much everything from this PR has been merged through those other PRs.

@ericsnowcurrently ericsnowcurrently deleted the frozen-startup-modules branch September 15, 2021 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants