Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native support for zipping the swig generated python files #1144

Closed
ddj116 opened this issue May 14, 2021 · 2 comments · Fixed by #1145
Closed

Native support for zipping the swig generated python files #1144

ddj116 opened this issue May 14, 2021 · 2 comments · Fixed by #1145

Comments

@ddj116
Copy link
Contributor

ddj116 commented May 14, 2021

Motivation

During simulation startup, the python input processor imports a lot of files that are not in user space but are generated by swig under the SIM_*/trick/ directory. For small sims, these are just a handful of files, but for large sims, like in the ramtares/ workflow for example, these are thousands of files. Each time a single sim instance starts, the sim must interact with the file system to open all of these files before it even gets to the first executable line of RUN_*/input.py.

On local filesystems, this process happens quite quickly, but Monte-carlo workflows typically necessitate runs be executed in a workspace on a network file system, since runs are not all run on the same machine and copying sims around is cumbersome. This usually means dozens, hundreds, and sometimes thousands of runs all start up near the same wall-clock time, and when each sim must read thousands of files from the network file system, this means potentially a million file open operations hitting the same network all at once.

We have seen extreme network filesystem slowness in AGDL and now FSL consistently over the years, and a recent evaluation of the NFS metrics combine with strace on the sim leads us to believe it is these massive amount of file open()s on the network FS that are contributing to the network slowness.

Reducing the file I/O by zipping SIM_*/trick/

It has recently come to my attention that python understands how to read modules from a .zip file natively, and I recently prototyped this approach to replace all of our 3000 file reads with a single read of trick.zip. Basically this is how it works:

# cd path/to/SIM, build the sim, then...
python -m compileall trick/             # Build all the .pyc files
zip  trick.zip trick/*.{py,pyc}         # zip 'em up
rm -rf trick/                           # Remove the original directory
# Now run, using the zip file
PYTHONPATH="trick.zip" ./S_main... RUN.../input.py

Confirmed via export PYTHONPATH="trick.zip"; strace -f -e trace=%file ./S_main... RUN.../input.py, all individual file reads from trick/ are replaced with the single read of trick.zip.

Complications associated with Trickified libraries -- needing to use PYTHONPATH

The zip by itself isn't enough, you have to adjust sys.path so that the python interpreter can import the modules under the zipped directory. Although TRICK_PYTHON_PATH can typically be adjusted in S_overrides.mk to add the zipped area, that approach does not get the entry into sys.path early enough for Trick to find it in the case where a Trickified library's python files import trick modules like import sim_services. In this scenario, the sim will die on the first input processor line that tries to reference a sim object.

@dbankieris was able to help figure out why this happens, here's his explanation copied from our discussion:

Here's what's going on. IPPython.cpp adds the current working directory (SIM_*) to sys.path, then trick/pymods, then everything in TRICK_PYTHON_PATH. It then imports trick, which is found under SIM_*. SIM_*/trick/__init__.py adds SIM_*/trick to sys.path, which allows imports like import sim_services. When we replace trick with trick.zip, the __init__.py within still adds SIM_*/trick to sys.path, whereas it now needs to add SIM_*/trick.zip/trick instead. If you set PYTHONPATH = SIM_*/trick.zip:SIM_*/trick.zip/trick, then everything works without needing to modify any import statements or Trick itself. As a separate issue, the reason you have to use PYTHONPATH and not TRICK_PYTHON_PATH is because trick/pymods is added before the contents of TRICK_PYTHON_PATH, and there is a trick module in both trick/pymods and trick.jar, which prevents the zipped __init__.py from running, since the trick module is found in pymods first, whereas it would usually be found in SIM_* first. I think the long-term solution is for Trick to produce trick.zip and add trick.zip/trick to sys.path.

The long term solution

In ramtares we are rolling our own solution by modifying our environment PYTHONPATH and S_overrides.mk to manage the creation of trick.zip. But it would be nice if this mechanism was native to Trick. Here I'd like to discuss:

  • Is moving from SIM*/trick/ --> SIM*/trick.zip a change that should apply globally? Should it be an option to Trick? Are there any downsides to ditching the original approach?
  • Can/should this be applied to the framework that Trickifies libraries as well?

FYI @jmpenn @spfennell @alexlin0 @dbankieris

@alexlin0
Copy link
Contributor

I think it's a good idea. One thing to figure out is when a header file changes, and then the resulting python file, how do we compare a file with the file in the zip. I'm sure the team will want to find the minimal work required to work with the zip file.

Anything you are adding to PYTHONPATH I'm ok with IPPython.cpp doing that so it's automatic.

@dbankieris
Copy link
Contributor

I'm working an MR for this right now!

dbankieris added a commit that referenced this issue May 19, 2021
dbankieris added a commit that referenced this issue May 20, 2021
Hide the non-zipped Python modules to indicate to users that changing
them will have no effect on the sim.

Refs #1144
astrophysics referenced this issue in astrophysics/trick Nov 24, 2022
astrophysics referenced this issue in astrophysics/trick Nov 24, 2022
Hide the non-zipped Python modules to indicate to users that changing
them will have no effect on the sim.

Refs #1144
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants