-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group some modules into isolated venvs… #118
Conversation
- for modules with certain sets of dependencies, delegate to another (isolated) virtual env and place a shell script in the top-level venv - update some executable names - remove `fix-pip` - update documentation
This is merely a starting point. One can easily imagine making better/more groups. Or Docker delegation (CLI or REST). Also, maybe there is some way to have less writing effort for each module. ( |
I have successfully tested this inside a Docker container. I first had to re-instate our custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, will test it now.
A non-recursive solution could be based on master...stweil:tf1. It uses symbolic links to run processors in the TF1 venv. |
It's not just about separating TF1. There's many more potential and actual incompatibilities (TF<2.2 vs >=2.2, Keras 2.3.* vs 2.4.*, Torch <1.0 vs <1.5 vs >=1.5, opencv-python vs -headless, Pillow ...). Some of these have merely run under the radar until now. Plus think about going into Docker thin containers (wrapping CLIs or even REST API) as the next step. So merely copying all variables and rules (as ub your proposal) we would quickly run into a mess (esp. since you have to drag along any change to them in all others as well). Plus symlinks are not supported by all currently relevant FS IIRC. |
They are supported, at least on any machine and operating system which I tried so far. We don't have plans to support Windows without WSL, do we? |
Oh, I think I confused HPFS (which doesn't support them but is irrelevant) and HFS+. And yes, I guess at least within WSL even Windows should be fine with symlinks nowadays. |
Indeed, there are. Luckily we only have to handle actual incompatibilities. Are there still problems with opencv-python? I thought all of them are using opencv-python-headless now? |
The main point was: there are too many for your kind of solution The other point was: we don't really know what incompatibilities we already have. They rarely surface as long as no one here uses complex workflows in Docker images and we don't have regression tests for that. Besides, potential can always become actual when a module changes in the future. |
Code is still compiling but so far it looks good :)
IIUC there is a potential project partner for phase 3 that intends to roll out OCR-D on a pure Windows (Server) environment. We'll continue to target Ubuntu 18.04 as the main platform and when in doubt, we'll stick to POSIX. However, symlinks seem like a less maintainable approach and unless there's an advantage to it over the more generalized multi-venv solution here, I'd avoid them. |
I noticed that, too, but think that's not feasible, so IMO even trying it is a waste of time. With Docker or a virtual Linux machine there exist solutions which are known to work. And with newer Windows WSL is also a good solution. |
Both @bertsky's and my solution use multi-venv. Running a |
One known existing problem is that some prebuild Tensorflow modules don't exist for every Python version. This is not a problem as long as we stick to Ubuntu 18.04, but for newer Debian / Ubuntu it could help to support different Python versions for the different venvs. |
Yes, both solutions are multi-env but I like that @bertsky's approach is more generic, not just for the TF1/TF2 issue but possible other dependency incompatibilities. The venv delegation mechanism is the same by and large. Wrapper scripts over symlinks have the advantage that they are explicitly in the script whereas symlinks contain that information implicitly. If the OS or the filesystem - think a removable FAT or NTFS drive - don't support them, this needlessly breaks the deployment IMHO.
I agree with you that Linux, Docker or even WSL are the best option. And we will continue to target Ubuntu 18.04. But there are Windows Server environments out there where Linux in a VM or Docker are not viable for whatever reason. We're not actively supporting that at the moment but if somebody made the effort, I would not want to prevent it with platform-dependent design choices.
Fully agree, it is unfortunate that we cannot support Python 3.8 at the moment. I have been thinking about alleviating this by setting up a PPA or other Debian repo and generating |
Both approaches are identically generic. I only have implemented TF1 as an example, but of course any number of other venvs can be added in the same way, and it would also be possible to use hard files instead of symbolic links when needed, so that's not the essential difference. The difference is how the Makefile looks like and whether it requires recursive make or not. |
As we all agree that there will be more venvs needed than two, I think it is important to have also a common understanding of the contents of those different venvs. It should be clearly defined what goes into the primary venv and what is the difference for the sub-venvs. The PR currently uses I'd try to put most modules into the primary venv. In my understanding that includes modules without peculiarities and modules which follow best / recommended practice (using current software versions => TF2, using OpenCV headless, ...). So I would not make a primary venv without any TF software and put all TF software in different sub-venvs, and similar with OpenCV headless or not (where we currently don't have a problem as far as I know). |
I think we all agree this is a non-argument (for a few bytes of shell script). We also (more or less) agreed that FS should be able to cope with symlinks.
I argued precisely against that above. @stweil's solution needs to mirror many variables and recipes for each additional sub-venv:
IMO this not only bloats the Makefile, it also makes it less readable and more error-prone.
They are not IMO: the non-recursive approach cannot yet deal with non-Python (or not purely Python) install targets. It's not trivial to wrap recipes like
I have explained that above already. The current choices are merely illustrations to get the new mechanism going and fix the most pressing inconsistencies. Headless stands for
Like I said above, I believe it's better to group all GPU-enabled modules into sub-venvs for now. Not only because they usually have mutually exclusive sets of dependencies, but because we still have no mechanism to make GPU work in Docker yet, so grouping them might facilitate future solutions. |
I've tested and unfortunately, I cannot get it working. After
fails to run. Any ideas what to look for to fix this? |
- ensure the target-specific `VIRTUAL_ENV` for modules with clashing dependencies gets exported to the sub-make - call sub-make with `--always-make` to ensure the nested recipe gets run as well when the outer recipe was needed - call sub-make with `--assume-old` on module directory to avoid updating the submodule twice
Probably need to differentiate between 2.1 (for ocrd_pc_segmentation, maybe others) and 2.2 (ocrd_anybaseocr?). Using |
Fine by me, if that's what it takes. I can only tell for sure once the new run finishes. |
Sorry for the noise but here's the newest hickup:
Apparently not all modules have been checked out:
Running |
That's strange. The |
I recommend against vfat. For fresh start, I always do: git submodule deinit --all -f
git submodule status | while read stat dir ver; do rmdir $dir; done
rm -fr venv |
This was a clean checkout, so no Makefile.local or variable overrides. Cannot say what
That won't work anyway (I tried and failed before).
Thx. I'll try with a fresh checkout but this is useful documentation for debugging. Perhaps add to the contributor guide? |
That would take substantially more space than just
I could introduce it as |
BTW your problems could also be explained by that alone. As soon as |
Damn! |
TF's handling of GH issues is a joke: Oh, this issue hasn't been solved in 7 days! Makes us look bad. Better close the issue automatically. Users seriously wounded will come back to our knife anyway. |
The space ran out after this, there was at least enough space left for Cannot reproduce now, though, fortunately:
Maybe a network problem.
|
Looks like this has since been removed upstream – 4 days ago. How long do they take to update their releases on PyPI? Who knows what other failures are lurking behind this one. Perhaps |
And what if we made that |
:( No space issues this time but again, something went wrong with anybaseocr and typegroups classifier. In fact, only these tools have been installed:
Wrapper scripts for Off to another fresh start, again. |
Are you sure you didn't just fall into the above |
(due to failing TF dependencies, possibly others – this was too ambitious or too early) This reverts commit a6e7e4e.
Yes, there was no error that broke |
|
thanks for the log. This is proof to me that you did fall into the
Anyway, since I have reverted the |
No, it was plain |
delegate to another (isolated) virtual env and
place a shell script in the top-level venv
fix-pip