-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop older Python support, require Python >=3.7 #487
Comments
We use RETURNN also on some Centos 7 / Amazon EC2 Linux machines with python 3.6. It would be nice if support for python 3.6 remained let's say until the end of the year. Regarding the sorted issue: this is not performance critical, so I think it's best to just leave the sorted call there if a deterministic order is required. |
Yea, this whole issue is mostly not about performance at all, but really about cleaning up code. And for that, using Python 3.6 might be problematic for the annotation arguments above (specifically forward declarations). But so maybe we just wait with this code cleaning up effort and dropping earlier Python support until end of year then. So you expect that the Amazon EC2 machines will come with newer Python then? |
Actually the one EC2-Linux machine I checked had python3.7 but that one was ARM based. I'm more concerned about the whole Centos situation. Redhat stopped support for Centos stable releases and now we'll have to decide what distribution we want for our prod environment. For now we stick to Centos 7. Maybe we can start by dropping support for python 2.7? But maybe you prefer to do all changes in one go? |
Does the current style of using rST docstring annotations and type hinting as in https://www.python.org/dev/peps/pep-0484 any restrictions on where a type hinting is not possible? I think it will be quite the effort to change all docstrings and functions, so what is the main reason to do this? Or do you want to do a gradual change, so that only new code uses the new annotation style? |
I think it's easier to do in one go. For only dropping Python 2, there is actually also not too much to be done, this is really minor then. That is why I also hesitate to do just that, because it's really only minor, and having Python 2 support might outweigh the 5 lines of code we safe or so.
Sure, this will be some effort. But I think it's required at some point in time to keep the code style modern. The Python world is slowly moving towards having more and more static typing.
Sure, we have to be pragmatic about this. We would have a rule that all new code must use annotations. And we would slowly change the old code, but these changes would not be a single commit. I think with some automation, we can also get quite far, and then need to check where it is incorrect or broken, which can also be automated (e.g. with help of PyCharm and/or MyPy). |
This comment has been minimized.
This comment has been minimized.
Bevor dropping the support for python2 we should make sure that in the instances where returnn has to talk to rasr (sequence training, Baum-Welch training, feature-extraction) this still works. I.e. finally make rasr compatible with python3 or handle possible incompatibilities of the used pickle protocols. |
I thought it already is compatible with Python 3? What is the state here?
I think there were just problems when we mixed Python 2 and Python 3, although we even resolved these problems with some workarounds. But when we drop Python 2, we can clean this up. But then RASR must support Python 3 as well. Or maybe we could also say that the modules which are imported by RASR (e.g. |
There is a version of RASR that uses python3 instead of python2 and I can port it to the public version if there is demand for that. |
I think this is a must such that we can safely drop Python 2. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
From the points raised in the original post, and also a similar discussion in rwth-i6/returnn_common#43, it looks like Python 3.7 is a good minimum version candidate. This is e.g. required for proper nice type annotations. Note that the current Python stable version is 3.10. Just dropping Python 2 and support earlier Python 3 versions like 3.5 would not really make much sense, and also not really allow us for much cleanup. |
bumping this issue. I find Python Typing essential for a large Python code base. 3.7 would be great. |
About RASR interface, specifically full-sum training, seq training or other exotic things, what is the state, is this now all well tested with Python 3? @Marvin84 @michelwi @SimBe195? About Python 3, which version do we need? @curufinwe? |
I have seen setups ranging from Python 3.5 to Python 3.8, we had recent fixes (recent as in "within the last year") for >3.7 as there were deprecated functions before. Switching to 3.7 will definitely exclude many existing setups that are still in 3.5/3.6, but I think we should not wait too long. Directly going to 3.7 will be a problem for many though I think, but lets wait for @curufinwe reply. |
However, as it was argued before, just dropping Python 2 but keeping Python 3.5 or 3.6 compatibility does not really gain us much. We still cannot use type annotations in a reasonable way. I think this would create just double the effort without much benefit. I think if we drop this, we should directly switch to at least Python 3.7 (or later, maybe better 3.8), and if this is not ready yet, then we still should wait. |
The question is, those existing setups with Python 3.5/3.6, is there any good reason they use such old Python? Or could those setups just be switched to some newer Python? |
in my estimation the main reason is that the OS we use ships with this old version of python people would just not bother installing a custom version. I.e. no good reason.
I think currently new versions of OS are being tested that would ship with python3.8. Once these get deployed we would anyway check compatibility and put in effort to fix problems. So maybe start the process now and then we can also drop support for python<=3.7
I just received confirmation that both sequence training (@christophmluscher) and fullsum recipe (@Marvin84) have been tested with python3.8. Maybe also check with @ZhouW321 if his setups are compatible? |
I'm not really sure what you mean by that. Dropping Python will not be done in a branch or PR. This will be a continuous process, directly in master. It will not be in one go. It will probably take a lot of time. It will be somewhat similar as the transition in i6_core. And we should only start at that point when it really can be dropped. |
I mean to start the transition in i6_core now and not only when the new OS is deployed. |
I would like just to share one use case with python 3.8 and full-sum training, our HiWi Daniel had some additional scientific packages installed, and even though returnn run script was not using his local python this interfered with the training, leading to very high WERs. This was solved once he created a virtual environment where those packages were installed, i.e., by making sure they are not accidentally used. I was very confused about this but we did not spend further time on debugging. |
So you suggest the same here for RETURNN? Not wait until the new OS is deployed, but drop it right now? |
I don't really understand this comment. In what way is this specific to Python 3.8? From your description, I don't think this is in any way specific to the Python version, so not really relevant here. Further, I also don't really understand how some unrelated unused additionally installed packages can in any way influence RETURNN. But this is also not relevant here in this issue but rather should be discussed in a separate issue, if you want to investigate this further. |
no, as I said in the statement that you literally just quoted I suggest this for i6_core (well ok, also for other recipes).
A statement to which I agree. |
Ah ok, I was confused, because this issue here is about RETURNN. And also, I thought for i6_core we already do that? At least I remember that we started to use type annotations (which means mostly that Python >=3.7 is required). I also see f-strings, which are Python >=3.6. So it means, i6_core does anyway not work with the default Ubuntu 16 Python. |
Since no further investigation was done, I cannot express any further thoughts on this. However, your high confidence in underlying the irrelevance of python3.8 is not comforting.
I also wondered about this. Again, I only described the experience 'as-is'. I am not sure where and how you deducted the 'unsued' case. Our intuitive assumption here was that Returnn did rely on local packages as an unexpected behavior. My intention was to answer your questions here, and the scenario I described, IMO, enters into the category of 'testing python version', since he used the same Returnn code with only a different python version and got from 7% WER to 90%. Maybe not directly because of the python version? I am not able to answer this question. |
@Marvin84 Yes I'm very confident that this is not about the Python version, but rather about some mess-up in the system or environment. Esp as you point out that it works after you worked with a virtual environment. |
In which context do you think we should follow-up with more investigation on this? I think this is an important issue. Implementation meeting or is it still rather a Returnn issue? |
First step, open a separate issue on this, and let's continue the discussion there. In that issue, then also add some more details if possible. |
After some discussion with @curufinwe, we agreed that we can drop Python <=3.6 support already right now, i.e. require Python >=3.7. But as I mentioned, this will be a continual effort. Although for all new code, we should directly write it in such a way. Specifically:
I might see if I can do some of the type annotations automatically, by converting our doc strings. I already do similar logic on RETURNN common side where I automatically parse RETURNN code and generate wrappers. |
I think before we start doing this, we should give some notification to everyone and leave them a bit of time (in terms of days, not too long) to test their setups with a new Python version. |
Require Python version >=3.7.
See #1326 for a follow-up.
This is just to start some discussion, specifically what we want to drop. Surely <=Python2, but probably also <Python3.x for some x (e.g. 7). We could also say it might make more sense maybe in 6 months or so. This issue implies then to clean up the code accordingly.
The purpose is to clean up some code where Python 2 vs 3 needs extra separate logic, and also where we could make use of newer syntax. I want to collect some of these arguments (and their needed Python version) here.
Sometimes some things can also be fixed by
from __future__ import ...
, which is ok (but also not for all features).Then there might be practical issues on some systems we use, in some conditions, which just doesn't provide newer Python versions, or would make it very inconvenient. E.g. for some time, RASR was linked to Python 2 by default, and also did not support Python 3. I think that is fixed now? Then Ubuntu 16 has some older Python, but maybe we can just agree to use anyway a newer Python version?
Now some arguments, or relevant features:
Type annotations. Like
def func(a: str) -> Data
or whatever. I think since Python 3.5.However, there is the issue with forward declaration. E.g.:
This requires >=Python 3.10 I think.
Or it requires
from __future__ import annotations
since Python 3.7 (doc).The earlier workaround was to use a string as annotation instead, i.e.
def bla(...) -> "Data"
(referred to as forward declaration). But I think this support was always somewhat limited (e.g. not sure if PyCharm will correctly pick this up), and will also likely go away at some point, or being deprecated, so I would argue to avoid this (because otherwise we will have a similar discussion soon again). But this means we need at least Python 3.7.(There are ongoing discussions around this. E.g. see pydantic #2678 (HN), PEP 649 (python-dev discussion).)
I would also argue, in any case, type inference must work in PyCharm. If that doesn't work, we should not use it.
In any case, I think these type annotations are actually one of the major reasons for us to drop older support.
dict.items()
ordict.keys()
returns a non-deterministic order in earlier Python versions.Because you usually want to have this deterministic, you would use
sorted(dict.items())
.Since Python 3.6, this is now deterministic (using the insertion order) (see here).
So, from Python 3.6 on, you can drop the
sorted
, or explicit usage ofOrderedDict
.We should still be a bit careful here though. Some issues:
assert py_version >= (3,6)
somewhere.sorted
, i.e. sorted by alphabet, or maybe to stay compatible with some old data?list(dict.keys())
would be different (whereassorted
would always be the same). Not sure if this could cause any issues.f-strings, since Python 3.6.
Ubuntu 16.04 (so far still the standard on many of our systems) comes with Python 3.5.
Ubuntu 20.04 comes with Python 3.8.
Ubuntu 22.04 comes with Python 3.10.
typing.Protocol
requires Python >=3.8.Slash (
/
) in function argument list to mark positional-only arguments (doc) requires Python >=3.8.The text was updated successfully, but these errors were encountered: