Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path.rglob fails with *.* on Python 3.11.0a7 #91616

Closed
domdfcoding opened this issue Apr 16, 2022 · 7 comments · Fixed by #91681
Closed

Path.rglob fails with *.* on Python 3.11.0a7 #91616

domdfcoding opened this issue Apr 16, 2022 · 7 comments · Fixed by #91681
Labels
3.11 only security fixes stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error

Comments

@domdfcoding
Copy link
Contributor

Bug report

On Python 3.11.0a7, calling Path.rglob("*.*") returns an iterator over no files, whereas on previous versions (including alpha 6) it would correctly return an iterator over all files in the directory and all of its children.

The following code illustrates this problem:

import pathlib, tempfile, pprint

with tempfile.TemporaryDirectory() as tmpdir:
	tmp_path = pathlib.Path(tmpdir)
	(tmp_path / "a.txt").touch()
	(tmp_path / "b.txt").touch()
	(tmp_path / "c").mkdir()
	(tmp_path / "c" / "d.txt").touch()
	pprint.pprint(list(tmp_path.rglob("*.*")))
	pprint.pprint(list(tmp_path.rglob("*.txt")))

Save this as pathlib_rglob.py.

Output:

$ python3.11 pathlib_rglob.py 
[]
[PosixPath('/tmp/tmpngjo2cyn/a.txt'),
 PosixPath('/tmp/tmpngjo2cyn/b.txt'),
 PosixPath('/tmp/tmpngjo2cyn/c/d.txt')]
$ python3.10 pathlib_rglob.py 
[PosixPath('/tmp/tmpodfbqyhx/a.txt'),
 PosixPath('/tmp/tmpodfbqyhx/b.txt'),
 PosixPath('/tmp/tmpodfbqyhx/c/d.txt')]
[PosixPath('/tmp/tmpodfbqyhx/a.txt'),
 PosixPath('/tmp/tmpodfbqyhx/b.txt'),
 PosixPath('/tmp/tmpodfbqyhx/c/d.txt')]

$ # This is Python 3.11.0a6 built from source
$ ~/python311a6/bin/python3.11 pathlib_rglob.py 
[PosixPath('/tmp/tmpobellcrw/a.txt'),
 PosixPath('/tmp/tmpobellcrw/b.txt'),
 PosixPath('/tmp/tmpobellcrw/c/d.txt')]
[PosixPath('/tmp/tmpobellcrw/a.txt'),
 PosixPath('/tmp/tmpobellcrw/b.txt'),
 PosixPath('/tmp/tmpobellcrw/c/d.txt')]

After much debugging I have tracked the issue to #32029
With that change, fnmatch.translate("*.*") gives (?s:(?>.*?\.).*)\Z.
With the change reverted (or on 3.11.0a6) its (?s:(?=(?P<g0>.*?\.))(?P=g0).*)\Z.
I don't know what the difference between those amounts to, but the former doesn't work while the latter does:

>>> import re
>>> re.fullmatch('(?s:(?>.*?\\.).*)\\Z', "a.txt")
>>> re.fullmatch('(?s:(?=(?P<g0>.*?\\.))(?P=g0).*)\\Z', "a.txt")
<re.Match object; span=(0, 5), match='a.txt'>

Your environment

  • CPython versions tested on: 3.10.4, 3.11.0a6, 3.11.0a7, built from git checkouts of the relevant tag.
  • Operating system and architecture: Ubuntu 20.04 amd64
@domdfcoding domdfcoding added the type-bug An unexpected behavior, bug, or error label Apr 16, 2022
@domdfcoding
Copy link
Contributor Author

@tim-one as you made the original change in #32029 could you take a look at this?

@AlexWaygood AlexWaygood added stdlib Python modules in the Lib dir 3.11 only security fixes labels Apr 16, 2022
@ghost

This comment was marked as outdated.

@ghost
Copy link

ghost commented Apr 17, 2022

It's a bug of re module.

@serhiy-storchaka , the creator of PR #31982 (Add support of atomic grouping in re).
This patch fixes the problem. I will submit a PR after thoroughly understanding the problem.

 Modules/_sre/sre_lib.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Modules/_sre/sre_lib.h b/Modules/_sre/sre_lib.h
index 3472e65b87..c73e116143 100644
--- a/Modules/_sre/sre_lib.h
+++ b/Modules/_sre/sre_lib.h
@@ -1350,8 +1350,8 @@ SRE(match)(SRE_STATE* state, const SRE_CODE* pattern, int toplevel)
                when the end of the group, represented by a SUCCESS op
                code, is reached. */
             /* Group Pattern begins at an offset of 1 code. */
-            DO_JUMP(JUMP_ATOMIC_GROUP, jump_atomic_group,
-                    &pattern[1]);
+            DO_JUMP0(JUMP_ATOMIC_GROUP, jump_atomic_group,
+                     &pattern[1]);
 
             /* Test Exit Condition */
             RETURN_ON_ERROR(ret);

@serhiy-storchaka
Copy link
Member

Ah, I see.

>>> re.match(r'(?s:(?>.*?\.).*)\Z', 'a.txt')
<re.Match object; span=(0, 5), match='a.txt'>
>>> re.fullmatch(r'(?s:(?>.*?\.).*)\Z', 'a.txt')

I thought that issues with fullmatch() and atomic grouping were solved, but it is not so.

Use the following tests in your PR:

diff --git a/Lib/test/test_re.py b/Lib/test/test_re.py
index 959582e2f1..a5aa1b358d 100644
--- a/Lib/test/test_re.py
+++ b/Lib/test/test_re.py
@@ -2173,6 +2173,10 @@ def test_fullmatch_possessive_quantifiers(self):
         self.assertIsNone(re.fullmatch(r'a*+', 'ab'))
         self.assertIsNone(re.fullmatch(r'a?+', 'ab'))
         self.assertIsNone(re.fullmatch(r'a{1,3}+', 'ab'))
+        self.assertTrue(re.fullmatch(r'a++b', 'ab'))
+        self.assertTrue(re.fullmatch(r'a*+b', 'ab'))
+        self.assertTrue(re.fullmatch(r'a?+b', 'ab'))
+        self.assertTrue(re.fullmatch(r'a{1,3}+b', 'ab'))
 
         self.assertTrue(re.fullmatch(r'(?:ab)++', 'ab'))
         self.assertTrue(re.fullmatch(r'(?:ab)*+', 'ab'))
@@ -2182,6 +2186,10 @@ def test_fullmatch_possessive_quantifiers(self):
         self.assertIsNone(re.fullmatch(r'(?:ab)*+', 'abc'))
         self.assertIsNone(re.fullmatch(r'(?:ab)?+', 'abc'))
         self.assertIsNone(re.fullmatch(r'(?:ab){1,3}+', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?:ab)++c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?:ab)*+c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?:ab)?+c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?:ab){1,3}+c', 'abc'))
 
     def test_findall_possessive_quantifiers(self):
         self.assertEqual(re.findall(r'a++', 'aab'), ['aa'])
@@ -2217,6 +2225,10 @@ def test_fullmatch_atomic_grouping(self):
         self.assertIsNone(re.fullmatch(r'(?>a*)', 'ab'))
         self.assertIsNone(re.fullmatch(r'(?>a?)', 'ab'))
         self.assertIsNone(re.fullmatch(r'(?>a{1,3})', 'ab'))
+        self.assertTrue(re.fullmatch(r'(?>a+)b', 'ab'))
+        self.assertTrue(re.fullmatch(r'(?>a*)b', 'ab'))
+        self.assertTrue(re.fullmatch(r'(?>a?)b', 'ab'))
+        self.assertTrue(re.fullmatch(r'(?>a{1,3})b', 'ab'))
 
         self.assertTrue(re.fullmatch(r'(?>(?:ab)+)', 'ab'))
         self.assertTrue(re.fullmatch(r'(?>(?:ab)*)', 'ab'))
@@ -2226,6 +2238,10 @@ def test_fullmatch_atomic_grouping(self):
         self.assertIsNone(re.fullmatch(r'(?>(?:ab)*)', 'abc'))
         self.assertIsNone(re.fullmatch(r'(?>(?:ab)?)', 'abc'))
         self.assertIsNone(re.fullmatch(r'(?>(?:ab){1,3})', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?>(?:ab)+)c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?>(?:ab)*)c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?>(?:ab)?)c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?>(?:ab){1,3})c', 'abc'))
 
     def test_findall_atomic_grouping(self):
         self.assertEqual(re.findall(r'(?>a+)', 'aab'), ['aa'])

@serhiy-storchaka
Copy link
Member

Make the same change in POSSESSIVE_REPEAT.

@ghost
Copy link

ghost commented Apr 17, 2022

A draft: https://github.com/animalize/cpython/commit/8e00544280f054b663c056c6a99973749efe4b97
I'm carefully reviewing #31982 and match_all/must_advance code.

@ghost
Copy link

ghost commented Apr 19, 2022

@serhiy-storchaka , please review #91681 .

serhiy-storchaka pushed a commit that referenced this issue Apr 19, 2022
…ping or Possessive Quantifiers (GH-91681)

These jumps should use DO_JUMP0() instead of DO_JUMP():
- JUMP_POSS_REPEAT_1
- JUMP_POSS_REPEAT_2
- JUMP_ATOMIC_GROUP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants