Path.rglob fails with . on Python 3.11.0a7 #91616

domdfcoding · 2022-04-16T20:46:51Z

Bug report

On Python 3.11.0a7, calling Path.rglob("*.*") returns an iterator over no files, whereas on previous versions (including alpha 6) it would correctly return an iterator over all files in the directory and all of its children.

The following code illustrates this problem:

import pathlib, tempfile, pprint

with tempfile.TemporaryDirectory() as tmpdir:
	tmp_path = pathlib.Path(tmpdir)
	(tmp_path / "a.txt").touch()
	(tmp_path / "b.txt").touch()
	(tmp_path / "c").mkdir()
	(tmp_path / "c" / "d.txt").touch()
	pprint.pprint(list(tmp_path.rglob("*.*")))
	pprint.pprint(list(tmp_path.rglob("*.txt")))

Save this as pathlib_rglob.py.

Output:

$ python3.11 pathlib_rglob.py 
[]
[PosixPath('/tmp/tmpngjo2cyn/a.txt'),
 PosixPath('/tmp/tmpngjo2cyn/b.txt'),
 PosixPath('/tmp/tmpngjo2cyn/c/d.txt')]
$ python3.10 pathlib_rglob.py 
[PosixPath('/tmp/tmpodfbqyhx/a.txt'),
 PosixPath('/tmp/tmpodfbqyhx/b.txt'),
 PosixPath('/tmp/tmpodfbqyhx/c/d.txt')]
[PosixPath('/tmp/tmpodfbqyhx/a.txt'),
 PosixPath('/tmp/tmpodfbqyhx/b.txt'),
 PosixPath('/tmp/tmpodfbqyhx/c/d.txt')]

$ # This is Python 3.11.0a6 built from source
$ ~/python311a6/bin/python3.11 pathlib_rglob.py 
[PosixPath('/tmp/tmpobellcrw/a.txt'),
 PosixPath('/tmp/tmpobellcrw/b.txt'),
 PosixPath('/tmp/tmpobellcrw/c/d.txt')]
[PosixPath('/tmp/tmpobellcrw/a.txt'),
 PosixPath('/tmp/tmpobellcrw/b.txt'),
 PosixPath('/tmp/tmpobellcrw/c/d.txt')]

After much debugging I have tracked the issue to #32029
With that change, fnmatch.translate("*.*") gives (?s:(?>.*?\.).*)\Z.
With the change reverted (or on 3.11.0a6) its (?s:(?=(?P<g0>.*?\.))(?P=g0).*)\Z.
I don't know what the difference between those amounts to, but the former doesn't work while the latter does:

>>> import re
>>> re.fullmatch('(?s:(?>.*?\\.).*)\\Z', "a.txt")
>>> re.fullmatch('(?s:(?=(?P<g0>.*?\\.))(?P=g0).*)\\Z', "a.txt")
<re.Match object; span=(0, 5), match='a.txt'>

Your environment

CPython versions tested on: 3.10.4, 3.11.0a6, 3.11.0a7, built from git checkouts of the relevant tag.
Operating system and architecture: Ubuntu 20.04 amd64

The text was updated successfully, but these errors were encountered:

domdfcoding · 2022-04-16T20:47:45Z

@tim-one as you made the original change in #32029 could you take a look at this?

ghost · 2022-04-17T03:45:05Z

It's a bug of re module.

@serhiy-storchaka , the creator of PR #31982 (Add support of atomic grouping in re).
This patch fixes the problem. I will submit a PR after thoroughly understanding the problem.

 Modules/_sre/sre_lib.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Modules/_sre/sre_lib.h b/Modules/_sre/sre_lib.h
index 3472e65b87..c73e116143 100644
--- a/Modules/_sre/sre_lib.h
+++ b/Modules/_sre/sre_lib.h
@@ -1350,8 +1350,8 @@ SRE(match)(SRE_STATE* state, const SRE_CODE* pattern, int toplevel)
                when the end of the group, represented by a SUCCESS op
                code, is reached. */
             /* Group Pattern begins at an offset of 1 code. */
-            DO_JUMP(JUMP_ATOMIC_GROUP, jump_atomic_group,
-                    &pattern[1]);
+            DO_JUMP0(JUMP_ATOMIC_GROUP, jump_atomic_group,
+                     &pattern[1]);
 
             /* Test Exit Condition */
             RETURN_ON_ERROR(ret);

serhiy-storchaka · 2022-04-17T14:32:32Z

Ah, I see.

>>> re.match(r'(?s:(?>.*?\.).*)\Z', 'a.txt')
<re.Match object; span=(0, 5), match='a.txt'>
>>> re.fullmatch(r'(?s:(?>.*?\.).*)\Z', 'a.txt')

I thought that issues with fullmatch() and atomic grouping were solved, but it is not so.

Use the following tests in your PR:

diff --git a/Lib/test/test_re.py b/Lib/test/test_re.py
index 959582e2f1..a5aa1b358d 100644
--- a/Lib/test/test_re.py
+++ b/Lib/test/test_re.py
@@ -2173,6 +2173,10 @@ def test_fullmatch_possessive_quantifiers(self):
         self.assertIsNone(re.fullmatch(r'a*+', 'ab'))
         self.assertIsNone(re.fullmatch(r'a?+', 'ab'))
         self.assertIsNone(re.fullmatch(r'a{1,3}+', 'ab'))
+        self.assertTrue(re.fullmatch(r'a++b', 'ab'))
+        self.assertTrue(re.fullmatch(r'a*+b', 'ab'))
+        self.assertTrue(re.fullmatch(r'a?+b', 'ab'))
+        self.assertTrue(re.fullmatch(r'a{1,3}+b', 'ab'))
 
         self.assertTrue(re.fullmatch(r'(?:ab)++', 'ab'))
         self.assertTrue(re.fullmatch(r'(?:ab)*+', 'ab'))
@@ -2182,6 +2186,10 @@ def test_fullmatch_possessive_quantifiers(self):
         self.assertIsNone(re.fullmatch(r'(?:ab)*+', 'abc'))
         self.assertIsNone(re.fullmatch(r'(?:ab)?+', 'abc'))
         self.assertIsNone(re.fullmatch(r'(?:ab){1,3}+', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?:ab)++c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?:ab)*+c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?:ab)?+c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?:ab){1,3}+c', 'abc'))
 
     def test_findall_possessive_quantifiers(self):
         self.assertEqual(re.findall(r'a++', 'aab'), ['aa'])
@@ -2217,6 +2225,10 @@ def test_fullmatch_atomic_grouping(self):
         self.assertIsNone(re.fullmatch(r'(?>a*)', 'ab'))
         self.assertIsNone(re.fullmatch(r'(?>a?)', 'ab'))
         self.assertIsNone(re.fullmatch(r'(?>a{1,3})', 'ab'))
+        self.assertTrue(re.fullmatch(r'(?>a+)b', 'ab'))
+        self.assertTrue(re.fullmatch(r'(?>a*)b', 'ab'))
+        self.assertTrue(re.fullmatch(r'(?>a?)b', 'ab'))
+        self.assertTrue(re.fullmatch(r'(?>a{1,3})b', 'ab'))
 
         self.assertTrue(re.fullmatch(r'(?>(?:ab)+)', 'ab'))
         self.assertTrue(re.fullmatch(r'(?>(?:ab)*)', 'ab'))
@@ -2226,6 +2238,10 @@ def test_fullmatch_atomic_grouping(self):
         self.assertIsNone(re.fullmatch(r'(?>(?:ab)*)', 'abc'))
         self.assertIsNone(re.fullmatch(r'(?>(?:ab)?)', 'abc'))
         self.assertIsNone(re.fullmatch(r'(?>(?:ab){1,3})', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?>(?:ab)+)c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?>(?:ab)*)c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?>(?:ab)?)c', 'abc'))
+        self.assertTrue(re.fullmatch(r'(?>(?:ab){1,3})c', 'abc'))
 
     def test_findall_atomic_grouping(self):
         self.assertEqual(re.findall(r'(?>a+)', 'aab'), ['aa'])

serhiy-storchaka · 2022-04-17T14:40:29Z

Make the same change in POSSESSIVE_REPEAT.

ghost · 2022-04-17T15:57:56Z

A draft: https://github.com/animalize/cpython/commit/8e00544280f054b663c056c6a99973749efe4b97
I'm carefully reviewing #31982 and match_all/must_advance code.

ghost · 2022-04-19T05:08:57Z

@serhiy-storchaka , please review #91681 .

…ping or Possessive Quantifiers (GH-91681) These jumps should use DO_JUMP0() instead of DO_JUMP(): - JUMP_POSS_REPEAT_1 - JUMP_POSS_REPEAT_2 - JUMP_ATOMIC_GROUP

domdfcoding added the type-bug An unexpected behavior, bug, or error label Apr 16, 2022

AlexWaygood added stdlib Python modules in the Lib dir 3.11 only security fixes labels Apr 16, 2022

This comment was marked as outdated.

Sign in to view

ghost mentioned this issue Apr 17, 2022

gh-91524: Speed up the regular expression substitution #91525

Merged

AlexWaygood added the topic-regex label Apr 17, 2022

serhiy-storchaka mentioned this issue Apr 22, 2022

gh-91616: re module, fix .fullmatch() mismatch when using Atomic Grouping or Possessive Quantifiers #91681

Merged

serhiy-storchaka closed this as completed Apr 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path.rglob fails with . on Python 3.11.0a7 #91616

Path.rglob fails with . on Python 3.11.0a7 #91616

domdfcoding commented Apr 16, 2022

domdfcoding commented Apr 16, 2022

This comment was marked as outdated.

ghost commented Apr 17, 2022

serhiy-storchaka commented Apr 17, 2022

serhiy-storchaka commented Apr 17, 2022

ghost commented Apr 17, 2022

ghost commented Apr 19, 2022

Path.rglob fails with *.* on Python 3.11.0a7 #91616

Path.rglob fails with *.* on Python 3.11.0a7 #91616

Comments

domdfcoding commented Apr 16, 2022

domdfcoding commented Apr 16, 2022

This comment was marked as outdated.

ghost commented Apr 17, 2022

serhiy-storchaka commented Apr 17, 2022

serhiy-storchaka commented Apr 17, 2022

ghost commented Apr 17, 2022

ghost commented Apr 19, 2022

Path.rglob fails with . on Python 3.11.0a7 #91616

Path.rglob fails with . on Python 3.11.0a7 #91616