Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursiveDirectoryIterator misbehaves if the directory gets modified while iterating #2903

Open
caco3 opened this issue Oct 30, 2023 · 17 comments
Labels
bug Documentation contains incorrect information Extension: spl Status: Verified

Comments

@caco3
Copy link

caco3 commented Oct 30, 2023

Description

I am trying to run a simple PHP script on the FreeBSD 12.4 system of my web hoster (hostpoint.ch).
It seems that there is some non-standard behaviour in it.

Preparations

setup a src directory with a lot of files:

mkdir src
for i in {1..500}; do touch src/$i; done

create a php script using following content:

<?php

function getRecursiveDirectoryIterator(?string $folder = null): \RecursiveIteratorIterator {
        return new \RecursiveIteratorIterator(
                new \RecursiveDirectoryIterator($folder, \RecursiveDirectoryIterator::SKIP_DOTS),
                \RecursiveIteratorIterator::CHILD_FIRST
        );
}

$i = 0;

mkdir("dst");


foreach (getRecursiveDirectoryIterator("src") as $path) {
        $i++;
        $newPath = str_replace("src", "dst", $path);
        echo("$i: $path -> $newPath\n");
        rename($path, $newPath);             // <- makes the iterator to break
}

echo("EOF");

?>

Expected output

It is expected that all 500 files get moved from src to dst:

1: src/1 -> dst/1
2: src/2 -> dst/2
3: src/3 -> dst/3
4: src/4 -> dst/4
5: src/5 -> dst/5
6: src/6 -> dst/6
[..]
496: src/496 -> dst/496
497: src/497 -> dst/497
498: src/498 -> dst/498
499: src/499 -> dst/499
500: src/500 -> dst/500

Effective Output

The iterator has 2 glitches, files 95..188 and 285..380 get skipped:

1: src/1 -> dst/1
2: src/2 -> dst/2
3: src/3 -> dst/3
4: src/4 -> dst/4
5: src/5 -> dst/5
6: src/6 -> dst/6
7: src/7 -> dst/7
8: src/8 -> dst/8
9: src/9 -> dst/9
10: src/10 -> dst/10
11: src/11 -> dst/11
12: src/12 -> dst/12
13: src/13 -> dst/13
14: src/14 -> dst/14
15: src/15 -> dst/15
16: src/16 -> dst/16
17: src/17 -> dst/17
18: src/18 -> dst/18
19: src/19 -> dst/19
20: src/20 -> dst/20
21: src/21 -> dst/21
22: src/22 -> dst/22
23: src/23 -> dst/23
24: src/24 -> dst/24
25: src/25 -> dst/25
26: src/26 -> dst/26
27: src/27 -> dst/27
28: src/28 -> dst/28
29: src/29 -> dst/29
30: src/30 -> dst/30
31: src/31 -> dst/31
32: src/32 -> dst/32
33: src/33 -> dst/33
34: src/34 -> dst/34
35: src/35 -> dst/35
36: src/36 -> dst/36
37: src/37 -> dst/37
38: src/38 -> dst/38
39: src/39 -> dst/39
40: src/40 -> dst/40
41: src/41 -> dst/41
42: src/42 -> dst/42
43: src/43 -> dst/43
44: src/44 -> dst/44
45: src/45 -> dst/45
46: src/46 -> dst/46
47: src/47 -> dst/47
48: src/48 -> dst/48
49: src/49 -> dst/49
50: src/50 -> dst/50
51: src/51 -> dst/51
52: src/52 -> dst/52
53: src/53 -> dst/53
54: src/54 -> dst/54
55: src/55 -> dst/55
56: src/56 -> dst/56
57: src/57 -> dst/57
58: src/58 -> dst/58
59: src/59 -> dst/59
60: src/60 -> dst/60
61: src/61 -> dst/61
62: src/62 -> dst/62
63: src/63 -> dst/63
64: src/64 -> dst/64
65: src/65 -> dst/65
66: src/66 -> dst/66
67: src/67 -> dst/67
68: src/68 -> dst/68
69: src/69 -> dst/69
70: src/70 -> dst/70
71: src/71 -> dst/71
72: src/72 -> dst/72
73: src/73 -> dst/73
74: src/74 -> dst/74
75: src/75 -> dst/75
76: src/76 -> dst/76
77: src/77 -> dst/77
78: src/78 -> dst/78
79: src/79 -> dst/79
80: src/80 -> dst/80
81: src/81 -> dst/81
82: src/82 -> dst/82
83: src/83 -> dst/83
84: src/84 -> dst/84
85: src/85 -> dst/85
86: src/86 -> dst/86
87: src/87 -> dst/87
88: src/88 -> dst/88
89: src/89 -> dst/89
90: src/90 -> dst/90
91: src/91 -> dst/91
92: src/92 -> dst/92
93: src/93 -> dst/93
94: src/94 -> dst/94
95: src/189 -> dst/189      <--- file 95..188 got skipped
96: src/190 -> dst/190
97: src/191 -> dst/191
98: src/192 -> dst/192
99: src/193 -> dst/193
100: src/194 -> dst/194
101: src/195 -> dst/195
102: src/196 -> dst/196
103: src/197 -> dst/197
104: src/198 -> dst/198
105: src/199 -> dst/199
106: src/200 -> dst/200
107: src/201 -> dst/201
108: src/202 -> dst/202
109: src/203 -> dst/203
110: src/204 -> dst/204
111: src/205 -> dst/205
112: src/206 -> dst/206
113: src/207 -> dst/207
114: src/208 -> dst/208
115: src/209 -> dst/209
116: src/210 -> dst/210
117: src/211 -> dst/211
118: src/212 -> dst/212
119: src/213 -> dst/213
120: src/214 -> dst/214
121: src/215 -> dst/215
122: src/216 -> dst/216
123: src/217 -> dst/217
124: src/218 -> dst/218
125: src/219 -> dst/219
126: src/220 -> dst/220
127: src/221 -> dst/221
128: src/222 -> dst/222
129: src/223 -> dst/223
130: src/224 -> dst/224
131: src/225 -> dst/225
132: src/226 -> dst/226
133: src/227 -> dst/227
134: src/228 -> dst/228
135: src/229 -> dst/229
136: src/230 -> dst/230
137: src/231 -> dst/231
138: src/232 -> dst/232
139: src/233 -> dst/233
140: src/234 -> dst/234
141: src/235 -> dst/235
142: src/236 -> dst/236
143: src/237 -> dst/237
144: src/238 -> dst/238
145: src/239 -> dst/239
146: src/240 -> dst/240
147: src/241 -> dst/241
148: src/242 -> dst/242
149: src/243 -> dst/243
150: src/244 -> dst/244
151: src/245 -> dst/245
152: src/246 -> dst/246
153: src/247 -> dst/247
154: src/248 -> dst/248
155: src/249 -> dst/249
156: src/250 -> dst/250
157: src/251 -> dst/251
158: src/252 -> dst/252
159: src/253 -> dst/253
160: src/254 -> dst/254
161: src/255 -> dst/255
162: src/256 -> dst/256
163: src/257 -> dst/257
164: src/258 -> dst/258
165: src/259 -> dst/259
166: src/260 -> dst/260
167: src/261 -> dst/261
168: src/262 -> dst/262
169: src/263 -> dst/263
170: src/264 -> dst/264
171: src/265 -> dst/265
172: src/266 -> dst/266
173: src/267 -> dst/267
174: src/268 -> dst/268
175: src/269 -> dst/269
176: src/270 -> dst/270
177: src/271 -> dst/271
178: src/272 -> dst/272
179: src/273 -> dst/273
180: src/274 -> dst/274
181: src/275 -> dst/275
182: src/276 -> dst/276
183: src/277 -> dst/277
184: src/278 -> dst/278
185: src/279 -> dst/279
186: src/280 -> dst/280
187: src/281 -> dst/281
188: src/282 -> dst/282
189: src/283 -> dst/283
190: src/284 -> dst/284
191: src/381 -> dst/381      <--- file 285..380 got skipped
192: src/382 -> dst/382
193: src/383 -> dst/383
194: src/384 -> dst/384
195: src/385 -> dst/385
196: src/386 -> dst/386
197: src/387 -> dst/387
198: src/388 -> dst/388
199: src/389 -> dst/389
200: src/390 -> dst/390
201: src/391 -> dst/391
202: src/392 -> dst/392
203: src/393 -> dst/393
204: src/394 -> dst/394
205: src/395 -> dst/395
206: src/396 -> dst/396
207: src/397 -> dst/397
208: src/398 -> dst/398
209: src/399 -> dst/399
210: src/400 -> dst/400
211: src/401 -> dst/401
212: src/402 -> dst/402
213: src/403 -> dst/403
214: src/404 -> dst/404
215: src/405 -> dst/405
216: src/406 -> dst/406
217: src/407 -> dst/407
218: src/408 -> dst/408
219: src/409 -> dst/409
220: src/410 -> dst/410
221: src/411 -> dst/411
222: src/412 -> dst/412
223: src/413 -> dst/413
224: src/414 -> dst/414
225: src/415 -> dst/415
226: src/416 -> dst/416
227: src/417 -> dst/417
228: src/418 -> dst/418
229: src/419 -> dst/419
230: src/420 -> dst/420
231: src/421 -> dst/421
232: src/422 -> dst/422
233: src/423 -> dst/423
234: src/424 -> dst/424
235: src/425 -> dst/425
236: src/426 -> dst/426
237: src/427 -> dst/427
238: src/428 -> dst/428
239: src/429 -> dst/429
240: src/430 -> dst/430
241: src/431 -> dst/431
242: src/432 -> dst/432
243: src/433 -> dst/433
244: src/434 -> dst/434
245: src/435 -> dst/435
246: src/436 -> dst/436
247: src/437 -> dst/437
248: src/438 -> dst/438
249: src/439 -> dst/439
250: src/440 -> dst/440
251: src/441 -> dst/441
252: src/442 -> dst/442
253: src/443 -> dst/443
254: src/444 -> dst/444
255: src/445 -> dst/445
256: src/446 -> dst/446
257: src/447 -> dst/447
258: src/448 -> dst/448
259: src/449 -> dst/449
260: src/450 -> dst/450
261: src/451 -> dst/451
262: src/452 -> dst/452
263: src/453 -> dst/453
264: src/454 -> dst/454
265: src/455 -> dst/455
266: src/456 -> dst/456
267: src/457 -> dst/457
268: src/458 -> dst/458
269: src/459 -> dst/459
270: src/460 -> dst/460
271: src/461 -> dst/461
272: src/462 -> dst/462
273: src/463 -> dst/463
274: src/464 -> dst/464
275: src/465 -> dst/465
276: src/466 -> dst/466
277: src/467 -> dst/467
278: src/468 -> dst/468
279: src/469 -> dst/469
280: src/470 -> dst/470
281: src/471 -> dst/471
282: src/472 -> dst/472
283: src/473 -> dst/473
284: src/474 -> dst/474
285: src/475 -> dst/475
286: src/476 -> dst/476

Note that changing the iterator to create a list and looping through the list works around this issue:

- foreach (getRecursiveDirectoryIterator("src") as $path) {
+ $fileList = iterator_to_array(getRecursiveDirectoryIterator("src"), true);
+ foreach ($fileList as $path => $fileInfo) {

PHP Version

7.4, 8.0, 8.1

Operating System

FreeBSD 12.4

@caco3 caco3 added bug Documentation contains incorrect information Status: Needs Triage labels Oct 30, 2023
@caco3
Copy link
Author

caco3 commented Oct 30, 2023

See also nextcloud/updater#509 (comment) for further information

@rettichschnidi
Copy link

rettichschnidi commented Oct 30, 2023

FYI: Fixed version is in the example code. Guess that should not be the case.

ps: Thanks a lot for your work on the Nextcloud/Hostpoint/update issue. I'm not a PHP dev, but pretty sure that using a directory iterator while manipulating the underlying folder is a bad idea.

@damianwadley
Copy link
Member

Iterating on a directory's contents while you are actively modifying its contents feels risky just on principle. It would seem alright to automatically "snapshot" one particular directory's contents before returning its data for iteration, but that also means not being able to pick up any changes to the directory - desirable or not. Then the recursive iteration is harder: if you were recursing on ./ and the first directory it found was src, after those files got moved, would you expect that the iterator did or did not subsequently discover them existing in dst?

All in all, the safest course of action would certainly be to make your own snapshot before performing any actions (like with iterator_to_array), then doing any necessary last-minute sanity checks before making each change, but I wouldn't yet discount there being something that could be done here.

@caco3
Copy link
Author

caco3 commented Oct 31, 2023

I: Fixed version is in the example code. Guess that should not be the case.

Thanks, I corrected the example.

ps: Thanks a lot for your work on the Nextcloud/Hostpoint/update issue. I'm not a PHP dev, but pretty sure that using a directory iterator while manipulating the underlying folder is a bad idea.

Thanks, hopefully they will accept my fix. Currently it looks like they do not want to accept it because they officially do not support BSD, see nextcloud/updater#510 (comment)

@damianwadley Thanks for your comment.

Oddly, therere are many examples on the web doing it without iterator_to_array(), and nowhere a note that it is dangerous.
Since I am not enough into iterators, is there a difference between

$iterator = new DirectoryIterator($dir);
foreach ($iterator as $fileinfo) { .. }

and

foreach (new DirectoryIterator($dir) as $fileinfo) { .. }

?
Is the 2nd one save (altough it is not using the iterator_to_array()?

@rettichschnidi
Copy link

Thanks, hopefully they will accept my fix. Currently it looks like they do not want to accept it because they officially do not support BSD, see nextcloud/updater#510 (comment)

Read it and I think they worded it badly and/or got it wrong. But their motivation to understand and fix the underlying issue is sound. And in this case, once it can be shown that there is no (vanilla) documentation stating that their usage of the iterators is save, then I am sure they will accept a fix and everyone, including users on their supported platforms, wins.

@rettichschnidi
Copy link

Oddly, therere are many examples on the web doing it without iterator_to_array(), and nowhere a note that it is dangerous.

Would flip the implied question: Where can one find authoritative documentation stating that it is save to keep using a DirectoryIterator instance once its content got manipulated?

Any chance a PHP developer can help us out here?

@caco3
Copy link
Author

caco3 commented Oct 31, 2023

Dear all

Thanks for your valuable comments and your support on this!

I did more tests:

Original:

foreach (getRecursiveDirectoryIterator("src") as $path) {
[..]

-> 522: src/1000 -> dst/1000 -> bad

Intermediate variable:

$iter = getRecursiveDirectoryIterator("src");
foreach ($iter as $path) {
[..]

-> 522: src/1000 -> dst/1000 -> bad

File list:

$fileList = iterator_to_array(getRecursiveDirectoryIterator("src"), true);
foreach ($fileList as $path => $fileInfo) {
[..]

-> 1000: src/1000 -> dst/1000 -> ok

This answers my question, an intermediate variable ($iter) does not contain the filelist but the iteration object:

print_r($iter);
/*RecursiveIteratorIterator Object
()*/

where as the filelist is a pure array:

print_r($fileList);
/*Array
(
    [src/1] => SplFileInfo Object
        (
            [pathName:SplFileInfo:private] => src/1
            [fileName:SplFileInfo:private] => 1
        )

    [src/2] => SplFileInfo Object
        (
            [pathName:SplFileInfo:private] => src/2
            [fileName:SplFileInfo:private] => 2
        )
[..]*/

So the usage of iterator_to_array() is a must.

@ausi
Copy link

ausi commented Nov 1, 2023

@caco3 can you test if the same issue also happens with readdir()?

$i = 0;
mkdir("dst");
$handle = opendir("src");
while (false !== ($entry = readdir($handle))) {
    $i++;
    $path = "src/$entry";
    $newPath = "dst/$entry";
    echo("$i: $path -> $newPath\n");
    rename($path, $newPath);
}
closedir($handle);
echo("EOF");

@caco3
Copy link
Author

caco3 commented Nov 1, 2023

Yes, same issue:

[..]
95: src/93 -> dst/93
96: src/94 -> dst/94
97: src/189 -> dst/189
98: src/190 -> dst/190
[..]
524: src/1000 -> dst/1000
EOF

@ausi
Copy link

ausi commented Nov 1, 2023

I found an answer to a similar question that suggests that this behavior undefined and depends on the operating system and file system: https://stackoverflow.com/a/39017355/1031606

So i’d guess that PHP is not at fault here, it just behaves the same way as the operating/file system it’s dealing with does.

There is also an article from Apple that explicitly states that this is not supported on their HFS file system: https://web.archive.org/web/20220122122948/https://support.apple.com/kb/TA21420?locale=en_US

@rettichschnidi
Copy link

rettichschnidi commented Nov 1, 2023

So i’d guess that PHP is not at fault here, it just behaves the same way as the operating/file system it’s dealing with does.

The "fault" of PHP is that it does not document the guarantees (not) given. At least I could not find any documentation on this topic.

@caco3
Copy link
Author

caco3 commented Nov 2, 2023

I agree that it just is a lack of documentation. IMO a critical one.
I hope the devs will pick it up and document it for the RecursiveDirectoryIterator() as well as readdir().

@damianwadley
Copy link
Member

I agree that it just is a lack of documentation. IMO a critical one.

If you're satisfied with this being a documentation problem, I can move it to the appropriate place:

@damianwadley damianwadley transferred this issue from php/php-src Nov 2, 2023
@caco3
Copy link
Author

caco3 commented Nov 2, 2023

Yes, please go ahead

@joshtrichards
Copy link

I am trying to run a simple PHP script on the FreeBSD 12.4 system of my web hoster (hostpoint.ch).

@caco3 IIRC you weren't able to reproduce this on FreeBSD stock, but only in the hoster's environment, correct?

NFS was the other known variable:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=57696

@joshtrichards
Copy link

@caco3 does the following behave exactly the same way for you?

function getRecursiveDirectoryIterator(?string $folder = null): \RecursiveIteratorIterator {
        return new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($folder, \RecursiveDirectoryIterator::SKIP_DOTS), \RecursiveIteratorIterator::CHILD_FIRST);
}

$i = 0;

mkdir("dst");

$iterator = getRecursiveDirectoryIterator("src");
$iterator->rewind();
while ($iterator->valid()) {
	$path = $iterator->current();
        $i++;
        $newPath = str_replace("src", "dst", $path);
        echo("$i: $path -> $newPath\n");
	rename($path, $newPath);             // <- makes the iterator to break
	$iterator->next();
}

echo("EOF\n");

@caco3
Copy link
Author

caco3 commented Apr 15, 2024

@caco3 IIRC you weren't able to reproduce this on FreeBSD stock, but only in the hoster's environment, correct?

Honestly I can't remember and don't have the local FreeBSD available anymore.

@caco3 does the following behave exactly the same way for you?

I tested it on my hosters system and yes, still same issue:

286: src/476 -> dst/476
EOF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Documentation contains incorrect information Extension: spl Status: Verified
Projects
None yet
Development

No branches or pull requests

6 participants