Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the on-disk Phar after mapPhar() reads incorrect files with the phar:// protocol #17125

Open
TimWolla opened this issue Dec 12, 2024 · 7 comments

Comments

@TimWolla
Copy link
Member

Description

The following code:

<?php

$a = new Phar(__DIR__ . '/a.phar');
$a->addFromString(
	'foo.php',
	<<<'EOT'
		<?php
		echo "this is foo.php\n";
		EOT
);
$a->addFromString(
	'bar.php',
	<<<'EOT'
		<?php
		echo "this is bar.php\n";
		EOT
);
$a->setStub(
	<<<'EOT'
		<?php
		Phar::mapPhar('a.phar');

		include 'phar://a.phar/foo.php';
		rename('b.phar', 'a.phar');
		include 'phar://a.phar/bar.php';
		__HALT_COMPILER(); ?>
		EOT
);

$a = new Phar(__DIR__ . '/b.phar');
$a->addFromString(
	'bar.php',
	<<<'EOT'
		<?php
		/* Shift the offsets compared to a.phar. */
		echo "this is bar.php\n";
		EOT
);
$a->addFromString(
	'foo.php',
	<<<'EOT'
		<?php
		echo "this is foo.php\n";
		EOT
);
$a->setStub(
	<<<'EOT'
		<?php
		Phar::mapPhar('b.phar');

		include 'phar://b.phar/foo.php';
		rename('a.phar', 'b.phar');
		include 'phar://b.phar/bar.php';
		__HALT_COMPILER(); ?>
		EOT
);

Running php a.phar resulted in this output:

this is foo.php
PHP Parse error:  Unterminated comment starting line 2 in phar:///*redacted*/a.phar/bar.php on line 2

But I expected this output instead:

this is foo.php
this is bar.php

The reason for this appears to be that the file offsets and lengths are cached during the mapPhar(), but each time the phar is opened with phar://X.phar it is reopened from disk without validating that it's still the same phar / instead of reopening it from a cached file descriptor.

openat(AT_FDCWD, "*redacted*/a.phar", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=343, ...}) = 0
lseek(4, 0, SEEK_CUR)                   = 0
lseek(4, 0, SEEK_SET)                   = 0
read(4, "<?php\nPhar::mapPhar('a.phar');\n\n"..., 8192) = 343
read(4, "", 8192)                       = 0
lseek(4, 144, SEEK_SET)                 = 144
read(4, " ?>\r\nX\0\0\0\2\0\0\0\21\0\0\0\1\0\0\0\0\0\0\0\0\0\7\0\0\0b"..., 8192) = 199
lseek(4, 149, SEEK_SET)                 = 149
read(4, "X\0\0\0\2\0\0\0\21\0\0\0\1\0\0\0\0\0\0\0\0\0\7\0\0\0bar.ph"..., 8192) = 194
lseek(4, -8, SEEK_END)                  = 335
read(4, "\3\0\0\0GBMB", 8192)           = 8
lseek(4, -40, SEEK_END)                 = 303
read(4, "\250\267o[\342\272\367\303\221>\361\307kf&\257\376\335\360;\356=`\244\270#\335\353\277\253\345$"..., 8192) = 40
lseek(4, 0, SEEK_SET)                   = 0
read(4, "<?php\nPhar::mapPhar('a.phar');\n\n"..., 8192) = 343
ioctl(3, TCGETS, 0x7ffc358e61c0)        = -1 ENOTTY (Inappropriate ioctl for device)
fstat(3, {st_mode=S_IFREG|0664, st_size=343, ...}) = 0
fstat(3, {st_mode=S_IFREG|0664, st_size=343, ...}) = 0
read(3, "<?php\nPhar::mapPhar('a.phar');\n\n"..., 4096) = 343
lseek(4, 272, SEEK_SET)                 = 272
read(4, "<?php\necho \"this is foo.php\\n\";\250"..., 8192) = 71
close(4)                                = 0
write(1, "this is foo.php\n", 16this is foo.php
)       = 16
rename("b.phar", "a.phar")              = 0
newfstatat(AT_FDCWD, "*redacted*/a.phar", {st_mode=S_IFREG|0664, st_size=387, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "*redacted*", {st_mode=S_IFDIR|0775, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "*redacted*", {st_mode=S_IFDIR|0775, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "*redacted*", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "*redacted*", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "*redacted*/a.phar", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=387, ...}) = 0
lseek(4, 0, SEEK_CUR)                   = 0
lseek(4, 241, SEEK_SET)                 = 241
read(4, "<?php\n/* Shift the offsets compa"..., 8192) = 146
close(4)                                = 0
write(2, "PHP Parse error:  Unterminated c"..., 122PHP Parse error:  Unterminated comment starting line 2 in phar://*redacted*/a.phar/bar.php on line 2
) = 122
close(3)                                = 0

This is an issue for self-updating phars where the autoloader needs to load a class after the update has been performed.

PHP Version

PHP 8.3.14

Operating System

Ubuntu 24.04

@nielsdos
Copy link
Member

Right, and if I understand correctly: to fix this we should only use the cached file offsets if the file is still the same. The alternative of unconditionally using the cached offsets together with the cached phar would not work because that will still not allow auto-updating phars to work.

@TimWolla
Copy link
Member Author

The alternative of unconditionally using the cached offsets together with the cached phar would not work because that will still not allow auto-updating phars to work.

Do you mean because of Windows not allowing to delete / rename a file that is still open?

Because my expectation would be that the state of the files within a Phar is internally consistent, independent of whether I load a file before or after the Phar is replaced on disk. It should not silently use a new version (an error would be fine, caching the actual contents would be preferred).

If I want to access the new contents, I would loadPhar() them.

@nielsdos
Copy link
Member

I'm getting confused about this ticket, but I think I'm starting to understand your perspective. Let's make sure we're on the same page.

I understand that there is a bug here in that there's a mismatch between the cached offsets and when the contents are being read, resulting in the contents being garbled.

I saw two possible fixes:

  1. "Detect" that the file has changed and throw away the cached offsets, rely only on the disk contents
  2. Only rely on cached data, regardless of the change on disk

I thought the proper fix would be option 1, but you seem to imply the proper fix is option 2.
I suppose that option 1 is not viable for self-updating phars because the updater still needs to run the old code?

I would have to recheck how this works again internally, but I'm not so sure that the phar file contents are cached (e.g. if a particular file in the phar was never read before).

@TimWolla
Copy link
Member Author

TimWolla commented Dec 16, 2024

I thought the proper fix would be option 1, but you seem to imply the proper fix is option 2.

Yes.

I suppose that option 1 is not viable for self-updating phars because the updater still needs to run the old code?

Yes. At least that would be a reasonable expectation from the user's perspective: They executed Phar::mapPhar('my-app.phar'); which makes the current Phar's contents available as my-app.phar from within the current request. Perhaps the example Phars in my issue were misleadingly constructed by using different names in mapPhar().

Consider the following case for the updater logic. The stub contains:

<?php
spl_autoload_register(static function ($class) {
    include 'phar://my-app.phar/lib/' . str_replace('\\', '/', $class) . '.php';
});

Phar::mapPhar('my-app.phar');
include 'phar://my-app.phar/startup.php';
__HALT_COMPILER();
?>

And then the updater logic does something like this:

$eventDispatcher->dispatch(new UpdateIncoming($newVersion));
$tmp = tempnam();
copy("http://example.com/my-app_{$newVersion}.phar", $tmp);
if (rename($tmp, $_SERVER['argv'][0])) {
  $eventDispatcher->dispatch(new UpdateSuccessful($newVersion)); // triggers the autoloader for `UpdateSuccessful`
} else {
  $eventDispatcher->dispatch(new UpdateFailed($newVersion)); // triggers the autoloader for `UpdateFailed`
}

And the new version of the app changes the UpdateSuccessful::__construct() method to public function __construct($newVersion, $oldVersion);. The updater of the old version certainly can't anticipate this new parameter and it should not need to, because it's including files from phar://my-app.phar which is referring to the currently executed Phar itself.

I would have to recheck how this works again internally, but I'm not so sure that the phar file contents are cached (e.g. if a particular file in the phar was never read before).

They are not. What I was suggesting as a possible solution, without looking at it in detail, was doing something like this:

static function mapPhar($mapAs) {
  $file = open($_SERVER['argv'][0], 'r');
  $offsets = calculateOffsets($file);
  self::$mappingTable[$mapAs] = [
    'offsets' => $offsets,
    'file' => $file,
  ];
}

And then reusing the cached $file handle whenever phar://{$mapAs}/… is used. Care would need to be taken to restore the seek offset to make this transparent to the user.

@TimWolla
Copy link
Member Author

TimWolla commented Dec 16, 2024

I just realize there is a related issue when deleting the Phar:

<?php

$a = new Phar(__DIR__ . '/output.phar');
$a->addFromString(
	'foo.php',
	<<<'EOT'
		<?php
		echo "this is foo.php\n";
		EOT
);
$a->addFromString(
	'bar.php',
	<<<'EOT'
		<?php
		echo "this is bar.php\n";
		EOT
);
$a->setStub(
	<<<'EOT'
		<?php
		Phar::mapPhar('my-app.phar');

		include 'phar://my-app.phar/foo.php';
		unlink(realpath($_SERVER['argv'][0]));
		include 'phar://my-app.phar/bar.php';
		__HALT_COMPILER(); ?>
		EOT
);

results in:

this is foo.php
PHP Warning:  include(phar://my-app.phar/bar.php): Failed to open stream: phar error: Cannot open phar archive "php-src/output.phar" for reading in php-src/output.phar on line 6
PHP Warning:  include(): Failed opening 'phar://my-app.phar/bar.php' for inclusion (include_path='.:/usr/share/php') in php-src/output.phar on line 6

I would expect opening bar.php to work, because phar://my-app.phar does not refer to an on-disk file, but to a “virtual” file in whatever Phar is currently being executed.

@nielsdos
Copy link
Member

nielsdos commented Dec 21, 2024

Thanks for the detailed explanation.

I've analysed this and I know why it doesn't work.
If you remove the include for "phar://a.phar/foo.php" it will suddenly start to work.
This happens because when the include opcode finishes execution it decrements the phar refcount, which will destroy the open file stream here:

php-src/ext/phar/phar.c

Lines 263 to 272 in 26244c7

if (phar->fp && (!(phar->flags & PHAR_FILE_COMPRESSION_MASK) || !phar->alias)) {
/* close open file handle - allows removal or rename of
the file on windows, which has greedy locking
only close if the archive was not already compressed. If it
was compressed, then the fp does not refer to the original file.
We're also closing compressed files to save resources,
but only if the archive isn't aliased. */
php_stream_close(phar->fp);
phar->fp = NULL;
}

This means that the next include, after the rename, will reopen the phar from disk.
I find it hard to think about the consequences of this given the comment. Given the comment it's clear that we can't just remove that code. Perhaps it's sufficient to set a flag on phar archives opened via webPhar or mapPhar and keep their streams open.

EDIT: actually, there is a check for alias which is set by mapPhar, which was introduced by 995ecff, but something seems off here... I'm thinking about it

@nielsdos
Copy link
Member

nielsdos commented Dec 21, 2024

If we solely rely on whether the phar is alias, then the condition should be changed to phar->fp && !phar->alias. This fixes the code in the opening post but introduces two BC breaks:

Phar: bug #13786: "PHP crashes on phar recreate after unlink" [ext/phar/tests/bug13786.phpt]
Bug #77432 (Segmentation fault on including phar file) [ext/phar/tests/bug77432.phpt]

So the flag approach is probably the more correct one. This would look something like the following and would fix the opening post code too while not introducing test regressions:

diff --git a/ext/phar/phar.c b/ext/phar/phar.c
index a4a9b7d2139..2ae06609c4b 100644
--- a/ext/phar/phar.c
+++ b/ext/phar/phar.c
@@ -260,7 +260,7 @@ bool phar_archive_delref(phar_archive_data *phar) /* {{{ */
 		PHAR_G(last_phar) = NULL;
 		PHAR_G(last_phar_name) = PHAR_G(last_alias) = NULL;
 
-		if (phar->fp && (!(phar->flags & PHAR_FILE_COMPRESSION_MASK) || !phar->alias)) {
+		if (phar->fp && (!(phar->flags & (PHAR_FILE_MAPPED | PHAR_FILE_COMPRESSION_MASK)) || !phar->alias)) {
 			/* close open file handle - allows removal or rename of
 			the file on windows, which has greedy locking
 			only close if the archive was not already compressed.  If it
@@ -2329,7 +2329,9 @@ zend_result phar_open_executed_filename(char *alias, size_t alias_len, char **er
 		return FAILURE;
 	}
 
-	if (phar_open_parsed_phar(ZSTR_VAL(fname), ZSTR_LEN(fname), alias, alias_len, 0, REPORT_ERRORS, NULL, 0) == SUCCESS) {
+	phar_archive_data *phar;
+	if (phar_open_parsed_phar(ZSTR_VAL(fname), ZSTR_LEN(fname), alias, alias_len, 0, REPORT_ERRORS, &phar, 0) == SUCCESS) {
+		phar->flags |= PHAR_FILE_MAPPED;
 		return SUCCESS;
 	}
 
@@ -2362,7 +2364,11 @@ zend_result phar_open_executed_filename(char *alias, size_t alias_len, char **er
 		fname = actual;
 	}
 
-	zend_result ret = phar_open_from_fp(fp, ZSTR_VAL(fname), ZSTR_LEN(fname), alias, alias_len, REPORT_ERRORS, NULL, error);
+	zend_result ret = phar_open_from_fp(fp, ZSTR_VAL(fname), ZSTR_LEN(fname), alias, alias_len, REPORT_ERRORS, &phar, error);
+
+	if (ret == SUCCESS) {
+		phar->flags |= PHAR_FILE_MAPPED;
+	}
 
 	if (actual) {
 		zend_string_release_ex(actual, 0);
diff --git a/ext/phar/phar_internal.h b/ext/phar/phar_internal.h
index 2f9ae7c1c84..a32b2a6400f 100644
--- a/ext/phar/phar_internal.h
+++ b/ext/phar/phar_internal.h
@@ -51,6 +51,7 @@
 #define PHAR_FILE_COMPRESSED_NONE  0x00000000
 #define PHAR_FILE_COMPRESSED_GZ    0x00100000
 #define PHAR_FILE_COMPRESSED_BZ2   0x00200000
+#define PHAR_FILE_MAPPED           0x01000000
 
 #define PHAR_SIG_MD5              0x0001
 #define PHAR_SIG_SHA1             0x0002

Please make a sanity check here if you can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants