Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cross-device mv by switching to copy #846

Closed
wants to merge 33 commits into from
Closed

Conversation

seanmorris
Copy link
Contributor

@seanmorris seanmorris commented Dec 6, 2023

Based on exploration started by @sejas WordPress/playground-tools#116

What is this PR doing?

This PR modifies the installAsset procedure to use a copy operation rather than a rename.

What problem is it solving?

Emscripten does not support rename operation across different Filesystems. We need to be able to copy files across such a boundary.

How is the problem addressed?

The BasePHP.cp method has been added to copy files and directories recursively. The installAsset procedure has been modified to use a copy operation rather than a rename.

Testing Instructions

In this project:

These are supporting changes for other libraries, so we'll need to build the project and use npm link to use the local packages with other projects, like playground-tools.

You can start by navigating to the root of the project in a terminal window and running the following:

git checkout sm-cross-device-mv
nvm use 20
npm install
npm run build

Testing @php-wasm/web

Start the dev server:

npm run dev

Navigate to http://localhost:5400/website-server/#{%20%22$schema%22:%20%22https://playground.wordpress.net/blueprint-schema.json%22,%20%22landingPage%22:%20%22/wp-admin/plugins.php%22,%20%22preferredVersions%22:%20{%20%22php%22:%20%228.0%22,%20%22wp%22:%20%22latest%22%20},%20%22steps%22:%20[%20{%20%22step%22:%20%22login%22,%20%22username%22:%20%22admin%22,%20%22password%22:%20%22password%22%20},%20{%20%22step%22:%20%22installPlugin%22,%20%22pluginZipFile%22:%20{%20%22resource%22:%20%22wordpress.org/plugins%22,%20%22slug%22:%20%22classic-editor%22%20},%20%22options%22:%20{%20%22activate%22:%20true%20}%20}%20]%20}

Ensure the classic-editor plugin is installed and active.

Testing @php-wasm/node

Link @wp-playground/blueprints

cd dist/packages/playground/blueprints/
npm link

Link @php-wasm/node

cd dist/packages/php-wasm/node/ 
npm link

In playground-tools

These are supporting changes for other libraries, so we'll need to use playground-tools to test it.

Create b.json

{
	"$schema": "https://playground.wordpress.net/blueprint-schema.json",
	"landingPage": "/wp-admin/plugins.php",
	"preferredVersions": {
		"php": "8.0",
		"wp": "latest"
	},
	"steps": [
		{
			"step": "login",
			"username": "admin",
			"password": "password"
		},
		{
			"step": "installPlugin",
			"pluginZipFile": {
				"resource": "wordpress.org/plugins",
				"slug": "notification"
			},
			"options": {
				"activate": true
			}
		}
	]
}

Then run (again, in playground-tools:

git checkout sm-preventing-memory-leaks
nvm use 20
npm install
npm link @wp-playground/blueprints @php-wasm/node
node dist/packages/wp-now/cli.js start --blueprint=b.json

Navigate to http://localhost:8881/wp-admin/plugins.php

Ensure the notification plugin is installed and active.

@seanmorris seanmorris changed the title [WIP] Cross Device Mv & Polyfills for Custom Event & File [WIP] Cross-Device MV & Polyfills for Custom Event & File Dec 6, 2023
@seanmorris seanmorris changed the title [WIP] Cross-Device MV & Polyfills for Custom Event & File Cross-Device MV & Polyfills for Custom Event & File Dec 6, 2023
@seanmorris seanmorris self-assigned this Dec 6, 2023
@adamziel
Copy link
Collaborator

adamziel commented Dec 7, 2023

The implicit file removal makes me feel icky, even if that's exactly what unix does.

Imagine someone "moving" a directory from an actual disk directory that's mounted inside EmscriptenFS into a Playground directory inside MEMFS. This wouldn't actually move the files – it would delete them from the disk without leaving any copy.

In a way, a lack of support for cross-filesystem mv could actually be a feature here. It protects the files on the disk from being accidentally wiped out.

Another concern I have is I'm not sure how to confirm this works across combinations of windows/linux/mac and different block devices/symlinks/hardlinks/tmpfs etc. I'm pretty confident about primitive operations like cp and rm, but once we start reasoning about more advanced cases such as device type detection then there's a lot of ground to cover. A bad outcome would be not covering all of that and then seeing a large number bug reports a few months later.

Intuitively, this feels like something that should be implemented by Emscripten. And maybe it will be in the upcoming WASMFS and we'll get it for free? Some research on that would surely be helpful. On the Playground side, it feels much easier to just give up and document lack of support for "heterogenous" mv as a known limitation.

I'd like to explore using the Compression Streams API to unzip files as we download them (vs downloading and then unzipping). If that works, we could just write the zip contents directly into wp-content/plugins/<plugin name> without ever going through the tmp directory, meaning no cross-filesystem writing would be required.

Also, a recursive copy(src, dst) function would be useful and would give us all the benefits without any of the downsides. The developers could then lean on an explicit copy and remove calls.

There's a way to unblock WordPress/playground-tools#116 with a way smaller impact surface. Adding a private copy function in blueprints/common.ts would allow the Blueprint steps such as installPlugin to explicitly copy and remove data. Then, later on, the larger Compression Streams refactoring could remove that code without ever exposing a new public method or adding more low-level mv logic.

Oh, and please let's move the polyfills to another PR so 1) they're not blocked by this dicussion and 2) we can reason about them separately.

cc @dmsnell @eliot-akira @sejas – I wonder what do y'all think here

@eliot-akira
Copy link
Collaborator

eliot-akira commented Dec 8, 2023

even if that's exactly what unix does

For curiosity's sake, I looked into how mv works.

To move a file, mv ordinarily simply renames it. However, if renaming does not work because the destination’s file system differs, mv falls back on copying as if by cp -a, then (assuming the copy succeeded) it removes the original.

(From https://www.gnu.org/software/coreutils/manual/html_node/mv-invocation.html)

In the source code for mv, it calls cp with the option move_mode. So the moving logic is mainly in copy.c. Skimming through it, there are a number of situations it handles when moving a file or directory. For example:

  • If source and destination files are the same
  • If either is . or ..
  • If either is a symlink or hardlink
  • If source is a symlink and destination is its referent
  • If source is a directory and destination is a file (not allowed), or vice versa (allowed)
  • "Create a backup of each destination directory in move mode" (And if source or destination is inside /tmp, making sure that the backup doesn't clobber either)
  • Something about preventing writing through a "dangling symlink"
  • "Ensure we don't try to copy through a symlink that was created by a prior call to this function" (I'm guessing it's about when source directory contains a symlink that includes destination)

I don't know how many of these are relevant to our situation, but I can imagine some edge cases that could accidentally result in data loss during recursive copy and remove. So I agree on this point raised:

Another concern I have is I'm not sure how to confirm this works across combinations of windows/linux/mac and different block devices/symlinks/hardlinks/tmpfs etc.

This might be why Emscripten only provides rename and not mv (which we aliased to FS.rename).

On the other hand, if Emscripten doesn't provide mv, and if we provide only partial support on same device/filesystem, then users would be responsible for implementing their own moving logic - which might be even more error prone.

@eliot-akira
Copy link
Collaborator

eliot-akira commented Dec 8, 2023

About the polyfills for CustomEvent and File, since they're about compatibility with Node.js 18, it might be useful to put them in a common place that's convenient for importing from other packages (so they don't have to reach into each other to import the polyfill).

Currently they're in:

  • File - packages/playground/blueprints/src/lib/steps/common.ts
  • CustomEvent - packages/php-wasm/progress/src/lib/custom-event.ts (this PR)

Maybe they can even be exported in a way that external packages like wp-now can use them if needed.

@seanmorris seanmorris changed the title Cross-Device MV & Polyfills for Custom Event & File Cross-Device MV Dec 12, 2023
@seanmorris seanmorris changed the title Cross-Device MV [DRAFT] Cross-Device MV Dec 12, 2023
@seanmorris seanmorris changed the title [DRAFT] Cross-Device MV Cross-Device MV Dec 12, 2023
@seanmorris
Copy link
Contributor Author

seanmorris commented Dec 12, 2023

I've updated this PR to use a copy operation rather than a move.

I've separated the polyfills into this draft PR: #865

@seanmorris seanmorris changed the title Cross-Device MV Fix cross-device mv by switching to Copy Dec 12, 2023
@seanmorris seanmorris changed the title Fix cross-device mv by switching to Copy Fix cross-device mv by switching to copy Dec 12, 2023

const fromStat = FS.stat(fromPath);

const fromMode = parseInt(fromStat.mode.toString(8).substring(0, 1));
Copy link
Collaborator

@adamziel adamziel Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this more human-readable – here are the thoughts I had when I was reading this:

  • What's mode?
  • What's 8?
  • Why does it need to be substring-ed?

Copy link
Contributor Author

@seanmorris seanmorris Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel That's just a way to tell if a link represents a directory or a file. Is this more readable?

const fromIsDir = fromStat.mode & 0o40000;

Copy link
Collaborator

@adamziel adamziel Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update! I'm still a bit confused about what's a mode, what's that flag etc:

		// Emscripten directories will have the sticky-bit set
		// https://linux.die.net/man/1/chmod
		const fromIsDir = fromStat.mode & 0o40000;

Let's be kind to readers of this code and explain the concepts and reasoning in place instead of just pointing them to the linux manual.

Copy link
Contributor Author

@seanmorris seanmorris Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The FILEMODE tells us things about the file, like
// permissions, whether its a file, link, or directory
// as well as its access and exec permissions.
//
// The 15th bit of the FILEMODE is the sticky-bit.
// Emscripten directories will have the sticky-bit set.
//
// We'll use a binary literal (0b...) with a logical
// and (&) to check for that.
const stickyBit = 0b100000000000000;
const sourceIsDir = sourceStat.mode & stickyBit;

@@ -616,6 +616,39 @@ export abstract class BasePHP implements IsomorphicLocalPHP {
this[__private__dont__use].FS.rename(fromPath, toPath);
}

/** @inheritDoc */
@rethrowFileSystemError('Could not copy "{path}"')
cp(fromPath: string, toPath: string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an options arg with a single recursive: bool option to ensure the developers express an explicit intention to copy a directory

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you added a boolean arg. Let's use an options arg instead, like I mentioned in that previous comment. It would have just a single option, like recursive: bool. So you would call it like cp(fromPath, toPath, { recursive: true }). This makes the calls clear, as you don't have to think about what that true value means, and it also makes the API easy to extend in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel Ah, I misunderstood. Done.

cp(fromPath: string, toPath: string, options: CpOptions) {

export interface CpOptions {
/**
* If true, recursively copies the directory and all its contents.
* Default: false.
*/
recursive?: boolean;
}

Copy link
Collaborator

@adamziel adamziel Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Note that cp() signature makes the options arg mandatory – let's make it optional like it is in rmdir, otherwise you'll always have to call it with an empty object like cp(fromPath, toPath, {})

// https://linux.die.net/man/1/chmod
const fromIsDir = fromStat.mode & 0o40000;

let toExists: boolean;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be more readable, what do you think?

Suggested change
let toExists: boolean;
let destinationExists: boolean;

}

if (!fromIsDir) {
const file = FS.readFile(fromPath, { encoding: 'binary' });
Copy link
Collaborator

@adamziel adamziel Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't { encoding: 'binary' } the default? Also, let's use functions available on this whenever possible, e.g. there's this.readFileAsBuffer()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const file = this.readFileAsBuffer(sourcePath);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! What about other places where this functions could be applicable over custom-built logic?


if (!fromIsDir) {
const file = FS.readFile(fromPath, { encoding: 'binary' });
FS.writeFile(toPath, file);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if toPath exists and is a directory?

Copy link
Contributor Author

@seanmorris seanmorris Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam In that case, it('cp() should copy a file into a directory', () => {...

if (destinationIsDir) {
FS.writeFile(
joinPaths(destinationPath, basename(sourcePath)),
file
);
} else {
FS.writeFile(destinationPath, file);
}

it('cp() should copy a file into a directory', () => {
const testDirPath1 = testDirPath;
const testDirPath2 = '/__test987654322';
php.mkdir(testDirPath1);
php.mkdir(testDirPath2);
php.writeFile(testDirPath1 + '/1.txt', '1');
php.cp(testDirPath1 + '/1.txt', testDirPath2);
expect(php.fileExists(testDirPath2 + '/1.txt')).toEqual(true);
expect(php.readFileAsText(testDirPath2 + '/1.txt')).toEqual('1');
});

Copy link
Collaborator

@adamziel adamziel Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test only covers a case when oldPath is a file, right? There's also a scenario when it's a directory and that also needs a test.

Comment on lines 260 to 261
* Copies a file or directory in the PHP filesystem to a
* new location.
Copy link
Collaborator

@adamziel adamziel Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's thoroughly explain what this function does in different situations, like when the target path already exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* Copies a file or directory in the PHP filesystem to a
* new location. The target directory's parent must be a
* valid location in the filesystem prior to copying.
*
* If the target path is a file that already exists,
* it will be overwritten.
*
* If the target path is a directory that already exists,
* the file or directory will be copied into it.
*
* If the target path's parent directory does not exist,
* an error will be thrown.
*
* @param oldPath The file or directory to be copied.
* @param newPath The new, full path to copy the file or directory to.
*/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! As I'm read it, I have additional questions:

The target directory's parent must be a valid location

A valid location means only that it must exist, right? If yes, let's say it directly.

If the target path is a file that already exists, it will be overwritten.

Is that the case only when the source path is a file? Or also when it's a directory?

If the target path is a directory that already exists, the file or directory will be copied into it.

So if the target path is a directory that does not exist yet, will it be created?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder about the semantics here. Right now this function makes an implicit assumption that says:

If the newPath is a directory that already exists, the developer intended to copy the oldPath inside that directory.

I don't think it always holds. Maybe the newPath shouldn't actually be there? I'd rather remove the implicit assumption and make the logic explicit: If newPath is a directory that already exists, that's an error, please remove it explicitly before the recursive copy.

@seanmorris seanmorris requested a review from a team as a code owner December 16, 2023 03:26
@seanmorris
Copy link
Contributor Author

seanmorris commented Dec 16, 2023

The following error was a red herring. The failure was happening directly before it, but the crash doesn't seem to happen once the actual bug has been fixed.

We detected that the Chromium Renderer process just crashed.

This is the equivalent to seeing the 'sad face' when Chrome dies.

This can happen for a number of different reasons:

I wasn't able to run the e2e tests locally until I read a bunch of docs and added the following lines:

"affected": {
"defaultBase": "trunk"
},

Once I added that I was able to run nx affected --target=e2e --configuration=ci --verbose and I determined the error was coming from nx run playground-website:e2e.

Once I zeroed in on that I was able to get the e2e tests passing again.

@seanmorris seanmorris requested a review from adamziel December 16, 2023 22:55
destinationExists = false;
}

if (!sourceIsDir) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if I copy a directory tree like wordpress/wp-content/plugins onto another directory, say /www that already has wordpress/wp-content/plugins in it? It seems like it would duplicate some directories?

@adamziel
Copy link
Collaborator

adamziel commented Dec 18, 2023

I wasn't able to run the e2e tests locally until I read a bunch of docs and added the following lines:

Aha! What was the error? Do you mean the Chrome Crash, or was there another issue? If that was the chrome crash, I wonder why that config update would help 🤔 If that fix be useful for others, you could start an issue to share it with others by updating the docs or the repo setup.

adamziel added a commit that referenced this pull request Dec 20, 2023
…root (#886)

Solves the [error discovered by
@sejas](WordPress/playground-tools#116 (comment))
caused by calling `php.mv(fromPath, toPath)` when `fromPath` is a
mounted local directory and `toPath` is a MEMFS directory. Emscripten
doesn't support such a scenario:

```
Proceeding without the Notification plugin. Could not install it in wp-admin. The original error was: Error: Could not move "/tmp/assets/Notification/notification": Cross-device link.
Error: Could not move "/tmp/assets/Notification/notification": Cross-device link.
    at descriptor.value (/playground-tools/node_modules/@php-wasm/node/index.cjs:67481:17)
```

The solution proposed in this PR replaces a `/tmp` directory with a
randomly-named temporary directory inside `wp-content`. `/tmp` doesn't
necessarily point to a system temp directory and needs to be revisited
anyway. Whether we should use a temporary directory inside `wp-content`
is another matter, but that part may be revisited once the [recursive
cp](#846) feature
is added.
@adamziel adamziel closed this Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants