Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Liberation] Add WXR import CLI script #2012

Merged
merged 9 commits into from
Nov 23, 2024

Conversation

zaerl
Copy link
Collaborator

@zaerl zaerl commented Nov 21, 2024

Add Data Liberation import script. The script lets you import a folder with WXRs inside WordPress. Add the possibility to run PHPUnit inside Playground.

cd packages/playground/data-liberation/bin/import
bash import-wxr.sh /a-folder/with-the/wxr-files-to-import-inside
cd packages/playground/data-liberation
nx run test:wp-phpunit

The import CLI is also registered as a WP-CLI command in the init action if WP-CLI is included. So it can also be run as wp data-liberation your-wrx-file-you-want-to-import.xml.

Motivation for the change, related issues

There's no good entry point to running that import right now; we use an ad-hoc code snippet inside the Data Liberation WordPress plugin. This new CLI command will make testing the import easy.

There must be also be the possibility of running the PHPUnit test in the context of WordPress.

  1. New CLI script
  2. Added PHPUnit run inside Playground
  3. Fix: missing require_once
  4. Fix: wrong method name
  5. Fix: endless loop

Implementation details

This script consists of six major parts.

The bin/import/import-wxr.sh bash script

This script accepts a folder path. You can create one and put all the WXR you want to import inside it. It starts the cli.ts server, mounts the folder specified in /wordpress/wp-content/uploads/import-wxr.

The bin/import/blueprint-import-wxr.json blueprint

The bluescript enables the Data Liberation plugin. Enumerate all the files with .xml extension inside the mounted folder and import them all using a new function created.

The PHP snippet run in the runPHP step uses the wp_visit_file_tree provided by the plugin:

<?PHP

require_once 'wordpress/wp-load.php';

$upload_dir = wp_upload_dir();

foreach ( wp_visit_file_tree( $upload_dir['basedir'] . '/import-wxr' ) as $event ) {
  foreach ( $event->files as $file ) {
    if ( $file->isFile() && pathinfo( $file->getPathname(), PATHINFO_EXTENSION ) === 'xml' ) {
      data_liberation_import( $file->getPathname() ); // Import the WXR.
    }
  }
};

A new data_liberation_import function

The new simple import function in the plugin runs WP_Stream_Importer and not much more.

The new tests/import/run.sh script

This script runs PHPUnit inside Playground. It generates an error if PHPUnit generates an error.

The new tests/import/blueprint-import.json blueprint

This blueprint runs all PHPUnit tests found in tests inside Playground. It returns success if everything goes well. It returns an error if one or more tests fail.

$base = '/wordpress/wp-content/plugins/data-liberation/';
require $base . 'vendor/autoload.php';

try {
    $arguments = [
        '--stderr',
        '--configuration', $base . 'phpunit.xml'
    ];

    $res = (new PHPUnit\TextUI\Application())->run($arguments);

    if ( $res !== 0 ) {
        trigger_error('PHPUnit failed', E_USER_ERROR);
    }
} catch (Throwable $e) {
    trigger_error('PHPUnit failed: ' . $e->getMessage(), E_USER_ERROR);
}

New unit test

The new WPStreamImporterTests class runs the first test using WP_Stream_Importer::create_for_wxr_file. It is only runnable inside WordPress, so there is a check in setUp() if it's the right environment. Otherwise, it is not run.

Testing Instructions (or ideally a Blueprint)

Import script

Example with one of the preexisting XML files:

cd packages/playground/data-liberation/bin/import
mkdir tmp
cp ../../tests/wxr/small-export.xml tmp
bash import-wxr.sh ./tmp

Then check http://127.0.0.1:9400/wp-admin/edit.php. All the WXR posts should be there.

PHPUnit run inside Playground

Run test on local:

cd packages/playground/data-liberation
nx run test:phpunit

1188 tests should succeed.

Run PHPUnit on Playground:

cd packages/playground/data-liberation
nx run test:wp-phpunit

All tests should succeed and output "Successfully ran target test:wp-phpunit for project playground-data-liberation".

@zaerl zaerl requested a review from adamziel November 21, 2024 10:54
@zaerl zaerl self-assigned this Nov 21, 2024
@zaerl zaerl force-pushed the add/data-liberation-import-script branch from 5e80d0f to 9124b25 Compare November 22, 2024 08:43
@zaerl zaerl force-pushed the add/data-liberation-import-script branch from 9124b25 to fab4f2f Compare November 22, 2024 12:25
@zaerl zaerl changed the title Add Data Liberation import script [Data Liberation] Add import script Nov 22, 2024
@adamziel
Copy link
Collaborator

adamziel commented Nov 22, 2024

This is a great start Francesco, thank you! What would it take to expand this to run the actual PHPunit test in context of WordPress — similarly to what WordPress core tests do? I'm not saying this would actually be useful at this early stage, but I'm just curious maybe it wouldn't be that heavy of a lift? There's some prior art in this repo if you search for PHPUnit

@zaerl
Copy link
Collaborator Author

zaerl commented Nov 22, 2024

This is a great start Francesco, thank you! What would it take to expand this to run the actual PHPunit test in context of WordPress — similarly to what WordPress core tests do? I'm not saying this would actually be useful at this early stage, but I'm just curious maybe it wouldn't be that heavy of a lift? There's some prior art in this repo if you search for PHPUnit

I have added the possibility to run PHPUnit in Playground and updated the PR description with details. When you have time, please let me know what you think. Thanks.

return false;
}

$is_wp_cli = defined( 'WP_CLI' ) && WP_CLI;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, a dedicated WP_CLI command might make sense. It would only be a thin wrapper. The website and the unit tests would use the same underlying import library with their own dedicated logging facilities.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I made this way so that a user that want to use only the plugin do not need to have WP-CLI all the times.

},
{
"step": "runPHP",
"code": "<?php require_once 'wordpress/wp-load.php'; $base = '/wordpress/wp-content/plugins/data-liberation/';\nrequire $base . 'vendor/autoload.php';\ntry {\n$arguments = [\n'--stderr',\n'--configuration', $base . 'phpunit.xml'\n];\n$res = (new PHPUnit\\TextUI\\Application())->run($arguments);\nif ( $res !== 0 ) {\ntrigger_error('PHPUnit failed', E_USER_ERROR);\n}\n} catch (Throwable $e) {\ntrigger_error('PHPUnit failed: ' . $e->getMessage(), E_USER_ERROR);\n};"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea! This will suffice for starters, but here's something if you'd like to the next level. What would it take to go from this to something more like a typical CLI command, e.g. cli --blueprint=... --mount=... run vendor/bin/phpunit --configuration phpunit.xml --stderr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Didn't touched the CLI in this first phase.

@adamziel adamziel changed the title [Data Liberation] Add import script [Data Liberation] Add WXR import CLI script Nov 23, 2024
@adamziel
Copy link
Collaborator

Aside of my two notes, this looks good. Thank you Francesco! A useful next step would be adding a few actual assertions assertions and running that test in the CI

@adamziel adamziel merged commit 4438d72 into trunk Nov 23, 2024
10 checks passed
@adamziel adamziel deleted the add/data-liberation-import-script branch November 23, 2024 17:18
@zaerl
Copy link
Collaborator Author

zaerl commented Nov 23, 2024

I have added an issue to track the new step for the CLI. Fell free to change it if you want.

adamziel added a commit that referenced this pull request Dec 11, 2024
…2058)

## Description

Adds the Data Liberation WXR importer as an option in the `importWxr`
step. The new importer is turned by including the `"importer":
"data-liberation"` option:

```json
{
  "steps": [
    {
      "step": "importWxr",
      "file": {
        "resource": "url",
        "url": "https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml"
      },
      "importer": "data-liberation"
    }
  ]
}
```

When the `importer` option is missing or set to "default," nothing
changes in the behavior of the step and it continues using the
https://github.com/humanmade/WordPress-Importer importer.

The new importer:

* Rewrites links in the imported content
* Downloads assets through Playground's CORS proxy
* Parallelizes the downloads
* Communicates progress

This PR is a part of
#1894

## Implementation details

This `importWxr` step fetches and includes the
`data-liberation-core.phar` file. The phar file is built with
[Box](https://box-project.github.io/box/configuration/) and contains the
importer library with its dependencies, which is a subset of the Data
Liberation library, a subset of the Blueprints library, and a few vendor
libraries.

This, unfortunately, means that any changes in the PHP files require
rebuilding the .phar file. Here's how you can do it:

```bash
nx build:phar playground-data-liberation
```

You can also build the entire Data Liberation package as a WordPress
plugin complete with a wp-admin page:

```bash
nx build:plugin playground-data-liberation
```

Both commands will output the built files to
`packages/playground/data-liberation/dist`

The progress updates are a first-class feature of the new importer. The
updated `importer` step receives them in real-time via a
`post_message_to_js()` call running after every import step. Then, it
passes them on to the progress bar UI.

### Other changes

* **TLS traffic now goes through the CORS proxy.** Since the new
importer uses `AsyncHTTP\Client` which deals with raw sockets,
Playground's [TLS-based network
bridge](#1926)
runs the outbound traffic through a cors proxy. Technically,
`TCPOverFetchWebsocket` gets the `corsProxy` URL passed to the
`playground.boot()` call.
* A few composer dependencies were forked, downgraded to PHP 7.2 using
Rector, and bundled with this PR to keep the Data Liberation importer
working.

## Remaining work

- [x] PHP 7.2 compatibility. Done by forking and Rector-downgrading
dependencies that were incompatible with PHP 7.2.
- [x] Report the importer's progress on the overall Blueprint progress
bar
- [x] Enqueue the data liberation plugin files for downloading at the
blueprint compilation stage
- [x] Don't eagerly rewrite attachments URLs in `WP_Stream_Importer`.
Exposing this information to the API consumer requires an explicit
decision. Do we rewrite it? Or do we ignore it?
- [x] Fix the TLS errors at the intersection of Playground network
transport and the async HTTP client library
- [x] Separate the markdown importer and its dependencies (md parser,
frontmatter parser, Symfony libraries) from the core plugin
- [x] Ship the importer and its tree-shaken deps (URL parser) as a
minified zip/phar

## Follow-up work

- [ ] Reconsider the `WP_Import_Session` API – do we need so many
verbosely named methods? Can we achieve the same outcomes with fewer
methods?
- [ ] Investigate why there's a significant delay before media downloads
start on PHP 7.2 – 7.4. It's likely a PHP.wasm issue.

## Testing instructions

* Default importer – [Open this
link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20})
and confirm it does what the current `importWxr` step do, that is it
stays at "Importing content" for a moment, fails to fetch media files
(CORS issues in network tools), but inserts posts and pages.
* Data Liberation – [Open this
link](http://localhost:5400/website-server/#{%20%22plugins%22:%20[],%20%22steps%22:%20[%20{%20%22step%22:%20%22importWxr%22,%20%22importer%22:%20%22data-liberation%22,%20%22file%22:%20{%20%22resource%22:%20%22url%22,%20%22url%22:%20%22https://raw.githubusercontent.com/wpaccessibility/a11y-theme-unit-test/master/a11y-theme-unit-test-data.xml%22%20}%20}%20],%20%22preferredVersions%22:%20{%20%22php%22:%20%228.3%22,%20%22wp%22:%20%226.7%22%20},%20%22features%22:%20{%20%22networking%22:%20true%20},%20%22login%22:%20true%20}),
confirm the import progress is visible and that the content and media
indeed get imported:

![CleanShot 2024-12-08 at 14 54
49@2x](https://github.com/user-attachments/assets/a7da3244-a10f-43d2-8e94-43d305220a7e)

## Related issues

* #1211 
* #2012 
* #1477 
* #1250 
* #1780
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants