Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBX-4906: Migrate richtext namespaces #67

Closed
wants to merge 23 commits into from
Closed

Conversation

vidarl
Copy link
Contributor

@vidarl vidarl commented Jan 30, 2023

Question Answer
JIRA issue IBX-4906
Bug/Improvement yes
New feature
Target version 4.3
BC breaks yes
Tests pass yes/no
Doc needed yes/no

Fixes #63.

TODO:

  • Implement feature / fix a bug.
  • Implement tests.
  • Fix new code according to Coding Standards ($ composer fix-cs).
  • Ask for Code Review.

@vidarl vidarl changed the title Ibx 4906 migrate namespaces IBX-4906 migrate richtext namespaces Jan 30, 2023
@vidarl vidarl force-pushed the ibx-4906-migrate-namespaces branch from cea378c to 1ab6c82 Compare January 30, 2023 10:44
use Symfony\Component\Process\PhpExecutableFinder;
use Symfony\Component\Process\Process;

abstract class MultiprocessComand extends Command
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have split up the abstraction of multi processing in a separate class. Something like this could be re-used in many scenarios and could maybe be added to core ?

Or I could merge this class into MigrateNamespacesCommand

* chunks that can be processed by a child processes. In order to do that it will maintain a cursor and call
* createChildProcess() for each chunk.
*/
abstract protected function iterate(): void;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too happy about the naming here though...any better alternatives?

@@ -145,3 +145,10 @@ services:
- '%ibexa.field_type.richtext.converter.edit.xhtml5.resources%'
- '@ibexa.config.resolver'

Ibexa\Bundle\FieldTypeRichText\Command\MigrateNamespacesCommand:
autowire: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any good reason for avoiding autowire and autoconfigure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any good reason for avoiding autowire and autoconfigure?

No, it simplifies the definition. Less lines less cost. However why do you place Command DIC in fieldtype_services.yaml? It doesn't belong here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, Symfony recommends not using autowire and autoconfigure due to the need to compute during container compilation (which does not affect prod, and dev when there no container changes). We currently do not follow this recommendation, because it does indeed remove a lot of our service definition declarations/code.

Just wanted to clarify.

'stop' => null,
];

$contentAttributeIDs = $this->gateway->getContentObjectAttributeIds($cursor['start'], $limit);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since contentobject_attribute.id is not unique, I found no better way than to fetch every single id in the mother process as I wanted to avoid using offset in the SQLs later on when iterating over the dataset

@vidarl
Copy link
Contributor Author

vidarl commented Jan 30, 2023

I don't think the failing checks can be blamed on me ?

@vidarl
Copy link
Contributor Author

vidarl commented Jan 30, 2023

@ibexa/php-dev : please have a look

@vidarl vidarl force-pushed the ibx-4906-migrate-namespaces branch from 1ab6c82 to 822d2e5 Compare January 30, 2023 11:13

foreach ($contentAttributes as $contentAttribute) {
$contentAttribute['data_text'] = str_replace('xmlns:ezxhtml="http://ez.no/xmlns/ezpublish/docbook/xhtml"', 'xmlns:ezxhtml="http://ibexa.co/xmlns/dxp/docbook/xhtml"', $contentAttribute['data_text']);
$contentAttribute['data_text'] = str_replace('xmlns:ezcustom="http://ez.no/xmlns/ezpublish/docbook/custom"', 'xmlns:ezcustom="http://ibexa.co/xmlns/dxp/docbook/custom"', $contentAttribute['data_text']);
Copy link
Contributor Author

@vidarl vidarl Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change ezxhtml and ezcustom to ibexaxhtml and ibexacustom while we are at it?

That would maybe require some changes in editor and other places too though ?

composer.json Outdated Show resolved Hide resolved
@alongosz alongosz dismissed their stale review January 30, 2023 12:21

Fixed the issue

@adamwojs adamwojs changed the title IBX-4906 migrate richtext namespaces IBX-4906: Migrate richtext namespaces Jan 30, 2023
src/bundle/Command/MultiprocessComand.php Outdated Show resolved Hide resolved
src/bundle/Command/MultiprocessComand.php Outdated Show resolved Hide resolved
src/bundle/Command/MultiprocessComand.php Outdated Show resolved Hide resolved
src/bundle/Resources/config/fieldtype_services.yaml Outdated Show resolved Hide resolved
@adamwojs
Copy link
Member

adamwojs commented Jan 30, 2023

and code style needs to be fixed here

Comment on lines +30 to +43
$query = $this->connection->createQueryBuilder();
$query->select('count(distinct a.id)')
->from(self::DB_TABLE_CONTENTOBJECT_ATTRIBUTE, 'a')
->where(
$query->expr()->eq(
'a.data_type_string',
':data_type'
)
)
->setParameter(':data_type', self::FIELD_TYPE_IDENTIFIER);

$statement = $query->execute();

return (int) $statement->fetchOne();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly are you trying to count here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counting number of rows in contentobject_attribute with unique ids.

This is then used for for the progressbar so that it can show any usefull information

Copy link
Contributor Author

@vidarl vidarl Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason why I want that number, and not the total number of rows is that I do not want to use the slow offset when iterating over the table. Thus I generate a cursor based on id and we then need to know how many different values of id we are dealing with in order for the progressbar to have any value...

@vidarl vidarl force-pushed the ibx-4906-migrate-namespaces branch from 54a4767 to 0aa233f Compare February 13, 2023 12:56
return $this->cursorStart !== null || $this->cursorStop !== null;
}

public static function migrateNamespaces(string $xmlText)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the migration code here, and not in a converter because our converter interface expects a DOMDocument, and I don't want to waste time on converting back&forth to that, considering we might deal with millions of objects.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is already a static method, couldn't it be placed in the converter? I find it a bit awkward that the converter calls a method on a symfony command and not the other way around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. it looks odd indeed. In your suggestion, we would call a public function on the converter which is not defined it's interface, so that is not perfect either....

I don't have a big preference for either options... @Steveb-p , any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason for keeping it as-is, is that the converter is meant to be temporary. We'll not support on-the-fly conversion of namespaces forever. The command script will be kept though...

@vidarl
Copy link
Contributor Author

vidarl commented Feb 13, 2023

FYI : There seems to be other changes in the markup in 3.x vs 4.x as well. So migration script needs further changes as well : https://issues.ibexa.co/browse/IBX-5056

I am done updating the migration script to support the other changes I found

@vidarl vidarl requested review from adamwojs and alongosz February 13, 2023 14:01
@webhdx webhdx requested a review from a team February 17, 2023 08:57
Copy link
Contributor

@Steveb-p Steveb-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put it plain, mixing a command that launches subprocesses and performs an operation (using abstract and then extending a command) is not the correct design, because it effectively causes it mix two different responsibilities (making it harder to test, harder to extend, and so forth).

I assume we have good reason to move the execution to multiple subprocesses, and it has to be done this way to speed up the process?

composer.json Outdated Show resolved Hide resolved
src/bundle/Command/AbstractMultiProcessComand.php Outdated Show resolved Hide resolved
src/bundle/Command/AbstractMultiProcessComand.php Outdated Show resolved Hide resolved
src/bundle/Command/AbstractMultiProcessComand.php Outdated Show resolved Hide resolved
src/bundle/Command/AbstractMultiProcessComand.php Outdated Show resolved Hide resolved
src/bundle/Command/AbstractMultiProcessComand.php Outdated Show resolved Hide resolved
src/bundle/Command/AbstractMultiProcessComand.php Outdated Show resolved Hide resolved
src/bundle/Command/MigrateNamespacesCommand.php Outdated Show resolved Hide resolved
vidarl and others added 2 commits February 17, 2023 12:57
Co-authored-by: Paweł Niedzielski <pawel.tadeusz.niedzielski@gmail.com>
@vidarl
Copy link
Contributor Author

vidarl commented Feb 17, 2023

To put it plain, mixing a command that launches subprocesses and performs an operation (using abstract and then extending a command) is not the correct design, because it effectively causes it mix two different responsibilities (making it harder to test, harder to extend, and so forth).

I commented on this in #67 (comment)
In general, I think we should use multi processing more often when we are doing larger batch jobs. It helps for performance and memory footprint. Therefore I made it abstract so it can be reused. However, as mentioned in my comment, I think the abstract would be better suited in core or something. I asked for input on that.

If you don't like this idea, the two classed can easily be squashed into on class only.

I assume we have good reason to move the execution to multiple subprocesses, and it has to be done this way to speed up the process?

Yes, that is the main reason. Still with the multi process approach, this will take hours to run on huge databases.

@alongosz
Copy link
Member

@Steveb-p long-running operation executing many fetch queries tends to leak memory because Doctrine extends PDOStatement. This issue should disappear once we migrate to doctrine/dbal 3x. The workaround we used so far is executing sub-processes so memory is released by the system when a forded process for an iteration finishes its work. I agree that the pattern we've used so far is a bit cumbersome and pollutes too much the actual business logic. I think we in PHP Team need to discuss possible alternatives and propose general refactoring. // cc @webhdx

@vidarl
Copy link
Contributor Author

vidarl commented Feb 17, 2023

FYI : Added support for listing number of field attributes that was updated vs how many was not in 08f79a6

This adds some parsing of output from each subproccess. Could maybe be done a bit cleaner with regexp, but not faster ...

@vidarl vidarl force-pushed the ibx-4906-migrate-namespaces branch from 1259e8a to 5ce46b3 Compare February 17, 2023 14:17
@vidarl vidarl force-pushed the ibx-4906-migrate-namespaces branch from 5ce46b3 to 08f79a6 Compare February 17, 2023 14:21
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 37 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Comment on lines +60 to +64
"config": {
"allow-plugins": {
"*": false
}
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the case where we don't have any other allowed plugins, we can simply configure it as:

Suggested change
"config": {
"allow-plugins": {
"*": false
}
},
"config": {
"allow-plugins": false
},

composer.json Outdated Show resolved Hide resolved
/**
* @var string
*/
private mixed $user;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is PHP 8.0 type. Why is it mixed but documented as string anyway? That also uncovers another issue - this command is never checked by any CI / automated tests.

Comment on lines +76 to +78
string $name = null,
PermissionResolver $permissionResolver,
UserService $userService
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct way to build method argument list. If you have a default value for one argument, all consecutive ones need to have default values defined as well. Given the changes to the extending class I think $name can be entirely dropped.

'processes',
null,
InputOption::VALUE_OPTIONAL,
'Number of child processes to run in parallel for iterations, if set to "auto" it will set to number of CPU cores -1, set to "1" or "0" to disable [default: "auto"]',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SonarLint: Split this 182 characters long line (which is greater than 120 authorized).

Comment on lines +178 to +183
$xmlText = str_replace('xmlns:ezxhtml="http://ez.no/xmlns/ezpublish/docbook/xhtml"', 'xmlns:ezxhtml="http://ibexa.co/xmlns/dxp/docbook/xhtml"', $xmlText);
$xmlText = str_replace('xmlns:ezcustom="http://ez.no/xmlns/ezpublish/docbook/custom"', 'xmlns:ezcustom="http://ibexa.co/xmlns/dxp/docbook/custom"', $xmlText);
$xmlText = str_replace('ezxhtml:class="ez-embed-type-image"', 'ezxhtml:class="ibexa-embed-type-image"', $xmlText);
$xmlText = str_replace('xmlns:ez="http://ez.no/xmlns/ezpublish/docbook"', 'xmlns:ez="http://ibexa.co/xmlns/ezpublish/docbook"', $xmlText);
$xmlText = str_replace('xmlns:a="http://ez.no/xmlns/annotation"', 'xmlns:a="http://ibexa.co/xmlns/annotation"', $xmlText);
$xmlText = str_replace('xmlns:m="http://ez.no/xmlns/module"', 'xmlns:m="http://ibexa.co/xmlns/module"', $xmlText);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that the entire essence of the changes is simple string replacement? Why don't we then execute it on the database directly? This should be 100x faster than pulling data into PHP process and update with new version. There are string replacement functions built-in DBMS-es.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.. I tested this on my db with 72958 richtext rows in ezcontentobject_attribute

cmd script with concurrency=10 takes : 53 seconds
sql update script : 15 seconds.

So it is 3.5x faster, unfortunately not 100x, but that is still significant. So I agree that the migration should be done by simple sql updates.

FYI : The updates I tested with was:

UPDATE ezcontentobject_attribute
SET
    data_text =
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(
                                data_text,
                                'xmlns:m="http://ez.no/xmlns/module"',
                                'xmlns:m="http://ibexa.co/xmlns/module"'
                            ),
                            'xmlns:a="http://ez.no/xmlns/annotation"',
                            'xmlns:a="http://ibexa.co/xmlns/annotation"'
                        ),
                        'xmlns:ez="http://ez.no/xmlns/ezpublish/docbook"',
                        'xmlns:ez="http://ibexa.co/xmlns/ezpublish/docbook"'
                    ),
                    'ezxhtml:class="ez-embed-type-image"',
                    'ezxhtml:class="ibexa-embed-type-image"'
                ),
                'xmlns:ezcustom="http://ez.no/xmlns/ezpublish/docbook/custom"',
                'xmlns:ezcustom="http://ibexa.co/xmlns/dxp/docbook/custom"'
            ),
            'xmlns:ezxhtml="http://ez.no/xmlns/ezpublish/docbook/xhtml"',
            'xmlns:ezxhtml="http://ibexa.co/xmlns/dxp/docbook/xhtml"'
        )
WHERE
        data_type_string='ezrichtext';

`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct place for the sql script would be https://github.com/ibexa/installer/tree/main/upgrade/db, right ?

$contentAttributes = $this->gateway->getContentObjectAttributes($contentAttributeIdStart, $contentAttributeIdStop);

foreach ($contentAttributes as $contentAttribute) {
//$orgString = $contentAttribute['data_text'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't leave a dead code in the final solution.

Comment on lines +41 to +47
if ($container->hasDefinition('ibexa.richtext.converter.edit.xhtml5')) {
$html5EditConverterDefinition = $container->getDefinition('ibexa.richtext.converter.edit.xhtml5');
$taggedInputServiceIds = $container->findTaggedServiceIds(
'ibexa.field_type.richtext.converter.edit.xhtml5'
);
$this->setConverterDefinitions($taggedInputServiceIds, $html5EditConverterDefinition);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The richtext->xhtml5edit transformation does not currently use the Aggregate converter (only one fixed converter was used), so I had to fix that in order to support on-the-fly conversion

Comment on lines +19 to +23
$xml = $xmlDoc->saveXML();
$xml = MigrateNamespacesCommand::migrateNamespaces($xml);
$xmlDoc->loadXML($xml);

return $xmlDoc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm... No. Either command or on the fly. We're not gonna start supporting multiple approaches. Namespace migration should be one time operation. This impacts the performance of all consumers for the entire system.

Copy link
Contributor Author

@vidarl vidarl Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it will degrade performance for all customers, but the problem is also explained in other comments in this PR:
Upgrading from 3.x to 4.x will for customers with lots of content take hours ( that means hours of downtime, or at least hours where content moderation/editing is not possible).

Migrating the namespaces will add hours of additional downtime for such clients. Thus, it was requested to have the possibility to do this in two steps ( so two shorter periods of downtime instead of one very long period ).

If we refuse to support on-the-fly migration for a short period ( I am thinking one or two sub-releases ), upgrade process will be significant more painful for some customers.

That said, it might be an alternative to have on-the-fly conversions enabled/disabled in config, or making support for it in a separate bundle that can be installed optionally.
The only change needed in richtext bundle for making the latter pretty easy is to keep the Aggregate converter for ichtext->xhtml5edit which this PR already introduces

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not going to accept the solution which degrades performance, while it could be avoided by deterministic one time change. The solution simply needs to work quickly through a simple query.

Upgrading from 3.x to 4.x will for customers with lots of content take hours ( that means hours of downtime, or at least hours where content moderation/editing is not possible).

Live upgrade script needs to be quick. It also should be tested on a fresh snapshot of production database beforehand.

If we refuse to support on-the-fly migration for a short period ( I am thinking one or two sub-releases ),

This is formally not possible due to BC. You cannot change the behavior of such complex Product back and forth within one major release.

Anyway, as @webhdx said we're going to take over this one, so don't worry about it 👍

@@ -145,3 +145,10 @@ services:
- '%ibexa.field_type.richtext.converter.edit.xhtml5.resources%'
- '@ibexa.config.resolver'

Ibexa\Bundle\FieldTypeRichText\Command\MigrateNamespacesCommand:
autowire: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any good reason for avoiding autowire and autoconfigure?

No, it simplifies the definition. Less lines less cost. However why do you place Command DIC in fieldtype_services.yaml? It doesn't belong here.

@Steveb-p
Copy link
Contributor

@vidarl PHPStan support has landed in 4.3 version. Please rebase the PR and run:

composer phpstan

and check any errors that are reported.

We usually take special care about any incorrect phpdoc declarations (as pointed out in the comments, there are a few). If there are any errors in the end that you do not expect to fix, then you can use composer phpstan -- --generate-baseline to regenerate the "baseline" file (note the -- separating composer options from phpstan options).

@webhdx
Copy link
Contributor

webhdx commented Mar 1, 2023

@vidarl We decided to takie this PR over and finish it. @ciastektk will work on that.

@vidarl
Copy link
Contributor Author

vidarl commented Mar 3, 2023

Closed due to #77

@vidarl vidarl closed this Mar 3, 2023
@alongosz alongosz deleted the ibx-4906-migrate-namespaces branch July 4, 2023 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants