-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IBX-4906: Migrate richtext namespaces #67
Conversation
cea378c
to
1ab6c82
Compare
use Symfony\Component\Process\PhpExecutableFinder; | ||
use Symfony\Component\Process\Process; | ||
|
||
abstract class MultiprocessComand extends Command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have split up the abstraction of multi processing in a separate class. Something like this could be re-used in many scenarios and could maybe be added to core ?
Or I could merge this class into MigrateNamespacesCommand
* chunks that can be processed by a child processes. In order to do that it will maintain a cursor and call | ||
* createChildProcess() for each chunk. | ||
*/ | ||
abstract protected function iterate(): void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too happy about the naming here though...any better alternatives?
@@ -145,3 +145,10 @@ services: | |||
- '%ibexa.field_type.richtext.converter.edit.xhtml5.resources%' | |||
- '@ibexa.config.resolver' | |||
|
|||
Ibexa\Bundle\FieldTypeRichText\Command\MigrateNamespacesCommand: | |||
autowire: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any good reason for avoiding autowire and autoconfigure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any good reason for avoiding autowire and autoconfigure?
No, it simplifies the definition. Less lines less cost. However why do you place Command DIC in fieldtype_services.yaml
? It doesn't belong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, Symfony recommends not using autowire
and autoconfigure
due to the need to compute during container compilation (which does not affect prod, and dev when there no container changes). We currently do not follow this recommendation, because it does indeed remove a lot of our service definition declarations/code.
Just wanted to clarify.
'stop' => null, | ||
]; | ||
|
||
$contentAttributeIDs = $this->gateway->getContentObjectAttributeIds($cursor['start'], $limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since contentobject_attribute.id
is not unique, I found no better way than to fetch every single id
in the mother process as I wanted to avoid using offset
in the SQLs later on when iterating over the dataset
I don't think the failing checks can be blamed on me ? |
@ibexa/php-dev : please have a look |
1ab6c82
to
822d2e5
Compare
|
||
foreach ($contentAttributes as $contentAttribute) { | ||
$contentAttribute['data_text'] = str_replace('xmlns:ezxhtml="http://ez.no/xmlns/ezpublish/docbook/xhtml"', 'xmlns:ezxhtml="http://ibexa.co/xmlns/dxp/docbook/xhtml"', $contentAttribute['data_text']); | ||
$contentAttribute['data_text'] = str_replace('xmlns:ezcustom="http://ez.no/xmlns/ezpublish/docbook/custom"', 'xmlns:ezcustom="http://ibexa.co/xmlns/dxp/docbook/custom"', $contentAttribute['data_text']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change ezxhtml
and ezcustom
to ibexaxhtml
and ibexacustom
while we are at it?
That would maybe require some changes in editor and other places too though ?
and code style needs to be fixed here |
Co-authored-by: Adam Wójs <adam@wojs.pl>
$query = $this->connection->createQueryBuilder(); | ||
$query->select('count(distinct a.id)') | ||
->from(self::DB_TABLE_CONTENTOBJECT_ATTRIBUTE, 'a') | ||
->where( | ||
$query->expr()->eq( | ||
'a.data_type_string', | ||
':data_type' | ||
) | ||
) | ||
->setParameter(':data_type', self::FIELD_TYPE_IDENTIFIER); | ||
|
||
$statement = $query->execute(); | ||
|
||
return (int) $statement->fetchOne(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly are you trying to count here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counting number of rows in contentobject_attribute with unique id
s.
This is then used for for the progressbar so that it can show any usefull information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason why I want that number, and not the total number of rows is that I do not want to use the slow offset
when iterating over the table. Thus I generate a cursor based on id
and we then need to know how many different values of id
we are dealing with in order for the progressbar to have any value...
54a4767
to
0aa233f
Compare
return $this->cursorStart !== null || $this->cursorStop !== null; | ||
} | ||
|
||
public static function migrateNamespaces(string $xmlText) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the migration code here, and not in a converter because our converter interface expects a DOMDocument, and I don't want to waste time on converting back&forth to that, considering we might deal with millions of objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is already a static method, couldn't it be placed in the converter? I find it a bit awkward that the converter calls a method on a symfony command and not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. it looks odd indeed. In your suggestion, we would call a public function on the converter which is not defined it's interface, so that is not perfect either....
I don't have a big preference for either options... @Steveb-p , any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason for keeping it as-is, is that the converter is meant to be temporary. We'll not support on-the-fly conversion of namespaces forever. The command script will be kept though...
I am done updating the migration script to support the other changes I found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To put it plain, mixing a command that launches subprocesses and performs an operation (using abstract and then extending a command) is not the correct design, because it effectively causes it mix two different responsibilities (making it harder to test, harder to extend, and so forth).
I assume we have good reason to move the execution to multiple subprocesses, and it has to be done this way to speed up the process?
Co-authored-by: Paweł Niedzielski <pawel.tadeusz.niedzielski@gmail.com>
I commented on this in #67 (comment) If you don't like this idea, the two classed can easily be squashed into on class only.
Yes, that is the main reason. Still with the multi process approach, this will take hours to run on huge databases. |
@Steveb-p long-running operation executing many fetch queries tends to leak memory because Doctrine extends PDOStatement. This issue should disappear once we migrate to doctrine/dbal 3x. The workaround we used so far is executing sub-processes so memory is released by the system when a forded process for an iteration finishes its work. I agree that the pattern we've used so far is a bit cumbersome and pollutes too much the actual business logic. I think we in PHP Team need to discuss possible alternatives and propose general refactoring. // cc @webhdx |
FYI : Added support for listing number of field attributes that was updated vs how many was not in 08f79a6 This adds some parsing of output from each subproccess. Could maybe be done a bit cleaner with regexp, but not faster ... |
1259e8a
to
5ce46b3
Compare
…:richtext-namespaces script
5ce46b3
to
08f79a6
Compare
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
"config": { | ||
"allow-plugins": { | ||
"*": false | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the case where we don't have any other allowed plugins, we can simply configure it as:
"config": { | |
"allow-plugins": { | |
"*": false | |
} | |
}, | |
"config": { | |
"allow-plugins": false | |
}, |
/** | ||
* @var string | ||
*/ | ||
private mixed $user; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is PHP 8.0 type. Why is it mixed but documented as string anyway? That also uncovers another issue - this command is never checked by any CI / automated tests.
string $name = null, | ||
PermissionResolver $permissionResolver, | ||
UserService $userService |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the correct way to build method argument list. If you have a default value for one argument, all consecutive ones need to have default values defined as well. Given the changes to the extending class I think $name
can be entirely dropped.
'processes', | ||
null, | ||
InputOption::VALUE_OPTIONAL, | ||
'Number of child processes to run in parallel for iterations, if set to "auto" it will set to number of CPU cores -1, set to "1" or "0" to disable [default: "auto"]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SonarLint: Split this 182 characters long line (which is greater than 120 authorized).
$xmlText = str_replace('xmlns:ezxhtml="http://ez.no/xmlns/ezpublish/docbook/xhtml"', 'xmlns:ezxhtml="http://ibexa.co/xmlns/dxp/docbook/xhtml"', $xmlText); | ||
$xmlText = str_replace('xmlns:ezcustom="http://ez.no/xmlns/ezpublish/docbook/custom"', 'xmlns:ezcustom="http://ibexa.co/xmlns/dxp/docbook/custom"', $xmlText); | ||
$xmlText = str_replace('ezxhtml:class="ez-embed-type-image"', 'ezxhtml:class="ibexa-embed-type-image"', $xmlText); | ||
$xmlText = str_replace('xmlns:ez="http://ez.no/xmlns/ezpublish/docbook"', 'xmlns:ez="http://ibexa.co/xmlns/ezpublish/docbook"', $xmlText); | ||
$xmlText = str_replace('xmlns:a="http://ez.no/xmlns/annotation"', 'xmlns:a="http://ibexa.co/xmlns/annotation"', $xmlText); | ||
$xmlText = str_replace('xmlns:m="http://ez.no/xmlns/module"', 'xmlns:m="http://ibexa.co/xmlns/module"', $xmlText); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this mean that the entire essence of the changes is simple string replacement? Why don't we then execute it on the database directly? This should be 100x faster than pulling data into PHP process and update with new version. There are string replacement functions built-in DBMS-es.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.. I tested this on my db with 72958 richtext rows in ezcontentobject_attribute
cmd script with concurrency=10 takes : 53 seconds
sql update script : 15 seconds.
So it is 3.5x faster, unfortunately not 100x, but that is still significant. So I agree that the migration should be done by simple sql updates.
FYI : The updates I tested with was:
UPDATE ezcontentobject_attribute
SET
data_text =
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
data_text,
'xmlns:m="http://ez.no/xmlns/module"',
'xmlns:m="http://ibexa.co/xmlns/module"'
),
'xmlns:a="http://ez.no/xmlns/annotation"',
'xmlns:a="http://ibexa.co/xmlns/annotation"'
),
'xmlns:ez="http://ez.no/xmlns/ezpublish/docbook"',
'xmlns:ez="http://ibexa.co/xmlns/ezpublish/docbook"'
),
'ezxhtml:class="ez-embed-type-image"',
'ezxhtml:class="ibexa-embed-type-image"'
),
'xmlns:ezcustom="http://ez.no/xmlns/ezpublish/docbook/custom"',
'xmlns:ezcustom="http://ibexa.co/xmlns/dxp/docbook/custom"'
),
'xmlns:ezxhtml="http://ez.no/xmlns/ezpublish/docbook/xhtml"',
'xmlns:ezxhtml="http://ibexa.co/xmlns/dxp/docbook/xhtml"'
)
WHERE
data_type_string='ezrichtext';
`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct place for the sql script would be https://github.com/ibexa/installer/tree/main/upgrade/db, right ?
$contentAttributes = $this->gateway->getContentObjectAttributes($contentAttributeIdStart, $contentAttributeIdStop); | ||
|
||
foreach ($contentAttributes as $contentAttribute) { | ||
//$orgString = $contentAttribute['data_text']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't leave a dead code in the final solution.
if ($container->hasDefinition('ibexa.richtext.converter.edit.xhtml5')) { | ||
$html5EditConverterDefinition = $container->getDefinition('ibexa.richtext.converter.edit.xhtml5'); | ||
$taggedInputServiceIds = $container->findTaggedServiceIds( | ||
'ibexa.field_type.richtext.converter.edit.xhtml5' | ||
); | ||
$this->setConverterDefinitions($taggedInputServiceIds, $html5EditConverterDefinition); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The richtext->xhtml5edit transformation does not currently use the Aggregate converter (only one fixed converter was used), so I had to fix that in order to support on-the-fly conversion
$xml = $xmlDoc->saveXML(); | ||
$xml = MigrateNamespacesCommand::migrateNamespaces($xml); | ||
$xmlDoc->loadXML($xml); | ||
|
||
return $xmlDoc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm... No. Either command or on the fly. We're not gonna start supporting multiple approaches. Namespace migration should be one time operation. This impacts the performance of all consumers for the entire system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it will degrade performance for all customers, but the problem is also explained in other comments in this PR:
Upgrading from 3.x to 4.x will for customers with lots of content take hours ( that means hours of downtime, or at least hours where content moderation/editing is not possible).
Migrating the namespaces will add hours of additional downtime for such clients. Thus, it was requested to have the possibility to do this in two steps ( so two shorter periods of downtime instead of one very long period ).
If we refuse to support on-the-fly migration for a short period ( I am thinking one or two sub-releases ), upgrade process will be significant more painful for some customers.
That said, it might be an alternative to have on-the-fly conversions enabled/disabled in config, or making support for it in a separate bundle that can be installed optionally.
The only change needed in richtext bundle for making the latter pretty easy is to keep the Aggregate converter for ichtext->xhtml5edit which this PR already introduces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not going to accept the solution which degrades performance, while it could be avoided by deterministic one time change. The solution simply needs to work quickly through a simple query.
Upgrading from 3.x to 4.x will for customers with lots of content take hours ( that means hours of downtime, or at least hours where content moderation/editing is not possible).
Live upgrade script needs to be quick. It also should be tested on a fresh snapshot of production database beforehand.
If we refuse to support on-the-fly migration for a short period ( I am thinking one or two sub-releases ),
This is formally not possible due to BC. You cannot change the behavior of such complex Product back and forth within one major release.
Anyway, as @webhdx said we're going to take over this one, so don't worry about it 👍
@@ -145,3 +145,10 @@ services: | |||
- '%ibexa.field_type.richtext.converter.edit.xhtml5.resources%' | |||
- '@ibexa.config.resolver' | |||
|
|||
Ibexa\Bundle\FieldTypeRichText\Command\MigrateNamespacesCommand: | |||
autowire: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any good reason for avoiding autowire and autoconfigure?
No, it simplifies the definition. Less lines less cost. However why do you place Command DIC in fieldtype_services.yaml
? It doesn't belong here.
@vidarl PHPStan support has landed in
and check any errors that are reported. We usually take special care about any incorrect phpdoc declarations (as pointed out in the comments, there are a few). If there are any errors in the end that you do not expect to fix, then you can use |
@vidarl We decided to takie this PR over and finish it. @ciastektk will work on that. |
Closed due to #77 |
Fixes #63.
TODO:
$ composer fix-cs
).