Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDO quote bottleneck #13440

Closed
staabm opened this issue Feb 20, 2024 · 7 comments
Closed

PDO quote bottleneck #13440

staabm opened this issue Feb 20, 2024 · 7 comments

Comments

@staabm
Copy link
Contributor

staabm commented Feb 20, 2024

Description

running https://github.com/druidfi/mysqldump-php and the following code:

<?php

error_reporting(E_ALL);

require_once __DIR__ . '/vendor/autoload.php';

use Druidfi\Mysqldump as IMysqldump;

$date = date('Ymd');
$dumpSettings = array(
    'no-data' => false,
    'add-drop-table' => true,
    'single-transaction' => true,
    'lock-tables' => false,
    'add-locks' => true,
    'extended-insert' => true,
    'disable-foreign-keys-check' => true,
    'skip-triggers' => false,
    'add-drop-trigger' => true,
    'databases' => true,
    'add-drop-database' => true,
    'hex-blob' => true,
    'include-tables' => array('my-table'), // add a table name here
);

$dump = new IMysqldump\Mysqldump("mysql:host=$hostname;dbname=$db", $username, $password, $dumpSettings); // add DB credentials
$dump->start("backup$date.sql");

Results in a blackfire profile graph which shows a bottleneck in calling PDO::quote:
grafik
I am using a db-table which contains 1.3 millions rows and contains 16 columns (some varchar, some int)

I guess its kind of expected when running a php tool to dump a db table in which we need to escape every column value which is stringy that we would bottleneck on the escaping function.

I am wondering whether there is some low hanging fruits in PDO::quote regarding use-cases which call the function excessively.

repro steps

PHP Version

PHP 8.1.27

Operating System

ubuntu 22 lts

@Girgias
Copy link
Member

Girgias commented Feb 20, 2024

For MySQL specifically, it might be possibly to optimize the quoter function provided by the driver: mysql_handle_quoter it does seem to be doing some reallocations.

@nielsdos
Copy link
Member

It looks deceivingly simple, I'll give it a shot tonight.

@nielsdos nielsdos self-assigned this Feb 20, 2024
@nielsdos
Copy link
Member

deceivingly simple

These were the right words.

I played with it a bit and found a couple of interesting things. I have to do verifications and some cleanup though.

  • The reallocation is not the bottleneck and has only a tiny impact on performance. Nevertheless it's a good idea to avoid it.
  • The indirect calls in mysqlnd_cset_escape_slashes are expensive. If I'm not mistaken, we can at least skip the indirect call for cset->mb_valid when the character is in the ASCII range (but I'd have to double check that). That already improves performance quite a bit.
  • The condition char_maxlen > 1 && cset->mb_charlen(*escapestr) > 1 looks to me like it can never be true, because we can only execute that if cset->mb_valid(escapestr, end) failed which would mean it's not valid multibyte to begin with... I'd have to double check this but I think this check is dead (codecov also has no coverage of that branch which gives a bit of a hint).
  • It's possible to do the character writes in bulk to newstr by delaying writes until we have to escape. I use a similar technique in my HTML 5 serialization code. This has only a small performance benefit, or at least smaller than I hoped.

That being said, it seems the majority of the time in the sample script is spent doing IO (as expected).
When testing PDO::quote in isolation I got a performance boost with my experimental code of about 2.3x.

@staabm
Copy link
Contributor Author

staabm commented Feb 21, 2024

Thanks for looking into it. 2x faster would be awesome, as the script will be used for multi GB tables in big databases.

Regarding IO and other bottlenecks: I am already working on that

@kamil-tekiela
Copy link
Member

Is your PHP using mysqlnd or libmysql?

@staabm
Copy link
Contributor Author

staabm commented Feb 22, 2024

mysqlnd 8.1.27

grafik

@nielsdos
Copy link
Member

nielsdos commented Mar 7, 2024

Didn't get auto-closed, but this is fixed by the commit series ending in 3ddd341.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants