Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimizations #18

Merged
merged 1 commit into from
Dec 29, 2024
Merged

Performance optimizations #18

merged 1 commit into from
Dec 29, 2024

Conversation

GromNaN
Copy link
Contributor

@GromNaN GromNaN commented Nov 28, 2024

  • Use use function to enable compiler optimization of some native functions.
  • Use preg_match('/(.).*\1/', $alphabet) to validate there is no duplicate char, faster than splitting into an array
  • Avoid splitting strings, and access chars directly with $string[$char]
  • Avoid functions with callback (array_reduce, array_filter), they are slower than foreach

Benchmark

PHPBench code
composer req --dev phpbench/phpbench

In phpbench.json

{
    "$schema": "./vendor/phpbench/phpbench/phpbench.schema.json",
    "runner.bootstrap": "vendor/autoload.php",
    "runner.file_pattern": "*Bench.php",
    "runner.path": "tests",
    "runner.iterations": 3
}

In tests/SqidsBench.php

<?php

namespace Sqids\Benchmark;

use PhpBench\Attributes\ParamProviders;
use PhpBench\Attributes\Revs;
use PhpBench\Attributes\Warmup;
use Sqids\Sqids;

#[Warmup(1)]
final class SqidsBench
{
    private const IDS = [
        'SvIzsqYMyQwI3GWgJAe17URxX8V924Co0DaTZLtFjHriEn5bPhcSkfmvOslpBu' => [0, 0],
        'n3qafPOLKdfHpuNw3M61r95svbeJGk7aAEgYn4WlSjXURmF8IDqZBy0CT2VxQc' => [0, 1],
        'tryFJbWcFMiYPg8sASm51uIV93GXTnvRzyfLleh06CpodJD42B7OraKtkQNxUZ' => [0, 2],
        'eg6ql0A3XmvPoCzMlB6DraNGcWSIy5VR8iYup2Qk4tjZFKe1hbwfgHdUTsnLqE' => [0, 3],
        'rSCFlp0rB2inEljaRdxKt7FkIbODSf8wYgTsZM1HL9JzN35cyoqueUvVWCm4hX' => [0, 4],
        'sR8xjC8WQkOwo74PnglH1YFdTI0eaf56RGVSitzbjuZ3shNUXBrqLxEJyAmKv2' => [0, 5],
        'uY2MYFqCLpgx5XQcjdtZK286AwWV7IBGEfuS9yTmbJvkzoUPeYRHr4iDs3naN0' => [0, 6],
        '74dID7X28VLQhBlnGmjZrec5wTA1fqpWtK4YkaoEIM9SRNiC3gUJH0OFvsPDdy' => [0, 7],
        '30WXpesPhgKiEI5RHTY7xbB1GnytJvXOl2p0AcUjdF6waZDo9Qk8VLzMuWrqCS' => [0, 8],
        'moxr3HqLAK0GsTND6jowfZz3SUx7cQ8aC54Pl1RbIvFXmEJuBMYVeW9yrdOtin' => [0, 9],
        'JSwXFaosANEOuLlYb3jHCBpeSzx7cPRrgf1dNTZqE4nDytU09isA5ahm6kKGvM' => [1_000_000, 2_000_000],
    ];

    #[Revs(1_000)]
    #[ParamProviders('provideSqids')]
    public static function benchEncodeDecode(array $params): void
    {
        foreach (self::IDS as $id => $numbers) {
            $params[0]->encode($numbers);
            $params[0]->decode($id);
        }
    }

    public static function provideSqids(): \Generator
    {
        yield 'default' => [
            new Sqids()
        ];
        yield 'custom blocklist' => [
            new Sqids(blocklist: [
                'JSwXFaosAN',
                'OCjV9JK64o',
                'rBHf',
                '79SM',
                '7tE6',
            ])
        ];
        yield 'no blocklist' => [
            new Sqids(blocklist: [])
        ];
    }
}

Before

    benchEncodeDecode # default.............I2 - Mo2.195ms (±0.47%)
    benchEncodeDecode # custom blocklist....I2 - Mo1.099ms (±0.84%)
    benchEncodeDecode # no blocklist........I2 - Mo929.423μs (±0.87%)

After

    benchEncodeDecode # default.............I2 - Mo994.363μs (±0.35%)
    benchEncodeDecode # custom blocklist....I2 - Mo681.353μs (±0.36%)
    benchEncodeDecode # no blocklist........I2 - Mo552.545μs (±0.21%)

@vinkla
Copy link
Collaborator

vinkla commented Nov 29, 2024

This looks good to me. I'll leave it up to @4kimov to merge this who has a deeper knowledge across our language implementations.

src/Sqids.php Outdated Show resolved Hide resolved
Copy link
Contributor Author

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand that it's a big PR and that you might have doubts about the merge. I've added comments to explain the changes. Given the performance gain, I think it's important to look into it.

throw new InvalidArgumentException('Alphabet must contain unique characters');
}

$minLengthLimit = 255;
if (
!is_int($minLength) ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type is already validated in the arg type.

Comment on lines -658 to -662
$inRangeNumbers = array_filter($numbers, fn($n) => $n >= 0 && $n <= self::maxValue());
if (count($inRangeNumbers) != count($numbers)) {
throw new InvalidArgumentException(
'Encoding supports numbers between 0 and ' . self::maxValue(),
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new array, the exception is thrown directly when there is an invalid number.

Comment on lines -739 to -743
$alphabetChars = str_split($this->alphabet);
foreach (str_split($id) as $c) {
if (!in_array($c, $alphabetChars)) {
return $ret;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This split operation is replaced by a more efficient regex.

Comment on lines +772 to +774
for ($i = 0, $j = strlen($alphabet) - 1; $j > 0; $i++, $j--) {
$r = ($i * $j + ord($alphabet[$i]) + ord($alphabet[$j])) % strlen($alphabet);
[$alphabet[$i], $alphabet[$r]] = [$alphabet[$r], $alphabet[$i]];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manipulation of individual characters of the string, instead of using an array of chars.

$id = [];
$chars = str_split($alphabet);

$result = $num;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse and modify the variable $num instead of creating a new one.

array_unshift($id, $chars[$this->math->intval($this->math->mod($result, count($chars)))]);
$result = $this->math->divide($result, count($chars));
} while ($this->math->greaterThan($result, 0));
$id = $alphabet[$this->math->intval($this->math->mod($num, strlen($alphabet)))] . $id;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appending to the end of the string is the same as using array_unshift on an array.

@4kimov
Copy link
Member

4kimov commented Dec 22, 2024

Well, to say that these optimizations are welcome would be an understatement. Thank you for taking the time!

A few basic questions as I look at this:

  1. Are there any breaking changes in this PR or Optimize performance of blocklist filtering and checking by using Regex #17?
  2. With new regex expressions, is there any character that a user might use in the alphabet to mess up regex matching?

@GromNaN
Copy link
Contributor Author

GromNaN commented Dec 22, 2024

1. Are there any breaking changes in this PR or [Optimize performance of blocklist filtering and checking by using Regex #17](https://github.com/sqids/sqids-php/pull/17)?

After multiple reviews, I don't see any breaking change in this PR.

In #17, there is something negligible with the $blocklist property, and maybe if someone customized the blocklist without adding all the same leet variations of the words.

2. With new regex expressions, is there any character that a user might use in the alphabet to mess up regex matching?

Regexes operate on bytes; unicode characters are split into bytes and not considered as a single character. But this was already the case with str_split. Unicode is already not supported.

There is no other restriction in the characters accepted by the alphabet. The alphabet is not used as part of the regex, and if it was, I would have used preg_quote to escape special chars. I escaped the blocklist works for this purpose.

@4kimov 4kimov merged commit 9390a85 into sqids:main Dec 29, 2024
9 checks passed
@4kimov
Copy link
Member

4kimov commented Dec 29, 2024

Thanks @GromNaN, great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants