Support an asterisk in bothify and optimize #701

nineinchnick · 2015-09-13T12:28:39Z

Base::bothify() now replaces an asterisk ('*') with either a random number or a random letter.

I also replaced preg_replace_callback in numerify and lexify as it is slower than iterating over a range of chars found by using strpos and strrpos.

Benchmarks are available here: https://gist.github.com/nineinchnick/84895a560910e3ead9cc

The biggest question is should we worry about UTF-8? I guess in a worst case it could replace the wrong byte. There are a few SO questions how to iterate over UTF-8 strings efficiently.

Should the new helper method Base::replaceWildcard be public?

…risk in bothify

fzaninotto · 2015-09-14T07:00:11Z

src/Faker/Provider/Base.php

     *
     * @param  string $string String that needs to bet parsed
     * @return string
     */
    public static function bothify($string = '## ??')
    {
+        $string = self::replaceWildcard($string, '*', function () {
+            return mt_rand(0, 1) ? '#' : '?';


Why don't you directly pass randomDigit() or randomAscii() here? That's save one function call for each placeholder.

How does it save a function call? For randomLetter(), it will be called once anyway and randomDigit() is not called at all, there's an optimization already that gets many digits from a big number.

fzaninotto · 2015-09-14T07:01:06Z

Hi, thanks for the patch!

Being utf8-safe also explains why the current version may be slower. Although faster, your method drops an important ability (UTF-8), which is a stopper. I think preg_replcae_callback() with a /u is a must.

nineinchnick · 2015-09-14T07:23:40Z

I've been thinking about it. We can still use strpos to extract a shorter string to pass to preg_replace_callback(). It also allows to skip the call completely in bothify if there are no '#' or '?'.

I'll benchmark that version and the one with custom utf-8 string iteration.

nineinchnick · 2015-09-14T07:27:57Z

Btw there should be an unit test to see it fail for special utf-8 strings. Maybe I'll try it with minimaxir/big-list-of-naughty-strings.

nineinchnick · 2015-09-14T07:31:17Z

There's an answer on SO that says it should be fine to search for an ascii char in an utf-8 string.

fzaninotto · 2015-09-14T07:34:16Z

Then you'll need a test to prove it :)

nineinchnick · 2015-09-14T08:01:14Z

Done and it seems to work. Done with https://gist.github.com/nineinchnick/917e644df42ccd62db5c
for #, * and ? on every possible utf-8 char and those files:

fzaninotto · 2015-09-14T08:06:33Z

I mean a unit test un Faker.

nineinchnick · 2015-09-14T08:14:29Z

Done.

fzaninotto · 2015-09-14T08:27:09Z

I meant testing Faker, not PHP. You have to make sure numerify, lexify, and bothify work correctly with UTF-8 strings.

nineinchnick · 2015-09-14T08:29:21Z

I'm not sure how such a test would look like. Should I run it on that big utf-8 string, replace the single char back to the placeholder and compare?

fzaninotto · 2015-09-14T08:30:36Z

No, just use a sample UTF-8 string with a couple special chars AND placeholders, and check that the result matches the expected with a regular expression (using /u)

nineinchnick · 2015-09-14T08:36:28Z

Done. Note that it doesn't matter if you use /u or not.

Support an asterisk in bothify and optimize

fzaninotto · 2015-09-14T08:38:14Z

Great, thanks!

replace preg_replace_callback with for loops; add support for an aste…

95a330c

…risk in bothify

fzaninotto reviewed Sep 14, 2015
View reviewed changes

add strpos on utf unit test

a49c4c3

remove strpos unit test; add another utf unit test

9fa1546

fzaninotto added a commit that referenced this pull request Sep 14, 2015

Merge pull request #701 from nineinchnick/bothify-add-asterisk

bedaf7d

Support an asterisk in bothify and optimize

fzaninotto merged commit bedaf7d into fzaninotto:master Sep 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support an asterisk in bothify and optimize #701

Support an asterisk in bothify and optimize #701

nineinchnick commented Sep 13, 2015

fzaninotto Sep 14, 2015

nineinchnick Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

Support an asterisk in bothify and optimize #701

Support an asterisk in bothify and optimize #701

Conversation

nineinchnick commented Sep 13, 2015

fzaninotto Sep 14, 2015

Choose a reason for hiding this comment

nineinchnick Sep 14, 2015

Choose a reason for hiding this comment

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015

nineinchnick commented Sep 14, 2015

fzaninotto commented Sep 14, 2015