Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Inflector::transliterate to public #10061

Closed
dinhtrung opened this issue Oct 31, 2015 · 14 comments
Closed

Change Inflector::transliterate to public #10061

dinhtrung opened this issue Oct 31, 2015 · 14 comments
Assignees
Milestone

Comments

@dinhtrung
Copy link
Contributor

In order to support transliteration, I have to install another composer module (yii2-transliteration) to do so.

Just found that simple conversion can be made with our transliterate.

Why don't we change protected to public static transliterate instead? It quite convenient some time.

Thanks,

@samdark
Copy link
Member

samdark commented Oct 31, 2015

Do you want transliteration only? There's Inflector::slug() for URLs.

@dinhtrung
Copy link
Contributor Author

Yes, slug can be use to normalize the filename and url. However in some language (mine is Vietnamese) the meta-data or meta-keywords also should be transliterate for better SEO. Another use case is to generate Ascii equivalent version of name or contents. This is where transliteration come in handy.

Does it make sense?

@samdark
Copy link
Member

samdark commented Oct 31, 2015

Better SEO w/ transliterated content? Are you absolutely sure?

Overall I agree that there could be cases.

@samdark samdark added this to the 2.0.x milestone Oct 31, 2015
@samdark samdark added the status:ready for adoption Feel free to implement this issue. label Oct 31, 2015
@dinhtrung
Copy link
Contributor Author

I'm not a SEO expert so I am not sure about the effective of transliteration keywords. However many websites in Vietnam include both transliterated and original keywords meta in the header.

There also some resources mention this and I believe search engine got index on transliterated content easier (more similar to English?).

https://moz.com/blog/so-you-want-to-know-about-foreign-language-seo-mozinar-q-a

For my recent project, I use it to convert SMS content. It is one of use case I think.

@samdark
Copy link
Member

samdark commented Oct 31, 2015

SMS content is definitely a good use case.

@SilverFire
Copy link
Member

Inflector::slug removes all special characters. Example of slugging from the test:

            'עִבְרִית' => 'iberiyt',
            'недвижимость' => 'nedvizimost',

When we make Inflector::transliterate public, i18n transliteration gives us chars with diacritic symbols like this:

'ʻibĕriyţ'
'nedvižimostʹ'

I can say about Russian word "недвижимость" that we don't use diacritics in transliteration. Therefore we usually write zh instead of ž. I'm sure there are many special rules of transliteration most of languages, but how correctly are they represented in PHP i18n transliterator?

Is it an expected behavior?

@SilverFire
Copy link
Member

Found a solution that can be turned optionally.

BaseInflector::transliterator defaults to Any-Latin; NFKD.

NFKD is an option of forms normalization described on unicode consorcium website

So let's test different common options:

$s = `获取到后台数据后 Український транслит на PHP! ¿Недвижимость или עִבְרִית?`;
$transliterators = ['Any-Latin; NFKD', 'Any-Latin; Latin-ASCII', 'Any-Latin; NFKD; [\u0080-\u7fff] remove'];
foreach ($transliterators as $transliterator) {
    echo "<br>" . transliterator_transliterate($transliterator, $string);
}
huò qǔ dào hòu tái shù jù hòu Ukraí̈nsʹkij translit na PHP! ¿Nedvižimostʹ ili ʻibĕriyţ?
huo qu dao hou tai shu ju hou Ukrainsʹkij translit na PHP! ¿Nedvizimostʹ ili ʻiberiyt?
huo qu dao hou tai shu ju hou Ukrainskij translit na PHP! Nedvizimost ili iberiyt?

The most spelling-correct is no.1, but the safest is no. 3.

Your thoughts?

@samdark
Copy link
Member

samdark commented Nov 4, 2015

We should make it an option.

@SilverFire
Copy link
Member

It's already an option of Inflector. Do you want do make it as an option of transliterate() method?

@samdark
Copy link
Member

samdark commented Nov 4, 2015

Yes.

@samdark
Copy link
Member

samdark commented Nov 4, 2015

Probably.

@SilverFire
Copy link
Member

It's a good idea.
Do you want to pre-define listed above options of forms normalization?

@samdark
Copy link
Member

samdark commented Nov 4, 2015

Yes.

@SilverFire
Copy link
Member

I'll handle this. Wait for the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants