Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "terms" #2767

Closed
wants to merge 4 commits into from
Closed

Fix "terms" #2767

wants to merge 4 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Sep 1, 2015

"terms" lists that were not fully lowercase cannot be handled. This PR counters this by processing the "terms" list upon retrieval of the translations from Transifex.

At first, I changed ./js/id/presets/preset.js # 49 to include a .toLowerCase() but it made more sense to just make sure the value is already OK in the ./dist/locales/##.json files.

While we're at it, the processing function also removes spaces after commas in the terms list. Those did not seem to affect the behaviour of the app, but Update: Those also put a penalty on the affected terms. As a free extra, removing them saves some bytes on the transmission of locale files and maybe even improves performance a tiny tiny bit, but I'm not sure. Overall, it's just nicer that they are normalized and the effort to do it is not so great since we have to process them all anyway. [Statement irrelevant: they need to be corrected.]

Fixes #2756.

Create a function in update_locales.js that:
* lowercases all "terms" translations in presets (fixes #2756)
* converts comma-space separators into just comma separators (saves a
  little bit of bandwidth and processing power on clients)
Ran `make translations`
Note that make translations has been improved by the previous commit.
@ghost
Copy link
Author

ghost commented Sep 2, 2015

Also, IMO this deserves a release; it fixes a subtle issue that may be confusing to end users. :)
But that's of course totally up to you guys to decide.

to clarify that preset.terms is made lowercase upon retrieval from Transifex
@bagage
Copy link
Contributor

bagage commented Sep 2, 2015

Those did not seem to affect the behaviour of the app

Yes it does. Instead of matching a tag exactly by its term, it will be matching by leventshtein distance for a cost of ... "1 space before everything". And this is, IMO, wrong.
For instance, in French wc (with space) will(?) match first CVC THEN toilets... so yes, it is affecting the behaviour as well.

By the way I think you should do more check on spaces around comma; some common typos are:

value1 , value2 (spaces before AND after)
value1, value2 (double spaces after.. we're talking about typo, right? :))

I think that the script should totally remove (\s)*(,)+(\s)* (hm, not sure about that..)... because we cannot rely on users to make proper formatting.

@ghost
Copy link
Author

ghost commented Sep 2, 2015

You're right, again.

midgard@midgard:~/git/iD/dist/locales$ grep -r " ," . | grep terms
./sv.json:                "terms": "pir ,vågbrytare,hamnarm,hamnpir,kaj"
./sv.json:                "terms": "musikaffär,musikbutik,cd-affär,cd,kassett,vinyl,lp,skivaffär ,skivbutik"
./sv.json:                "terms": "pappersvaror ,pappershandel,papper,anteckningsböcker,grattiskort,kuvert,pennor,kontorsmaterial,kort,papper"
./sv.json:                "terms": "information,informationskällan,turistbyråer ,turistkontor,karta,informationskarta,infokarta,informationstavla"
./fr.json:                "terms": "station de vidange,sanitaires,mobilehome ,camping,toilettes chimiques"
./hu.json:                "terms": "törvényszék,kúria ,ítélőtábla,bírósági hivatal,ítélőszék"
./hu.json:                "terms": "fotólabor ,fotóműhely ,fotólaboratórium"
./hu.json:                "terms": "ruházati bolt ,fehérneműbolt,öltönyáruház,"
./hu.json:                "terms": "hajszobrász,frizérozó,frizuraszerviz,hajvágás,borbély,hajszalon ,mesterfodrász,hajegyenesítés,haj,bajusz"
./es.json:                "terms": "ruta,itinerario,rumbo,dirección,trayecto,peatonal,pie,senderismo ,camino,derrotero,vía"
./ar.json:                "terms": "مطار ,ميناء جوّي"
./ar.json:                "terms": "بار ,حانة ,خمَّارَة ,مَشْرَب"
./ar.json:                "terms": "سوق ,سوق تجارية,ساحة السوق"
./ar.json:                "terms": "كشك التحصيل ,مكتب رسوم العبور"
./ar.json:                "terms": "مَرْأب ,كراج,تصليح سيارات"
./ar.json:                "terms": "طريق ,ممر ,مسلك ,درب"
./ar.json:                "terms": "منارة ,فنارة"
./ar.json:                "terms": "خط الكهرباء ,خط الطاقة"

Spaces at the end don't have as much impact but I'm going to remove them in this PR, thanks.

I also searched for semicolons (;) and found that the Ukrainian translation consistently has weird Latin character sequences (sometimes including semicolons) between the normal Cyrillic letters. I suppose those are some sort of keyboard sequence corresponding to other words. As such, we can't replace semicolons, I'll fix the few translations that erroneously used semicolons instead of commas manually in Transifex.

Can now also handle e.g.:
* ` ,`
* ` ,,  `
* `,   `

Change function definition of deepNormalizeTerms to the convention
within this file
@ghost
Copy link
Author

ghost commented Sep 2, 2015

Best not pull the translations right now: Access has been changed to Allowed access (in an unrelated commit) and Transifex has nullified all those translations.

@bhousel
Copy link
Member

bhousel commented Sep 2, 2015

I agree we should really do a release.. Lets let the translators catch up with "Allowed Access" and try to push a release in few days.

Regarding this change to fix the terms: I like what you're doing, but I think it might be better to make the original fix ./js/id/presets/preset.js because:

  1. The English terms don't pass through make translations
  2. Some organizations or iD forks are loading custom presets via the api, bypassing the .json files.

So our presets code really needs to handle the possibility that the terms list may have bad data in it.

@ghost
Copy link
Author

ghost commented Sep 2, 2015

Shall I leave the code in update-locales.js as well, or omit it? It really doesn't matter that much. NL is 139.8 KiB without the normalization, 139.3 KiB with.

I'll open a new PR when I've modified it. Feel free to close this one.

@bagage
Copy link
Contributor

bagage commented Sep 4, 2015

@bhousel will the incoming release wait for this PR (or at least issue #2756 to be fixed) to be merged? It would be great to be so :).

@bhousel
Copy link
Member

bhousel commented Sep 5, 2015

Shall I leave the code in update-locales.js as well, or omit it? It really doesn't matter that much. NL is 139.8 KiB without the normalization, 139.3 KiB with.

I'll open a new PR when I've modified it. Feel free to close this one.

Yeah, skip the update-locales.js changes to keep things simple. I'll close this PR.

will the incoming release wait for this PR (or at least issue #2756 to be fixed) to be merged? It would be great to be so :).

I think we can wait - Monday is a holiday here in the US.. Let's shoot for Tuesday?

@bhousel bhousel closed this Sep 5, 2015
@ghost ghost deleted the fix-terms branch September 5, 2015 08:32
@ghost ghost mentioned this pull request Sep 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants