Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding transliteration code as a sub project to cogcomp-nlp #563

Merged
merged 74 commits into from
Oct 16, 2017

Conversation

mayhewsw
Copy link
Member

@mayhewsw mayhewsw commented Oct 5, 2017

This is old code written by Jeff Pasternack in C#, ported by me into Java, and finally being migrated out of gitlab into github.

mayhewsw and others added 30 commits September 22, 2015 17:30
…rest. I'm working in CSPTransliteration right now
…t. Also commented out a lot of code, and am now fixing as needed. STILL BROKEN
…but needs some more debugging to be sure it is the same.
@danyaljj
Copy link
Member

danyaljj commented Oct 5, 2017

How about a slight modification to the readme?

CogComp Transliteration

Transliteration is the conversion of a given name in the source language (from source script) to a name in the target language (target script), such that the target language name is:

  1. phonemically equivalent to the source name
मममबई → Mumbai
  1. conforms to the phonology of the target language
नरिदेरनद → ਨਰਰਰਦਰ (नरिदरिम )
  1. matches the user intuition of the equivalent of the source language
    name in the target language, considering the culture and orthographic
    character usage in the target language
ആലപപഴ (aalappuzha)→ Alappuzha

About this package

This is a Java port of Jeff Pasternack's C# code from "Learning Better Transliterations."

To run, look at examples in TestTransliteration
or Runner.

Further reading

You can checkout more details on this work:

@inproceedings{PasternackRo09a,
    author = {J. Pasternack and D. Roth},
    title = {Learning Better Transliterations},
    booktitle = {CIKM},
    month = {11},
    year = {2009},
    url = "http://cogcomp.org/papers/PasternackRo09a.pdf",
    funding = {MIAS},
    projects = {TL},
    comment = {State-of-the-art transliteration with unbounded substring-to-substring productions and capable of both discovery and generation.},
}

@mayhewsw
Copy link
Member Author

mayhewsw commented Oct 9, 2017

@danyaljj this failed, but I don't know why. I can't login to the TeamCity site.

@danyaljj
Copy link
Member

We have tested this locally and seems like the failure is due to NER issues. With that in mind, merging this.

@danyaljj danyaljj merged commit a4458ec into CogComp:master Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants