Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Greek language support #2658

Merged
merged 1 commit into from
Aug 14, 2018
Merged

Conversation

giannisdaras
Copy link
Contributor

Description

This pull request aims to optimize further the Greek language support by introducing some more changes.

Types of change

The enhancement of the Greek language support is achieved by the following changes:

  1. Addition of syntax_iterators.py file for noun chunks detection.
  2. A (lot) more rules added to lemmatizer.
  3. More exceptions added to lemmatizer and finally usage of them (the version before this PR does not include them in the init file, so the lemmatizer exceptions are unused).
  4. Greek language Lemmatizer based on the rule-based technique of the default Lemmatizer in order to
    optimize it using the language specific characteristics.
  5. PEP8 and Flake8 tests for all the scripts.
  6. Norm exceptions: removal of duplicate keys in the dictionary.
  7. Removal of unused imports for cleaner code.

All in all, I hope that this PR will improve significantly the quality of Greek language support.

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@ines ines added enhancement Feature requests and improvements lang / el Greek language data and models labels Aug 13, 2018
@honnibal
Copy link
Member

Thanks! Looks great!

@honnibal honnibal merged commit fe94e69 into explosion:master Aug 14, 2018
@steremma
Copy link

steremma commented May 8, 2019

Awesome work @Eleni170 , thanks for the contribution! I know this is an old issue but just in case you are still active: Is sentence segmentation supported? I am having some trouble getting it to work:

>>> sp = spacy.load("el", disable=['tagger', 'ner', 'textcat'])
>>> text = "Αυτή είναι η πρώτη πρόταση. Εδώ θα έπρεπε να σπάσει. Δεν έσπασε όμως!! Περίεργο, έτσι δεν είναι;" 
>>> for sentence in sp(text).sents: 
>>>     print(sentence)

out: Αυτή είναι η πρώτη πρόταση. Εδώ θα έπρεπε να σπάσει. Δεν έσπασε όμως!! Περίεργο, έτσι δεν είναι;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements lang / el Greek language data and models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants