This is a curated list of samples on NLP preprocessing. You are welcome to make a pull request to contribute!
-
Extract Word Pair : Extract special word pairs from a group of word pairs (e.g. French, Italian, Portuguese, Spanish, Turkish). The word pair should match the following requirements:
- Single deletion distance
- Deletion at the exact center of the word
Example.
(bloccare, blocare) and (fellah, felah)
Incorrect example.(vitamină, vitamin) and (maxi, maksi)
-
Extract Word Pair (Cognate) : Extract special word pairs from a group of word pairs (e.g. French, Italian, Portuguese, Spanish). The word pair should match the following requirements:
- Being a cognate pair
- Single deletion distance
Example.
(veni, venir) and (rosmarin, romarin)
Incorrect example.(hanorac, anorak) and (msingur, singolo)