Skip to content

In this project we make the first steps toward understanding typological differences of English, Spanish and Basque poetic rhythm with the aim of creating a language independent scansion system.

License

Notifications You must be signed in to change notification settings

manexagirrezabal/herascansion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Herascansion

In this project we make the first steps toward understanding typological differences of English, Spanish and Basque poetic rhythm with the aim of creating a language independent scansion system.

Dataset

We will soon release some specific scripts to read these corpora.

Features

There are specific implementations to get some necessary features, such as:

  • isheavy: This function returns true if the syllable is heavy or not according to a heuristic.
  • Lexical stress: We have implemented functions to get the lexical stress for each language.
    • English: We are using the NETTalk dictionary and if the word is not in the vocabulary, we use an SVM-based model
    • Spanish: We have an implementation that encodes the grammatical rules of Spanish stress.
    • Basque: We have an implementation that encodes the grammatical rules of standard Basque stress.
  • Part-of-speech tags: We have used pretrained models for POS-tagging.
    • English: Hidden Markov Models trained on the WSJ section of the Penn Treebank corpus
    • Spanish: Ixa-pipes trained on the Ancora corpus
    • Basque: Ixa-pipes trained on the Universal Dependencies corpus
  • In the English corpus syllables are divided. In the Spanish and Basque corpora, as words are not syllabified, we are using a syllabification algorithm based on the sonority hierarchy and maximum onset principle.

Techniques

Rule-based methods

Data-driven methods

License

Creative Commons v. 3.0

About

In this project we make the first steps toward understanding typological differences of English, Spanish and Basque poetic rhythm with the aim of creating a language independent scansion system.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages