NLP for 18th-century Portuguese medical texts

This is a repository for the paper: Zilio, L., Lazzari R.R., Finatto, M.J.B. (2024) NLP for historical Portuguese: Analysing 18th-century medical texts. In Proceedings of PROPOR 2024.

Repository content:

This is just an overview. Please refer to the paper above to get more information about the content of each folder.

TMX: this folder contains original and normalised versions of the texts described in the paper in a TMX format (a type of XML format)

aligned: this folder contains the results of semi-automatic alignments between original and normalised versions of each file of the corpus

keywords: this folder contains the results from the keyword analysis presented in the paper

parsed: this folder contains the automatically parsed version of the files (parsing done with STANZA)

variants: this folder contains the variants found by comparing the semi-automatically aligned files

The file aligned_parsed_modern_PT_NLTK_tokenizer_stanza.tsv contains an automatic parsing (with STANZA) combined with the alginments for the whole corpus. The second column of the parsing is the semi-automatically aligned original spelling of the token.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP for 18th-century Portuguese medical texts

Repository content:

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
TMX		TMX
aligned		aligned
keywords		keywords
parsed		parsed
variants		variants
LICENSE		LICENSE
README.md		README.md
aligned_parsed_modern_PT_NLTK_tokenizer_stanza.tsv		aligned_parsed_modern_PT_NLTK_tokenizer_stanza.tsv

License

uebelsetzer/NLP_for_18th-century_Portuguese_medical_texts

Folders and files

Latest commit

History

Repository files navigation

NLP for 18th-century Portuguese medical texts

Repository content:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages