This repository contains formality annotations for each word, sentence and document in CodE Alltag, a German-language email corpus. Formality scores between +1 (most formal) and -1 (most informal) are obtained automatically with transformer models fine-tuned on formality-assessed sentences (for sentences and documents) or on formality-assessed words (for words).
The formality_scores_documents*.json files provide the formality scores for each document. The formality_scores_sentences*.json and formality_scores_words*.json files include a list of all formality scores for the sentences or words in each document.
The TSV files sentences_formality_scores.tsv and words_formality_scores.tsv list the formality score for each sentence or word in the entire corpus.