Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MODULE] - Sentence complexity #2

Open
jhoetter opened this issue Oct 25, 2022 · 1 comment
Open

[MODULE] - Sentence complexity #2

jhoetter opened this issue Oct 25, 2022 · 1 comment
Assignees
Labels
cognition enhancement New feature or request good first issue Good for newcomers

Comments

@jhoetter
Copy link
Member

Please describe the module you would like to add to the content library
I know that some of the texts in my dataset are rather difficult to understand. In general, complexity of sentences can differ in my projects. I want to detect that.

Do you already have an implementation?
If so, please share it here. For instance:

from typing import Dict, Any
import textstat

def setall(d, keys, value):
    for k in keys:
        d[k] = value

MAX_SCORE = 122
MIN_SCORE = 0

OUTCOMES = {}
setall(OUTCOMES, range(90, MAX_SCORE), "very easy")
setall(OUTCOMES, range(80, 90), "easy")
setall(OUTCOMES, range(70, 80), "fairly easy")
setall(OUTCOMES, range(60, 70), "standard")
setall(OUTCOMES, range(50, 60), "fairly difficult")
setall(OUTCOMES, range(30, 50), "difficult")
setall(OUTCOMES, range(MIN_SCORE, 30), "very difficult")

def get_mapping_complexity(score):
    if score < MIN_SCORE:
        return OUTCOMES[MIN_SCORE]
    return OUTCOMES[int(score)]

def fn_sentence_complexity(record: Dict[str, Any]) -> str:
    text = record["text"]

    language = record["language"]
    if language is not None:
        textstat.set_lang(language)
    

    sentence_complexity_score = textstat.flesch_reading_ease(text)
    sentence_complexity = get_mapping_complexity(sentence_complexity_score)
    return sentence_complexity

Additional context
-

@jhoetter jhoetter added the enhancement New feature or request label Oct 25, 2022
@jhoetter jhoetter self-assigned this Oct 25, 2022
@jhoetter jhoetter closed this as completed Nov 7, 2022
@jhoetter jhoetter added the good first issue Good for newcomers label Nov 24, 2022
@SvenjaKern
Copy link
Contributor

I am wondering what is meant by complexity?
So is it refferened to the vocab, the morphology, the semantic or the syntax? Is it a mix of all? Or is it compered to the Language Niveaus Language Learner Style?
Maybe we can find out and add it in the ReadMe. If I wonder, maybe the clients will, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cognition enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants