Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make a master-elicitation list of most frequent real elicitations per concept set #5

Open
LinguList opened this issue Mar 1, 2019 · 7 comments

Comments

@LinguList
Copy link

This should be trivial to do, but @Schweikhard was right in the sense that it is sometimes confusing, specifically for newbies, what a concepticon gloss means (and definitions are also not the best always). So instead of having users to always check definitions, linked concepts, etc., we could make a very simple approach by which we use the linked elicitation glosses and select the most frequent one to be the "representative" elicitation for a given concept set. This could then also be included in the concepticon mapping command, to make it easier for users to inspect the data. Furthermore, we could even do this for all languages in our sample. So one could ask for french, english, and german elicitation glosses per concept set. And when people do their mapping, they could more easily see what concept we intend.

@xrotwang
Copy link
Contributor

xrotwang commented Mar 3, 2019

Ok, I see this being useful for the non-english glosses. But for the english one, I'd think either the Concepticon Gloss is the most common elicitation gloss, or is more helpful?

@LinguList
Copy link
Author

  • There may be non-uniqueness problems with "common" glosses.
  • they could change with releases (more than if we fix them)
  • making them "more helpful" would indicate information we already have assembled with all glosses, like indirect POS-marking with "to " or "the ", many more brackets, whatever
  • elicitation glosses are a product of the practice of a linguistic community, but concepticon's original glosses had the intention to fulfill uniqueness constraints, be "standardized" to some extend by having strict practice for " or " constructs, etc., so they are not the same as elicitation practice of English-speaking linguists

So I see many arguments against saying that the concepticon gloss is identical with the English elicitation gloss. They often coincide, also from our practice, but many aspects are differerent, and I see many disadvantages if we change that, specifically also because people use those as "id" equivalent now, as they can easily memorize them.

@xrotwang
Copy link
Contributor

xrotwang commented Mar 3, 2019

Ok. I guess what I'm struggling with is why "the most common elicitation gloss" should be particularly useful - as opposed to the set of all english elicitation glosses. Or is limiting to just one elicitation gloss only for pragmatic reasons?

@LinguList
Copy link
Author

Pragmatic reasons during mapping: you do the automatic concept mapping, and now you provide information for people to quickly check the mapping. The most obvious way would to add the definition in a column and have people read it. But people don't do that, or they get angry at our definition and stop using concepticon in total, or take it literally. The best indicator are -- you are right -- all existing mappings. They just can be many: https://github.com/clld/concepticon-data/tree/master/concepticondata#twenty-most-diverse-concept-sets. For convenience, during this mapping process, it might simply be useful to allow to flexibly add more columns, with content, like "German reflex", etc. When using spreadsheet software, having all mappings would not hurt. When doing linking with text-files, it would. And the genereal idea of having three different languages represented by their most common elicitaton gloss would probably be extremely helpful to identify the intended meaning. Maybe -- the more I think about it -- this is even more important than the English gloss...

@xrotwang
Copy link
Contributor

xrotwang commented Mar 3, 2019

Ok, makes sense. And indeed, shouldn't be very difficult to implement. Might be worth making this configurable - i.e. the "mapper" can choose additional languages for elicitation glosses?

@LinguList
Copy link
Author

Yes. We could even distinguish "example" as one most frequent gloss of a language from all glosses.

@xrotwang
Copy link
Contributor

That's easy enough to do via SQLite from concepticon-cldf:

select
    concepticon_id, concepticon_gloss, gloss, max(c) 
from (
    select 
        p.cldf_id as concepticon_id, 
        p.cldf_name as concepticon_gloss,
        c.cldf_name as gloss, c
        ount(c.cldf_id) as c 
    from 
        parametertable as p, 
        `concepts.csv` as c,
        formtable as f 
    where 
        c.cldf_parameterreference = p.cldf_id and 
        f.concept_id = c.cldf_id and 
        f.cldf_languagereference = 'english' 
    group by p.cldf_id, c.cldf_name
) as q
group by concepticon_id order by cast(concepticon_id as int);
concepticon_id concepticon_gloss gloss max(c)
0 situation 13
1 CONTEMPTIBLE contemptible 2
2 DUST dust 103
3 BRAVE brave 16
4 COURTYARD courtyard 6
5 GAZELLE Grant's gazelle 1
6 EARTHQUAKE earthquake 14
7 GATHER gather 9
8 CURSE curse 5
9 ANNOUNCE announce 9

@LinguList Might be enough adding this to https://github.com/concepticon/concepticon-cldf/tree/main/doc ?

@xrotwang xrotwang transferred this issue from concepticon/pyconcepticon Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants