make a master-elicitation list of most frequent real elicitations per concept set #5

LinguList · 2019-03-01T16:49:59Z

This should be trivial to do, but @Schweikhard was right in the sense that it is sometimes confusing, specifically for newbies, what a concepticon gloss means (and definitions are also not the best always). So instead of having users to always check definitions, linked concepts, etc., we could make a very simple approach by which we use the linked elicitation glosses and select the most frequent one to be the "representative" elicitation for a given concept set. This could then also be included in the concepticon mapping command, to make it easier for users to inspect the data. Furthermore, we could even do this for all languages in our sample. So one could ask for french, english, and german elicitation glosses per concept set. And when people do their mapping, they could more easily see what concept we intend.

xrotwang · 2019-03-03T00:18:59Z

Ok, I see this being useful for the non-english glosses. But for the english one, I'd think either the Concepticon Gloss is the most common elicitation gloss, or is more helpful?

LinguList · 2019-03-03T05:15:08Z

There may be non-uniqueness problems with "common" glosses.
they could change with releases (more than if we fix them)
making them "more helpful" would indicate information we already have assembled with all glosses, like indirect POS-marking with "to " or "the ", many more brackets, whatever
elicitation glosses are a product of the practice of a linguistic community, but concepticon's original glosses had the intention to fulfill uniqueness constraints, be "standardized" to some extend by having strict practice for " or " constructs, etc., so they are not the same as elicitation practice of English-speaking linguists

So I see many arguments against saying that the concepticon gloss is identical with the English elicitation gloss. They often coincide, also from our practice, but many aspects are differerent, and I see many disadvantages if we change that, specifically also because people use those as "id" equivalent now, as they can easily memorize them.

xrotwang · 2019-03-03T05:27:15Z

Ok. I guess what I'm struggling with is why "the most common elicitation gloss" should be particularly useful - as opposed to the set of all english elicitation glosses. Or is limiting to just one elicitation gloss only for pragmatic reasons?

LinguList · 2019-03-03T05:35:10Z

Pragmatic reasons during mapping: you do the automatic concept mapping, and now you provide information for people to quickly check the mapping. The most obvious way would to add the definition in a column and have people read it. But people don't do that, or they get angry at our definition and stop using concepticon in total, or take it literally. The best indicator are -- you are right -- all existing mappings. They just can be many: https://github.com/clld/concepticon-data/tree/master/concepticondata#twenty-most-diverse-concept-sets. For convenience, during this mapping process, it might simply be useful to allow to flexibly add more columns, with content, like "German reflex", etc. When using spreadsheet software, having all mappings would not hurt. When doing linking with text-files, it would. And the genereal idea of having three different languages represented by their most common elicitaton gloss would probably be extremely helpful to identify the intended meaning. Maybe -- the more I think about it -- this is even more important than the English gloss...

xrotwang · 2019-03-03T05:40:39Z

Ok, makes sense. And indeed, shouldn't be very difficult to implement. Might be worth making this configurable - i.e. the "mapper" can choose additional languages for elicitation glosses?

LinguList · 2019-03-03T06:09:08Z

Yes. We could even distinguish "example" as one most frequent gloss of a language from all glosses.

xrotwang · 2024-03-15T07:03:35Z

That's easy enough to do via SQLite from concepticon-cldf:

select
    concepticon_id, concepticon_gloss, gloss, max(c) 
from (
    select 
        p.cldf_id as concepticon_id, 
        p.cldf_name as concepticon_gloss,
        c.cldf_name as gloss, c
        ount(c.cldf_id) as c 
    from 
        parametertable as p, 
        `concepts.csv` as c,
        formtable as f 
    where 
        c.cldf_parameterreference = p.cldf_id and 
        f.concept_id = c.cldf_id and 
        f.cldf_languagereference = 'english' 
    group by p.cldf_id, c.cldf_name
) as q
group by concepticon_id order by cast(concepticon_id as int);

concepticon_id	concepticon_gloss	gloss	max(c)
0		situation	13
1	CONTEMPTIBLE	contemptible	2
2	DUST	dust	103
3	BRAVE	brave	16
4	COURTYARD	courtyard	6
5	GAZELLE	Grant's gazelle	1
6	EARTHQUAKE	earthquake	14
7	GATHER	gather	9
8	CURSE	curse	5
9	ANNOUNCE	announce	9

@LinguList Might be enough adding this to https://github.com/concepticon/concepticon-cldf/tree/main/doc ?

xrotwang transferred this issue from concepticon/pyconcepticon Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make a master-elicitation list of most frequent real elicitations per concept set #5

make a master-elicitation list of most frequent real elicitations per concept set #5

LinguList commented Mar 1, 2019

xrotwang commented Mar 3, 2019

LinguList commented Mar 3, 2019

xrotwang commented Mar 3, 2019

LinguList commented Mar 3, 2019

xrotwang commented Mar 3, 2019

LinguList commented Mar 3, 2019

xrotwang commented Mar 15, 2024

make a master-elicitation list of most frequent real elicitations per concept set #5

make a master-elicitation list of most frequent real elicitations per concept set #5

Comments

LinguList commented Mar 1, 2019

xrotwang commented Mar 3, 2019

LinguList commented Mar 3, 2019

xrotwang commented Mar 3, 2019

LinguList commented Mar 3, 2019

xrotwang commented Mar 3, 2019

LinguList commented Mar 3, 2019

xrotwang commented Mar 15, 2024