Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagel-2018-200 #433

Merged
merged 9 commits into from
Feb 16, 2018
Merged

Pagel-2018-200 #433

merged 9 commits into from
Feb 16, 2018

Conversation

chrzyki
Copy link
Contributor

@chrzyki chrzyki commented Feb 14, 2018

#421

If not ambiguous I went for the most general concept available, if ambiguous I left the respective concepts unlinked.

Copy link
Contributor

@LinguList LinguList left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use, apparently, as many as three different lists, one for ABVD, one for IELex, one for Bantu. They then use some subsets, so the 200 Item list used here is neither the Indo-European (207 items) nor the list of ABVD (210), but their smaller list is also a different one:

We use three published lexical datasets. The IE data comprise the words for 200 meanings in each of 103 languages [3]. The Austro- nesian data comprise 210 meanings and 400 languages [11], and here we use the 154 meanings with fewer than 200 cognate classes (see electronic supplementary material).

So what are those 200 meanings from? I think, if they supplement those 154 meaniongs with some extra infom this is better than the 200 meanings of which we don't know what they are supposed to mean (we know what they are: basic Swadesh, but it's not clear how that relates to the paper).

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 14, 2018

Hm, I see. Should have checked with you first for access to the paper/feedback on the precise terminology and work flow. Sorry. :) How to proceed from here?

@LinguList
Copy link
Contributor

I guess they say a bit more in supplementary material and paper about this, right? So just check what they say there, and if they really give those 200 basic concepts IN their supmat, maybe check overlap with ABVD (Greenhill 2008 210) and IELex (Dunn 2012 207) to see if it's any of those lists, if not, the better...

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 16, 2018

We use three published lexical datasets. The IE data comprise the
words for 200 meanings in each of 103 languages [3]. The Austro-
nesian data comprise 210 meanings and 400 languages [11], and
here we use the 154 meanings with fewer than 200 cognate classes
(see electronic supplementary material). The Bantu data comprise
424 languages and 102 meanings [12]. The meanings in these data-
sets are taken principally from the Swadesh fundamental
vocabulary 200-word list [2]. The raw data for the IE and Austrone-
sian languages are available upon request from the authors of those
studies, and for the Bantu they are made available as part of the
supplementary information to that paper. Alternatively, the IE
data are available at IELex (ielex.mpi.nl) and the Austronesian
data are made available in the Austronesian Basic Vocabulary
Database (ABVD, language.psy.auckland.ns/austronesian).

Given this, I've compared the S1 list with ABVD and IELEx (but it probably needs someone with a better understanding of the lists like you to decided what to do with that). There are probably more shared concepts than hinted at by the numbers because I didn't correct for entries that most likely refer to one and the same concept but where spelled/expressed differently.

Hope that helps?

Oh, and I fixed on concept I missed in the S1 (liver). Will squash and merge if we decide what to do with the list.

@LinguList
Copy link
Contributor

Okay, obvious case, you used the "intersection" command, right? That means, just write in the description: the 200 concepts come from the SI, they constitute combination of concepts from different lists, like ABVD (Greenhill-2008-210), IELex (Dunn-2012-207), and the Bantu list (Grollemund-2015-102).

Then, add one sentence on the extra columns and the information in there.

Make sure to check whether it IS Grollemund (didn't find time to check this, as I'm in a meeting now).

Thanks!

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 16, 2018

Sure, will do. Used my own list comparison thingie but only because I forgot about intersection in pyconcepticon. :) Sorry, what do you mean with 'extra columns'?

@LinguList
Copy link
Contributor

The ones you specified in the json, or aren't there any?

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 16, 2018

Ah, that was separate from the list sources. Yeah, there is one measure that I included in the table & in the meta data information. Will specify what it contains in the comment.

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 16, 2018

Yup, it's Grollemund (2015). Will edit PR momentarily.

Curiously enough (unless I'm missing something), the authors here (Pagel 2018) state the Bantu list has got 102 meanings, while, from everything I can tell, there appear only to be 100 meanings?

@LinguList
Copy link
Contributor

The original list they took 100 from has 159 meanings (see the paper), so maybe they took 2 more, but it's more likely a spelling error (?). Maybe add this to the note. Saying: they say having sample 102, but to our knowledge it's only 100. I tend to do this often to draw readers' attention to the fact that there are often inconsistencies (compare my note to the swadesh 215 list).

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 16, 2018

So, this should be it. Also discovered a problematic bibliography entry that was there for quite some time but only caused an issue now, probably due to a chain of references? If you don't immediately what caused this (i.e. the parser only failing now), I'll prepare an issue for me and investigate this.

Also: Thankfully we have easy squashing and merging for PRs. :)

@@ -214,3 +214,4 @@ Foley-1986-50 Foley, W. A. 1986 50 basic,areal English New Guinea Foley1986 Fo
Luniewska-2016-299 Łuniewska, Magdalena AND Haman, Ewa AND Armon-Lotem, Sharon AND Etenkowski, Bartłomiej AND Southwood, Frenette AND Anđelković, Darinka AND Blom, Elma AND Boerma, Tessel AND Chiat, Shula AND Engel de Abreu, Pascale AND Gagarina, Natalia AND Gavarró, Anna AND Håkansson, Gisela AND Hickey, Tina AND Jensen de López, Kristine AND Marinis, Theodoros AND Popović, Maša AND Thordardottir, Elin AND Blažienė, Agnė AND Cantú Sánchez, Myriam AND Dabašinskienė, Ineta AND Ege, Pınar AND Ehret, Inger-Anne AND Fritsche, Nelly-Ann AND Gatt, Daniela AND Janssen, Bibi AND Kambanaros, Maria AND Kapalková, Svetlana AND Kronqvist, Bjarke AND Kunnari, Sari AND Levorato, Chiara AND Nenonen, Olga AND Nic Fhlannchadha, Siobhán AND O’Toole, Ciara AND Polišenská, Kamila AND Pomiechowska, Barbara AND Ringblom, Natalia AND Rinker, Tanja AND Roch, Maja AND Savić, Maja AND Slančová, Daniela AND Tsimpli, Ianthi Maria AND Ünal-Logacev, Özlem 2016 299 acquisition Afrikaans, Catalan, Danish, Dutch, English, Finnish, German, Greek, Hebrew, Hungarian, Icelandic, Irish, Xhosa, Italian, Lithuanian, Luxembourgish, Maltese, Polish, Russian, SouthAfricanEnglish, Serbian, Slovak, Spanish, Swedish, Turkish Global https://link.springer.com/article/10.3758%2Fs13428-015-0636-6 Luniewska2016 A list containing 299 items, listing the average age of acquisition for each respective item in 25 languages. 1154-1177
Gampe-2017-48 Gampe, Anja AND Kurthen, Ira AND Daum, Moritz M. 2017 48 acquisition English Global http://journals.sagepub.com/doi/full/10.1177/0142723717736450 Gampe2017 A list that was developed to assess bilingual children's vocabulary (Swiss German native speakers). The MEAN* columns contain information about the average correctness in the lexical decision task for each item. Items are based on [Luniewska 299](:ref:Luniewska-2016-299).
Auracher-2017-16 Auracher, Jan 2017 16 acquisition English Global https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5665516/ Auracher2017 Small list designed for eliciting reaction times to animal pictures. The animal list was included as one experiment (in a total of three different experiments) to differentiate between the reaction times of nonsense words and emotional body postures.
Pagel-2018-200 Pagel, Mark AND Meade, Andrew 2018 200 basic English Global http://rstb.royalsocietypublishing.org/content/373/1740/20160517 Pagel2018 A list designed to serve as a basis for discussing the lexical replacement rates of number words. The list is based on [Blust 210](:ref:Blust-2008-210), [Dunn 207](:ref:Dunn-2012-207), and [Grollemund 100](:ref:Grollemund-2015-100). The RATE columns descrbibes the lexical replacement rate of an item per annum, i.e. its lexical stability. Note that the authors state that they took 102 meanings from [Grollemund 100](:ref:Grollemund-2015-100), whereas, to the best of our knowledge, the list only containts 100 meanings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to reword:

The list is based on [Blust 210](:ref:Blust-2008-210), [Dunn 207](:ref:Dunn-2012-207), and [Grollemund 100](:ref:Grollemund-2015-100). The RATE columns descrbibes the lexical replacement rate of an item per annum, i.e. its lexical stability. Note that the authors state that they took 102 meanings from [Grollemund 100](:ref:Grollemund-2015-100), whereas, to the best of our knowledge, the list only containts 100 meanings.		

to

The list is based on the [210-item list of ABVD](:ref:Blust-2008-210), the  [207-item list of IELex](:ref:Dunn-2012-207), and the [100-item list by Grollemund et al.](:ref:Grollemund-2015-100). The RATE column describes the lexical replacement rate of an item per year, i.e. its lexical stability. Note that the authors state that they took 102 meanings from [Grollemund et al. (2015)](:bib:Grollemund2015), whereas, to the best of our knowledge, the list only contains 100 meanings.	

Also, please provide the link to the SI, not the paper as URL.

Pagel-2018-200-3 3 animal 619 ANIMAL 0.00019532
Pagel-2018-200-4 4 ashes 646 ASH 0.00022033
Pagel-2018-200-5 5 at 1461 AT 0.000316527
Pagel-2018-200-6 6 back 0.000279398
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is oviously the spine, that is BACK in concepticon, we have tthis concept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not a question of whether the concept is available but rather: back (behind), back (of an animal), back (in time), back (in direction).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously the one to which both IELex and ABVD and the Bantu list link (I suppose it's the body part)

Pagel-2018-200-5 5 at 1461 AT 0.000316527
Pagel-2018-200-6 6 back 0.000279398
Pagel-2018-200-7 7 bad 1292 BAD 0.000430415
Pagel-2018-200-8 8 bark 0.000266899
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the tree bark, we have it (if you check how the lists were linked)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming from Swadesh? Will link to tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to tree, to "bark". Obviously, Swadesh would not want to have a sound symbolism word in his basic vocabulary list, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, meant to the tree idea of bark. Sorry about the ambiguity.

Pagel-2018-200-13 13 bite 1403 BITE 0.000344925
Pagel-2018-200-14 14 black 163 BLACK 0.000144405
Pagel-2018-200-15 15 blood 946 BLOOD 0.00014066
Pagel-2018-200-16 16 blow 0.000240553
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the three original lists, this is "blow (of wind)"

Pagel-2018-200-21 21 cloud 1489 CLOUD 0.000190384
Pagel-2018-200-22 22 cold 1287 COLD 0.000192004
Pagel-2018-200-23 23 come 1446 COME 0.000157998
Pagel-2018-200-24 24 count 0.000210452
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be in concepticon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count (verb) or count (of something)?

Pagel-2018-200-30 30 dog 2009 DOG 0.000172762
Pagel-2018-200-31 31 drink 1401 DRINK 0.00011108
Pagel-2018-200-32 32 dry 1398 DRY 0.000122853
Pagel-2018-200-33 33 dull 0.000386042
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

means "blunt as in blunt of knife", we have it in concepticon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming from Swadesh? Otherwise ambiguous dull (stupid), dull (blunt)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The don't have "dull" as "stupid" in ABVD, right?

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 16, 2018

Yeah, all the remarks make sense. But I couldn't disambiguate them without knowing the original list. Will do that with that knowledge.

Pagel-2018-200-23 23 come 1446 COME 0.000157998
Pagel-2018-200-24 24 count 0.000210452
Pagel-2018-200-25 25 cut 1432 CUT 0.000247422
Pagel-2018-200-26 26 day 7.73722E-05
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

day is opposite to night (obvious from original mapping in IELex and ABVD)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

Pagel-2018-200-33 33 dull 0.000386042
Pagel-2018-200-34 34 dust 2 DUST 0.000310687
Pagel-2018-200-35 35 ear 1247 EAR 4.94428E-05
Pagel-2018-200-36 36 earth 0.000214117
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earth is probably earth/soil, as it should occur in both ABVD and IELex, I'm sure

Pagel-2018-200-51 51 float 1574 FLOAT 0.000349614
Pagel-2018-200-52 52 flow 2003 FLOW 0.000278764
Pagel-2018-200-53 53 flower 239 FLOWER 0.000123561
Pagel-2018-200-54 54 fly 0.00011732
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fly is always the verb in these lists

Pagel-2018-200-86 86 leaf 628 LEAF 0.000226195
Pagel-2018-200-87 87 left 244 LEFT 0.000291936
Pagel-2018-200-88 88 leg 1297 LEG 0.000293917
Pagel-2018-200-89 89 lie 0.000226318
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lie probably "lie (down)", but checking with ABVD and IELex should show this clearly

Pagel-2018-200-89 89 lie 0.000226318
Pagel-2018-200-90 90 live 1422 BE ALIVE 7.1041E-05
Pagel-2018-200-91 91 liver 1224 LIVER 0.000168724
Pagel-2018-200-92 92 long 0.000127153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long is not linked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could check original list, otherwise ambiguous? long (distance), long (duration), ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, as in all cases, you should check the mother list from which this came. There we usually have the original meaning, and long, for example prototypically points to distance.

Pagel-2018-200-114 114 push 1452 PUSH 0.000462207
Pagel-2018-200-115 115 rain 2108 RAINING OR RAIN 0.000119644
Pagel-2018-200-116 116 red 156 RED 0.000199829
Pagel-2018-200-117 117 right 0.000247233
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right is obviously "correct", as they have rightside!

Pagel-2018-200-139 139 sleep 1585 SLEEP 0.000159814
Pagel-2018-200-140 140 small 1246 SMALL 0.000220408
Pagel-2018-200-141 141 smell 2124 SMELL 0.000361635
Pagel-2018-200-142 142 smoke 0.000151871
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smooke is smoke (exhaust), they wouldn't use "to smoke" in a basic list

Copy link
Contributor

@LinguList LinguList left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see my comments below, I think they are easy to correct. As a general rule: if you know list A is derived from list B, and list B has unambigious mappings, use B to understand A ;)

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 16, 2018

Yupp. Thanks. Will take better note of that for future lists. :)

Pagel-2018-200-90 90 live 1422 BE ALIVE 7.1041E-05
Pagel-2018-200-91 91 liver 1224 LIVER 0.000168724
Pagel-2018-200-92 92 long 0.000127153
Pagel-2018-200-92 92 long 2143 LONG OR TALL 0.000127153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but long is still 1203 long, not long or tall (IELex and ABVD both link to LONG).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to be sorry, I'm a little bit confused and not paying the attention I should be paying. Sorry about that.

@chrzyki
Copy link
Contributor Author

chrzyki commented Feb 16, 2018

Sorry for taking longer than necessary with that list. Learned a lot, though, and will hopefully be faster with following lists.

@LinguList
Copy link
Contributor

That's the spirit. Learning by doing, even if it's the hard way ;) Feel free to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants