Skip to content

Latest commit

 

History

History
174 lines (129 loc) · 11.9 KB

README.md

File metadata and controls

174 lines (129 loc) · 11.9 KB

StructureDataset glottolog/glottolog: Glottolog database 5.1 as CLDF

CLDF Metadata: cldf-metadata.json

Sources: sources.bib.zip

Comprehensive reference information for the world's languages, especially the lesser known languages

property value
dc:bibliographicCitation Hammarström, Harald & Forkel, Robert & Haspelmath, Martin & Bank, Sebastian. 2024. Glottolog 5.1. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://glottolog.org)
dc:conformsTo CLDF StructureDataset
dc:identifier https://glottolog.org
dc:license https://creativecommons.org/licenses/by/4.0/
dcat:accessURL https://github.com/glottolog/glottolog-cldf
prov:wasDerivedFrom
  1. glottolog/glottolog-cldf v5.0
  2. Glottolog v5.1
prov:wasGeneratedBy
  1. glottolog/pyglottolog 3.14.1.dev0
  2. python: 3.12.3
  3. python-packages: requirements.txt
rdf:ID glottolog
rdf:type http://www.w3.org/ns/dcat#Distribution
property value
dc:conformsTo CLDF ValueTable
dc:extent 134506

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Language_ID string References languages.csv::ID
Parameter_ID string References parameters.csv::ID
Value string
Code_ID string References codes.csv::ID
Comment string
Source list of string (separated by ;) References sources.bib::BibTeX-key
codeReference string

This table lists parameters (or aspects) of languoids that Glottolog assigns values for, such as the languoid's position on the Glottolog classification or the descriptive status. Refer to the Description column in the table for details, and to the datatype columnn for information how values for the parameter should be interpreted.

property value
dc:conformsTo CLDF ParameterTable
dc:extent 7

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Description string
ColumnSpec json
type string
Valid choices:
categorical sequential other
Describes the domain of the parameter
infoUrl string URL (relative to aboutUrl) of a web page with further information about the parameter
datatype json CSVW datatype description for values for this parameter. I.e. content of the Value column of associated rows in ValueTable should be interpreted/parsed accordingly
Source list of string (separated by ;) Source describing the parameter in detail
References sources.bib::BibTeX-key

Table codes.csv

property value
dc:conformsTo CLDF CodeTable
dc:extent 29

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Parameter_ID string The parameter or variable the code belongs to.
References parameters.csv::ID
Name string
Description string
numerical_value integer Integer value associated with a code. Implements ordering for ordered parameter domains.

This table lists all Glottolog languoids, i.e. families, languages and dialects which are nodes in the Glottolog classification - including "non-genealogical" trees as described at https://glottolog.org/glottolog/glottologinformation . Thus, assumptions about the properties of a languoid listed here should be made after including associated information from ValueTable, in particular for languoid level and category. Locations (WGS 84 coordinates) for language groups, i.e. languoids of level "family are computed as recursive centroids as described at https://pyglottolog.readthedocs.io/en/latest/homelands.html#pyglottolog.homelands.recursive_centroids while locations for dialects are simply inherited from the associated languoids of level "language" in most cases.

property value
dc:conformsTo CLDF LanguageTable
dc:extent 26953

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Macroarea list of string (separated by ;)
Latitude decimal
≥ -90
≤ 90
Longitude decimal
≥ -180
≤ 180
Glottocode string
Regex: [a-z0-9]{4}[1-9][0-9]{3}
ISO639P3code string
Regex: [a-z]{3}
Level string
Valid choices:
language dialect family
Glottolog languoid level.
Countries list of string (separated by ;) ISO 3166-1 alpha-2 country codes for countries a language is spoken in.
Family_ID string Glottocode of the top-level genetic unit, the languoid belongs to
References languages.csv::ID
Language_ID string Glottocode of the language-level languoid, the languoid belongs to (in case of dialects)
References languages.csv::ID
Closest_ISO369P3code string ISO 639-3 code of the languoid or an ancestor if the languoid is a dialect. See also #13
First_Year_Of_Documentation integer The first year that an extinct languoid was documented (in the sense that there is data that pertains to it). Positive numbers are years AD, negative numbers are years BC.
Last_Year_Of_Documentation integer The last year that an extinct language was documented. (in the sense that there is data that pertains to it). Positive numbers are years AD, negative numbers are years BC.
Is_Isolate boolean Marks a language-level languoid as isolate, i.e. as language with no genetic relationship with other languages.

Table names.csv

Alternative names for Glottolog languoids from various sources.

property value
dc:extent 120134

Columns

Name/Property Datatype Description
ID string Primary key
Language_ID string References languages.csv::ID
Name string
Provider string
lang string

Table trees.csv

property value
dc:conformsTo CLDF TreeTable
dc:extent 247

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string Name of tree as used in the tree file, i.e. the tree label in a Nexus file or the 1-based index of the tree in a newick file
Description string Describe the method that was used to create the tree, etc.
Tree_Is_Rooted boolean
Valid choices:
Yes No
Whether the tree is rooted (Yes) or unrooted (No) (or no info is available (null))
Tree_Type string
Valid choices:
summary sample
Whether the tree is a summary (or consensus) tree, i.e. can be analysed in isolation, or whether it is a sample, resulting from a method that creates multiple trees
Tree_Branch_Length_Unit string
Valid choices:
change substitutions years centuries millennia
The unit used to measure evolutionary time in phylogenetic trees.
Media_ID string References a file containing a Newick representation of the tree, labeled with identifiers as described in the LanguageTable (the Media_Type column of this table should provide enough information to chose the appropriate tool to read the newick)
References media.csv::ID
Source list of string (separated by ;) References sources.bib::BibTeX-key

Table media.csv

property value
dc:conformsTo CLDF MediaTable
dc:extent 1

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Description string
Media_Type string
Regex: [^/]+/.+
Download_URL anyURI
Path_In_Zip string