Skip to content
/ kawa Public

Multilingual lexicon & ontology, with word embeddings and entity linking

License

Notifications You must be signed in to change notification settings

ontocord/kawa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kawa

Kawa (川) pronounced ka-va means river in Japanese and is a multilingual lexicon and ontology, with word embeddings and entity management and linking. It is intended to be used to assist in text data mining.

License

  • The source code authored by Ontocord LLC and contributed by contributors of this project is licensed under Apache 2.0.
  • The ontology data is derived from Conceptnet, Yago, wordnet, and Wikiann and is mostly licensed under a CC license.

Yago

Yago is licensed under CC BY 4.0. https://yago-knowledge.org/

Conceptnet 5 Licensing Info

Below is information on the licensing of Conceptnet 5 from the authors of Conceptnet 5 generally under a CC BY SA 4.0 (http://conceptnet.io):

This work includes data from ConceptNet 5, which was compiled by the Commonsense Computing Initiative. ConceptNet 5 is freely available under the Creative Commons Attribution-ShareAlike license (CC BY SA 4.0) from http://conceptnet.io.

The included data was created by contributors to Commonsense Computing projects, contributors to Wikimedia projects, DBPedia, OpenCyc, Games with a Purpose, Princeton University's WordNet, Francis Bond's Open Multilingual WordNet, and Jim Breen's JMDict. Credits and acknowledgements ConceptNet has been developed by:

The MIT Media Lab, through various groups at different times:

Commonsense Computing Software Agents Digital Intuition The Commonsense Computing Initiative, a worldwide collaboration with contributions from:

National Taiwan University Universidade Federal de São Carlos Hokkaido University Tilburg University Nihon Unisys Labs Dentsu Inc. Kyoto University Yahoo Research Japan Luminoso Technologies, Inc.

Significant amounts of data were imported from:

WordNet, a project of Princeton University Open Multilingual WordNet, compiled by Francis Bond and Kyonghee Paik Wikipedia and Wiktionary, collaborative projects of the Wikimedia Foundation Luis von Ahn's "Games with a Purpose" JMDict, compiled by Jim Breen CC-CEDict, by MDBG The Unicode CLDR DBPedia Here is a short, incomplete list of people who have made significant contributions to the development of ConceptNet as a data resource, roughly in order of appearance:

Push Singh Catherine Havasi Hugo Liu Hyemin Chung Robyn Speer Ken Arnold Yen-Ling Kuo Joshua Chin Joanna Lowry-Duda Robert Beaudoin Naoki Otani Vanya Cohen Licenses for included resources Commonsense Computing The Commonsense Computing project originated at the MIT Media Lab and expanded worldwide. Tens of thousands of contributors have taken some time to teach facts to computers. Their pseudonyms can be found in the "sources" list found in ConceptNet's raw data and in its API.

Games with a Purpose Data collected from Verbosity, one of the CMU "Games with a Purpose", is used and released under ConceptNet's license, by permission from Luis von Ahn and Harshit Surana.

Verbosity players are anonymous, so in the "sources" list, data from Verbosity is simply credited to the pseudonym "verbosity".

Wikimedia projects ConceptNet uses data directly from Wiktionary, the free dictionary. It also uses data from Wikipedia, the free encyclopedia via DBPedia.

Wiktionary and Wikipedia are collaborative projects, authored by their respective online communities. They are currently released under the Creative Commons Attribution-ShareAlike license.

Wikimedia encourages giving attribution by providing links to the hosted pages that the data came from, and DBPedia asks for the same thing in turn. In addition to crediting the assertions that came from Wiktionary and DBPedia, we also provide "ExternalURL" edges pointing to the page that they came from. For example, the term /c/de/sprache has an ExternalURL link pointing to http://en.wiktionary.org/wiki/Sprache. Its list of individual contributors can be seen by following its "History" link.

The URLs of links to DBPedia are the same as the resource names that DBPedia uses, encouraging interoperability with their linked data.

WordNet WordNet is available under an unencumbered license: see http://wordnet.princeton.edu/wordnet/license/. Its text is reproduced below:

WordNet Release 3.0

This software and database is being provided to you, the LICENSEE, by Princeton University under the following license. By obtaining, using and/or copying this software and database, you agree that you have read, understood, and will comply with these terms and conditions.:

Permission to use, copy, modify and distribute this software and database and its documentation for any purpose and without fee or royalty is hereby granted, provided that you agree to comply with the following copyright notice and statements, including the disclaimer, and that the same appear on ALL copies of the software, database and documentation, including modifications that you make for internal use or for distribution.

WordNet 3.0 Copyright 2006 by Princeton University. All rights reserved.

THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

The name of Princeton University or Princeton may not be used in advertising or publicity pertaining to distribution of the software and/or database. Title to copyright in this software, database and any associated documentation shall at all times remain with Princeton University and LICENSEE agrees to preserve same.

Open Multilingual WordNet Open Multilingual WordNet was compiled by Francis Bond, Kyonghee Paik, and Ryan Foster, from data provided by many multilingual WordNet projects. Here is the complete list of references to the projects that created the data.

Wikiann

The code for wikiann/mmner is under Apache 2, and the underlying data is wikipedia data which is CC-BY-SA.

See https://github.com/afshinrahimi/mmner and https://huggingface.co/datasets/wikiann.

About

Multilingual lexicon & ontology, with word embeddings and entity linking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages