This repository includes different computational resources around lexical-semantic knowledge in Portuguese and can be seen as a follow-up to the Onto.PT and CONTO.PT projects (http://ontopt.dei.uc.pt/).
The following resources are included:
-
Large Portuguese Lexical-Semantic Knowledge Base (PT-LKB), with instances of lexical-semantic relations acquired from ten computational lexical resources: PAPEL (http://www.linguateca.pt/PAPEL/), Dicionário Aberto (dicionario-aberto.net), Wikcionário.PT (https://pt.wiktionary.org), TeP (http://www.nilc.icmc.usp.br/tep2/), OpenThesaurus.PT (http://paginas.fe.up.pt/~arocha/AED1/0607/trabalhos/thesaurus.txt), OpenWordNet-PT (https://github.com/own-pt/openWordnet-PT), PULO (http://wordnet.pt/), Port4Nooj (http://www.linguateca.pt/Repositorio/Port4Nooj/), WordNet.Br (http://www.nilc.icmc.usp.br/wordnetbr/), ConceptNet (http://conceptnet.io/)
-
PT-LKB embeddings, word embeddings learned from the structure of the large Portuguese LKB with node2vec.
-
TALES, an analogy-like test with lexical-semantic relations for assessing Portuguese word embeddings, with relations acquired from the large Portuguese LKB.
-
Analogies (TAP), an adaptation of the LX-4WAnalogiesPT analogy test to the BATS format, also adopted by TALES.
-
BATS-PT, a manual translation of the lexicographic portion of the Bigger Analogy Test Set (BATS) to Portuguese, covering ten types of lexico-semantic analogies, that can be used for assessing word embeddings and language models.