CDMTraitSemanticAnalysis project

The purpose of this project is to find traits for entities and attributes inside CDM (Common Data Model) schema documents. The schema documents folder can be found in the project.

The proposed traits are being found by running NLP analysis on the name and descriptions of every entity.

The Jaccard index between the set of generated and sample traits is above 0.7

The project uses both NLTK and Spacy as NLP processing libraries in order to tokenize, stem, lemma and do vector-based comparison of the description sencences. In order to run it, just install the requirements and run main.py to follow additional instructions.

Example:

Attribute name: agingId

Description: Represents the Microsoft's subsidiary age ID that have positive ROI every year.

Proposed traits: ['means.demographic.age', 'means.measurement.age', 'means.identity', 'means.idea.company', 'means.idea.organization', 'means.idea.organization.unit', 'means.identity.company.name']

As it is clear from the proposed set of traits, the analyzer will try to find appropriate features inside the description while ignoring the ones that are not important to find the correct traits.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
CDM.SchemaDocuments		CDM.SchemaDocuments
handwritten-examples		handwritten-examples
LICENSE		LICENSE
README.md		README.md
attribute_extractor.py		attribute_extractor.py
attribute_name_analyzer.py		attribute_name_analyzer.py
description_analyzer.py		description_analyzer.py
main.py		main.py
nlp_utility.py		nlp_utility.py
noise_manager.py		noise_manager.py
requirements.txt		requirements.txt
trait_analyzer.py		trait_analyzer.py
trait_extractor.py		trait_extractor.py
trait_to_attribute_matcher.py		trait_to_attribute_matcher.py
validation_runner.py		validation_runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDMTraitSemanticAnalysis project

About

Releases

Packages

Contributors 2

Languages

License

nenad1002/CDM-trait-semantic-analysis

Folders and files

Latest commit

History

Repository files navigation

CDMTraitSemanticAnalysis project

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages