Skip to content

BangLiu/ConcepT

Repository files navigation

ConcepT

We include the following resources from our ConcepT system in this repository:

  • UCCM dataset.txt The UCCM dataset. It is used to evaluate the performance of our approach for concept mining and it contains 10,000 samples.
  • concept_tagging_data.txt The document tagging dataset. It is used to evaluate the document tagging accuracy of ConcepT, and it contains 11,547 documents with concept tags.
  • topic_concept_instance_taxonomy_data.sample.txt Topic-concept-instance taxonomy. It contains 1000 topic-concept-instance samples from our constructed taxonomy.
  • seed_concept_patterns.txt The seed concept patterns for bootstrapping-based concept mining. It contains the seed string patterns we utilized for bootstrapping-based concept mining from queries.
  • topics.txt Pre-defined topic list. It contains our 31 pre-defined topics for taxonomy construction.

The above datasets are published for research purpose.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published