Skip to content

This repository contains files and information about step 3 of Kaphta Architecture: Indexing of Extracted Information, using the R language.

Notifications You must be signed in to change notification settings

ramongsilva/Indexing-of-extracted-information

Repository files navigation

Indexing of Extracted Information

This repository contains files and information about step 3 of Kaphta Architecture: Indexing of Extracted Information. In this stage, PubMed abstracts with extracted information (Information Extraction step) are indexed. There are 2 indexations, using the R language: Individual and Cross indexations. The individual indexations are for entities about polyphenols, cancer and genes, and the cross indexations are for polyphenol-cancer and polyphenol-gene entity associations. The following are listed the files and results of this stage.

For more information about this and other steps of the Kaphta Architecture, see sections of the Kaptha Web Tool available in https://portal.ifsuldeminas.edu.br/kaphtawebtool/.

Individual and Cross indexations

  • indexing-information-extracted-gh.R: R script for individual and cross indexation of extracted information from PubMed abstracts about polyphenols anticancer activity, using the inverted index.
  • functions.R: script with auxiliary functions. Save this file in the same folder of indexing-information-extracted-gh.R script, because it is needed to execute this script.
  • db_total_project.db: SQLite Database needed to execute all R scripts of kaphta architecture steps. This database contains tables with the Entity dictionary, Total PubMed abstracts textual corpus, and Pubmed abstracts classified as positive in text classification. Save this file in the same folder of indexing-information-extracted-gh.R script, because it is needed to execute this script.
  • entities-recognized: folder with files resulted of NER task, containing extracted information about named entities (polyphenols, cancers and genes) recognized on PubMed abstracts in the previous stage (Information Extraction step). Save this folder with the files in the same folder of indexing-information-extracted-gh.R script, because it is needed to execute this script, on the indexation task.
  • Rule_associations_recognized.rar: compacted file resulted of AR task in the previous stage (Information Extraction step), containing the PubMed abstract sentences with at least one rule from rules dictionary recognized. Save this file in the same folder of indexing-information-extracted-gh.R script, because it is needed to execute this script, on indexation tasks.

Results

Below are presented files from the results folder, with the results for individual and cross indexation of PubMed abstracts.

Individual indexation

Cross indexation

About

This repository contains files and information about step 3 of Kaphta Architecture: Indexing of Extracted Information, using the R language.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages