Extracting structural, semantic, scholarly and other relevant features from scholarly articles to explain reproducibility
src/structural_features.py
: Quantitative and Qualitative information from the scholarly article.src/scholarly_features.py
: Scholarly meta information pertaining to the scholarly article.src/linguistic_features.py
: Linguistic indicators quantifying the language used in the scholarly article.
Quantitative and qualitative information pertaining to the structure of the scholarly article. This include information such as number of tables, figures, or algorithms in a given scholarly article. We developed python modules that parsed the PDF of the scholarly article in order to extract all of the aforementioned information. These features are mentioned in Table below:
TBA
Information pertaining to the scholarly article such as how many citations did a scholarly article receive on Google Scholar or number of times the PDF of the scholarly article is downloaded from the publisher website. All of this information was manually collected for all of the samples. These features are mentioned in Table below:
TBA
Linguistic indicators quantifying the language used in the scholarly article to articulate the findings of a study. These indicators include Polysemy, Hypernymy, L2 Readability, etc. All of this information was collected using Coh-Metrix web tool \citep{mcnamara2014automated}. The linguistic measures were based on the abstract of the scholarly article. These features are mentioned in Table below:
TBA