GitHub - AirlineDog/science-disruptiveness: Machine learning project that uses paper abstracts and titles to predict their disruptiveness

Predicting Scientific Paper Disruptiveness

Introduction

This project aims to predict the disruptiveness of scientific papers using machine learning techniques. Disruptiveness is measured using the CD5 index, which quantifies how likely a paper's citations are to cite the paper's own references. A higher CD5 index indicates a more disruptive paper.

Dataset

The dataset used is derived from the Alexandria3k package and contains a 1% sample of CrossRef available publications with abstracts.

Methodology

Data Preprocessing:
- Remove HTML tags, numbers, and special characters from abstracts and titles.
- Lemmatize words in abstracts and titles.
Feature Engineering:
- Create additional features based on the dataset, such as vocabulary uniqueness, number of authors and references, average reference year, and author affiliations.
Model Training:
- Train various machine learning models, including:
  - Decision Tree Regressor
  - XGBoost and LightGBM Regressor
  - Voting and Stacking Ensemble methods
  - Neural Network
- Use cross-validation to evaluate model performance and tune hyperparameters.
Evaluation:
- Evaluate models using the mean absolute error metric.
- Compare the performance of different models to identify the best-performing one.

Results

The results indicate that while the models can predict disruptiveness to some extent, the accuracy is limited. This could be due to factors such as the complexity of the task, the limited dataset size, and the potential limitations of the features used.

Future Work

Future work could explore the following:

Incorporate additional features, such as semantic analysis of text.
Experiment with different machine learning algorithms and techniques.
Collect a larger and more diverse dataset.
Investigate the impact of linguistic changes on disruptiveness.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
README.md		README.md
science-disruptiveness.ipynb		science-disruptiveness.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Scientific Paper Disruptiveness

About

Releases

Packages

Languages

AirlineDog/science-disruptiveness

Folders and files

Latest commit

History

Repository files navigation

Predicting Scientific Paper Disruptiveness

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages