Skimlit: NLP Model for Sentence Classification in Paper Abstracts

Skimlit: NLP Model for Sentence Classification in Paper Abstracts

Project Overview 🌐

In this NLP project, I endeavored to replicate the paper "Neural Networks for Joint Sentence Classification in Medical Paper Abstracts" using the PubMed 200k RCT dataset. Leveraging my skills in the NLP field and knowledge of TensorFlow, I aimed to create an NLP model capable of classifying abstract sentences into specific roles (e.g., objective, methods, results) to facilitate efficient literature skimming for researchers.

Dependencies 🛠️

Data 📊

I utilized the "PubMed 200k RCT dataset" from GitHub. Due to time and resource constraints, I focused on the "20k RCT dataset" subset.

Model Architecture 🏗️

Token Inputs:
- String tokens
- Token embeddings using a pre-trained embedding layer (e.g., TensorFlow Hub)
- Dense layer with ReLU activation (128 units)
Char Inputs:
- String characters
- Character vectorization
- Character embedding
- Bidirectional LSTM layer with 32 units
Line Numbers Inputs:
- Integer sequence representing line numbers (15 elements)
- Dense layer with ReLU activation (32 units)
Total Lines Inputs:
- Integer sequence representing total lines (20 elements)
- Dense layer with ReLU activation (32 units)
Hybrid Embedding:
- Concatenation of outputs from the token and char models
- Dense layer with ReLU activation (256 units)
- Dropout layer with a dropout rate of 0.5
Tribrid Embedding:
- Concatenation of outputs from line number, total line, and hybrid embeddings
Output Layer:
- Dense layer with softmax activation (5 units) for multi-class classification
Overall Model:
- Constructed by specifying inputs and outputs for line number, total line, token, and char models.

Training 🚀

I trained a TF-IDF Multinomial Naive Bayes model as a baseline. Subsequently, I trained the deep learning model described in the architecture above.

Evaluation 📈

Results 🎉

Usage 🚀

To run the code:

Clone the repository: https://github.com/mouraffa/Skimlit-NLP-Model-for-Sentence-Classification-in-Paper-Abstracts.git
Open the notebook in Google Colab: Skimlit_NLP_Model_for_Sentence_Classification_in_Paper_Abstracts.ipynb
Follow the instructions within the notebook to execute the code.

Feel free to reach out for questions or collaboration! 🤝

Mouraffa youssef
Mouraffayoussef@gmail.com
My LinkedIn Profile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Skimlit: NLP Model for Sentence Classification in Paper Abstracts

Project Overview 🌐

Dependencies 🛠️

Data 📊

Model Architecture 🏗️

Training 🚀

Evaluation 📈

Results 🎉

Usage 🚀

Files

README.md

Latest commit

History

README.md

File metadata and controls

Skimlit: NLP Model for Sentence Classification in Paper Abstracts

Project Overview 🌐

Dependencies 🛠️

Data 📊

Model Architecture 🏗️

Training 🚀

Evaluation 📈

Results 🎉

Usage 🚀