SciDigest

With the growth of many fields, more papers are published annually. Researchers need to read tons of journals, but reading through hundreds of unstructured abstracts is time-consuming and irritating.

SciDigest is a deep learning model that structures abstracts to help researchers save time when reading scientific journals. This model receives abstracts as inputs and turns them into a structured abstract.

Example 1

Input:

We classify four qubit states under SLOCC operations, that is, we classify the orbits of the group on the Hilbert space . We approach the classification by realising this representation as a symmetric space of maximal rank. We first describe general methods for classifying the orbits of such a space. We then apply these methods to obtain the orbits in our special case, resulting in a complete and irredundant classification of -orbits on . It follows that an element of is conjugate to an element of precisely 87 classes of elements. Each of these classes either consists of one element or of a parameterised family of elements, and the elements in the same class all have equal stabiliser in . We also present a complete and irredundant classification of elements and stabilisers up to the action of where Sym4 permutes the four tensor factors of .

Output:

BACKGROUND

We classify four qubit states under SLOCC operations, that is, we classify the orbits of the group on the Hilbert space. We approach the classification by realising this representation as a symmetric space of maximal rank. We first describe general methods for classifying the orbits of such a space. We then apply these methods to obtain the orbits in our special case, resulting in a complete and irredundant classification of -orbits on.

RESULTS

It follows that an element of is conjugate to an element of precisely 87 classes of elements.

METHODS

Each of these classes either consists of one element or of a parameterised family of elements, and the elements in the same class all have equal stabiliser in .

RESULTS

We also present a complete and irredundant classification of elements and stabilisers up to the action of where Sym4 permutes the four tensor factors of .

example 2

Input:

The aim of this paper is to map the scientific landscape related to cancer research worldwide between 2012 and 2017. We use scientific publication data from Web of Science Core Collection and combine bibliometrics and social network analysis techniques to identify the most relevant journals, research areas, countries and research organizations in cancer scientific landscape. The results show: Oncotarget as the journal with most publications; a significant increase in China’s publications, reaching United States’ publications in 2017; MD Cancer Center, University of California and Harvard University as organizations with most publications; cell biology as the most frequent research area; breast, lung and colorectal cancer as the most frequent keywords; high density of co-authorship between organizations in the West, especially in the US, and low density between organizations in Asian and lower and medium income countries. Our findings can be used to guide a global knowledge platform guiding policy, planning and funding decisions as well as to establish new institutional collaborations.

Output:

BACKGROUND

The aim of this paper is to map the scientific landscape related to cancer research worldwide between 2012 and 2017. We use scientific publication data from Web of Science Core Collection and combine bibliometrics and social network analysis techniques to identify the most relevant journals, research areas, countries and research organizations in cancer scientific landscape.

RESULTS

The results show: Oncotarget as the journal with most publications; a significant increase in China’s publications, reaching United States’ publications in 2017; MD Cancer Center, University of California and Harvard University as organizations with most publications; cell biology as the most frequent research area; breast, lung and colorectal cancer as the most frequent keywords; high density of co-authorship between organizations in the West, especially in the US, and low density between organizations in Asian and lower and medium income countries.

CONCLUSIONS

Our findings can be used to guide a global knowledge platform guiding policy, planning and funding decisions as well as to establish new institutional collaborations.

SciDigest is trained on PubMed 200k RCT dataset.

Some of the model architecture will be referenced and based on:

Note: There's no source code given from the paper, everything here will my take on the paper's explaination

Goal

The goal of this project:

Replicate the model architecture in Paper 2
Beat the F1-Score of the model in Paper 1, that is 91.6

Model Architecture from Paper 2

My Implementation

Aside from the things that are not explained in the paper, and I decide on myself, here's a few changes:

My implementation of label optimization
Using Adam optimizer instead of SGD
Using a custom-trained embedding layer instead of GloVe

The best model's F1-score is 0.890/89.0 which is 0.026/2.6 lower than the target of 91.6. This model didn't succeed in reaching the goal. There are several things that can be improved with more time and computing power:

Adding some batch normalization layers
Adapting embedding layers on all the training sentences
Experimenting with the learning rate
Experiment with batch size.

Results

The best model's F1-score is 0.890/89.0 which is 0.026/2.6 lower than the target of 91.6. This model didn't succeed in reaching the goal. There are several things that can be improved with more time and computing power:

Adding some batch normalization layers
Adapting embedding layers on all the training sentences
Experimenting with the learning rate
Experiment with batch size.

Usage

Refer to the Usage.ipynb

Download the model
Download the structurizer module
Call the preprocess_and_strucurizer function from the module passing it the model from 1. and the abstract to structurize

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitattributes		.gitattributes
1_scidigest_eda_preprocessing.ipynb		1_scidigest_eda_preprocessing.ipynb
2_scidigest_model_training_mini_20k.ipynb		2_scidigest_model_training_mini_20k.ipynb
3_scidigest_model_training_200k_ipynb.ipynb		3_scidigest_model_training_200k_ipynb.ipynb
4_evaluation.ipynb		4_evaluation.ipynb
LICENSE		LICENSE
README.md		README.md
Usage.ipynb		Usage.ipynb
all_model_performance.png		all_model_performance.png
best_model.keras		best_model.keras
best_model_performance.png		best_model_performance.png
confusion_matrix.png		confusion_matrix.png
final_model_architecture.png		final_model_architecture.png
model_from_paper.png		model_from_paper.png
model_performance_mini_20k.png		model_performance_mini_20k.png
structurizer_module.py		structurizer_module.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciDigest

Example 1

example 2

Goal

Model Architecture from Paper 2

My Implementation

Results

Usage

About

Releases

Packages

Languages

License

Justin-Jonany/SciDigest

Folders and files

Latest commit

History

Repository files navigation

SciDigest

Example 1

example 2

Goal

Model Architecture from Paper 2

My Implementation

Results

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages