Skip to content

Data Science for Social Good - Deutsche Krebsgesellschaft

Notifications You must be signed in to change notification settings

meiradania/dssg-dkg

Repository files navigation

Data Science for Social Good - Deutsche Krebsgesellschaft

To run the notebooks, anaconda installation is recommended: includes jupyter notebook, python 2.7 and all libraries except gensim: https://docs.continuum.io/anaconda/install

Python Notebooks Description:

  • 1_parsing_xml.ipynb - read xml exported from Reference Manager into dataframe.

  • 2_features_extraction.ipynb - transform full title string into numerical representation, by removing stopwords, lowering case and converting into dictionary.

  • 3_features_transformation.ipynb - transform documents from the dictionary vector representation into TF-IDF representation, and save as numpy array format.

  • 4_classification.ipynb - use TF-IDF representation of full title to classify documents into useful/not useful. Algorithms: logistic regression and k-means.

Python libraries used:

About

Data Science for Social Good - Deutsche Krebsgesellschaft

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published