Skip to content

This repository contains the code a trial Wikipedia article quality classification project, 09/2021

Notifications You must be signed in to change notification settings

rlnrbio/WikipediaClassification

Repository files navigation

WikipediaClassification

This repository contains the code for a trial Wikipedia article quality classification project, 09/2021

Idea

The idea behind this project was to build a Proof of Concept and to analyze how well conventional text classification algorithms as well as Neural Networks are able to evaluate the quality of Wikipedia articles automatically. For this, it utilizes articles as training and evaluation data that have manually been curated and have been assigned the "good article" batch by Wikipedia Editors. It is an example for the implementation of a simple, stand alone pipeline from data creation, curation and cleaning as well as analysis. The report included in this repository contains a detailed description of data sources, data processing and analysis and can be used for further improvements of conventional text classification models.

Code

About

This repository contains the code a trial Wikipedia article quality classification project, 09/2021

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages