Skip to content

Wikimedia image classification and suggestings for article writers

Notifications You must be signed in to change notification settings

sleighsoft/seminar-knowledge-mining

 
 

Repository files navigation

Seminar Knowledge Mining

Code Climate

Wikimedia image classification and suggestions for article authors.

Set up instructions

Unix

  1. Install these dependencies by using your system's package manager if you don't have them already.

    Depdendency Apt Pacman Homebrew
    Python 3 python3 python
    Cython cython3 cython
    Pip python3-pip python-pip
    Virtualenv virtualenv python-virtualenv
    Fortran gfortran gcc-fortran
    Blas libblas-dev blas
    Lapack liblapack-dev lapack
    PNG libpng-dev libpng
    JPEG libjpeg8-dev libjpeg-turbo
    Freetype libfreetype6-dev freetype2
    Cairo libcairo2-dev cairo
    FFI libffi-dev
  2. Create a virtual environment inside the repository root by runnning virtualenv . or if you have multiple Python versions virtualenv -p python3 ..

  3. Activate your virtual environment using source bin/activate. Make sure that the repository name is in front of your shell promt now.

  4. Install dependencies inside your virtual environment

     pip install -r requirements.txt
    
  5. Install OpenCV 3.0 with bindings for Python 3 by running

     chmod +x tool/setup-opencv.sh
     tool/setup-opencv.sh
    
  6. UTF-8 is required, so you may need to add these lines to your ~/.bash_profile and apply the changes with source ~/.bash_profile.

     export LC_ALL=en_US.UTF-8
     export LANG=en_US.UTF-8
    

Windows

  1. Create a virtual environment inside the repository root by runnning virtualenv . or if you have multiple Python versions virtualenv -p C:\Python34\python.exe ..

  2. Activate your virtual environment using Scripts\activate. Make sure that the repository name is in front of your shell promt now.

  3. Download these dependencies. If in doubt, use the link before the last in each list. Run pip install <path-to-file> on each of those.

  4. Install remaining dependencies inside your virtual environment using pip install -r requirements.txt.

Workflows

Data set

  1. Download DBpedia dump
  2. Extract list of image names
  3. Fetch image and meta data of random entries
  4. Manually label data
  5. Balance amount of image per class

Training

  1. Preprocess data set
  2. Extract image and text based features
  3. Train classifier

Image classification

  1. Get user search term
  2. Query DBpedia for related images based on description
  3. Fetch image and meta data of first results
  4. Extract image and text based features
  5. Use trained classifier to predict class
  6. Store results in DBpedia

Usage

-h can be passed as a parameter to get a comprehensive list of parameters for all the classes listed below.

fetch_[source].py files can be used to download images for a test/training set from [source] to test the classifier.

extraction.py can be used to extract textual and visual features from images.

classifier.py can be used to classify images.

performance.py trains a different classifiers and measures their individual performance.

evaluation.py can be used to measure the performance of individual features.

About

Wikimedia image classification and suggestings for article writers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%