Natural Language Process Project

Delivery-1

Preprocessing

There are three steps to extract collocations

Preprocessing
- Removing stopwords
- Stemming and Lemmatization
Extracting the Bigrams and Trigrams
Generating Collocations using related methods

N-GRAM Frequencies

Methods

Raw Frequency
PMI
T-Test
Chi-Square
Likelihood Ratio
Poisson Stirling

Raw Frequency Results

PMI Results

T-Test Results

Chi-Square Results

Likelihood Ratio Results

Poisson Stirling Results

Comparison of Methods

Delivery-2

Development Process

There are three steps to build a classifier for classification

Preprocessing
- Removing stopwords
- Tokenization
- Stemming
Classifier
Evaluation

Preprocessing

Dataset words converted to lowercase format
Punctuation marks from the dataset words are removed
Tokenization process by using filtering options on the dataset words like extracting stopwords, applying some regex patterns
Stemmization applied on the dataset words

General Structure

Files from dataset are read
Labels created according to dataset
Data.json file is created which holds labels we decided

Generating Training Datasets

Write data into CSV file
- If there is no available train set created before, the csv file is created
Read train set from CSV
- If there is a previously created csv file that is available, the file is read. “Suç” and “İçtihat” are prepared as lists
Split Dataset
- Common approach is used for splitting the dataset which was 80% for training set and 20% for test set
Vectorize
- Using the data from our test set, a TF-IDF matrix is created

Classifiers

Support Vector Machines (specifically, linear SVM)
Multinomial Naive Bayes
Logistic Regression

Support Vector Machine Results

Multinomial Naive Bayes Results

Logistic Regression Results

Delivery-3

Classifiers

FastText
LSTM

FastText

Dataset is taken from previous iteration
Labels created according to dataset
Label names concatenated with underscore to prevent ambiguity such as

__label__ tag is added to labels for model creation

FastText Results

LSTM Results

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Anıl Şenay}

_{Bilgehan Geçici}

_{Kürşat Açıkgöz}

_Beyza

_{Ahmet Önkol}

_{Ahmet Elburuz Gürbüz}

⚠️

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Delivery 1 - Collocations		Delivery 1 - Collocations
Delivery 2 - Machine Learning		Delivery 2 - Machine Learning
Delivery 3 - Deep learning		Delivery 3 - Deep learning
Delivery 4		Delivery 4
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Process Project

Table of Contents

Delivery-1

Preprocessing

N-GRAM Frequencies

Methods

Raw Frequency Results

PMI Results

T-Test Results

Chi-Square Results

Likelihood Ratio Results

Poisson Stirling Results

Comparison of Methods

Delivery-2

Development Process

Preprocessing

General Structure

Generating Training Datasets

Classifiers

Support Vector Machine Results

Multinomial Naive Bayes Results

Logistic Regression Results

Delivery-3

Classifiers

FastText

FastText Results

LSTM Results

Contributors ✨

About

Releases

Packages

Contributors 3

Languages

anilsenay/CSE4095S22_Grp4

Folders and files

Latest commit

History

Repository files navigation

Natural Language Process Project

Table of Contents

Delivery-1

Preprocessing

N-GRAM Frequencies

Methods

Raw Frequency Results

PMI Results

T-Test Results

Chi-Square Results

Likelihood Ratio Results

Poisson Stirling Results

Comparison of Methods

Delivery-2

Development Process

Preprocessing

General Structure

Generating Training Datasets

Classifiers

Support Vector Machine Results

Multinomial Naive Bayes Results

Logistic Regression Results

Delivery-3

Classifiers

FastText

FastText Results

LSTM Results

Contributors ✨

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages