GitHub - SpellOnYou/korean-sarcasm: Construct text corpus data and corresponding model for automatic sarcasm detection on korean.

Kocasm : korean automatic sarcasm detection

Why this name? Kocasm is blend word, Korean + sarcasm

Why Irony detection is important?

Because it converts or distorts literal meaning of sentence, sarcasm is highly related to Sentiment Classification.

Preparing the data

HTML data gathered from a twitter
Data is composed of label 1,0.
- label 1: sarcasm, label0: randomly gatherd
korean data, queries for hashtags such as 역설, 아무말, 운수좋은날, 笑, 뭐래 아닙니다, 그럴리없다, 어그로, irony sarcastic, sarcasm was labeled as True data.(so still has lots of noise)
And pre-processed dataset (1) user anonymous (2) removing hashtag (3) removing url process.

If you have any other questions with corpus, please contacts me
- jiwon.kim.096@gmail.com

If you want to compare with other dataset, refer: [English]

ghosh: This english dataset collected by Aniruddha Ghosh and Tony Veale. See their repository and paper, Fracking Sarcasm using Neural Network

Language Model (It is still being editting)

bag_of_words.py: Basic bayesian model
dl_models.py: Model classes for a general transformer
tf_attention_models.py : Tensorflow attentive rnn model

I'm strongly inspired by MirunaPislar's code and I referred a lot to that codes, but I tried to make my codes more pythonic and pytorchic style. Actually, I am still modifying the code.
Kokasm is compatible with: Python 2.7-3.7

In case with your own data, clone this repository and...

export DATA_DIR=/path/to/data
export PREP_DIR=/path/to/preprocess
export SAVE_DIR=/path/to/save

python tf_attention_models.py \
    --mode train \
    --model_cfg config/attention_base.json \
    --data_file $DATA_DIR/jiwon/train.csv \
    --test_file $DATA_DIR/jiwon/test.csv \
    --pretrain_file $BERT_PRETRAIN \
    --vocab PREP_DIR/vocab.txt \
    --save_dir $SAVE_DIR \
    --max_len 128

Citation

If you found this dataset useful, please cite as:

@misc{kim2019kocasm,
  author = {Kim, Jiwon and Cho, Won Ik},
  title = {Kocasm: Korean Automatic Sarcasm Detection},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/SpellOnYou/korean-sarcasm}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
config		config
data		data
img		img
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kocasm : korean automatic sarcasm detection

Why Irony detection is important?

Preparing the data

Language Model (It is still being editting)

In case with your own data, clone this repository and...

Citation

See also

linguistic, computer science related to sarcasm

Kaggle - Twitter Inory Detection

About

Releases

Packages

Contributors 2

Languages

License

SpellOnYou/korean-sarcasm

Folders and files

Latest commit

History

Repository files navigation

Kocasm : korean automatic sarcasm detection

Why Irony detection is important?

Preparing the data

Language Model (It is still being editting)

In case with your own data, clone this repository and...

Citation

See also

linguistic, computer science related to sarcasm

Kaggle - Twitter Inory Detection

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages