This repository contains:
(1) Code to extract discourse markers from wikipedia (TSA).
(1) Code to extract significant discoßurse markers from predictions over a sample
Evaluation code:
Installation
Using pip:
pip install git+ssh://git@github.com/IBM/tslm-discourse-markers.git#egg=tslm-discourse-markers
Alternatively, you can first clone the code, and install the requirements:
1. git clone git@github.com:IBM/tslm-discousrse-markers.git
2. cd tslm-discourse-markers
3. pip install -r requirements.txt
You also need to download fasttext model: curl https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -o ~/Downloads/lid.176.bin and spacy english model: python -m spacy download en_core_web_sm
Running
If you are using tslm-discourse-markers in a publication, please cite the following paper:
Liat Ein-Dor, Ilya Shnayderman, Artem Spector, Lena Dankin,Ranit Aharonov and Noam Slonim 2022 Fortunately, Discourse Markers Can Enhance Language Models for Sentiment Analysis. AAAI-2022.
SenDM model can be found at: https://huggingface.co/ibm/tslm-discourse-markers
import datasets
directory = 'dataset/WIKI_ENGLISH' datasets.load_dataset('csv', data_files={folder: [f'{directory}/{folder}/{folder}_*.csv.gz'] for folder in ['train', 'dev','test']})
This project welcomes external contributions, if you would like to contribute please see further instructions here
Pull requests are very welcome! Make sure your patches are well tested. Ideally create a topic branch for every separate change you make. For example:
- Fork the repo
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
Major changes are documented here.
If you have any questions or issues you can create a new issue here.
This code is distributed under Apache License 2.0. If you would like to see the detailed LICENSE click here.
The YASO dataset was collected by Liat Ein-Dor, Ilya Shnayderman, Artem Spector, Lena Dankin, Ranit Aharonov and Noam Slonim.
The code was written by Ilya Shnayderman.