BioMAISx

This repository releases the BioMAISx (Biotechnology: Media, Agriculture, Investment, (and) Sentiment Excerpts) dataset annotated for Aspect-Based Sentiment Analysis (ABSA). It includes all code required for collecting and processing the raw data used for annotation, details on how the data was annotated, and code for post-processing the annotated data.

The dataset is made available as a csv here. See here the polarity distribution per aspect category.

A Zenodo link will later be made available.

Examples of preparing and using this data to train ABSA models is located in tutorials.

Collecting Data

The raw articles from which the quotes used in this corpus were sourced came from Factiva. You need to gain access to articles from Factiva (for a fee) and attain a user key and CID. Then to download the articles, set your key and CID to environment variables named FACTIVA_USER_KEY and FACTIVA_CID, respectively. Then you should be able to successfully run python scripts/download-source.py

Preprocessing Data

From the raw text data, we filtered to articles with specific keyterms, extracted quotations from those articles, and then filtered those quotations to those within contianing terms from the desired lexicon.

From this the quotes were reformatted for annotation with LabelStudio and proposed entities (noun chunks) were extracted using SpaCy. The code for this transformation is in scripts/preprocess-source.py

Annotating

Relevant information and code for annotation is included in annotation/README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioMAISx

Collecting Data

Preprocessing Data

Annotating

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
annotation		annotation
lexicon		lexicon
scripts		scripts
tutorials		tutorials
README.md		README.md
publishers.csv		publishers.csv
requirements.txt		requirements.txt

uchicago-dsi/BioMAISx

Folders and files

Latest commit

History

Repository files navigation

BioMAISx

Collecting Data

Preprocessing Data

Annotating

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages