Skip to content
/ repro Public

RePro: A Benchmark Dataset for Opinion Mining in Brazilian Portuguese

Notifications You must be signed in to change notification settings

lucasnil/repro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

RePro: A Benchmark Dataset for Opinion Mining in Brazilian Portuguese

RePro, which stands for "REview of PROducts," is a benchmark dataset for opinion mining in Brazilian Portuguese. It consists of 10,000 humanly annotated e-commerce product reviews, each labeled with sentiment and topic information. The dataset was created based on data from one of the largest Brazilian e-commerce platforms, which produced the B2W-Reviews01 dataset (https://github.com/americanas-tech/b2w-reviews01). The RePro dataset aims to provide a valuable resource for tasks related to sentiment analysis and topic modeling in the context of Brazilian Portuguese e-commerce product reviews. It is designed to serve as a benchmark for future research in natural language processing and related fields.

Licensing

RePro is available at https://github.com/lucasnil/repro and https://huggingface.co/datasets/lucasnil/repro under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (CC BY-NC-SA 4.01, https://creativecommons.org/licenses/by-nc-sa/4.0/), which means that licensees may only copy, distribute, display, work on and make derivative works and remixes based on it if they give credit to B2W Digital in the manner specified in https://github.com/americanas-tech/b2w-reviews01/blob/main/b2wreviews01_stil2019.pdf. Also, licensees may only distribute derivative works under a license identical (“not more restrictive”) to the license that governs the original work. Finally, licensees may only copy, distribute, display, work on, and make derivative works and remixes based on it for non-commercial purposes. We emphasize that models, AI, or any content derived from this corpus, including fine-tuned models, are strictly prohibited for commercial use.

Citation

When utilizing or referencing this dataset, kindly cite the following publication:

@inproceedings{dos2024repro,
  title={RePro: a benchmark for Opinion Mining for Brazilian Portuguese},
  author={dos Santos Silva, Lucas Nildaimon and Real, Livy and Zandavalle, Ana Claudia Bianchini and Rodrigues, Carolina Francisco Gadelha and da Silva Gama, Tatiana and Souza, Fernando Guedes and Zaidan, Phillipe Derwich Silva},
  booktitle={Proceedings of the 16th International Conference on Computational Processing of Portuguese},
  pages={432--440},
  year={2024}
}

About

RePro: A Benchmark Dataset for Opinion Mining in Brazilian Portuguese

Resources

Stars

Watchers

Forks