This repository unites extra information about MCCD - Methodology for Creating Conversational Datasets.
To validate our methodology, we created a tool following the methodology recommendations.
Two datasets were created using MCCD and Miner-XenForo and are ready to be used.
Files descriptionFile | Size | Description |
---|---|---|
clear_cache.csv | 76 mb | Summary with thread size and message lengths. |
conversations.zip | 71 gb | Clean and processed dataset with identified conversations. |
conversations_min.zip | 31 mb | Conversation Samples. |
forum.adrenaline.com.br.zip | 2.0 gb | Unprocessed dataset. |
File | Size | Description |
---|---|---|
clear_cache.csv | 137 mb | Summary with thread size and message lengths. |
conversations.zip | 5.4 gb | Clean and processed dataset with identified conversations. |
conversations_min.zip | 29 mb | Conversation Samples. |
forum.outerspace.com.br.zip | 4.5 gb | Unprocessed dataset. |
Sanches, M.; C. de Sá, J.; M. de Souza, A.; Silva, D.; R. de Souza, R.; Reis, J. and Villas, L. (2022). MCCD: Generating Human Natural Language Conversational Datasets. In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-569-2; ISSN 2184-4992, pages 247-255. DOI: 10.5220/0011077400003179
author={Matheus Sanches. and Jader {C. de Sá}. and Allan {M. de Souza}. and Diego Silva. and Rafael {R. de Souza}. and Julio Reis. and Leandro Villas.},
title={MCCD: Generating Human Natural Language Conversational Datasets},
booktitle={Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2022},
pages={247-255},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011077400003179},
isbn={978-989-758-569-2},
issn={2184-4992},
}```