MCCD - Methodology for Creating Conversational Datasets

This repository unites extra information about MCCD - Methodology for Creating Conversational Datasets.

Tool: Miner-XenForo

To validate our methodology, we created a tool following the methodology recommendations.

Access here.

Datasets

Two datasets were created using MCCD and Miner-XenForo and are ready to be used.

Access here.

Adrenaline

Files description

File	Size	Description
clear_cache.csv	76 mb	Summary with thread size and message lengths.
conversations.zip	71 gb	Clean and processed dataset with identified conversations.
conversations_min.zip	31 mb	Conversation Samples.
forum.adrenaline.com.br.zip	2.0 gb	Unprocessed dataset.

Access here.

OuterSpace

Files description

File	Size	Description
clear_cache.csv	137 mb	Summary with thread size and message lengths.
conversations.zip	5.4 gb	Clean and processed dataset with identified conversations.
conversations_min.zip	29 mb	Conversation Samples.
forum.outerspace.com.br.zip	4.5 gb	Unprocessed dataset.

Access here.

Citation

Sanches, M.; C. de Sá, J.; M. de Souza, A.; Silva, D.; R. de Souza, R.; Reis, J. and Villas, L. (2022). MCCD: Generating Human Natural Language Conversational Datasets. In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-569-2; ISSN 2184-4992, pages 247-255. DOI: 10.5220/0011077400003179

author={Matheus Sanches. and Jader {C. de Sá}. and Allan {M. de Souza}. and Diego Silva. and Rafael {R. de Souza}. and Julio Reis. and Leandro Villas.},
title={MCCD: Generating Human Natural Language Conversational Datasets},
booktitle={Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2022},
pages={247-255},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011077400003179},
isbn={978-989-758-569-2},
issn={2184-4992},
}```

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCCD - Methodology for Creating Conversational Datasets

Tool: Miner-XenForo

Datasets

Adrenaline

OuterSpace

Citation

About

Releases

Packages

MatheusFerraroni/MCCD

Folders and files

Latest commit

History

Repository files navigation

MCCD - Methodology for Creating Conversational Datasets

Tool: Miner-XenForo

Datasets

Adrenaline

OuterSpace

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages