-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switchboard Dialog Act Corpus added under datasets/swda
#1678
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool thank you !
I left a few comments
After changing the feature type to ClassLabel
you'll need to regenerate the dataset_infos.json file
datasets-cli test ./datasets/swda --save_infos --all_configs --ignore_verifications
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
@lhoestq Thank you for your detailed comments! I fixed everything you suggested. Please let me know if I'm missing anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
It looks like the Transcript and Utterance objects are missing, maybe we can mention it in the README ? Or just add them ? @gmihaila @bhavitvyamalik |
Hi @lhoestq, |
@lhoestq Any info on how to add them? |
@gmihaila, instead of using the current repo you should look into this. You can use the Almost all the attributes of |
@bhavitvyamalik Thank you for the clarification! I didn't use that because it doesn't have the splits. I think in combination with what I used would help. Let me know if I can help! I can make those changes if you don't have the time. |
I'm a bit busy for the next 2 weeks. I'll be able to complete it by end of January only. Maybe you can start with it and I'll help you? |
Yes, I can start working on it and ask you to do a code review. Yes, not all files are there. I'll try to find papers that have the correct and full splits, if not, I'll do like you suggested. Thank you again for your help @bhavitvyamalik ! |
Switchboard Dialog Act Corpus
Intro:
The Switchboard Dialog Act Corpus (SwDA) extends the Switchboard-1 Telephone Speech Corpus, Release 2,
with turn/utterance-level dialog-act tags. The tags summarize syntactic, semantic, and pragmatic information
about the associated turn. The SwDA project was undertaken at UC Boulder in the late 1990s.
Details:
homepage
repo
I believe this is an important dataset to have since there is no dataset related to dialogue act added.
I didn't find any formatting for pull request. I hope all this information is enough.
For any support please contact me.