-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added metadata and correct splits for swda. #1749
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thank you !
I left a few comments
Also it looks like the dummy_data.zip file is quite big (1MB), could you try to reduce its size please ?
To do so feel free to take a look inside the zip file and you should see many unused csv files in the swda.zip directory. You can remove all of them except the ones defined by the test/dev/train txt files. For example if there is 3994
in the train_split.txt
then you should keep the sw_1319_3994.utt.csv
file.
I will push updates tomorrow. |
@lhoestq thank you for your comments! I went ahead and fixed the code 😃. Please let me know if I missed anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes !
Looks all good now :)
* 'pos': (str) The POS tagged version of the utterance, from PtbBasename+.pos | ||
* 'topic_description': (str) The topic that is being discussed. | ||
* 'trees': (str) The tree(s) containing this utterance (separated by ||| in the file). Use `[Tree.fromstring(t) | ||
for t in row_value.split("|||")]` to convert to (list of nltk.tree.Tree). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice !
Switchboard Dialog Act Corpus
I made some changes following @bhavitvyamalik recommendation in #1678: