-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add article_id and process test set template for semeval 2020 task 11… #1979
Add article_id and process test set template for semeval 2020 task 11… #1979
Conversation
3a798fd
to
795554a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thank you :)
795554a
to
c5933b5
Compare
if os.path.isfile(tc_labels_template): | ||
tc_test_template = self._process_tc_labels_template(tc_labels_template) | ||
else: | ||
tc_test_template = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why do you need to test if the file exists ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a classification task on a span in a given article. So this template just provides the offsets for the spans to be classified in an article.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
And in these lines you're checking if the file with the template exists.
Is this verification necessary ? Isn't the file always present in the ptc-corpus.tgz
file downloaded for this dataset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is just to be on the safer side since this dataset loads via a manual data dir so I'm assuming different people might have different discrepancies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reproducibility concerns we try to have deterministic dataset generations.
Do you think we can simply assume that this file always exists ?
Otherwise we can keep it this way..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the official dataset contains this file so I guess we can consider it to be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed it to make it deterministic.
f3c1daf
to
406ed9d
Compare
Thanks ! After that we should be good to merge this one :) |
406ed9d
to
07283f6
Compare
@lhoestq Made the changes! The failure now seems to be unrelated to the changes. Any idea what's going on? |
This is a bug on master that we're investigating. You can ignore it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good now ! Thanks again :)
merging
… dataset
article_id
is needed to create the submission file for the task at https://propaganda.qcri.org/semeval2020-task11/technique classification
task provides the span indices in a template for the test set that is necessary to complete the task. This PR implements processing of that template for the dataset.