Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help Reproducing the Results #13

Closed
PierreColombo opened this issue Jan 25, 2021 · 7 comments
Closed

Help Reproducing the Results #13

PierreColombo opened this issue Jan 25, 2021 · 7 comments

Comments

@PierreColombo
Copy link

PierreColombo commented Jan 25, 2021

Dear Authors thanks a lot for your work.
I am trying to do a follow up work and add a new metric on your librairy.
However i am facing an issue as i fight to reproduce your result (Tab.2 ) My evaluation is as follow :
for all summary generated :
score.append(metric.evaluate_batch(summary, references,agregate = True))

stat.pearson_cor(scores, target_score)
with target score beeing an average of 4 annotation score .
Am I missing something ?
Cheers

@PierreColombo
Copy link
Author

Do you have a piece of code i could look at ?

@Alex-Fabbri
Copy link
Contributor

Hi @PierreColombo
Yes, that was how we initially calculated the score. When we updated the paper, we followed Louis and Nenkova (Section 3.1) and report system-level correlations. We'll have that version out on ArXiv for next Wednesday, and I'll try to provide some reference code with that release.

@tanay2001
Copy link
Contributor

Hi @Alex-Fabbri , I am trying to reproduce the kendall tau correlations scores quoted in your paper. Could please provide some clarity regarding which ROUGE-1,2,3,4 metric was used (ie: wether it's precision/recall/f1 scores)
Thanks

@Alex-Fabbri
Copy link
Contributor

Alex-Fabbri commented Apr 3, 2021

Hi @tanay2001, we used f1 scores. I'm also attaching a file to help with reproducing the scores.

code.zip

@dptam
Copy link

dptam commented Apr 23, 2021

Hi Alex,

Thanks for releasing the code. I downloaded the code, and noticed it took an input file to compute scores on. I passed in the human annotations you linked in the repo, but noticed the json lines were missing several keys like summ_id or metric_scores_{args.subset}? I was wondering if there was a sample input file to run system_level_correlations.py on?

Thanks

@Alex-Fabbri
Copy link
Contributor

Hi Derek, I just updated the code so that it should work with the model_annotations.aligned.scored.jsonl file. Please feel free to reopen this issue if you encounter any problems!

@Asir-Saadat
Copy link

Hi there. I would like to get the original source of each summary of CNN/DM. How can I obtain that? What I have observed is that, in the jsonl file, there are the generated summaries, but I couldn't find the actual paragraph/source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants