-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluating BART on CNN/DM : How to process dataset #1391
Comments
thanks for the interest. you need to remove https://github.com/abisee/cnn-dailymail/blob/b15ad0a2db0d407a84b8ca9b5731e1f1c4bd24b9/make_datafiles.py#L235 |
Note In order to remove |
Note 2 To get better results, I also had to keep text cased. In order to do this, I removed this line : |
I followed these instructions but I'm getting |
I modified the
|
There are many details, here is my code. I fix the over lenght of train.bpe.source caused by ascii '0D' in articles by split and join I summarize several notes here :
code : https://gist.github.com/zhaoguangxiang/45bf39c528cf7fb7853bffba7fe57c7e |
@zhaoguangxiang Thank you! |
Here's a version for Python 3 if anyone is interested: |
Summary: The first step in the CNN/DM fine-tuning instructions for BART is misleading (see #1391). This PR fixes the README and adds links to #1391 as well as to a repository with CNN/DM processing code adjusted for BART. Pull Request resolved: #1650 Differential Revision: D19606689 fbshipit-source-id: 4f1771f47d3650035a911ab393ab6df2193c1bf9
Summary: The first step in the CNN/DM fine-tuning instructions for BART is misleading (see facebookresearch#1391). This PR fixes the README and adds links to facebookresearch#1391 as well as to a repository with CNN/DM processing code adjusted for BART. Pull Request resolved: facebookresearch#1650 Differential Revision: D19606689 fbshipit-source-id: 4f1771f47d3650035a911ab393ab6df2193c1bf9
Summary: The first step in the CNN/DM fine-tuning instructions for BART is misleading (see facebookresearch/fairseq#1391). This PR fixes the README and adds links to facebookresearch/fairseq#1391 as well as to a repository with CNN/DM processing code adjusted for BART. Pull Request resolved: facebookresearch/fairseq#1650 Differential Revision: D19606689 fbshipit-source-id: 4f1771f47d3650035a911ab393ab6df2193c1bf9
@zhaoguangxiang |
I forgot my reproduction result. I will reply to you after trying again. |
Thank you very much~~ It will help a lot |
If anyone still has problems about:
|
I forgot my reproduction experience. |
From the README of BART for reproducing CNN/DM results :
After following instructions, I don't have files like
test.source
andtest.target
...Instead, I have
test.bin
, and chunked version of this file(
chunked/test_000.bin
~chunked/test_011.bin
).How can I process
test.bin
intotest.source
andtest.target
?@ngoyal2707 @yinhanliu
The text was updated successfully, but these errors were encountered: