You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like the BERT benchmark requires a number of ancillary files in addition to the Wikipedia data (i.e. Model checkpoints, vocab file, settings file) that are needed to reproduce the closed benchmark. However, I can't find any definitive source to the license of these files. Nor can I find the provenance of the checkpoint (i.e. what data is was trained on).
It would be very helpful if the above information was available so we could evaluate any legal risk of performing the benchmark.
Thank You
p.s. My assumption is that the model was trained from Wikipedia and the rest of the files are either CC or Apache 2.0. However, I could not find that documented anywhere and the license file in the google drive is ambiguous if it includes those files.
The text was updated successfully, but these errors were encountered:
The License.txt file in that google drive describes that we are covered by Creative Commons Attribution-Sharealike 3.0 Unported License. Does that answer your question? I believe we took the raw Wikipedia file as source, but the reason we're hosting here in a google drive is that Wikipedia rotates its archives, so we needed a stable place for people to repro, but our intention is to follow the Wikipedia license.
Hi,
It seems like the BERT benchmark requires a number of ancillary files in addition to the Wikipedia data (i.e. Model checkpoints, vocab file, settings file) that are needed to reproduce the closed benchmark. However, I can't find any definitive source to the license of these files. Nor can I find the provenance of the checkpoint (i.e. what data is was trained on).
It would be very helpful if the above information was available so we could evaluate any legal risk of performing the benchmark.
Thank You
p.s. My assumption is that the model was trained from Wikipedia and the rest of the files are either CC or Apache 2.0. However, I could not find that documented anywhere and the license file in the google drive is ambiguous if it includes those files.
The text was updated successfully, but these errors were encountered: