Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License/Source Materials for BERT checkpoint files/vocab/settings #477

Closed
ylockerman opened this issue May 10, 2021 · 3 comments
Closed
Assignees
Labels

Comments

@ylockerman
Copy link

ylockerman commented May 10, 2021

Hi,

It seems like the BERT benchmark requires a number of ancillary files in addition to the Wikipedia data (i.e. Model checkpoints, vocab file, settings file) that are needed to reproduce the closed benchmark. However, I can't find any definitive source to the license of these files. Nor can I find the provenance of the checkpoint (i.e. what data is was trained on).

It would be very helpful if the above information was available so we could evaluate any legal risk of performing the benchmark.

Thank You

p.s. My assumption is that the model was trained from Wikipedia and the rest of the files are either CC or Apache 2.0. However, I could not find that documented anywhere and the license file in the google drive is ambiguous if it includes those files.

@johntran-nv johntran-nv added the language_model BERT NLP label Nov 8, 2022
@johntran-nv
Copy link
Contributor

@sgpyc do we have updated instructions here?

@johntran-nv
Copy link
Contributor

The License.txt file in that google drive describes that we are covered by Creative Commons Attribution-Sharealike 3.0 Unported License. Does that answer your question? I believe we took the raw Wikipedia file as source, but the reason we're hosting here in a google drive is that Wikipedia rotates its archives, so we needed a stable place for people to repro, but our intention is to follow the Wikipedia license.

@hiwotadese
Copy link
Contributor

Closing because it is resolve by @johntran-nv #477 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants