Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find .ckpt. file #9

Open
nishithbenhur opened this issue Mar 30, 2020 · 6 comments
Open

Unable to find .ckpt. file #9

nishithbenhur opened this issue Mar 30, 2020 · 6 comments

Comments

@nishithbenhur
Copy link

This is a repository of pretrained Japanese BERT models. The pretrained models are available along with the source code of pretraining.

Hi Team,
As mentioned above, i am unable to find the .ckpt file . My intention is is to host the model as a service and i need the below files for the same . Can you let me know on this.

├── model.ckpt.data-00000-of-00001
├── model.ckpt.index
├── model.ckpt.meta

@nishithbenhur
Copy link
Author

If these are the models included as below then which one can be user for FAQ answering. Assume i have a dataset of 100 Questions and corresponding answers. This is for a Chatbot application

BERT-base_mecab-ipadic-bpe-32k.tar.xz (2.1GB)
BERT-base_mecab-ipadic-bpe-32k_whole-word-mask.tar.xz (2.1GB)
BERT-base_mecab-ipadic-char-4k.tar.xz (1.6GB)
BERT-base_mecab-ipadic-char-4k_whole-word-mask.tar.xz (1.6GB)

@singletongue
Copy link
Collaborator

The model files are included in tar.xz archive files.
You can try any of them, but I'm afraid that 100 QA pairs might be too less for fine-tuning a model.

@nishithbenhur
Copy link
Author

If these are the models included as below then which one can be user for FAQ answering. Assume i have a dataset of 100 Questions and corresponding answers. This is for a Chatbot application

BERT-base_mecab-ipadic-bpe-32k.tar.xz (2.1GB)
BERT-base_mecab-ipadic-bpe-32k_whole-word-mask.tar.xz (2.1GB)
BERT-base_mecab-ipadic-char-4k.tar.xz (1.6GB)
BERT-base_mecab-ipadic-char-4k_whole-word-mask.tar.xz (1.6GB)

Dear Suzuki,

Thank you for the reply.
I thought the model is readily available for using . Which model can be readily used for FAQ dataset of 300 Questions

@singletongue
Copy link
Collaborator

There are no difference between the models in usability.
If you're still unsure which model to use, why not give it a try with the first one?

For a quick trial, I recommend to use our models in Hugging Face's Transformers package.
Simple demo: https://twitter.com/huggingface/status/1205283603128758277

@nishithbenhur
Copy link
Author

There are no difference between the models in usability.
If you're still unsure which model to use, why not give it a try with the first one?

For a quick trial, I recommend to use our models in Hugging Face's Transformers package.
Simple demo: https://twitter.com/huggingface/status/1205283603128758277

Thank you. I will try the first one and give my feedback and observations so that rest of the people are benefited too.

  1. Also could you tell me the differences between as i need to know which model can be used for FAQ answering in Chatbot
  • BERT-base_mecab-ipadic-bpe-32k.tar.xz (2.1GB)
  • BERT-base_mecab-ipadic-bpe-32k_whole-word-mask.tar.xz (2.1GB)
  • BERT-base_mecab-ipadic-char-4k.tar.xz (1.6GB)
  • BERT-base_mecab-ipadic-char-4k_whole-word-mask.tar.xz (1.6GB)
  1. How can i get the .ckpt file for bert-base-japanese so that i can host it a HTTP API service?

@xeisberg
Copy link

@nishithbenhur

  1. I am not sure. Character may focus on each individual character in opposition to words(tokenised) in Japanese which are often made up of several characters. I do not know about difference between whole word mask and not whole word mask, it would be interesting to know about the differences in performance.

  2. You can find the files here: https://www.nlp.ecei.tohoku.ac.jp/%7Em-suzuki/bert-japanese/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants