Converting tf rubert files to pytorch #863

Ulitochka · 2019-05-31T09:44:22Z

Hello.

I would like to convert rubert tf-files to pytorch.

I use code form https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py

But i have assert:

Initialize PyTorch weight ['bert', 'embeddings', 'LayerNorm', 'beta'] Initialize PyTorch weight ['bert', 'embeddings', 'LayerNorm', 'gamma'] Initialize PyTorch weight ['bert', 'embeddings', 'position_embeddings'] Skipping bert/embeddings/position_embeddings/AdamWeightDecayOptimizer Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/repos/pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py", line 67, in <module> args.pytorch_dump_path) File "/home/repos/pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py", line 38, in convert_tf_checkpoint_to_pytorch load_tf_weights_in_bert(model, tf_checkpoint_path) File "/home/repos/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 116, in load_tf_weights_in_bert assert pointer.shape == array.shape File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 535, in __getattr__ type(self).__name__, name)) AttributeError: 'Embedding' object has no attribute 'shape'
This is due to the fact that the number of initial components has been changed in rubert model?

The text was updated successfully, but these errors were encountered:

yurakuratov · 2019-06-01T09:25:21Z

Hi!
Yes, we have some extra optimizer weights saved in checkpoint, which are not supported by pytorch converting tool. I would suggest to change one line in pytorch_pretrained_bert/modeling.py file to skip optimizer parameters:

78 if any(n in ["adam_v", "adam_m", "global_step"] for n in name):
->
78 if any(n in ["adam_v", "adam_m", "AdamWeightDecayOptimizer", "AdamWeightDecayOptimizer_1", "global_step"] for n in name):

We will probably update our RuBERT checkpoint to be compatible with pytorch converting tool soon.

Ulitochka · 2019-06-01T11:03:23Z

Thanks.

K-Mike · 2019-06-17T07:20:07Z

I got

NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files

when tried to do the same

yurakuratov · 2019-06-17T08:05:38Z

@K-Mike Did you set up correct path to RuBERT checkpoint?

K-Mike · 2019-06-17T08:12:48Z

Checked it again, yes I'm sure (tokenizer works well with the same path).

yurakuratov · 2019-06-17T08:50:36Z

Can you provide code example and full trace back?

K-Mike · 2019-06-17T09:19:34Z

Yes, maybe you tell me what I do wrong

import torch
from pytorch_pretrained_bert import convert_tf_checkpoint_to_pytorch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM

BERT_MODEL_PATH = 'rubert_cased_L-12_H-768_A-12_v1/'
tokenizer = BertTokenizer.from_pretrained(BERT_MODEL_PATH, cache_dir=None,do_lower_case=False)

convert_tf_checkpoint_to_pytorch.convert_tf_checkpoint_to_pytorch(
    BERT_MODEL_PATH + 'bert_model.ckpt',
    BERT_MODEL_PATH + 'bert_config.json',
    '')

yurakuratov · 2019-06-17T09:22:38Z

Can you try to use absolute path for BERT_MODEL_PATH?

K-Mike · 2019-06-17T09:35:15Z

Not help, but if change name
from bert_model.ckpt.data-00000-of-00001 to bert_model.ckpt, another error.

Is it ok, that there are only:
bert_model.ckpt.data-00000-of-00001
bert_config.json
vocab.txt
but in google bert there are
bert_config.json
bert_model.ckpt.index
bert_model.ckpt.meta
vocab.txt

yurakuratov · 2019-06-17T09:50:07Z

No, that is not okay.

I've checked that we have 5 files in http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz

K-Mike · 2019-06-17T11:11:20Z

Thanks! Very strange behavior, I downloaded the file 4 times and there always were 3 files only, when I downloaded in linux, there are 5 files.
now I have an error as in the beginning of the issue.

yurakuratov self-assigned this May 31, 2019

Ulitochka closed this as completed Jun 1, 2019

yurakuratov mentioned this issue Aug 22, 2019

reupload BERT model without optimizer parameters in checkpoint #969

Closed

monologg mentioned this issue Jan 27, 2020

Fix importing unofficial TF models with extra optimizer weights huggingface/transformers#2652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting tf rubert files to pytorch #863

Converting tf rubert files to pytorch #863

Ulitochka commented May 31, 2019

yurakuratov commented Jun 1, 2019

Ulitochka commented Jun 1, 2019

K-Mike commented Jun 17, 2019

yurakuratov commented Jun 17, 2019

K-Mike commented Jun 17, 2019 •

edited

Loading

yurakuratov commented Jun 17, 2019

K-Mike commented Jun 17, 2019

yurakuratov commented Jun 17, 2019 •

edited

Loading

K-Mike commented Jun 17, 2019

yurakuratov commented Jun 17, 2019

K-Mike commented Jun 17, 2019

Converting tf rubert files to pytorch #863

Converting tf rubert files to pytorch #863

Comments

Ulitochka commented May 31, 2019

yurakuratov commented Jun 1, 2019

Ulitochka commented Jun 1, 2019

K-Mike commented Jun 17, 2019

yurakuratov commented Jun 17, 2019

K-Mike commented Jun 17, 2019 • edited Loading

yurakuratov commented Jun 17, 2019

K-Mike commented Jun 17, 2019

yurakuratov commented Jun 17, 2019 • edited Loading

K-Mike commented Jun 17, 2019

yurakuratov commented Jun 17, 2019

K-Mike commented Jun 17, 2019

K-Mike commented Jun 17, 2019 •

edited

Loading

yurakuratov commented Jun 17, 2019 •

edited

Loading