Don't we need to reconcile SpaCy and BERT tokens? #9

hjpark2017 · 2022-10-01T06:08:55Z

First of all, thank you for releasing the program on your paper. What I'm curious about is that SpaCy divides sentences into word units, but BERT divides them into WordPiece units, so I think there will be a problem that the tokens are not accurately mapped to each other. I wonder which part of the program you uploaded deals with these problems.

BinLiang-NLP · 2022-10-06T13:51:28Z

First of all, thank you for releasing the program on your paper. What I'm curious about is that SpaCy divides sentences into word units, but BERT divides them into WordPiece units, so I think there will be a problem that the tokens are not accurately mapped to each other. I wonder which part of the program you uploaded deals with these problems.

Hi,
Thanks for your question.
I do agree that SpaCy divides sentences into word units, but BERT divides them into WordPiece units. That is, the tokens of a small number of samples are not incongruent in SenticGCN-BERT. For the datasets of this work, however, most samples are consistent. Therefore, we do not deal with this problem in our work. Definitely, you can also align the WordPiece units of BERT model for better results.
Please let me know if there is any problem.
Thanks!!!

hjpark2017 · 2023-05-30T04:35:23Z

I'm sorry for the late greeting.
Thank you for your kind explanation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't we need to reconcile SpaCy and BERT tokens? #9

Don't we need to reconcile SpaCy and BERT tokens? #9

hjpark2017 commented Oct 1, 2022

BinLiang-NLP commented Oct 6, 2022

hjpark2017 commented May 30, 2023

Don't we need to reconcile SpaCy and BERT tokens? #9

Don't we need to reconcile SpaCy and BERT tokens? #9

Comments

hjpark2017 commented Oct 1, 2022

BinLiang-NLP commented Oct 6, 2022

hjpark2017 commented May 30, 2023