Skip to content

Latest commit

 

History

History
290 lines (254 loc) · 8.54 KB

parsing.md

File metadata and controls

290 lines (254 loc) · 8.54 KB

Parsing

VLSP 2020 Shared Task Dependency Parsing

Data description:

  • Training data: 8152 sentences
    • Training Data Package1: All sentences (5069) from DP-2019
    • Training Data Package2: 3083 sentences
  • Viettreebank Testing data: 1123 sentences
    • 906 sentences from Viettreebank
    • 217 sentences from vnexpress.vn

Data Annatation

  • Word segmentation: Review and correct all word segmentation errors in all data sets
  • Part of speech tagging: Review and correct all POS errors in all data sets
  • Dependency labels set: 38 main labels 47 sub-labels

Leaderboard

CONLLU

Model LAS UAS Method Reference Code
PhoBert+ELMO / Biaffine 76.27 84.65 Doan VLSP'20
fastText Embed / Biaffine 75.64 84.08 Nguyen VLSP'20
Graph Neural Networks 73.19 81.71 Nguyen et al. VLSP'20

Raw text

Model LAS UAS Method Reference Code
PhoBert+ELMO / Biaffine / VNCoreNLP 67.32 76.12 Doan VLSP'20
fastText Embed / Biaffine / VNCoreNLP 65.3 74.47 Nguyen VLSP'20
Graph Neural Networks 64.35 72.85 Nguyen et al. VLSP'20

UD Vietnamese VTB

The Vietnamese UD treebank is a conversion of the constituent treebank created in the VLSP project (https://vlsp.hpda.vn/).

Data description:

  • 3000 sentences and 43754 tokens

Leaderboard

Model LAS UAS Method Reference Code
Trainkit v0.3.1 64.76 70.96 Nguyen et al. EACL-DEMO'21 Official
Stanza v1.1.1 53.63 48.16 Peng et al. ACL-SD'20 Official

BkTreebank: A Vietnamese Dependency Treebank

BKTreebank 1.0 contains 6,900 sentences annotated with POS tagging and dependency parsing for Vietnamese. The treebank was divided into a training set of 5639 sentences and a test set of 1270 sentences for learning and testing POS tagging and dependency parsing

Vietnamese Dependency Treebank VnDT

Vietnamese dependency Treebank namely VnDT contains 10200 sentences. The VnDT Treebank is formatted following 10-column data format as proposed by the CoNLL shared tasks on multilingual dependency parsing.

Leaderboard

VnDT v1.1

Model LAS UAS Method Reference Code
PhoBERT-base 78.77 85.22 Liu et al. '18 Nguyen et al. '20 Official
PhoBERT-large 77.85 84.32 Liu et al. '18 Nguyen et al. '20 Official
Biaffine 74.99 81.19 Dozat and Manning ICLR'17 Nguyen '18
JointWPD 73.90 80.12 Nguyen '18
jPTDP-v2 73.12 79.63 Nguyen et al. CoNLL'18 Nguyen '18 Official
VnCoreNLP (unsegmented) 71.38 77.35 Nguyen et al. NAACL'18 Nguyen '18 Official

VnDT v1.0

Model LAS UAS Method Reference Code
VnCoreNLP 73.39 79.02 Nguyen et al. NAACL'18 Official
Biaffine 71.73 78.45 Dozat and Manning ICLR'17 Nguyen '18
JointWPD 70.50 77.04 Nguyen '18
jPTDP-v2 69.81 76.60 Nguyen et al. CoNLL'18 Nguyen '18 Official
VnCoreNLP (unsegmented) 67.79 74.24 Nguyen et al. NAACL'18 Nguyen '18 Link

VietTreebank

Miscellaneous

📜 Papers

💫 Services: OpenFPT: Vitk (2017)

📁 Open sources