Skip to content

Releases: buriy/spacy-ru

spaCy 2.3 models

22 Oct 22:18
Compare
Choose a tag to compare

Models included in this release:

ru2_nerus_800ks_96

  • width=96 (for CPU and GPU **)
  • POS score: 87,9
  • DEP score: 87,1
  • NER score: 95,3
  • trained on Nerus
  • LICENSE: MIT
Itn   Tag Loss    Tag %    Dep Loss    UAS     LAS
20  612196.679    91.566  2285020.336  91.676  85.352

ru2_combined_400ks_96 *

  • width=96 (for CPU and GPU **)
  • POS score: 89,2
  • DEP score: 87,9
  • NER score: 94,73
  • LICENSE: CC BY-NC-SA 4.0
Itn   Tag Loss    Tag %    Dep Loss    UAS     LAS
20  468998.154    92.414  1774568.248  92.134  86.241

ru2_grameval_96

  • width=96 (for CPU and GPU **)
  • POS score: 89,0
  • DEP score: 87,9
  • NER score: 0,0
  • only POS tagging & DEP parsing !!!,
  • LICENSE: CC BY-NC-SA 4.0
Itn   Tag Loss    Tag %    Dep Loss    UAS     LAS
20  207172.379    93.661  926799.585  94.010  88.752

ru2_grameval_300

  • width=300 (for GPU **)
  • POS score: 90,0
  • DEP score: 91,3
  • NER score: 0,0
  • only POS tagging & DEP parsing !!!,
  • LICENSE: CC BY-NC-SA 4.0
Itn   Tag Loss    Tag %    Dep Loss    UAS     LAS
20  54762.824    95.291  394716.120  98.595  94.527

Notes:

  • All models are based on Navec vectors & pymorphy2 morphology (So we have ~2.5 mln words included in a combined vector model).
  • POS and DEP tests are based on the weighted model quality on grameval subsets: score = (3news + 3fiction + wiki + social) / 8.
    • "combined" dataset = grameval 2020 + a part of Nerus
  • ** CPU speed depends on the network width square, so width-300 model compared to width-96 model is about 10x slower on CPU, though GPU speed is almost constant.
    width=48: CPU WPS=8000 GPU WPS=12000
    width=96: CPU WPS=3600 GPU WPS=12000
    width=192: CPU WPS=1300 GPU WPS=10000
    width=300: CPU WPS=600 GPU WPS=8000

POS & DEP model for spaCy 2.3 based on SynTagRus and navec

10 Jul 15:26
f1e3e03
Compare
Choose a tag to compare

POS & DEP model for spaCy 2.3: POS tagger and DEP (syntax analysis) models, trained on SynTagRus, using Navec vectors & pymorphy2 morphology.

Quality on SynTagRus-test:

POS | 95.31%
DEP UAS | 91.77%
DEP LAS | 89.12%

Accuracy.txt:

Itn  Tag Loss    Tag %    Dep Loss    UAS     LAS    NER Loss   NER P   NER R   NER F   Token %  CPU WPS  GPU WPS
---  ---------  --------  ---------  ------  ------  ---------  ------  ------  ------  -------  -------  -------
 30  24154.514    95.310  196988.805  91.777  89.124      0.000   0.000   0.000   0.000  100.000     6001    11902

How to use it: unpack into your project root folder, then

import ru2_syntagrus
ru2_syntagrus.load_ru2('path_to/ru2_syntagrus')

Or you could just use spacy.load('path_to/ru2_syntagrus/') but then lemmas will be a bit worse.

License: CC BY-NC-SA 4.0 (same as SynTagRus)
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
http://creativecommons.org/licenses/by-nc-sa/4.0/