Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpanMarker for NER to spaCy universe #12730

Merged
merged 4 commits into from
Jun 20, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions website/meta/universe.json
Original file line number Diff line number Diff line change
Expand Up @@ -4316,6 +4316,37 @@
},
"category": ["apis", "standalone"],
"tags": ["apis", "deployment"]
},
{
"id": "span_marker",
"title": "SpanMarker",
"slogan": "Effortless state-of-the-art NER in spaCy",
"description": "The SpanMarker integration with spaCy allows you to seamlessly replace the default spaCy `\"ner\"` pipeline component with any [SpanMarker model available on the Hugging Face Hub](https://huggingface.co/models?library=span-marker). Through this, you can take advantage of the advanced Named Entity Recognition capabilities of SpanMarker within the familiar and powerful spaCy framework.\n\nBy default, the `span_marker` pipeline component uses a [SpanMarker model using RoBERTa-large trained on OntoNotes v5.0](https://huggingface.co/tomaarsen/span-marker-roberta-large-ontonotes5). This model reaches a competitive 91.54 F1, notably higher than the [85.5 and 89.8 F1](https://spacy.io/usage/facts-figures#section-benchmarks) from `en_core_web_lg` and `en_core_web_trf`, respectively. A short head-to-head between this SpanMarker model and the `trf` spaCy model has been posted [here](https://github.com/tomaarsen/SpanMarkerNER/pull/12).\n\nAdditionally, see [here](https://tomaarsen.github.io/SpanMarkerNER/notebooks/spacy_integration.html) for documentation on using SpanMarker with spaCy.",
"github": "tomaarsen/SpanMarkerNER",
"pip": "span_marker",
"code_example": [
"import spacy",
"",
"nlp = spacy.load(\"en_core_web_sm\", disable=[\"ner\"])",
"nlp.add_pipe(\"span_marker\", config={\"model\": \"tomaarsen/span-marker-roberta-large-ontonotes5\"})",
"",
"text = \"\"\"Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the \\",
"Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her \\",
"death in 30 BCE.\"\"\"",
"doc = nlp(text)",
"print([(entity, entity.label_) for entity in doc.ents])",
"# [(Cleopatra VII, \"PERSON\"), (Cleopatra the Great, \"PERSON\"), (the Ptolemaic Kingdom of Egypt, \"GPE\"),",
"# (69 BCE, \"DATE\"), (Egypt, \"GPE\"), (51 BCE, \"DATE\"), (30 BCE, \"DATE\")]"
],
"code_language": "python",
"url": "https://tomaarsen.github.io/SpanMarkerNER",
"author": "Tom Aarsen",
"author_links": {
"github": "tomaarsen",
"website": "https://www.linkedin.com/in/tomaarsen"
},
"category": ["pipeline", "standalone", "scientific"],
"tags": ["ner"]
}
],

Expand Down