Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpanMarker for NER to spaCy universe #12730

Merged
merged 4 commits into from
Jun 20, 2023

Conversation

tomaarsen
Copy link
Contributor

@tomaarsen tomaarsen commented Jun 15, 2023

Hello!

Pull Request overview

  • Added SpanMarker to the spaCy universe projects list

Description

I've added my recent SpanMarker module for NER to the spaCy universe list. SpanMarker performs competitively on various NER benchmarks, and the integration with spaCy seems like a no-brainer to me. spaCy is just extremely convenient, and so I've tried to adopt that design direction for the integration. In particular, the integration is designed as a drop-in replacement of the default spaCy NER solution:

  import spacy

  nlp = spacy.load("en_core_web_sm")
+ nlp.add_pipe("span_marker", config={"model": "tomaarsen/span-marker-roberta-large-ontonotes5"})

  text = '''Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the
  Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her
  death in 30 BCE.'''
  doc = nlp(text)

After which all normal spaCy behaviour should work as intended, i.e. visualization and processing of entities. This allows users to very quickly transition from using pure spaCy to various different SpanMarker models hosted on the Hugging Face Hub.

The above script, but then copy-paste ready for your convenience
pip install span_marker
import spacy

# Load the spaCy model with the span_marker pipeline component
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("span_marker", config={"model": "tomaarsen/span-marker-roberta-large-ontonotes5"})

# Feed some text through the model to get a spacy Doc
text = """Cleopatra VII, also known as Cleopatra the Great, was the last active ruler of the \
Ptolemaic Kingdom of Egypt. She was born in 69 BCE and ruled Egypt from 51 BCE until her \
death in 30 BCE."""
doc = nlp(text)

# And look at the entities
print([(entity, entity.label_) for entity in doc.ents])

from spacy import displacy
displacy.serve(doc, "ent")

The visualization results in:

image

I hope I've formatted the description and code example correctly - I haven't tried to generate the website locally.

Types of change

Documentation.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • [-] I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

Good job on getting spaCy to where it is today, lots of respect.

  • Tom Aarsen

@svlandeg svlandeg added the universe Changes to the Universe directory of third-party spaCy code. label Jun 15, 2023
@tomaarsen
Copy link
Contributor Author

Following #12737, I also implemented the spacy_factories in tomaarsen/SpanMarkerNER@4057860, and I removed the import from the snippet in this, too.

@victorialslocum
Copy link
Contributor

Hi @tomaarsen!

Thanks so much for doing that and your contribution! I'm working on reviewing the PR and will get back to you soon.

@victorialslocum
Copy link
Contributor

Hi again @tomaarsen,

Unfortunately, I cannot seem to get the example code to run without importing span_marker into the Python file. Can you check the factories implementation you did again?

Also, it also might make sense to exclude or disable the default ner in en_core_web_sm, so it's less confusing as a total pipeline. You could also just use spacy.blank("en") for the demo.

Besides those two things, everything runs as expected and the website looks good on my end! Thanks again for your contribution.

@tomaarsen
Copy link
Contributor Author

I cannot seem to get the example code to run without importing span_marker into the Python file. Can you check the factories implementation you did again?

I'm experiencing the same thing in the Colab session I just tested in. It exclusively works locally it seems. I probably installed it slightly differently there. I'll chase this down and release an update.

I'll also consider disabling the NER pipeline outright as opposed to loading it and then removing it when adding the span_marker pipeline component, that's a smart option. I rely on the sentencizer so I can't use spacy.blank("en"), but nlp = spacy.load("en_core_web_sm", disable=["ner"]) makes a lot of sense.

I'll be in touch soon!

@tomaarsen
Copy link
Contributor Author

I've found & fixed the issue (an incorrect build system in the pyproject.toml). A new version (1.2.2) has been released, which should fix the importing issue.

Beyond that, I've updated my documentation and the code example to use disable=["ner"] for clarity, as per your useful recommendation.

  • Tom Aarsen

Copy link
Contributor

@victorialslocum victorialslocum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the example code and everything works as expected. Also ran the website locally and all looks great. LGTM!

Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your contribution! We'll get this merged and published on the Universe page 🎉

@svlandeg svlandeg merged commit 93983f0 into explosion:master Jun 20, 2023
svlandeg pushed a commit that referenced this pull request Jun 20, 2023
* Add SpanMarker for NER to spaCy universe

* Escape the newlines in the text in the code example

Or at least, attempt to

* Remove now unnecessary import

* Disable NER pipeline component in code example
@tomaarsen tomaarsen deleted the universe/span_marker branch June 20, 2023 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
universe Changes to the Universe directory of third-party spaCy code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants