Skip to content

Commit

Permalink
Catch Attribute exception in updated_entity + update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Lucaterre committed Oct 5, 2022
1 parent 05f77ba commit 4281c59
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 130 deletions.
File renamed without changes.
21 changes: 7 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ This extension allows using entity-fishing tool as a spaCy pipeline component to
- [Get extra information from Wikidata](#Get-extra-information-from-Wikidata)
- [Use other language](#Use-other-language)
- [Get information about entity fishing API response](#Get-information-about-entity-fishing-API-response)
- [How to process a long text?](#How-to-process-a-long-text?)
* [Configuration parameters](#Configuration-parameters)
* [Attributes](#Attributes)
* [Recommendations](#Recommendations)
Expand Down Expand Up @@ -496,18 +495,6 @@ doc._.metadata
}
```

### How to process a long text?

Process NER and disambiguate a long text can be really tricky.
In fact, spaCy can be raised an exception due to the default limit parameter `nlp.max_length`.
The strategy here is to pass a text as batch of sentences with [`nlp.pipe()`](https://spacy.io/api/language#pipe) method and,
then pass entities to spacyfishing with all context (not only the sentences, to help disambiguation) and
all entities with continuous characters offsets (start and end characters positions are re-calculated).
You can use a provided script [`process_long_text.py`](examples/process_long_text.py) that can help to process huge text.
For example, a text with `2 073` sentences that contains `12 901` entities to disambiguate can be processed in about a minute (with no extra information)
and in less than 1 minute 30 (with extra information and properties filter applied).


## Configuration parameters

```
Expand Down Expand Up @@ -618,4 +605,10 @@ Entity-fishing is tool created by [Patrice Lopez](https://github.com/kermitt2) (

Awesome logo designed by [Alix Chagué](https://github.com/alix-tz).

Special thanks to [@HugoSchtr](https://github.com/HugoSchtr), [@gromag](https://github.com/gromag) for documentation review.
Special thanks to

- Documentation review:
[@HugoSchtr](https://github.com/HugoSchtr), [@gromag](https://github.com/gromag)

- Code contribution:
[@davidberenstein1957](https://github.com/davidberenstein1957)
97 changes: 0 additions & 97 deletions examples/process_long_text.py

This file was deleted.

42 changes: 23 additions & 19 deletions spacyfishing/entity_fishing_linker.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@
as disambiguation and entity linking component.
"""

import requests
import concurrent.futures
import json
import logging

from email import iterators
from typing import List, Tuple

import requests
from spacy import util
from spacy.language import Language
from spacy.tokens import Doc, Span
Expand Down Expand Up @@ -243,24 +244,27 @@ def updated_entities(self, doc: Doc, response: list) -> None:
:type response: list
"""
for entity in response:
span = doc.char_span(start_idx=entity['offsetStart'],
end_idx=entity['offsetEnd'])
try:
span._.kb_qid = str(entity['wikidataId'])
span._.url_wikidata = self.wikidata_url_base + span._.kb_qid
except KeyError:
pass
try:
span._.wikipedia_page_ref = str(entity["wikipediaExternalRef"])
# if flag_extra : search other info on entity
# => attach extra entity info to span
if self.flag_extra:
self.look_extra_informations_on_entity(span, entity)
except KeyError:
pass
try:
span._.nerd_score = entity['confidence_score']
except KeyError:
span = doc.char_span(start_idx=entity['offsetStart'],
end_idx=entity['offsetEnd'])
try:
span._.kb_qid = str(entity['wikidataId'])
span._.url_wikidata = self.wikidata_url_base + span._.kb_qid
except KeyError:
pass
try:
span._.wikipedia_page_ref = str(entity["wikipediaExternalRef"])
# if flag_extra : search other info on entity
# => attach extra entity info to span
if self.flag_extra:
self.look_extra_informations_on_entity(span, entity)
except KeyError:
pass
try:
span._.nerd_score = entity['confidence_score']
except KeyError:
pass
except AttributeError:
pass

# ~ Entity-fishing call service methods ~:
Expand All @@ -279,7 +283,7 @@ def concept_look_up_batch(self, wiki_id_batch: str) -> List[requests.Response]:
params=self.language,
verbose=self.verbose)

def disambiguate_text_batch(self, files_batch: List[dict]) -> requests.Response:
def disambiguate_text_batch(self, files_batch: List[dict]) -> List[requests.Response]:
"""
> The function `disambiguate_text_batch` takes a list of dictionaries as input, where each
dictionary contains the text to be disambiguated and the corresponding language. The function
Expand Down

0 comments on commit 4281c59

Please sign in to comment.