Skip to content

Commit

Permalink
Remove debug data normalization for span analysis (explosion#13203)
Browse files Browse the repository at this point in the history
* Remove debug data normalization for span analysis

As a result of this normalization, `debug data` could show a user tokens
that do not exist in their data.

* Update spacy/cli/debug_data.py

---------

Co-authored-by: svlandeg <svlandeg@github.com>
  • Loading branch information
2 people authored and jordankanter committed Mar 14, 2024
1 parent 7fbc8fd commit 65179bc
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions spacy/cli/debug_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -1073,8 +1073,7 @@ def _get_distribution(docs, normalize: bool = True) -> Counter:
word_counts: Counter = Counter()
for doc in docs:
for token in doc:
# Normalize the text
t = token.text.lower().replace("``", '"').replace("''", '"')
t = token.text.lower()
word_counts[t] += 1
if normalize:
total = sum(word_counts.values(), 0.0)
Expand Down

0 comments on commit 65179bc

Please sign in to comment.