Skip to content

Commit

Permalink
Add experimental coref docs (#11291)
Browse files Browse the repository at this point in the history
* Add experimental coref docs

* Docs cleanup

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply changes from code review

* Fix prettier formatting

It seems a period after a number made this think it was a list?

* Update docs on examples for initialize

* Add docs for coref scorers

* Remove 3.4 notes from coref

There won't be a "new" tag until it's in core.

* Add docs for span cleaner

* Fix docs

* Fix docs to match spacy-experimental

These weren't properly updated when the code was moved out of spacy
core.

* More doc fixes

* Formatting

* Update architectures

* Fix links

* Fix another link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
  • Loading branch information
3 people committed Sep 27, 2022
1 parent b60cf09 commit 29a577c
Show file tree
Hide file tree
Showing 6 changed files with 889 additions and 6 deletions.
92 changes: 86 additions & 6 deletions website/docs/api/architectures.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ menu:
- ['Text Classification', 'textcat']
- ['Span Classification', 'spancat']
- ['Entity Linking', 'entitylinker']
- ['Coreference', 'coref-architectures']
---

A **model architecture** is a function that wires up a
Expand Down Expand Up @@ -587,8 +588,8 @@ consists of either two or three subnetworks:
run once for each batch.
- **lower**: Construct a feature-specific vector for each `(token, feature)`
pair. This is also run once for each batch. Constructing the state
representation is then a matter of summing the component features and
applying the non-linearity.
representation is then a matter of summing the component features and applying
the non-linearity.
- **upper** (optional): A feed-forward network that predicts scores from the
state representation. If not present, the output from the lower model is used
as action scores directly.
Expand Down Expand Up @@ -628,8 +629,8 @@ same signature, but the `use_upper` argument was `True` by default.
> ```
Build a tagger model, using a provided token-to-vector component. The tagger
model adds a linear layer with softmax activation to predict scores given
the token vectors.
model adds a linear layer with softmax activation to predict scores given the
token vectors.
| Name | Description |
| ----------- | ------------------------------------------------------------------------------------------ |
Expand Down Expand Up @@ -920,5 +921,84 @@ A function that reads an existing `KnowledgeBase` from file.
A function that takes as input a [`KnowledgeBase`](/api/kb) and a
[`Span`](/api/span) object denoting a named entity, and returns a list of
plausible [`Candidate`](/api/kb/#candidate) objects. The default
`CandidateGenerator` uses the text of a mention to find its potential
aliases in the `KnowledgeBase`. Note that this function is case-dependent.
`CandidateGenerator` uses the text of a mention to find its potential aliases in
the `KnowledgeBase`. Note that this function is case-dependent.
## Coreference {#coref-architectures tag="experimental"}
A [`CoreferenceResolver`](/api/coref) component identifies tokens that refer to
the same entity. A [`SpanResolver`](/api/span-resolver) component infers spans
from single tokens. Together these components can be used to reproduce
traditional coreference models. You can also omit the `SpanResolver` if working
with only token-level clusters is acceptable.
### spacy-experimental.Coref.v1 {#Coref tag="experimental"}
> #### Example Config
>
> ```ini
>
> [model]
> @architectures = "spacy-experimental.Coref.v1"
> distance_embedding_size = 20
> dropout = 0.3
> hidden_size = 1024
> depth = 2
> antecedent_limit = 50
> antecedent_batch_size = 512
>
> [model.tok2vec]
> @architectures = "spacy-transformers.TransformerListener.v1"
> grad_factor = 1.0
> upstream = "transformer"
> pooling = {"@layers":"reduce_mean.v1"}
> ```
The `Coref` model architecture is a Thinc `Model`.
| Name | Description |
| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tok2vec` | The [`tok2vec`](#tok2vec) layer of the model. ~~Model~~ |
| `distance_embedding_size` | A representation of the distance between candidates. ~~int~~ |
| `dropout` | The dropout to use internally. Unlike some Thinc models, this has separate dropout for the internal PyTorch layers. ~~float~~ |
| `hidden_size` | Size of the main internal layers. ~~int~~ |
| `depth` | Depth of the internal network. ~~int~~ |
| `antecedent_limit` | How many candidate antecedents to keep after rough scoring. This has a significant effect on memory usage. Typical values would be 50 to 200, or higher for very long documents. ~~int~~ |
| `antecedent_batch_size` | Internal batch size. ~~int~~ |
| **CREATES** | The model using the architecture. ~~Model[List[Doc], Floats2d]~~ |
### spacy-experimental.SpanResolver.v1 {#SpanResolver tag="experimental"}
> #### Example Config
>
> ```ini
>
> [model]
> @architectures = "spacy-experimental.SpanResolver.v1"
> hidden_size = 1024
> distance_embedding_size = 64
> conv_channels = 4
> window_size = 1
> max_distance = 128
> prefix = "coref_head_clusters"
>
> [model.tok2vec]
> @architectures = "spacy-transformers.TransformerListener.v1"
> grad_factor = 1.0
> upstream = "transformer"
> pooling = {"@layers":"reduce_mean.v1"}
> ```
The `SpanResolver` model architecture is a Thinc `Model`. Note that
`MentionClusters` is `List[List[Tuple[int, int]]]`.
| Name | Description |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| `tok2vec` | The [`tok2vec`](#tok2vec) layer of the model. ~~Model~~ |
| `hidden_size` | Size of the main internal layers. ~~int~~ |
| `distance_embedding_size` | A representation of the distance between two candidates. ~~int~~ |
| `conv_channels` | The number of channels in the internal CNN. ~~int~~ |
| `window_size` | The number of neighboring tokens to consider in the internal CNN. `1` means consider one token on each side. ~~int~~ |
| `max_distance` | The longest possible length of a predicted span. ~~int~~ |
| `prefix` | The prefix that indicates spans to use for input data. ~~string~~ |
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[MentionClusters]]~~ |
Loading

0 comments on commit 29a577c

Please sign in to comment.