Skip to content

Commit

Permalink
Merge branch 'master' into docs/llm
Browse files Browse the repository at this point in the history
  • Loading branch information
svlandeg committed Sep 7, 2023
2 parents 58e026e + cc78847 commit ae7f64f
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 15 deletions.
3 changes: 2 additions & 1 deletion website/docs/api/large-language-models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -602,7 +602,7 @@ on an upstream NER component for entities extraction.
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `template` | Custom prompt template to send to LLM model. Defaults to [`rel.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/rel.v1.jinja). ~~str~~ |
| `label_description` | Dictionary providing a description for each relation label. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `label_definitions` | Dictionary providing a description for each relation label. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
| `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ |
Expand All @@ -621,6 +621,7 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`.
[components.llm.task]
@llm_tasks = "spacy.REL.v1"
labels = ["LivesIn", "Visits"]
[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "rel_examples.jsonl"
Expand Down
2 changes: 1 addition & 1 deletion website/docs/usage/large-language-models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ nlp.add_pipe(
"labels": ["PERSON", "ORGANISATION", "LOCATION"]
},
"model": {
"@llm_models": "spacy.gpt-3.5.v1",
"@llm_models": "spacy.GPT-3-5.v1",
},
},
)
Expand Down
21 changes: 11 additions & 10 deletions website/docs/usage/training.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ Some of the main advantages and features of spaCy's training config are:
Under the hood, the config is parsed into a dictionary. It's divided into
sections and subsections, indicated by the square brackets and dot notation. For
example, `[training]` is a section and `[training.batch_size]` a subsection.
example, `[training]` is a section and `[training.batcher]` a subsection.
Subsections can define values, just like a dictionary, or use the `@` syntax to
refer to [registered functions](#config-functions). This allows the config to
not just define static settings, but also construct objects like architectures,
Expand Down Expand Up @@ -254,7 +254,7 @@ For cases like this, you can set additional command-line options starting with
block.
```bash
$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy --paths.dev ./corpus/dev.spacy --training.batch_size 128
$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy --paths.dev ./corpus/dev.spacy --training.max_epochs 3
```
Only existing sections and values in the config can be overwritten. At the end
Expand All @@ -279,7 +279,7 @@ process. Environment variables **take precedence** over CLI overrides and values
defined in the config file.
```bash
$ SPACY_CONFIG_OVERRIDES="--system.gpu_allocator pytorch --training.batch_size 128" ./your_script.sh
$ SPACY_CONFIG_OVERRIDES="--system.gpu_allocator pytorch --training.max_epochs 3" ./your_script.sh
```
### Reading from standard input {id="config-stdin"}
Expand Down Expand Up @@ -578,16 +578,17 @@ now-updated model to the predicted docs.
The training configuration defined in the config file doesn't have to only
consist of static values. Some settings can also be **functions**. For instance,
the `batch_size` can be a number that doesn't change, or a schedule, like a
the batch size can be a number that doesn't change, or a schedule, like a
sequence of compounding values, which has shown to be an effective trick (see
[Smith et al., 2017](https://arxiv.org/abs/1711.00489)).
```ini {title="With static value"}
[training]
batch_size = 128
[training.batcher]
@batchers = "spacy.batch_by_words.v1"
size = 3000
```
To refer to a function instead, you can make `[training.batch_size]` its own
To refer to a function instead, you can make `[training.batcher.size]` its own
section and use the `@` syntax to specify the function and its arguments – in
this case [`compounding.v1`](https://thinc.ai/docs/api-schedules#compounding)
defined in the [function registry](/api/top-level#registry). All other values
Expand All @@ -606,7 +607,7 @@ from your configs.
> optimizer.
```ini {title="With registered function"}
[training.batch_size]
[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
Expand Down Expand Up @@ -1027,14 +1028,14 @@ def my_custom_schedule(start: int = 1, factor: float = 1.001):
```
In your config, you can now reference the schedule in the
`[training.batch_size]` block via `@schedules`. If a block contains a key
`[training.batcher.size]` block via `@schedules`. If a block contains a key
starting with an `@`, it's interpreted as a reference to a function. All other
settings in the block will be passed to the function as keyword arguments. Keep
in mind that the config shouldn't have any hidden defaults and all arguments on
the functions need to be represented in the config.
```ini {title="config.cfg (excerpt)"}
[training.batch_size]
[training.batcher.size]
@schedules = "my_custom_schedule.v1"
start = 2
factor = 1.005
Expand Down
6 changes: 3 additions & 3 deletions website/meta/universe.json
Original file line number Diff line number Diff line change
Expand Up @@ -2806,7 +2806,7 @@
"",
"# see github repo for examples on sentence-transformers and Huggingface",
"nlp = spacy.load('en_core_web_md')",
"nlp.add_pipe(\"text_categorizer\", ",
"nlp.add_pipe(\"classy_classification\", ",
" config={",
" \"data\": data,",
" \"model\": \"spacy\"",
Expand Down Expand Up @@ -3010,8 +3010,8 @@
"# Load the spaCy language model:",
"nlp = spacy.load(\"en_core_web_sm\")",
"",
"# Add the \"text_categorizer\" pipeline component to the spaCy model, and configure it with SetFit parameters:",
"nlp.add_pipe(\"text_categorizer\", config={",
"# Add the \"spacy_setfit\" pipeline component to the spaCy model, and configure it with SetFit parameters:",
"nlp.add_pipe(\"spacy_setfit\", config={",
" \"pretrained_model_name_or_path\": \"paraphrase-MiniLM-L3-v2\",",
" \"setfit_trainer_args\": {",
" \"train_dataset\": train_dataset",
Expand Down

0 comments on commit ae7f64f

Please sign in to comment.