Multiple textcat_multilabel models - only run on demand #10177
Replies: 1 comment 4 replies
-
What I would do is have two pipelines. Call them the generalist (all data) and the specialist (deals with a subset). You would train them each as usual, using a subset of data for the specialist, based on what you'll actually pass it. For inference you can do something like this:
We recently added the ability to pass docs to pipelines. This is mainly intended for adding extra data, so I'm not sure it helps much here, but it is an option. You could also train one pipeline with two textcat components. You would disable the specialist on the first call ( Another separate option would be to combine your classification into a single step. So if you have a label We've been meaning to make a hierarchical textcat component that could probably help with this, but we haven't started work on it yet. Can you clarify why you only want to run the second textcat on demand? I don't think the overhead of a textcat should be that high in general. |
Beta Was this translation helpful? Give feedback.
-
I have a pipeline where I need to tag the whole document with a trained
textcat_multilabel
component. Based on the labels I want to run anothertextcat_multilabel
model. How would a config look like for that (just in broad lines)? Also how can I make sure that my models don't automatically runs when I create the document, i.e. onnlp(text)
. I only want to run on demand.Beta Was this translation helpful? Give feedback.
All reactions