-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make room for other backend (NN) implementations #549
base: master
Are you sure you want to change the base?
Conversation
Add a layer of abstraction to support a TorchScript version and others
Perhaps we can discuss this at the meeting on Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. Thanks @kwalcock!
This is a draft of what might eventually be the branch that integrates the PyTorch code into processors, but probably not until we get as far as inference. Please see also #550 for ongoing developments. |
I'm beginning to think that the document attachment is unnecessary and overly complex. I think it is there so that the interface is properly satisfied with a simple tagPartsOfSpeech(doc) and recognizeNamedEntities(doc) etc. rather than tagPartsOfSpeedh(doc, embeddings) and recognizeNamedEntities(doc, embeddings). However, calling tagPartsOfSpeech(doc) without going through annotate(doc) will only result in an exception during the basicSanityCheck anyway, so the individual interface methods are not very useful. I'm thinking that they should be changed as below with annotate calling the version with the extra argument. This is also related to the problem that Michael Reynolds reported (#548) recently in which the relatively simple example needed to be modified to account for the embeddings. It probably isn't really necessary. The predicate attachment is in a similar situation although it is not sanity checked in the same way. override def annotate(doc:Document): Document = {
// Do this just once and share the result in a parameter instead of document attachment.
val embeddings = MetalBackend.mkEmbeddings(doc)
tagPartsOfSpeech(doc, embeddings)
recognizeNamedEntities(doc, embeddings)
chunking(doc)
parse(doc, embeddings)
lemmatize(doc)
srl(doc, embeddings)
resolveCoreference(doc)
discourse(doc)
doc
}
override def tagPartsOfSpeech(doc: Document) = {
// In case this is a one-off call, the embeddings have to be created separately.
val embeddings = MetalBackend.mkEmbeddings(doc)
tagPartsOfSpeech(doc, embeddings)
}
// Normally the embeddings are supplied as when called from annotate().
def tagPartsOfSpeech(doc:Document, embeddings: ...): Unit = {
... |
Thanks @kwalcock ! Also, the doc attachment is necessary to store some intermediate state from CoreNLP, which uses CoreNLP data structures, and I wanted to hide it from the API. |
Add a layer of abstraction to support a TorchScript version and others