Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make room for other backend (NN) implementations #549

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

kwalcock
Copy link
Member

Add a layer of abstraction to support a TorchScript version and others

Add a layer of abstraction to support a TorchScript version and others
@kwalcock
Copy link
Member Author

Perhaps we can discuss this at the meeting on Monday.

Copy link
Contributor

@MihaiSurdeanu MihaiSurdeanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. Thanks @kwalcock!

@kwalcock
Copy link
Member Author

This is a draft of what might eventually be the branch that integrates the PyTorch code into processors, but probably not until we get as far as inference. Please see also #550 for ongoing developments.

@kwalcock
Copy link
Member Author

I'm beginning to think that the document attachment is unnecessary and overly complex. I think it is there so that the interface is properly satisfied with a simple tagPartsOfSpeech(doc) and recognizeNamedEntities(doc) etc. rather than tagPartsOfSpeedh(doc, embeddings) and recognizeNamedEntities(doc, embeddings). However, calling tagPartsOfSpeech(doc) without going through annotate(doc) will only result in an exception during the basicSanityCheck anyway, so the individual interface methods are not very useful. I'm thinking that they should be changed as below with annotate calling the version with the extra argument. This is also related to the problem that Michael Reynolds reported (#548) recently in which the relatively simple example needed to be modified to account for the embeddings. It probably isn't really necessary. The predicate attachment is in a similar situation although it is not sanity checked in the same way.

  override def annotate(doc:Document): Document = {
    // Do this just once and share the result in a parameter instead of document attachment.
    val embeddings = MetalBackend.mkEmbeddings(doc)

    tagPartsOfSpeech(doc, embeddings)
    recognizeNamedEntities(doc, embeddings)
    chunking(doc)
    parse(doc, embeddings)
    lemmatize(doc)
    srl(doc, embeddings)
    resolveCoreference(doc)
    discourse(doc)
    doc
  }

  override def tagPartsOfSpeech(doc: Document) = {
    // In case this is a one-off call, the embeddings have to be created separately.
    val embeddings = MetalBackend.mkEmbeddings(doc)
    tagPartsOfSpeech(doc, embeddings)
  }

  // Normally the embeddings are supplied as when called from annotate().
  def tagPartsOfSpeech(doc:Document, embeddings: ...): Unit = {
...

@MihaiSurdeanu
Copy link
Contributor

Thanks @kwalcock !

Also, the doc attachment is necessary to store some intermediate state from CoreNLP, which uses CoreNLP data structures, and I wanted to hide it from the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants