Make room for other backend (NN) implementations #549

kwalcock · 2021-09-10T15:58:20Z

Add a layer of abstraction to support a TorchScript version and others

kwalcock · 2021-09-10T15:59:16Z

Perhaps we can discuss this at the meeting on Monday.

MihaiSurdeanu

This makes sense. Thanks @kwalcock!

kwalcock · 2021-09-13T18:17:01Z

This is a draft of what might eventually be the branch that integrates the PyTorch code into processors, but probably not until we get as far as inference. Please see also #550 for ongoing developments.

kwalcock · 2021-09-13T19:06:34Z

I'm beginning to think that the document attachment is unnecessary and overly complex. I think it is there so that the interface is properly satisfied with a simple tagPartsOfSpeech(doc) and recognizeNamedEntities(doc) etc. rather than tagPartsOfSpeedh(doc, embeddings) and recognizeNamedEntities(doc, embeddings). However, calling tagPartsOfSpeech(doc) without going through annotate(doc) will only result in an exception during the basicSanityCheck anyway, so the individual interface methods are not very useful. I'm thinking that they should be changed as below with annotate calling the version with the extra argument. This is also related to the problem that Michael Reynolds reported (#548) recently in which the relatively simple example needed to be modified to account for the embeddings. It probably isn't really necessary. The predicate attachment is in a similar situation although it is not sanity checked in the same way.

  override def annotate(doc:Document): Document = {
    // Do this just once and share the result in a parameter instead of document attachment.
    val embeddings = MetalBackend.mkEmbeddings(doc)

    tagPartsOfSpeech(doc, embeddings)
    recognizeNamedEntities(doc, embeddings)
    chunking(doc)
    parse(doc, embeddings)
    lemmatize(doc)
    srl(doc, embeddings)
    resolveCoreference(doc)
    discourse(doc)
    doc
  }

  override def tagPartsOfSpeech(doc: Document) = {
    // In case this is a one-off call, the embeddings have to be created separately.
    val embeddings = MetalBackend.mkEmbeddings(doc)
    tagPartsOfSpeech(doc, embeddings)
  }

  // Normally the embeddings are supplied as when called from annotate().
  def tagPartsOfSpeech(doc:Document, embeddings: ...): Unit = {
...

MihaiSurdeanu · 2021-10-06T02:55:42Z

Thanks @kwalcock !

Also, the doc attachment is necessary to store some intermediate state from CoreNLP, which uses CoreNLP data structures, and I wanted to hide it from the API.

Make room for other backend (NN) implementations

decc9b8

Add a layer of abstraction to support a TorchScript version and others

kwalcock requested review from MihaiSurdeanu and ZhengTang1120 September 10, 2021 15:58

Avoid double laziness

e618b9e

MihaiSurdeanu approved these changes Sep 10, 2021

View reviewed changes

ZhengTang1120 approved these changes Sep 13, 2021

View reviewed changes

kwalcock added 2 commits October 5, 2021 16:40

Move AnnotatedSentence

008580d

Add Onnx backend

4f8cd3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make room for other backend (NN) implementations #549

Make room for other backend (NN) implementations #549

kwalcock commented Sep 10, 2021

kwalcock commented Sep 10, 2021

MihaiSurdeanu left a comment

kwalcock commented Sep 13, 2021

kwalcock commented Sep 13, 2021

MihaiSurdeanu commented Oct 6, 2021

Make room for other backend (NN) implementations #549

Are you sure you want to change the base?

Make room for other backend (NN) implementations #549

Conversation

kwalcock commented Sep 10, 2021

kwalcock commented Sep 10, 2021

MihaiSurdeanu left a comment

Choose a reason for hiding this comment

kwalcock commented Sep 13, 2021

kwalcock commented Sep 13, 2021

MihaiSurdeanu commented Oct 6, 2021