Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic CluProcessor annotation fails for relatively simple operation #548

Closed
reynoldsm88 opened this issue Sep 8, 2021 · 6 comments
Closed

Comments

@reynoldsm88
Copy link

Description

I would like to use CluProcessor to go through a list of words and lemmatize them. In the past I used the code similar to the code snippet I've provided, however, now this operation is causing an exception related to something with WordEmbeddings during a sanity check of the document.

I'm not sure if this is something I'm doing wrong or if this is a bug.

Versions

  • java : 1.8.0_202
  • scala : 2.12.7
  • clulab-processors-main : 8.4.2

Code

The following code results in an exception in my test case...

val processor = new CluProcessor()
val doc = processor.mkDocument( "counties" )
processor.lemmatize( doc )

Stacktrace

[info]   at org.clulab.processors.clu.CluProcessor.basicSanityCheck(CluProcessor.scala:520)
[info]   at org.clulab.processors.clu.CluProcessor.lemmatize(CluProcessor.scala:340)
[info]   at com.acme.MyTestSuite.$anonfun$new$1(MyTestSuite.scala:21)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.flatspec.AnyFlatSpecLike$$anon$5.apply(AnyFlatSpecLike.scala:1683)
[info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
@MihaiSurdeanu
Copy link
Contributor

Thanks @reynoldsm88 !

It seems to me that one dependency (the one that handles word embeddings) is not configured correctly. Because of its size, we store it on a local artifactory install, which must be added to the list of repos in the build file. Here is one build file that exemplifies this:

https://github.com/clulab/habitus/blob/main/build.sbt

I don't know if @kwalcock wants to add anything to this.

@BeckySharp
Copy link
Contributor

@MihaiSurdeanu do we not need proc.tagPartsOfSpeech(doc) first?

@MihaiSurdeanu
Copy link
Contributor

MihaiSurdeanu commented Sep 8, 2021 via email

@kwalcock
Copy link
Member

kwalcock commented Sep 8, 2021

This seems to work. YMMV. It may be that the sanity check is overreacting if only lemmatize is going to be run.

package org.clulab.processors.clu

import org.clulab.dynet.Utils

object LemmatizeApp extends App {
  Utils.initializeDyNet()

  val processor = new CluProcessor()
  val doc = processor.mkDocument("counties")

  processor.mkConstEmbeddings(doc)
  processor.lemmatize(doc)
}

@reynoldsm88
Copy link
Author

Thanks for the quick response @kwalcock @MihaiSurdeanu and @BeckySharp!

Keith's fix did the trick, everything is working as usual again. Do you want me to mark the issue as resolved? Otherwise, feel free to close it as resolved.

@MihaiSurdeanu
Copy link
Contributor

Thanks @reynoldsm88 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants