Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No identity for record no.priv.garshol.duke.CompactRecord@61001b64 #253

Open
arjasethan1 opened this issue Apr 30, 2018 · 2 comments
Open

Comments

@arjasethan1
Copy link

Any Idea about the error? I am trying to use active learning for record linkage for software names form two different sources.

[GeneticConfiguration 0.15 [ID] [VENDOR NumericComparator 0.77 0.12] [PRODUCT DifferentComparator 0.75 0.23] [VERSION QGramComparator 0.53 0.22]]
Exception in thread "main" no.priv.garshol.duke.DukeException: No identity for record no.priv.garshol.duke.CompactRecord@61001b64
at no.priv.garshol.duke.matchers.TestFileListener.getid(TestFileListener.java:225)
at no.priv.garshol.duke.matchers.TestFileListener.matches(TestFileListener.java:102)
at no.priv.garshol.duke.Processor.registerMatch(Processor.java:601)
at no.priv.garshol.duke.Processor.compareCandidatesBest(Processor.java:493)
at no.priv.garshol.duke.Processor.match(Processor.java:428)
at no.priv.garshol.duke.Processor.match(Processor.java:252)
at no.priv.garshol.duke.Processor.linkBatch(Processor.java:379)
at no.priv.garshol.duke.Processor.linkRecords(Processor.java:364)
at no.priv.garshol.duke.Processor.linkRecords(Processor.java:342)
at no.priv.garshol.duke.genetic.GeneticAlgorithm.evaluate(GeneticAlgorithm.java:348)
at no.priv.garshol.duke.genetic.GeneticAlgorithm.evolve(GeneticAlgorithm.java:208)
at no.priv.garshol.duke.genetic.GeneticAlgorithm.run(GeneticAlgorithm.java:188)
at com.fractal.dataextraction.ACDP.DukeTest$.delayedEndpoint$com$fractal$dataextraction$ACDP$DukeTest$1(DukeTest.scala:26)
at com.fractal.dataextraction.ACDP.DukeTest$delayedInit$body.apply(DukeTest.scala:7)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.fractal.dataextraction.ACDP.DukeTest$.main(DukeTest.scala:7)
at com.fractal.dataextraction.ACDP.DukeTest.main(DukeTest.scala)

@larsga
Copy link
Owner

larsga commented May 1, 2018

It means that you have no ID field for this record. That's a problem, because then Duke has no way to identify the record when reporting back to you. So you need to make sure the schema declares an ID field, and that every record has a value for this field.

@arjasethan1
Copy link
Author

arjasethan1 commented May 1, 2018

Hi @larsga, thanks for the quick reply. I made sure to remove all the null values and it seems working. But in the active mode duke is not asking me any questions, does it expose those questions to any http://localhost:<> ? I am not able to find this info in the documentation. Here are my settings.

` val geneticAlgorithm = new GeneticAlgorithm(config, null, false)

geneticAlgorithm.setActive(true)
// geneticAlgorithm.setThreads(5)
geneticAlgorithm.setConfigOutput("output/config_output.xml")
geneticAlgorithm.setLinkFile("output/label_data.txt")
geneticAlgorithm.setQuestions(10)
geneticAlgorithm.run()
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants