Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sequenceLabeler.learn producing run error #1861

Closed
PramodParida opened this issue May 4, 2019 · 12 comments
Closed

sequenceLabeler.learn producing run error #1861

PramodParida opened this issue May 4, 2019 · 12 comments
Assignees
Labels
Bug Bug in learning semantics, critical by default In PR There is a PR waiting to be merged for this issue L2S Learning to search subsystem Lang: Python
Milestone

Comments

@PramodParida
Copy link

PramodParida commented May 4, 2019

Hi,

I am getting this error while running the below code:

Code:

class SequenceLabeler(pyvw.SearchTask):
    def __init__(self, vw, sch, num_actions):
        # you must must must initialize the parent class
        # this will automatically store self.sch <- sch, self.vw <- vw
        pyvw.SearchTask.__init__(self, vw, sch, num_actions)
        
        # set whatever options you want
        sch.set_options( sch.AUTO_HAMMING_LOSS | sch.AUTO_CONDITION_FEATURES )

    def _run(self, sentence):   # it's called _run to remind you that you shouldn't call it directly!
        output = []
        for n in range(len(sentence)):
            pos,word = sentence[n]
            # use "with...as..." to guarantee that the example is finished properly
            with self.vw.example({'w': [word]}) as ex:
                pred = self.sch.predict(examples=ex, my_tag=n+1, oracle=pos, condition=[(n,'p'), (n-1, 'q')])
                output.append(pred)
        return output

vw = pyvw.vw("--search 4 --search_task hook --ring_size 1024") # 3 is the number of labels
sequenceLabeler = vw.init_search_task(SequenceLabeler)

for i in range(3):
    sequenceLabeler.learn(my_dataset)

Error:

  File "/home/pramod/anaconda3/lib/python3.6/site-packages/vowpalwabbit-8.6.1-py3.6-linux-x86_64.egg/vowpalwabbit/pyvw.py", line 24, in run
    def run(): self._output = self._run(my_example)

  File "<ipython-input-26-b725e17b2470>", line 15, in _run
    with self.vw.example({'w': [word]}) as ex:

AttributeError: __exit__

Traceback (most recent call last):

  File "<ipython-input-26-b725e17b2470>", line 26, in <module>
    sequenceLabeler.learn(my_dataset)

  File "/home/pramod/anaconda3/lib/python3.6/site-packages/vowpalwabbit-8.6.1-py3.6-linux-x86_64.egg/vowpalwabbit/pyvw.py", line 36, in learn
    self._call_vw(my_example, isTest=False);

  File "/home/pramod/anaconda3/lib/python3.6/site-packages/vowpalwabbit-8.6.1-py3.6-linux-x86_64.egg/vowpalwabbit/pyvw.py", line 31, in _call_vw
    self.vw.learn(self.bogus_example) # this will cause our ._run hook to get called

  File "/home/pramod/anaconda3/lib/python3.6/site-packages/vowpalwabbit-8.6.1-py3.6-linux-x86_64.egg/vowpalwabbit/pyvw.py", line 169, in learn
    pylibvw.vw.learn_multi(self,ec)

RuntimeError: std::exception

Please provide hints to solve this.

Thanks

@jackgerrits
Copy link
Member

Are you able to provide the example from your dataset so I can repro the issue?

@jackgerrits jackgerrits added the In Discussion More information is required to proceed label May 8, 2019
@PramodParida
Copy link
Author

I am using the dataset atis and have formated it to the type vw. But facing this issue with pyvw library it is breaking as posted above. I have facing this issue both in windows and linux.

Thanks

@lokitoth
Copy link
Member

lokitoth commented May 9, 2019

@PramodParida: Are you trying to learn audio => transcription for ATIS, or something else?

Also, did you build the VW library you are using, or did you get a pre-built one?

@PramodParida
Copy link
Author

@lokitoth: NO it's tagged text data. I have vowpalwabbit running in command line mode but facing this issue while using the python wrapper.

@lokitoth lokitoth added the Bug Bug in learning semantics, critical by default label May 23, 2019
@arielf
Copy link
Collaborator

arielf commented Jun 25, 2019

@PramodParida

@lokitoth question is important.

"did you build the VW library you are using, or did you get a pre-built one?"

Generally, any precompiled library is unlikely to be compatible with anaconda, which has it own incompatible tool-chain (incl compilers)

@PramodParida
Copy link
Author

The command line mode works fine.

Yes, I have build the library. But python interface is not working.

Please resolve the python issue.

Thanks

@lokitoth
Copy link
Member

lokitoth commented Dec 4, 2019

The issue here is due to changes to how example disposal works in the Python bindings. We mistakenly kept enter when removing exit in #1837.

PR #2176. removes the enter call and updates the example scripts to properly dispose the generated examples (without with, as that is no longer supported.) Unfortunately, it seems that there was no good way to implement finish_example without the breaking change.

@lokitoth lokitoth added In PR There is a PR waiting to be merged for this issue L2S Learning to search subsystem and removed In Discussion More information is required to proceed labels Dec 4, 2019
@andy-soft
Copy link

I have the same problem, and cannot continue , want to test to POS tagger and the NER labeler, and both fail at the exact same point!
read the code but found no way to solve it!
please help!
C# wrapper (.net 4.6+) don't work either for POS labeling keept throwing error on IntPtr!
on the source the POS tagger has been removed as well as the NER splitter ¿why?

@lokitoth
Copy link
Member

lokitoth commented Dec 4, 2019

Hi @andy-soft: Could you elaborate what you mean by "on the source the POS tagger has been removed as well as the NER splitter"?

Take a look at the PR referenced above to see how the examples change to deal with the issue mentioned here.

With that said, there are additional issues in more complex LDF-based tasks that I do not yet have a handle on.

I have been focusing on Python right now, but will look at the C# bindings next.

In particular, the issue is here:

with self.vw.example({'w': [word]}) as ex:
   # ... code here

This needs to be replaced with:

ex = self.vw.example({'w': [word]})
# ... code here #make sure to remove the indent
self.vw.finish_example([ex]) # In search, need to pass examples into finish_example as a list.

@lokitoth
Copy link
Member

lokitoth commented Dec 5, 2019

The samples have been updated with the correct code to use in this case with #2176. LDF issues with Covington DEP Parser, and Word Alignment are tracked by #2175.

@lokitoth lokitoth closed this as completed Dec 5, 2019
@andy-soft
Copy link

Hi @andy-soft: Could you elaborate what you mean by "on the source the POS tagger has been removed as well as the NER splitter"?

Take a look at the PR referenced above to see how the examples change to deal with the issue mentioned here.

With that said, there are additional issues in more complex LDF-based tasks that I do not yet have a handle on.

I have been focusing on Python right now, but will look at the C# bindings next.

In particular, the issue is here:

with self.vw.example({'w': [word]}) as ex:
   # ... code here

This needs to be replaced with:

ex = self.vw.example({'w': [word]})
# ... code here #make sure to remove the indent
self.vw.finish_example([ex]) # In search, need to pass examples into finish_example as a list.

Hi, thanks for the reply

AS I saw the "Oracle" for a sequence detector/classifier, needs to be built externally, as the Python interface failed, (just because of all the previous blah blah) I tried to test the system with a Spanish complicated POS tagging and NER detection, and never found the documentation on how to link the code with external C# code, I tried to deduct the usage reading the "unit testing" on the distributed code, but this never compiled on my computer (using W10 VS2019 and all the C++ SDK % pkg installed correctly) I only got to work the version 8.60 retrieved by Nuget pkg mgr.
The python interface only runs under the Ubuntu-internal windows subsystem, I never got to build it on windows, each time I tried to install it (batch, etc.) it throws strange and unrecognizable errors.
But under python 3.6 the "pyvw" didn't even work well still (I have to test the upgrade yet)

Under C# I found no documentation about the interface or API, so I had to test all by myself, and try-error is just too hard to get a result. I wanted to use the VW for many internal NLP tasks, as I deducted it is able o handle the high-dimensionality of a highly inflected language like Spanish.

Many of the "testing apps, and procedures, simply do not compile nor run with the NuGet-8.6 pkg
I know that this is all part of a work-in-progress but most of the tutorials are old and you cannot reproduce them in any way, AS an example the POS tagger and NER detector were mysteriously removed and the link is unexistent.

I am a skilled C# programmer (but not in C++) and did not find out how to use the C# to C++interface. Maybe because I got no success in understanding the "internals", on many operations like multiclass classification I got it trained, but couldn't get the predictions "out of the VW" just because I found no way to do it. (I even tried the structured parameters).. no luck!

I want to make complex predictions involving several parallel selections, like doing POS tagging + Semantic + Grammatical Parsing, all in one step, using "contextual Bandit" mode, I make it using a modified-by-me HMM tagger who does all the things at once! (I guess WV will do this better)

I also will build on VW a NER detector + classifier for a generic noun-phrase, using Spanish and a previously processed corpus. I am building NLP systems for 15 years and I was seduced with the promise of VW outperforming CRF++ and other CNN-LSTM using embeddings, I am working on a sequence labeler to "understand" the structure of discourse in Spanish, with several spell-errors, not-an-easy-task, indeed!

I succeeded in doing joint predictions and by correcting spelling and making POS tagging at once! (on my system) I guess VW will do this far better!

If you could provide me with a simple "working" C# sample of the oracle, I can start from this on!

Thanks anyway!

@andy-soft
Copy link

Hi there, the repairing-code worked (Haleluyah) on training, but on the "prediction" it still gives strange errors!


AssertionError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/vowpalwabbit/pyvw.py in run()
22 self._output = None
23 self.bogus_example[0].set_test_only(isTest)
---> 24 def run(): self._output = self._run(my_example)
25 setup = None
26 takedown = None

in _run(self, sentence)
14 # use "with...as..." to guarantee that the example is finished properly
15 ex = self.vw.example({'w': [word]})
---> 16 pred = self.sch.predict(examples=ex, my_tag=n+1, oracle=pos, condition=[(n,'p'), (n-1, 'q')])
17 output.append(pred)
18 return output

~/.local/lib/python3.6/site-packages/vowpalwabbit/pyvw.py in predict(examples, my_tag, oracle, condition, allowed, learner_id)
310 P.set_oracles(oracle)
311 elif isinstance(oracle, int):
--> 312 assert oracle > 0, 'multiclass labels are from 1..., please do not use zero or bad things will happen!'
313 P.set_oracle(oracle)
314 else:

AssertionError: multiclass labels are from 1..., please do not use zero or bad things will happen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Bug in learning semantics, critical by default In PR There is a PR waiting to be merged for this issue L2S Learning to search subsystem Lang: Python
Projects
None yet
Development

No branches or pull requests

5 participants