Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added function to replace gensim check_output with subprocess.check_output #1182

Closed
wants to merge 17 commits into from

Conversation

kirit93
Copy link
Contributor

@kirit93 kirit93 commented Mar 4, 2017

@tmylk I've added a function above the existing check_output function in an attempt to solve this issue. Please let me know if there's anything I need to change. This is with reference to issue #703

@tmylk
Copy link
Contributor

tmylk commented Mar 5, 2017

@kirit93 Thanks for the PR.
How do we know that the output is printed to the log? Please add a test

Could you please replace the actual check_output and run the tests for mallet, dtm, fasttext and wordrank wrappers.

There was some reason I added extra KeyboardInterrupt handling to the code - need to look more in order to find what was the issue now. Please check that behaviour is preserved.

@kirit93
Copy link
Contributor Author

kirit93 commented Mar 6, 2017

@tmylk I ran the tests for mallet, dtm, fasttext and wordrank wrappers. All pass except for fasttext, but the error I get with my check_output function is the same I get when I use the existing gensim check_output function. So I'm not sure why that is failing, any idea?
This is what is shown on the log -

======================================================================
FAIL: testSimilarity (__main__.TestFastText)
Test similarity for in-vocab and out-of-vocab words
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_fasttext_wrapper.py", line 139, in testSimilarity
    self.assertEqual(self.test_model.similarity('night', 'nights'), self.test_model.similarity('nights', 'night'))
AssertionError: 0.3508754735782928 != 0.35087547357829274

----------------------------------------------------------------------
Ran 15 tests in 9.282s

FAILED (failures=1)

I'll get started with the rest of the things you mentioned.

@piskvorky piskvorky changed the title Issue #703. Added function to replace gensim check_output with subprocess.check_output Added function to replace gensim check_output with subprocess.check_output Mar 6, 2017
@piskvorky
Copy link
Owner

piskvorky commented Mar 6, 2017

Looks like a bad test -- floats should never be compared for bit equality. Instead, there's np.allclose() etc.

@kirit93
Copy link
Contributor Author

kirit93 commented Mar 6, 2017

Should I fix those tests?
Currently the function uses :

self.assertEqual(self.test_model.similarity('night', 'nights'), self.test_model.similarity('nights', 'night'))

I'll change it to :

self.assertTrue(numpy.allclose(self.test_model.similarity('night', 'nights'), self.test_model.similarity('nights', 'night')))

I made the change and ran the test and there are no failures, the assertEqual is used in a couple of other places in this test. If this fix is okay, I'll make the changes everywhere.

@tmylk
Copy link
Contributor

tmylk commented Mar 6, 2017

@kirit93 Confirm that using allclose is the right way to go.

Does the error output get printed to the log with this new change?

@kirit93
Copy link
Contributor Author

kirit93 commented Mar 7, 2017

@tmylk numpy.testing.assert_allclose() seems to be a better option for float comparisons. It prints a verbose error message to the log unlike numpy.allclose. It uses a default relative tolerance of 1e-7 which can be set to 0 if desired.

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address comments about exception handling

gensim/utils.py Outdated
raise
except:
error = "Error in check_output while trying to execute: \n ' " + str(args) + " '\nthis command was not found"
print(error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please output to the log

Copy link
Contributor Author

@kirit93 kirit93 Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular log file I should be logging output to? If not STDOUT, what is the right way to send out the error message? In the earlier function the output and error were being returned from the function. Should I do the same? I could output it to STDERR as the error generated by the shell when it tries executing the incorrect command is also written to STDERR.

gensim/utils.py Outdated
process.terminate()
raise
except:
error = "Error in check_output while trying to execute: \n ' " + str(args) + " '\nthis command was not found"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be exceptions even when command is found? if yes, then please remove "command was not found"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I overlooked that detail. Should I change the error message to something more generic like " xyz command could not be executed "?

gensim/utils.py Outdated
except KeyboardInterrupt:
process.terminate()
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of catching and immediately throwing the exception? this line doesn't add anything as is

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sure the catch-all except: below doesn't apply to KeyboardInterrupt.

But it's not a good practice anyway, because there's also SystemExit etc. At least limit the catch-all to Exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to catch the subprocess.CalledProcessError and output an error message that is easy for the user to understand? This is the error raised when check_output fails.
For any other error including KeyboardInterrupt I can use a catch-all Exception and raise it.

gensim/utils.py Outdated
except KeyboardInterrupt:
process.terminate()
raise
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sure the catch-all except: below doesn't apply to KeyboardInterrupt.

But it's not a good practice anyway, because there's also SystemExit etc. At least limit the catch-all to Exception.

gensim/utils.py Outdated
@@ -1176,4 +1180,4 @@ def sample_dict(d, n=10, use_random=True):
according to natural dict iteration.
"""
selected_keys = random.sample(list(d), min(len(d), n)) if use_random else itertools.islice(iterkeys(d), n)
return [(key, d[key]) for key in selected_keys]
return [(key, d[key]) for key in selected_keys]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwanted change.

gensim/utils.py Outdated
raise
except:
error = "Error in check_output while trying to execute: \n ' " + str(args) + " '\nthis command was not found"
print(error)
Copy link
Owner

@piskvorky piskvorky Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original exception was more useful, because it included stdout/stderr, which useful for error debugging.

Is there any way to do the same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I didn't fully understand what you mean here. Which original exception are you talking about?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant subprocess.CalledProcessError, a few lines above.

I don't have any strong opinion on what this function should do on failure, or how to capture the subprocess stdout/stderr. Except for sure we don't want to be printing anything -- logging preferred.

gensim/utils.py Outdated
In case args generates an error
>>> test_checkoutput(args=['/usr/bin/pythons', '-ve']) #Incorrect argument
/bin/sh: /usr/bin/pythons: No such file or directory
*Error in args : /usr/bin/pythons -ve
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that is printed? Or logged?

In general, we don't want to pollute stdout at all, unless the user explicitly requested printing.

Copy link
Contributor Author

@kirit93 kirit93 Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The No such file or directory line gets outputted to STDERR as it is the error returned by the shell when it tries to execute the command in args.
The Error in args: /usr/bin/pythons -ve message gets printed to STDOUT because I'm using a simple print command. Is there a log file you'd like me to log the output to or should I print to STDERR if we don't want STDOUT polluted?

gensim/utils.py Outdated
Instead of raising the error, output a more specific error message
"""
error = "subprocess.check_output could not execute command ' " + str(args) + " '"
print(error, file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log instead of printing, see examples in the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please output to log instead of printing.

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for the exception handling

gensim/utils.py Outdated
error = "subprocess.check_output could not execute command ' " + str(args) + " '"
print(error, file=sys.stderr)
return error
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is redundant. It will be raised anyway


class TestOutput(unittest.TestCase):
def test_check_output(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test that exception is raised

@@ -97,6 +97,10 @@ def test_check_output(self):
res = utils.check_output(args=["echo", "hello"])
self.assertEqual(res, b'hello\n')

def test_check_output_exception(self):
error = utils.check_output(args=['ldfs'])
self.assertEqual(error, "subprocess.check_output could not execute command ' ldfs '")
Copy link
Contributor Author

@kirit93 kirit93 Mar 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_utils.py imports utils.py from the installed gensim. So this test case is still running the old version of check_output and therefore the exception test fails. However, if you change the import to use the current version of utils.check_output, all tests pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can install your new version of gensim with python setup.py install

@tmylk
Copy link
Contributor

tmylk commented Mar 23, 2017

Please add unit test that exception is indeed raised.

gensim/utils.py Outdated
"""
error = "subprocess.check_output could not execute command ' " + str(args) + " '"
logger.error(error)
return error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason to return instead of previous raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I've chosen to simple return the error message is because raise results in the following large and not user friendly message.

Traceback (most recent call last):
  File "test.py", line 51, in test_check_out
    res = subprocess.check_output(args, shell=flag)
  File "/Users/kirit/anaconda/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/Users/kirit/anaconda/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'ldfs' returned non-zero exit status 127

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 76, in <module>
    test_check_out(args=["ldfs"])
  File "test.py", line 60, in test_check_out
    raise error
TypeError: exceptions must derive from BaseException

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising a string (instead of some exception) is definitely not a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we benefit from raising an exception here? Once the subprocess.CalledProcessError occurs and is caught, an appropriate message is returned to the user indicating why the check_output command failed. Wouldn't that be enough for the user to proceed correctly?

Copy link
Owner

@piskvorky piskvorky Apr 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your raise error already raises an exception. It's just that proper exceptions inherit from Exception (or appropriate subclasses), rather than being just strings.

Copy link
Contributor Author

@kirit93 kirit93 Apr 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do a raise Exception(error).
Would that be okay?

Copy link
Owner

@piskvorky piskvorky Apr 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into the actual logic. Just commenting on the discussion above, where you say The reason I've chosen to simple return the error message is because raise results in the following large and not user friendly message., but that message is only due to raising a string instead of a proper exception :)

Copy link
Owner

@piskvorky piskvorky Apr 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2c would be to use the same error-handling logic that the function employed previously, before your changes. The least surprise to users.

@kirit93
Copy link
Contributor Author

kirit93 commented Apr 19, 2017

@tmylk @piskvorky I've modified the function to handle errors just as the earlier function would. I hope this is okay. Please let me know if there are any further changes to be made so that the PR can be merged. Thanks!

gensim/utils.py Outdated
Instead of raising the error, output a more specific error message
"""
error = "subprocess.check_output could not execute command ' " + str(args) + " '"
logger.error(error)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please raise the CalledProcessError exception as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but why exactly do we need to raise the CalledProcessError at all. As long as we tell the user why check_output failed, shouldn't that be fine? Using shell=true with subprocess.check_output, in case of erroneous input, the same error message gets displayed as it would if the user were to enter the command on the terminal. This serves the purpose of telling the user why check_output failed.

If I pass ldfs to check_output, stderr will have /bin/sh: ldfs: command not found. Similarly, if you were to type the same command on the terminal, the output would be bash: ldfs: command not found.

In the earlier code stderr would have had a huge error message indicating a FileNotFound. If required CalledProcessError can be captured and a similar error message can be logged.

In case I've missed something and there is some reason that an implementation similar to the earlier one is preferred, I will look into that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all we need the actual output to be written to log, not to stderr.
Secondly, the tests should check that the return message is indeed for ldfs: command not found. The current message they test for is not informative

Copy link
Contributor Author

@kirit93 kirit93 May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be okay for the log to have the error message as follows

/bin/sh: ldfs: command not found
Command '['ldfs']' returned non-zero exit status 127.

along with a subprocess.CalledProcessError being raised?

In the test case, I'm unable to figure out how to use assertRaises to verify that it was in fact a subprocess.CalledProcessError that was raised.

@tmylk
Copy link
Contributor

tmylk commented May 2, 2017

It seems that tests didn't run for the last commit, test_check_output_exception is expected to fail now. Please remove that test.

Is it correct that the only difference between old code and this improvement is that the error is printed to log here:

error = "subprocess.check_output could not execute command ' " + str(args) + " '"
logger.error(error)

However this doesn't answer the original purpose of the issue in #703 :

There should ideally be a way for the user to know what the exact nature of the error is,

Please add code and tests that actually do that.

@kirit93
Copy link
Contributor Author

kirit93 commented May 19, 2017

@tmylk, could you check out my latest push and let me know if that works for us?

@kirit93
Copy link
Contributor Author

kirit93 commented May 29, 2017

@menshikh-iv, could you let me know if this PR is okay to merge?

gensim/utils.py Outdated
Python 2.6.2
Added extra KeyboardInterrupt handling
def check_output(args, flag=True):
r"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required r

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

gensim/utils.py Outdated
>>> check_output(args=['/usr/bin/python', '--version'])
Python 2.6.2
Added extra KeyboardInterrupt handling
def check_output(args, flag=True):
Copy link
Contributor

@menshikh-iv menshikh-iv May 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new interface breaks down many wrappers from gensim. First, please merge develop to your branch (it needs to correct test run).

Second, run tests locally with all wrappers like this
FT_HOME=/home/ivan/release/test/fastText WR_HOME=/home/ivan/release/test/wordrank VOWPAL_WABBIT_PATH=/home/ivan/release/test/vowpal_wabbit/vowpalwabbit/vw DTM_PATH=/home/ivan/release/test/dtm/dtm/main MALLET_HOME=/home/ivan/release/test/Mallet python setup.py test

Another variant is PR#1368 docker image. You can use it for full test run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My branch is upto date with develop. I ran python setup.py test with the appropriate env variable. Two tests failed but these are tests unrelated to what I am working on.

What should I do now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirit93 hm, strange, I'll run all the tests and write the results here.

Copy link
Contributor Author

@kirit93 kirit93 Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@menshikh-iv, It seems like the travis build succeeded after develop was merged into issue. Is the PR good to go?

@kirit93 kirit93 closed this May 31, 2017
@kirit93 kirit93 reopened this May 31, 2017
@menshikh-iv
Copy link
Contributor

I run all tests and get 39 errors, but they are all similar to each other:

First type

======================================================================
ERROR: testEnsemble (gensim.test.test_wordrank_wrapper.TestWordrank)
Test ensemble of two embeddings
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ivan/release/test/gensim/gensim/test/test_wordrank_wrapper.py", line 37, in setUp
    self.test_model = wordrank.Wordrank.train(self.wr_path, self.corpus_file, self.out_name, iter=6, dump_period=5, period=5)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/wordrank.py", line 87, in train
    os.makedirs(meta_dir)
  File "/home/ivan/.virtualenvs/clean2/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 17] File exists: '/home/ivan/release/test/wordrank/testmodel/meta'

Second type

======================================================================
ERROR: testLargeMmap (gensim.test.test_ldamallet_wrapper.TestLdaMallet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ivan/release/test/gensim/gensim/test/test_ldamallet_wrapper.py", line 61, in setUp
    self.model = ldamallet.LdaMallet(self.mallet_path, corpus, id2word=dictionary, num_topics=2, iterations=1)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/ldamallet.py", line 100, in __init__
    self.train(corpus)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/ldamallet.py", line 158, in train
    self.convert_input(corpus, infer=False)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/ldamallet.py", line 155, in convert_input
    check_output(args=cmd, shell=True)
TypeError: check_output() got an unexpected keyword argument 'shell'

Third type

======================================================================
ERROR: testCalledProcessError (gensim.test.test_dtm.TestDtmModel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ivan/release/test/gensim/gensim/test/test_dtm.py", line 70, in testCalledProcessError
    rng_seed=1)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/dtmmodel.py", line 129, in __init__
    self.train(corpus, time_slices, mode, model)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/dtmmodel.py", line 202, in train
    check_output(args=cmd, stderr=PIPE)
TypeError: check_output() got an unexpected keyword argument 'stderr'

For this reason, I can't merge your PR.

@kirit93
Copy link
Contributor Author

kirit93 commented Jun 4, 2017

@menshikh-iv for the second two error types I can fix it by changing the definition of my check_output function to be like this def check_output(args, flag=True, stderr=None).

This way the way the function is called will not have to be changed in the other files in the code.
Another option is that I could change the calls to check_output so that they fit in with the new implementation.

And the first type of error is not related to check_output right?

@menshikh-iv
Copy link
Contributor

@kirit93 First type error is strange, I will investigate it later. Could you please do a few things:

  • Return old interface for check_output function
  • Re-raise exception (if subprocess call breaks down)
  • Logging (with debug level) full command string with args, that user pass to check_output
  • Check case with KeyboardInterrupt (it should be possible to stop this call with Ctrl + C)

After this fixes, I will run all tests again and write a result to this PR.

@kirit93
Copy link
Contributor Author

kirit93 commented Jun 9, 2017

@menshikh-iv I made the changes you requested and pushed them. Travis built successfully. Please let me know if the PR is okay now.

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Jun 13, 2017

Sorry for late response @kirit93,

Now, I get a new kind of error, look at command

======================================================================
ERROR: testLargeMmap (gensim.test.test_ldamallet_wrapper.TestLdaMallet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ivan/release/test/gensim/gensim/test/test_ldamallet_wrapper.py", line 61, in setUp
    self.model = ldamallet.LdaMallet(self.mallet_path, corpus, id2word=dictionary, num_topics=2, iterations=1)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/ldamallet.py", line 100, in __init__
    self.train(corpus)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/ldamallet.py", line 158, in train
    self.convert_input(corpus, infer=False)
  File "/home/ivan/release/test/gensim/gensim/models/wrappers/ldamallet.py", line 155, in convert_input
    check_output(args=cmd, shell=True)
  File "/home/ivan/release/test/gensim/gensim/utils.py", line 1171, in check_output
    res = subprocess.check_output(args, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 219, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '/ h o m e / i v a n / r e l e a s e / t e s t / M a l l e t / b i n / m a l l e t   i m p o r t - f i l e   - - p r e s e r v e - c a s e   - - k e e p - s e q u e n c e   - - r e m o v e - s t o p w o r d s   - - t o k e n - r e g e x   " \ S + "   - - i n p u t   / t m p / 1 5 e 4 d b _ c o r p u s . t x t   - - o u t p u t   / t m p / 1 5 e 4 d b _ c o r p u s . m a l l e t' returned non-zero exit status 126

@kirit93
Copy link
Contributor Author

kirit93 commented Jun 13, 2017

@menshikh-iv, I reverted to the old implementation of check_output and I ran the tests again. I still get an error like the one you posted.

ERROR: testCnpmiVWModel (gensim.test.test_coherencemodel.TestCoherenceModel)
Perform sanity check to see if c_npmi coherence works with LDA VW gensim wrapper
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/kirit/OpenSource/gensim/gensim/test/test_coherencemodel.py", line 70, in setUp
    self.malletmodel = LdaMallet(mallet_path=self.mallet_path, corpus=corpus, id2word=dictionary, num_topics=2, iterations=0)
  File "/Users/kirit/OpenSource/gensim/gensim/models/wrappers/ldamallet.py", line 100, in __init__
    self.train(corpus)
  File "/Users/kirit/OpenSource/gensim/gensim/models/wrappers/ldamallet.py", line 158, in train
    self.convert_input(corpus, infer=False)
  File "/Users/kirit/OpenSource/gensim/gensim/models/wrappers/ldamallet.py", line 155, in convert_input
    check_output(args=cmd, shell=True)
  File "/Users/kirit/OpenSource/gensim/gensim/utils.py", line 1206, in check_output
    raise error
subprocess.CalledProcessError: Command '/home/ivan/release/test/Mallet/bin/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /var/folders/gq/gqtxhd8j2cl0v9chmcjpd3mr0000gn/T/9de601_corpus.txt --output /var/folders/gq/gqtxhd8j2cl0v9chmcjpd3mr0000gn/T/9de601_corpus.mallet' returned non-zero exit status 127.

Could the cause of the error be something unrelated to the check_output code in this PR?

@piskvorky
Copy link
Owner

piskvorky commented Jun 14, 2017

@kirit93 unless your system user is /home/ivan, that error log looks extremely fishy.

@kirit93
Copy link
Contributor Author

kirit93 commented Jun 14, 2017

@piskvorky, I ran the following command FT_HOME=/home/ivan/release/test/fastText WR_HOME=/home/ivan/release/test/wordrank VOWPAL_WABBIT_PATH=/home/ivan/release/test/vowpal_wabbit/vowpalwabbit/vw DTM_PATH=/home/ivan/release/test/dtm/dtm/main MALLET_HOME=/home/ivan/release/test/Mallet python setup.py test

@piskvorky
Copy link
Owner

@kirit93 Where did you get that command from, do those paths really exist?

@kirit93
Copy link
Contributor Author

kirit93 commented Jun 14, 2017

No they don't exist on my system, but @menshikh-iv asked me to run that command to test whether the PR works or not.

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Jun 14, 2017

@kirit93 It's a path in my filesystem.

For the full test run, you should:

  1. Install all external dependencies for wrappers (Fasttext, WordRank, WovpalVabbit, DTM, MALLET).
  2. Create a clean virtual environment.
  3. Install all test dependencies pip install .[test] (in gensim folder).
  4. Substitute the necessary paths to environment variables and run a command from your message with correct paths for your filesystem.

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Jun 14, 2017

@kirit93 I just ran all the tests on the development branch and made sure that everything works with the implementation from origin/develop.

@kirit93
Copy link
Contributor Author

kirit93 commented Jun 14, 2017

@menshikh-iv, any idea what could be the cause of this error?

@menshikh-iv
Copy link
Contributor

In my opinion, it is now easier to recreate PR and slightly change the code from develop branch.
Before that, please learn to run all tests for wrappers (instruction from previous post)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants