-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue-1310 #1332
Fix issue-1310 #1332
Conversation
please add logging before every check_output call |
Done |
with smart_open(input_fname, 'rb') as r: | ||
with smart_open(output_fname, 'wb') as w: | ||
utils.check_output(w, args=command, stdin=r) | ||
|
||
logger.info("Delete frequencies from vocab file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a logging with cmd_del_vocab_freq
like line 103
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be necessary to add logging arguments to the function check_output
. Then you will not need to add logging to it each time and some questions from the mailing list will be easier to debug, @tmylk what do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's add a utils logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be great
@parulsethi could you please add logging parameters to |
gensim/utils.py
Outdated
@@ -1164,11 +1164,12 @@ def check_output(stdout=subprocess.PIPE, *popenargs, **kwargs): | |||
Added extra KeyboardInterrupt handling | |||
""" | |||
try: | |||
cmd = kwargs.get("args") | |||
logger.info("COMMAND: %s", cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if some specific information needs to be logged for a command, then the log statement can be specified in the code there and this will just make sure to log the command that has to be run, for debugging purposes. WDYT?
Or should a parameter for log string be provided so that the command specific info is also logged from check_output only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You should use
logger.debug
here (because it's needed for debug purposes only) - Need to logging all input parameters (
popenargs
andkwargs
) - I think now is not necessary to logging some "command specific" data (based on troubles from our users in mailing list), logging all arguments will be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made suggested changes, and removed logging command(cmd
) at logger.info
level
Looks good, thank you @parulsethi 👍 |
@@ -1164,6 +1164,7 @@ def check_output(stdout=subprocess.PIPE, *popenargs, **kwargs): | |||
Added extra KeyboardInterrupt handling | |||
""" | |||
try: | |||
logger.debug("COMMAND: %s %s", str(popenargs), str(kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%s
= str
, no need to call that function explicitly.
sgd_num=100, lrate=0.001, period=10, iter=90, epsilon=0.75, dump_period=10, reg=0, alpha=100, | ||
beta=99, loss='hinge', memory=4.0, cleanup_files=True, sorted_vocab=1, ensemble=0): | ||
""" | ||
`wr_path` is the path to the Wordrank directory. | ||
`corpus_file` is the filename of the text file to be used for training the Wordrank model. | ||
Expects file to contain space-separated tokens in a single line | ||
`out_path` is the path to directory which will be created to save embeddings and training data. | ||
`out_name` is name of the directory which will be created(in wordrank folder) to save embeddings and training data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space before bracket (
.
input_fnames = [corpus_file.split('/')[-1], corpus_file.split('/')[-1], cooccurrence_file] | ||
output_fnames = [temp_vocab_file, cooccurrence_file, cooccurrence_shuf_file] | ||
|
||
logger.info("Prepare training data using glove code") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to make info
messages more informative: what data? How much data?
If it's not meant to be seen by end users (internal opaque debug message), it's better to use debug
.
for command, input_fname, output_fname in zip(commands, input_fnames, output_fnames): | ||
with smart_open(input_fname, 'rb') as r: | ||
with smart_open(output_fname, 'wb') as w: | ||
utils.check_output(w, args=command, stdin=r) | ||
|
||
logger.info("Delete frequencies from vocab file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
= imperative. Is that intented? Should the user delete something?
Probably meant to be "deleting" or "will delete" instead.
Renamed
out_path
. Also, shifted a log line just above its corresponding command execution.