-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose swarming job errors and stacktraces when they fail #1815
Comments
..maybe an another issue, but shouldn't an error in a single swarming thread not-kill the whole swarming process? |
This seems to indicate that swarm is unable to locate the data file, try changing the data filename to a nonexistent one and you get the same error |
Any suggestions on how to rectify this? |
@lovekeshvig Sorry, been on vacation for the past week. Just catching up. This could be related to #1805. What happens if you do this?
|
@lovekeshvig ping? |
Actually, if I change the data file name in the
|
I can second this. I have tried to run the sine example with a brand new installation of nupic (via pip) on OS X 10.10, and all I'm seeing is the JSON parser error. I have added some debugging and it boils down to
|
I am getting the exact same one on ubuntu 14.02LTS. Reinstalled nupic and still get it. -------- Original message -------- From: Pascal Ehlert notifications@github.com Date:03/24/2015 18:51 (GMT+01:00) To: numenta/nupic nupic@noreply.github.com Subject: Re: [nupic] Expose swarming job errors and stacktraces when they fail
(#1815) ~/tmp/sine% python sine_experiment.py Results from all experiments:Generating experiment files in directory: /var/folders/41/c8y1r3yd2z50xk9fj4w1zmy40000gn/T/tmp4NLDtw... |
I can also confirm that this happens on Ubuntu 13.10. Installing 14.10 right now to check. |
Has probab more to do with json... is there older alternatives? Btw. Hey Pascal , regards from Pascal :P -------- Original message -------- From: Pascal Ehlert notifications@github.com Date:03/24/2015 19:05 (GMT+01:00) To: numenta/nupic nupic@noreply.github.com Cc: Pascal Weinberger passiweinberger@gmail.com Subject: Re: [nupic] Expose swarming job errors and stacktraces when they fail
(#1815) — |
Nope, it's most likely not a json issue. The place where it occurs is when it's parsing the individual swarm job results which are supposed to be in json (as I understand it). However instead of a json string, |
completionReason=None, completionMsg=None, workerCompletionReason=u'success', workerCompletionMsg=None, cancel=0, startTime=None, endTime=None, results=None, ... |
Guys, I think the issue you're having might have been fixed with #1902, this has not been included in a binary release, so in order to test you'll need to compile locally following the README instructions. |
This has indeed solved the problem, thank you! Would you mind giving a quick explanation on what happened here? |
Sure. We were using an old method of finding data files, which did not work when NuPIC was installed from a binary package because all search paths were dependent on environment variables. In the PR I linked above, I ripped all that out and put in place the standard python file packaging method, so that data files packaged within the binary installation can be found. |
Okay, little throwback here: I tried to run my swarm script again in a new shell and it failed. This only seems to work as long as the |
On Tue, Mar 24, 2015 at 8:24 PM, Pascal Ehlert notifications@github.com
|
I don't think users should need to set |
For everyone who stumbles across this, you can get swarming to work by setting the
If, like me, you try to run one of the examples out there (e.g. sine prediction or the gym tutorial) under 0.2.1, also note that support for relative file paths in the swarming spec is broken in that version. Use absolute paths instead and you should be fine. |
This is due to your permissions setting, set yourself as root using sudo -s On Tue, Mar 24, 2015 at 11:21 PM, Pascal Ehlert notifications@github.com
Lovekesh Vig |
Thanks guys. Sent from my MegaPhone
|
@lovekeshvig Thank you, but I cannot confirm this. Running as root without having the NUPIC variable set fails just like before. |
This should fix it: #1968 |
The swarming output is suppressed by assigning a temp file to the outputs here: https://github.com/numenta/nupic/blob/master/nupic/swarming/permutations_runner.py#L632 I am currently working on a better solution, let me know if you have any suggestions. |
any way to reliably reproduce this bug? |
@andrewmalta13 You might try changing a line of the input CSV later in the file to a different data type, like changing a |
@rhyolight just tried that on examples/opf/clients/hotgym/prediction/one_gym/rec-center-hourly.csv I changed on of the entries in "kw_energy_consumption" column to "oops" and ran swarm.py. I received the stack trace:
|
@andrewmalta13 So that did not replicate the problem. Try @lovekeshvig's suggestion above? |
Also reports the error as I would expect:
|
@rhyolight are you sure this issue hasn't been addressed? Perhaps by this PR: #2205 |
@andrewmalta13 It was reported on HTM Forum a month ago. |
Huh, strange. I guess I will keep trying to reproduce it. |
@andrewmalta13 If it gets too tedious, maybe just leave it alone until I get another report of the error, then we can both work with the user getting the error to try and replicate. |
👍 |
As reported initially in #1717, a JSON parse error is returned sometimes from swarming jobs that don't work. This stops the entire swarm and dumps an error like this:
This error is misleading because it looks like a JSON parsing error, but it's really because one of the swarm jobs failed and the swarming system is not extracting the error proplery from that job and displaying it to the user. The
jobInfo
object returned from the swarm job has noresults
object in this case, which is causing the error above.Instead, the program should report back the original error from the swarm job to the user instead of this worthless stacktrace.
The text was updated successfully, but these errors were encountered: