Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No error message, if htcondor grouped submission fails #194

Closed
cverstege opened this issue Nov 4, 2024 · 1 comment
Closed

No error message, if htcondor grouped submission fails #194

cverstege opened this issue Nov 4, 2024 · 1 comment
Assignees
Labels

Comments

@cverstege
Copy link
Contributor

I'm currently setting up submission at lxplus and I get non helpful error messegas like:

going to submit 11 htcondor job(s)
submission of htcondor job(s) '/afs/cern.ch/user/c/cversteg/jobs/tmp4etnzxid/htcondor_job.jdl' failed:

ERROR: [pid 2476791] Worker Worker(salt=8438222517, workers=1, host=lxplus945.cern.ch, username=cversteg, pid=2476791) failed    HerwigRun(effective_workflow=htcondor, branch=-1, campaign=Dijets_NLO_lowpt, mc_setting=NPoff, start_seed=100, number_of_jobs=1000, events_per_job=10000, setupfile=None, workflow=htcondor)
Traceback (most recent call last):
  File "/eos/user/c/cversteg/mc-run/luigi/luigi/worker.py", line 210, in run
    new_deps = self._run_get_new_deps()
  File "/eos/user/c/cversteg/mc-run/luigi/luigi/worker.py", line 138, in _run_get_new_deps
    task_gen = self.task.run()
  File "/eos/user/c/cversteg/mc-run/law/law/workflow/remote.py", line 628, in run
    return self._run_impl()
  File "/eos/user/c/cversteg/mc-run/law/law/workflow/remote.py", line 700, in _run_impl
    self.submit()
  File "/eos/user/c/cversteg/mc-run/law/law/workflow/remote.py", line 882, in submit
    job_ids, submission_data = self._submit_group(submit_jobs)
  File "/eos/user/c/cversteg/mc-run/law/law/contrib/htcondor/workflow.py", line 191, in _submit_group
    c, p = job_id.split(".")
AttributeError: 'Exception' object has no attribute 'split'
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   HerwigRun__1__Dijets_NLO_lowpt_9da4d0afa7   has status   FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time 
INFO: Worker Worker(salt=8438222517, workers=1, host=lxplus945.cern.ch, username=cversteg, pid=2476791) was stopped. Shutting down Keep-Alive thread

This is due to the htcondor submission failing. If I print out the error object, which is the job_id, I get the acutal htcondor error:

ERROR: Failed to commit job submission into the queue.
ERROR: EOS Submission is not currently supported by the HTCondor Service: https://batchdocs.web.cern.ch/troubleshooting/eos.html#no-eos-submission-allowed

If the job_id is not a string, but an Exception object, the error/excpetion should be printed, so that the user can debug the failing submission.

@cverstege
Copy link
Contributor Author

law is on the latest master (0aae7e6)

@riga riga closed this as completed in 9ac6dbc Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants