Job not running even though queue is displayed as empty
2 posters
Page 1 of 1
Job not running even though queue is displayed as empty
Heya,
I'm not sure if this qualifies as a bug, but here we go anyway: I (mabrocks / id 110) have set up a new job (id 6277) in all.q. From what I can see through the cluster status, the nodes in that queue are idling (i.e., no jobs are shown, and the nodes are indeed reporting a load of 0.0* in the qstat output). However, my job is not run. I've tried pausing / resuming the job, as well as deleting and re-uploading a new job. I've also tried to move the job to all2.q, where it was not executed as well. Any idea what is going on? I could successfully run things yesterday / this morning...
Marc
I'm not sure if this qualifies as a bug, but here we go anyway: I (mabrocks / id 110) have set up a new job (id 6277) in all.q. From what I can see through the cluster status, the nodes in that queue are idling (i.e., no jobs are shown, and the nodes are indeed reporting a load of 0.0* in the qstat output). However, my job is not run. I've tried pausing / resuming the job, as well as deleting and re-uploading a new job. I've also tried to move the job to all2.q, where it was not executed as well. Any idea what is going on? I could successfully run things yesterday / this morning...
Marc
mabrocks- Posts : 12
Join date : 2014-07-28
Re: Job not running even though queue is displayed as empty
Hi, Marc. Hmm, sorry you are having this problem. Let me investigate and see what's going on.
Aaron
Aaron
Re: Job not running even though queue is displayed as empty
Hi, Marc. Ok, due to a problem that it is going to take us a little while longer to get to the bottom of, StarExec was stalled out and not running anybody's jobs. We got it unstuck, and your job has completed now.
Sorry for the delay,
Aaron
Sorry for the delay,
Aaron
Re: Job not running even though queue is displayed as empty
We tracked down the bug, and will be deploying a fix tomorrow. Part of the problem was that since switching to running two jobs per node (thus doubling StarExec's compute power!), some defensive mechanisms we had in place for when job pair's results could not be reported back in to the database (which happened occasionally due to transient system issues, it seems) -- this mechanism was not enabled. That led to another issue being exposed, in our scheduling algorithm. These problems were not easy to find, but very easy to fix.
Aaron
Aaron
Re: Job not running even though queue is displayed as empty
Hi Aaron,
thanks for tracking this down and taking care of the system!
Marc
thanks for tracking this down and taking care of the system!
Marc
mabrocks- Posts : 12
Join date : 2014-07-28
Re: Job not running even though queue is displayed as empty
The fix has been deployed now, so hopefully this won't happen again.
Aaron
Aaron
Similar topics
» Empty worker queue on job submission
» queue not fully used
» deleting all jobs / emptying a community's queue
» bug running job
» Jobs not running, but cluster appears idle?
» queue not fully used
» deleting all jobs / emptying a community's queue
» bug running job
» Jobs not running, but cluster appears idle?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|