StarExec
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Job not running even though queue is displayed as empty

2 posters

Go down

Job not running even though queue is displayed as empty Empty Job not running even though queue is displayed as empty

Post by mabrocks Tue Jan 13, 2015 12:02 pm

Heya,

I'm not sure if this qualifies as a bug, but here we go anyway: I (mabrocks / id 110) have set up a new job (id 6277) in all.q. From what I can see through the cluster status, the nodes in that queue are idling (i.e., no jobs are shown, and the nodes are indeed reporting a load of 0.0* in the qstat output). However, my job is not run. I've tried pausing / resuming the job, as well as deleting and re-uploading a new job. I've also tried to move the job to all2.q, where it was not executed as well. Any idea what is going on? I could successfully run things yesterday / this morning...

Marc

mabrocks

Posts : 12
Join date : 2014-07-28

Back to top Go down

Job not running even though queue is displayed as empty Empty Re: Job not running even though queue is displayed as empty

Post by Admin Wed Jan 14, 2015 3:57 pm

Hi, Marc. Hmm, sorry you are having this problem. Let me investigate and see what's going on.
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Job not running even though queue is displayed as empty Empty Re: Job not running even though queue is displayed as empty

Post by Admin Wed Jan 14, 2015 6:53 pm

Hi, Marc. Ok, due to a problem that it is going to take us a little while longer to get to the bottom of, StarExec was stalled out and not running anybody's jobs. We got it unstuck, and your job has completed now.
Sorry for the delay,
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Job not running even though queue is displayed as empty Empty Re: Job not running even though queue is displayed as empty

Post by Admin Wed Jan 14, 2015 10:20 pm

We tracked down the bug, and will be deploying a fix tomorrow. Part of the problem was that since switching to running two jobs per node (thus doubling StarExec's compute power!), some defensive mechanisms we had in place for when job pair's results could not be reported back in to the database (which happened occasionally due to transient system issues, it seems) -- this mechanism was not enabled. That led to another issue being exposed, in our scheduling algorithm. These problems were not easy to find, but very easy to fix.
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Job not running even though queue is displayed as empty Empty Re: Job not running even though queue is displayed as empty

Post by mabrocks Thu Jan 15, 2015 5:04 am

Hi Aaron,

thanks for tracking this down and taking care of the system!

Marc

mabrocks

Posts : 12
Join date : 2014-07-28

Back to top Go down

Job not running even though queue is displayed as empty Empty Re: Job not running even though queue is displayed as empty

Post by Admin Thu Jan 15, 2015 5:03 pm

The fix has been deployed now, so hopefully this won't happen again.
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Job not running even though queue is displayed as empty Empty Re: Job not running even though queue is displayed as empty

Post by Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum