StarExec
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Jobs not running, but cluster appears idle?

2 posters

Go down

Jobs not running, but cluster appears idle? Empty Jobs not running, but cluster appears idle?

Post by j.waldmann Sat Mar 21, 2015 5:26 pm

Hi.

I have jobs

* 6843 .. 6845 (on all2.q)
* 6846 .. 6848 (on all.q)

that are "incomplete" but make no progress.

"cluster status" shows "no data in table" which probably means the nodes are idle.

I checked (for 6848) that "pause/resume" did not change anything.

I need the jobs on all2.q (they are huge). The others on all.q I made just for testing.
(I would have used all.q from the start but I did not want to disturb other jobs running there.)

Oh yes and I have a deadline. But I just should have started this earlier...

- J.

j.waldmann

Posts : 84
Join date : 2014-04-26

Back to top Go down

Jobs not running, but cluster appears idle? Empty Re: Jobs not running, but cluster appears idle?

Post by Admin Sat Mar 21, 2015 7:21 pm

Sorry for the problems, Johannes. I will investigate in an hour or so. You can always get the definitive view of what is happening on the compute nodes from the "qstat output" on the cluster status page. I agree it does look like the cluster is mostly idle.

The change to solver pipelines has been a major one, and we have broken some things along the way. Just for your planning, we are planning to redeploy next week to fix a number of small bugs here and there. Hopefully we can find and fix the issue you are seeing by Monday.
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Jobs not running, but cluster appears idle? Empty Re: Jobs not running, but cluster appears idle?

Post by Admin Sat Mar 21, 2015 9:03 pm

Ok, I found the problem. The directory where we place our job scripts before submitting them to grid engine has >600k files in it, and we are getting errors from the system trying to write more files there. We are supposed to be deleting those files after the job script starts, but maybe we are not. Anyway, I started a shell command that should hopefully delete most of these. It will take a little while to run, since there are so many files!
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Jobs not running, but cluster appears idle? Empty Re: Jobs not running, but cluster appears idle?

Post by j.waldmann Sat Mar 21, 2015 10:20 pm

Hi Aaron, thanks for the quick reaction. Everything is looking good now. - J.

j.waldmann

Posts : 84
Join date : 2014-04-26

Back to top Go down

Jobs not running, but cluster appears idle? Empty Re: Jobs not running, but cluster appears idle?

Post by Admin Sun Mar 22, 2015 7:26 am

Great!
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Jobs not running, but cluster appears idle? Empty Re: Jobs not running, but cluster appears idle?

Post by Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum