Jobs not running, but cluster appears idle?
2 posters
Page 1 of 1
Jobs not running, but cluster appears idle?
Hi.
I have jobs
* 6843 .. 6845 (on all2.q)
* 6846 .. 6848 (on all.q)
that are "incomplete" but make no progress.
"cluster status" shows "no data in table" which probably means the nodes are idle.
I checked (for 6848) that "pause/resume" did not change anything.
I need the jobs on all2.q (they are huge). The others on all.q I made just for testing.
(I would have used all.q from the start but I did not want to disturb other jobs running there.)
Oh yes and I have a deadline. But I just should have started this earlier...
- J.
I have jobs
* 6843 .. 6845 (on all2.q)
* 6846 .. 6848 (on all.q)
that are "incomplete" but make no progress.
"cluster status" shows "no data in table" which probably means the nodes are idle.
I checked (for 6848) that "pause/resume" did not change anything.
I need the jobs on all2.q (they are huge). The others on all.q I made just for testing.
(I would have used all.q from the start but I did not want to disturb other jobs running there.)
Oh yes and I have a deadline. But I just should have started this earlier...
- J.
j.waldmann- Posts : 84
Join date : 2014-04-26
Re: Jobs not running, but cluster appears idle?
Sorry for the problems, Johannes. I will investigate in an hour or so. You can always get the definitive view of what is happening on the compute nodes from the "qstat output" on the cluster status page. I agree it does look like the cluster is mostly idle.
The change to solver pipelines has been a major one, and we have broken some things along the way. Just for your planning, we are planning to redeploy next week to fix a number of small bugs here and there. Hopefully we can find and fix the issue you are seeing by Monday.
Aaron
The change to solver pipelines has been a major one, and we have broken some things along the way. Just for your planning, we are planning to redeploy next week to fix a number of small bugs here and there. Hopefully we can find and fix the issue you are seeing by Monday.
Aaron
Re: Jobs not running, but cluster appears idle?
Ok, I found the problem. The directory where we place our job scripts before submitting them to grid engine has >600k files in it, and we are getting errors from the system trying to write more files there. We are supposed to be deleting those files after the job script starts, but maybe we are not. Anyway, I started a shell command that should hopefully delete most of these. It will take a little while to run, since there are so many files!
Aaron
Aaron
Re: Jobs not running, but cluster appears idle?
Hi Aaron, thanks for the quick reaction. Everything is looking good now. - J.
j.waldmann- Posts : 84
Join date : 2014-04-26
Similar topics
» Bug running jobs
» jobs maybe not running
» Cluster down
» cluster status : internal error populating table
» bug running job
» jobs maybe not running
» Cluster down
» cluster status : internal error populating table
» bug running job
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum