StarExec
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Job pairs failing

2 posters

Go down

Job pairs failing Empty Job pairs failing

Post by lianah Mon Feb 23, 2015 10:01 am

I am running a job on StarExec, and I am having trouble debugging my job submission. I am getting a lot of job-pair failures with the following statuses "benchmark error (12)" (pair id 52343076) and "run script error (11)"(pair id 52345975). The output log generated is empty and I can run the benchmarks on my machine without any issues. I am not sure if the problem is on the StarExec side or if I have setup my script incorrectly for StarExec. (For the star exec admins, the job id is 6709 if you want to look into this.) Does anybody know what "benchmark error (12)" and "run script error (11)" mean, and what are the likely causes of these? Any suggestions for what I can do to debug these kinds of errors?

lianah

Posts : 11
Join date : 2014-04-23

Back to top Go down

Job pairs failing Empty Re: Job pairs failing

Post by Admin Mon Feb 23, 2015 11:14 am

I'm looking into it, Liana, and will let you know as soon as I figure out what the problem is.
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Job pairs failing Empty Re: Job pairs failing

Post by lianah Mon Feb 23, 2015 11:31 am

Thank you for the quick reply. I tried a couple other jobs and I wonder if the problem is not related to all2.queue. I ran test jobs on long.queue and all1.queue with a small timeout and have not seen these error messages so far.The StarExec log for some of the failed jobs also suggests that maybe something is not mounted correctly:
"02/23/15 08:02:23 AM CST: WORKING_DIR is /export/starexec/sandbox2
chown: cannot access `/export/starexec/sandbox2': No such file or directory"
I will use the all1.queue for now.

lianah

Posts : 11
Join date : 2014-04-23

Back to top Go down

Job pairs failing Empty Re: Job pairs failing

Post by Admin Mon Feb 23, 2015 11:35 am

Right. The problem looks like a missing sandbox2 directory. I had not noticed that the errors are localized to nodes on all2.q (so thanks for pointing that out). But some all2.q nodes have sandbox2 directories. We'll look into it. If running on all.q is a workaround, then please do that for now.
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Job pairs failing Empty Re: Job pairs failing

Post by Admin Mon Feb 23, 2015 6:47 pm

Just an update: it looks like somehow one of the local sandbox or sandbox2 directories is getting blown away when a solver is running on a compute node. Our scripts are not recreating this directory after the solver runs, and the solver does have the power to destroy that directory (not that they should do so) when they run, since that directory is chown'ed to the sandbox or sandbox2 user before the solver executes. I do not know if anyone's solver is (accidentally) doing this, but that could be the explanation for the missing directories.

We should have a fix out tomorrow that simply checks for this directory and creates it if it is missing.

Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Job pairs failing Empty Re: Job pairs failing

Post by Admin Tue Feb 24, 2015 3:04 pm

We just deployed a fix for this this morning, so you should not see pairs failing for this reason now (let me know if you still do).
Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Job pairs failing Empty Re: Job pairs failing

Post by Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum