| Author |
Topic  |
|
|
clithero
Junior Member
 
37 Posts |
Posted - May 25 2008 : 1:27:29 PM
|
Hi, I am trying to run some FSL jobs (first level FEAT and flirt) using scripts that had previously worked. It seems they either failed or started hanging. Killing jobs from some nodes (node 9) seemed to work, but all the jobs I had running on nodes 8 and 6 are remaining "stalled" from qstatall. I would be grateful for some insight. Thanks! John |
|
|
josh.bizzell
BIAC Staff
   
USA
118 Posts |
Posted - May 27 2008 : 10:24:47 AM
|
Is this still an issue? If so, can you provide job IDs and the node the job was submitted to?
-Josh |
 |
|
|
dvsmith
Advanced Member
    
USA
218 Posts |
Posted - Jul 20 2008 : 1:19:01 PM
|
I'm having the same problem that John was having. My jobs are stalled and wasting space on the cluster. Their current status is "dr". How can I get rid of them?
They're all on node5, so maybe there's just a problem with that node?
The job IDs are below: 422292 422294 422295 422297
Thanks, David
|
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Jul 20 2008 : 6:50:38 PM
|
david you can kill your jobs with the "qdel" function.
ie: qdel 422292 |
 |
|
|
dvsmith
Advanced Member
    
USA
218 Posts |
Posted - Jul 20 2008 : 10:52:45 PM
|
no... that's what got them in the stalled state.
when i type ps -u smith , i do not see any processes that seem to be supporting the hung jobs on the head node, so i don't think i can kill them from there. i also can't ssh onto node5 to kill any aberrant processes there, which is where they are all stuck. |
 |
|
|
dvsmith
Advanced Member
    
USA
218 Posts |
Posted - Jul 21 2008 : 1:37:04 PM
|
| So what was the deal with node5? My stalled out jobs are gone, so I assume Josh another cluster admin killed them? |
 |
|
|
josh.bizzell
BIAC Staff
   
USA
118 Posts |
Posted - Jul 21 2008 : 1:39:37 PM
|
Node 5 was in a hung state, so it needed to be rebooted, which was done this morning and killed all of your jobs.
I'm testing some things out, and once all of those tests pass, I'll get node 5 up and running to the users queue.
- Josh |
 |
|
|
josh.bizzell
BIAC Staff
   
USA
118 Posts |
Posted - Jul 21 2008 : 4:49:32 PM
|
Node 5 has been tested and is up and running.
-Josh |
 |
|
|
clithero
Junior Member
 
37 Posts |
Posted - Sep 04 2008 : 10:20:37 AM
|
I think a couple of jobs randomly stalled last night on node7. Any ideas for what happened?
Two from me (both LIBSVM jobs). Jobs 493918 and 493920. Vinod also had one, but I am not sure which node. They are still sitting on qstatall.
Thanks. |
 |
|
|
josh.bizzell
BIAC Staff
   
USA
118 Posts |
Posted - Sep 04 2008 : 10:26:19 AM
|
It looks like node7 has crashed. You are correct about the jobs that are hung. You and Vinod will need to resubmit the jobs. I'll try to get node7 up and running as soon as possible.
Sorry for any inconvenience, Josh |
 |
|
|
josh.bizzell
BIAC Staff
   
USA
118 Posts |
Posted - Sep 05 2008 : 10:07:10 AM
|
Node7 should be up and running again. Please let us know if you experience any problems.
-Josh |
 |
|
| |
Topic  |
|