Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
 All Forums
 Support Forums
 Cluster Support
 Cluster down?

Note: You must be registered in order to post a reply.
To register, click here. Registration is FREE!

Screensize:
UserName:
Password:
Format Mode:
Format: BoldItalicizedUnderlineStrikethrough Align LeftCenteredAlign Right Horizontal Rule Insert HyperlinkInsert EmailInsert Image Insert CodeInsert QuoteInsert List
   
Message:

* HTML is OFF
* Forum Code is ON
Smilies
Smile [:)] Big Smile [:D] Cool [8D] Blush [:I]
Tongue [:P] Evil [):] Wink [;)] Clown [:o)]
Black Eye [B)] Eight Ball [8] Frown [:(] Shy [8)]
Shocked [:0] Angry [:(!] Dead [xx(] Sleepy [|)]
Kisses [:X] Approve [^] Disapprove [V] Question [?]

 
Check here to subscribe to this topic.
   

T O P I C    R E V I E W
clithero Posted - May 25 2008 : 1:27:29 PM
Hi,
I am trying to run some FSL jobs (first level FEAT and flirt) using scripts that had previously worked. It seems they either failed or started hanging. Killing jobs from some nodes (node 9) seemed to work, but all the jobs I had running on nodes 8 and 6 are remaining "stalled" from qstatall.
I would be grateful for some insight.
Thanks!
John
10   L A T E S T    R E P L I E S    (Newest First)
josh.bizzell Posted - Sep 05 2008 : 10:07:10 AM
Node7 should be up and running again. Please let us know if you experience any problems.

-Josh
josh.bizzell Posted - Sep 04 2008 : 10:26:19 AM
It looks like node7 has crashed. You are correct about the jobs that are hung. You and Vinod will need to resubmit the jobs. I'll try to get node7 up and running as soon as possible.

Sorry for any inconvenience,
Josh
clithero Posted - Sep 04 2008 : 10:20:37 AM
I think a couple of jobs randomly stalled last night on node7. Any ideas for what happened?

Two from me (both LIBSVM jobs). Jobs 493918 and 493920. Vinod also had one, but I am not sure which node. They are still sitting on qstatall.

Thanks.
josh.bizzell Posted - Jul 21 2008 : 4:49:32 PM
Node 5 has been tested and is up and running.

-Josh
josh.bizzell Posted - Jul 21 2008 : 1:39:37 PM
Node 5 was in a hung state, so it needed to be rebooted, which was done this morning and killed all of your jobs.

I'm testing some things out, and once all of those tests pass, I'll get node 5 up and running to the users queue.

- Josh
dvsmith Posted - Jul 21 2008 : 1:37:04 PM
So what was the deal with node5? My stalled out jobs are gone, so I assume Josh another cluster admin killed them?
dvsmith Posted - Jul 20 2008 : 10:52:45 PM
no... that's what got them in the stalled state.

when i type
ps -u smith
, i do not see any processes that seem to be supporting the hung jobs on the head node, so i don't think i can kill them from there. i also can't ssh onto node5 to kill any aberrant processes there, which is where they are all stuck.
petty Posted - Jul 20 2008 : 6:50:38 PM
david you can kill your jobs with the "qdel" function.

ie: qdel 422292
dvsmith Posted - Jul 20 2008 : 1:19:01 PM
I'm having the same problem that John was having. My jobs are stalled and wasting space on the cluster. Their current status is "dr". How can I get rid of them?

They're all on node5, so maybe there's just a problem with that node?

The job IDs are below:
422292
422294
422295
422297

Thanks,
David

josh.bizzell Posted - May 27 2008 : 10:24:47 AM
Is this still an issue? If so, can you provide job IDs and the node the job was submitted to?

-Josh

BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.28 seconds. Snitz Forums 2000