Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password   Forgot your Password?
 All Forums
 Support Forums
 Cluster Support
 Scheduled Downtime - 11/16-17
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

petty
BIAC Staff

USA
453 Posts

Posted - Nov 10 2011 :  2:16:29 PM  Show Profile  Reply with Quote
Duke OIT is performing power maintenance in the data center on our row this coming week.

As a result, the cluster nodes will be OFF two nights:
Wed 11/16, Thu 11/17

They will be shutdown by 10pm 5:30pm and will be brought back up each day when maintenance is finished.

In other words, they won't be off for 2 days straight, but they will be down two consecutive nights.

As a result, any existing jobs will be killed when they are turned off and they will be inaccessible during the downtime.



Edited by - petty on Nov 16 2011 10:28:16 AM

dvsmith
Advanced Member

USA
218 Posts

Posted - Nov 11 2011 :  12:01:34 PM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
Hey Chris,

I assume this will also kill daemon processes that are submitting jobs, but can you confirm whether that is the case? I can either pause or let the power cut off kill my daemons and then resume after the maintenance, but it would definitely be easier to just pause them each night (if that's feasible).

Thanks!
David
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Nov 11 2011 :  3:14:28 PM  Show Profile  Reply with Quote
If they are on the head node then they won't die.

If all goes as planned the head node will stay on.



Go to Top of Page

dvsmith
Advanced Member

USA
218 Posts

Posted - Nov 18 2011 :  09:43:32 AM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
Are all of the nodes back up now? I tried submitting to each node, but half of my jobs are stuck in the queue.
Go to Top of Page

syam.gadde
BIAC Staff

USA
421 Posts

Posted - Nov 18 2011 :  09:48:42 AM  Show Profile  Reply with Quote
Due to some wierdness, the half of the cluster that was affected by the power outage is still down for the time being.
Go to Top of Page

dvsmith
Advanced Member

USA
218 Posts

Posted - Nov 18 2011 :  09:52:24 AM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
Ah ok... If I go ahead and restart my jobs, will the grid just avoid those nodes or will half my jobs get stuck in the queue (and/or fail)?
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Nov 18 2011 :  09:54:32 AM  Show Profile  Reply with Quote
right ... i have to go physically push the buttons on the down nodes due to a "feature ( bug? )" with the remote power management.

if you do "qhost" ... the nodes that are up won't have any blanks.
Go to Top of Page

dvsmith
Advanced Member

USA
218 Posts

Posted - Nov 18 2011 :  09:59:26 AM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
OK, thanks. Sorry for the inconvenience.
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.54 seconds. Snitz Forums 2000