| T O P I C R E V I E W |
| petty |
Posted - Nov 10 2011 : 2:16:29 PM Duke OIT is performing power maintenance in the data center on our row this coming week.
As a result, the cluster nodes will be OFF two nights: Wed 11/16, Thu 11/17
They will be shutdown by 10pm 5:30pm and will be brought back up each day when maintenance is finished.
In other words, they won't be off for 2 days straight, but they will be down two consecutive nights.
As a result, any existing jobs will be killed when they are turned off and they will be inaccessible during the downtime.
|
| 7 L A T E S T R E P L I E S (Newest First) |
| dvsmith |
Posted - Nov 18 2011 : 09:59:26 AM OK, thanks. Sorry for the inconvenience. |
| petty |
Posted - Nov 18 2011 : 09:54:32 AM right ... i have to go physically push the buttons on the down nodes due to a "feature ( bug? )" with the remote power management.
if you do "qhost" ... the nodes that are up won't have any blanks. |
| dvsmith |
Posted - Nov 18 2011 : 09:52:24 AM Ah ok... If I go ahead and restart my jobs, will the grid just avoid those nodes or will half my jobs get stuck in the queue (and/or fail)? |
| syam.gadde |
Posted - Nov 18 2011 : 09:48:42 AM Due to some wierdness, the half of the cluster that was affected by the power outage is still down for the time being. |
| dvsmith |
Posted - Nov 18 2011 : 09:43:32 AM Are all of the nodes back up now? I tried submitting to each node, but half of my jobs are stuck in the queue. |
| petty |
Posted - Nov 11 2011 : 3:14:28 PM If they are on the head node then they won't die.
If all goes as planned the head node will stay on.
|
| dvsmith |
Posted - Nov 11 2011 : 12:01:34 PM Hey Chris,
I assume this will also kill daemon processes that are submitting jobs, but can you confirm whether that is the case? I can either pause or let the power cut off kill my daemons and then resume after the maintenance, but it would definitely be easier to just pause them each night (if that's feasible).
Thanks! David |