| Author |
Topic  |
|
lh115
New Member

USA
15 Posts |
Posted - Dec 23 2009 : 8:26:10 PM
|
Hi,
I'm having trouble accessing qinteract. I had been logged into qinteract from earlier in the day. my windows froze so I closed them out. When I type 'qinteract' at the head node I get: "Your "qrsh" request could not be scheduled, try again later."
When I type qstatall, it lists my user name as still running two jobs at node4 even though I exited out of those jobs (so I would expect errors at the waiting queue.)
I don't if anyone else is having trouble.
Thanks, Lars |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Dec 23 2009 : 9:45:47 PM
|
Something happened to node4 late this afternoon and its hung and unreachable via ssh. Not really sure what happened since i can't get to any of the logs.
syam has access to a switch that should allow him to reboot it remotely tomorrow from biac, if that doesn't work someone will have to manually reset it.
All the other nodes are running correctly, so non-interactive jobs shouldn't have any issues running. |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Dec 24 2009 : 09:29:07 AM
|
Node4 and the head node are back up.
There were users running FSL jobs on node4, which sucked up all the available memory and caused the machine to hang.
Node4 should only be used for prepping your jobs to submit to the actual gridengine and visualization, since there are only 4 available processors for everyone to share. |
 |
|
|
gunes
BIAC Alum
 
45 Posts |
Posted - Dec 28 2009 : 12:35:52 PM
|
I cannot access to NODE4. WHEN I WROTE QINTERACT, IT SAYS:
Your "qrsh" request could not be scheduled, try again later.
Any suggestion. I need to run a few analyses today. I will appreciate if you can help me.
Thanks.
Gunes
|
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Dec 28 2009 : 1:05:53 PM
|
| Node4 is hung again, likely due to some memory intensive processes. When they are finished, it should be reachable .. otherwise someone will probably have to do a manual reboot from the data center. |
 |
|
|
gunes
BIAC Alum
 
45 Posts |
Posted - Dec 28 2009 : 1:13:52 PM
|
It has been like this since Lars posted on Wednesday. We will wait for you or Syam to come back to work. Happy Holidays! :) |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Dec 29 2009 : 12:19:13 PM
|
OK folks, node4 is back up after a trip over to the data center for a reset.
There was an eventstats process, and a flirt process that used all available memory on the node.
The logs indicated that the machine killed the jobs, but then it did not recover cleanly.
-chris |
 |
|
|
gunes
BIAC Alum
 
45 Posts |
Posted - Dec 29 2009 : 12:20:33 PM
|
| Great! Thank you. |
 |
|
|
clithero
Junior Member
 
37 Posts |
Posted - Jan 17 2010 : 12:45:07 PM
|
I've been getting this message again while trying to start "qinteract":
Your "qrsh" request could not be scheduled, try again later.
I know that sometimes waiting for a while helps, but I've tried from several different terminals for the past few hours with no success.
Thanks, John |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Jan 17 2010 : 2:52:23 PM
|
| the machine is hung again, when did this start happening to you? |
 |
|
|
clithero
Junior Member
 
37 Posts |
Posted - Jan 17 2010 : 2:56:19 PM
|
| I think my first attempt was at about 11 or so. |
 |
|
|
diaz
BIAC Alum
    
USA
212 Posts |
Posted - Jan 18 2010 : 11:01:23 AM
|
Looks like it's still hung - I'm getting the same error "Your 'qrsh' request could not be scheduled, try again later."
|
Michele T. Diaz, Ph.D. Associate Director Brain Imaging and Analysis Center |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Jan 18 2010 : 3:56:24 PM
|
I went to the data center to reset the machine and look through the logs, so its currently back up.
Someone was running eventstats on the interactive node again, which caused the freeze. Please only use this node for non-memory intensive processes, (ie: visualizing results, configuring other jobs, etc).
If its a job that can be submitted to the grid-engine, then please set that up. This node's environment is not set-up for a bunch of users to login and just run analyses at will. The other nodes are restricted to take only 4 jobs at a time (one for each processor) because there is adequate memory to go around. |
 |
|
|
gunes
BIAC Alum
 
45 Posts |
Posted - Jan 18 2010 : 4:01:40 PM
|
Thank you, Chris. You are awesome! :) Now I can finish my fsl analysis.
|
 |
|
|
clithero
Junior Member
 
37 Posts |
Posted - Jan 18 2010 : 4:04:49 PM
|
Hi Chris, Great, thanks very much for fixing that.
Is there any way to have something automated that will kick users off for running such jobs? There obviously is not an effective incentive structure in place to prevent this.
Thanks again for restarting the node! |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Jan 18 2010 : 4:15:05 PM
|
there is a way to restrict based on CPU usage and time on the queue. When this was tried before however everything failed.
I supposed i could write a daemon that just kills any process based on keywords in a list, or just not put certain programs on there .. clearly hoping for common courtesy isn't cutting it.
|
 |
|
Topic  |
|