Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password   Forgot your Password?
 All Forums
 Support Forums
 Cluster Support
 qinteract problems
 New Topic  Reply to Topic
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 2

lh115
New Member

USA
15 Posts

Posted - Dec 23 2009 :  8:26:10 PM  Show Profile  Reply with Quote
Hi,

I'm having trouble accessing qinteract. I had been logged into qinteract from earlier in the day. my windows froze so I closed them out. When I type 'qinteract' at the head node I get:
"Your "qrsh" request could not be scheduled, try again later."

When I type qstatall, it lists my user name as still running two jobs at node4 even though I exited out of those jobs (so I would expect errors at the waiting queue.)

I don't if anyone else is having trouble.

Thanks,
Lars

petty
BIAC Staff

USA
453 Posts

Posted - Dec 23 2009 :  9:45:47 PM  Show Profile  Reply with Quote
Something happened to node4 late this afternoon and its hung and unreachable via ssh. Not really sure what happened since i can't get to any of the logs.

syam has access to a switch that should allow him to reboot it remotely tomorrow from biac, if that doesn't work someone will have to manually reset it.

All the other nodes are running correctly, so non-interactive jobs shouldn't have any issues running.
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Dec 24 2009 :  09:29:07 AM  Show Profile  Reply with Quote
Node4 and the head node are back up.

There were users running FSL jobs on node4, which sucked up all the available memory and caused the machine to hang.

Node4 should only be used for prepping your jobs to submit to the actual gridengine and visualization, since there are only 4 available processors for everyone to share.
Go to Top of Page

gunes
BIAC Alum

45 Posts

Posted - Dec 28 2009 :  12:35:52 PM  Show Profile  Reply with Quote
I cannot access to NODE4.
WHEN I WROTE QINTERACT, IT SAYS:

Your "qrsh" request could not be scheduled, try again later.

Any suggestion. I need to run a few analyses today. I will appreciate if you can help me.

Thanks.

Gunes

Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Dec 28 2009 :  1:05:53 PM  Show Profile  Reply with Quote
Node4 is hung again, likely due to some memory intensive processes. When they are finished, it should be reachable .. otherwise someone will probably have to do a manual reboot from the data center.
Go to Top of Page

gunes
BIAC Alum

45 Posts

Posted - Dec 28 2009 :  1:13:52 PM  Show Profile  Reply with Quote
It has been like this since Lars posted on Wednesday. We will wait for you or Syam to come back to work.
Happy Holidays!
:)
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Dec 29 2009 :  12:19:13 PM  Show Profile  Reply with Quote
OK folks, node4 is back up after a trip over to the data center for a reset.

There was an eventstats process, and a flirt process that used all available memory on the node.

The logs indicated that the machine killed the jobs, but then it did not recover cleanly.

-chris
Go to Top of Page

gunes
BIAC Alum

45 Posts

Posted - Dec 29 2009 :  12:20:33 PM  Show Profile  Reply with Quote
Great! Thank you.
Go to Top of Page

clithero
Junior Member

37 Posts

Posted - Jan 17 2010 :  12:45:07 PM  Show Profile  Reply with Quote
I've been getting this message again while trying to start "qinteract":

Your "qrsh" request could not be scheduled, try again later.

I know that sometimes waiting for a while helps, but I've tried from several different terminals for the past few hours with no success.

Thanks,
John
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Jan 17 2010 :  2:52:23 PM  Show Profile  Reply with Quote
the machine is hung again, when did this start happening to you?
Go to Top of Page

clithero
Junior Member

37 Posts

Posted - Jan 17 2010 :  2:56:19 PM  Show Profile  Reply with Quote
I think my first attempt was at about 11 or so.
Go to Top of Page

diaz
BIAC Alum

USA
212 Posts

Posted - Jan 18 2010 :  11:01:23 AM  Show Profile  Reply with Quote
Looks like it's still hung - I'm getting the same error "Your 'qrsh' request could not be scheduled, try again later."

Michele T. Diaz, Ph.D.
Associate Director
Brain Imaging and Analysis Center
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Jan 18 2010 :  3:56:24 PM  Show Profile  Reply with Quote
I went to the data center to reset the machine and look through the logs, so its currently back up.

Someone was running eventstats on the interactive node again, which caused the freeze. Please only use this node for non-memory intensive processes, (ie: visualizing results, configuring other jobs, etc).

If its a job that can be submitted to the grid-engine, then please set that up. This node's environment is not set-up for a bunch of users to login and just run analyses at will. The other nodes are restricted to take only 4 jobs at a time (one for each processor) because there is adequate memory to go around.
Go to Top of Page

gunes
BIAC Alum

45 Posts

Posted - Jan 18 2010 :  4:01:40 PM  Show Profile  Reply with Quote
Thank you, Chris. You are awesome! :) Now I can finish my fsl analysis.
Go to Top of Page

clithero
Junior Member

37 Posts

Posted - Jan 18 2010 :  4:04:49 PM  Show Profile  Reply with Quote
Hi Chris,
Great, thanks very much for fixing that.

Is there any way to have something automated that will kick users off for running such jobs? There obviously is not an effective incentive structure in place to prevent this.

Thanks again for restarting the node!
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Jan 18 2010 :  4:15:05 PM  Show Profile  Reply with Quote
there is a way to restrict based on CPU usage and time on the queue. When this was tried before however everything failed.

I supposed i could write a daemon that just kills any process based on keywords in a list, or just not put certain programs on there .. clearly hoping for common courtesy isn't cutting it.
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Next Page
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.44 seconds. Snitz Forums 2000