| Author |
Topic  |
|
|
morey
BIAC Faculty
 
USA
25 Posts |
Posted - Nov 10 2010 : 2:35:57 PM
|
We are running a TBSS crossing fibers program (tbbss_x) and running out of memory on the cluster which is proabably understandable given the extraordinarily large RAM + Swap requirements according to the FSL designers. See the email thread below. Do you have any thoughts on this ?
Raj
-----Original Message----- From: Rajendra Morey Sent: Wednesday, November 10, 2010 2:32 PM To: Courtney Haswell Subject: RE: Tbss_x
Hi Courtney :
I believe the only thing we tried was running it on the grid engine at Yale. This gave the identical error as the BIAC cluster. This was a few days before Liz was leaving and there were other project that were more pressing so nothing more was done.
40 GB of memory in RAM +swap is an astounding amount of memory. I am wandering if the we can get more than that . If the cluster nodes have a 64 bit addressable space than this may be possible, otherwise not sure where to go with this.
Raj -----Original Message----- From: Courtney Haswell Sent: Wednesday, November 10, 2010 2:04 PM To: Rajendra Morey Subject: Tbss_x
Raj,
Liz asked the FSl group about the tbss_x error. There was a reply on June 30 that said "You might want to make sure you are not running out of memory. When ran on a group of 90 subjects the swap_subjectwise process would die with less than about 40GB memory available (RAM+swap), and the subsequent steps would fail."
The process that is failing takes a lot of memory and is writing information to tmp. Do you know you guys looked into how much memory is alloted on a node on the cluster? I didn't want to re-research it if it was already determined not to be a problem.
Thanks, Courtney
|
Rajendra Morey MD |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Nov 10 2010 : 3:14:58 PM
|
Well, you home directy has a limit of 1GB, so don't write anything there.
The temporary space is limited to 32GB per user, (that's all nodes combined)
Interactive nodes have 8GB or ram, and compute nodes have 32GB of ram. The OS is running as a ramdisk, so you can typically subtract 2GB from available ram. |
 |
|
|
morey
BIAC Faculty
 
USA
25 Posts |
Posted - Nov 11 2010 : 3:58:46 PM
|
| chris, if I am understanding you correctly that the operating system is using disk space dynamically as if it were RAM and I am also understanding the maximum allocated for "RAM" per user is 32 GB and in theory this limit could be increased by some system setting in the OS. If this is so, could you increase this for us very temporarily so that we can see if our analysis moves past the point where it currently fails. This will help a great deal in resolving this problem given it is the last bit of analysis we need for this paper. |
Rajendra Morey MD |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Nov 11 2010 : 4:02:45 PM
|
the operating system has no actual disk space .. its all in ram. the temporary space i mentioned is available on an nfs mount and each user is limited to 32GB collectively if writing files to temp. This appears like any other space on the disk and the nodes have 32GB of actual ram installed in them, which can't change.
so if you were the only user on 1 node, you would have approximately 30GB of actual ram available to you and 32GB of space available on the temp disk.
you aren't hitting any limits on temp, because i would've received an error from the cluster
|
Edited by - petty on Nov 11 2010 4:07:57 PM |
 |
|
| |
Topic  |
|