Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password   Forgot your Password?
 All Forums
 Support Forums
 Cluster Support
 Matlab Cluster Error: Caught std: exception
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

vvs4
Junior Member

USA
40 Posts

Posted - Jan 31 2011 :  08:25:28 AM  Show Profile  Visit vvs4's Homepage  Send vvs4 an AOL message  Reply with Quote
Hi BIAC,

I was having trouble running some scripts this weekend that create a matlab .m template, and then set up a virtual display, and submit it to run using the following command:

Xvfb :$RANDINT -fbdir $TMPDIR &
# use x virtual frame buffer to set up display, use memory at $RANDINT set earlier, and make output directory the temporary one that gets trashed


/usr/local/bin/matlab -display :$RANDINT < spm_DTI2_2_${RUN}.m
# run matlab with display set to same $RANDINT, and run the script template that we set up.


Doing the following gave me the following error after matlab was initialized, about a gazillion times:

>> Caught std::exception Exception message is:
bad numeric conversion: positive overflow


I looked this up, and found documentation that speculated that this error was a part of this bug: (documentation from September 2010):

Summary

Extended characters typed or pasted into a nodesktop instance of MATLAB running in a xterm causes infinite loop of prompts to be displayed.
Description

This issue was originally found on the Mac but the problem is also reproducible on Linux too. When running MATLAB with the -nodesktop option within an xterm MATLAB is incorrectly handling extended characters when they are typed directly or pasted into the xterm. As MATLAB tries to process the extended characters it ends up going into an infinite loop and keeps displaying the command prompt. The only way break out of the loop is to type ctrl-z and then kill the MATLAB process.

Workaround

There currently is not a work around for the problem.
Fix

This bug was fixed as of R2010b(7.11).

If you have a current subscription to MathWorks Software Maintenance Service (SMS), you can download product updates. If not, learn more about MathWorks SMS. and it exists in: MATLAB R2010a (7.10)

Things I Tried:
I thought that it might be something about the display, so I changed the command to:

/usr/local/bin/matlab -nodisplay < spm_DTI2_2_${RUN}.m

but got the same error.

I noticed that it started spitting out the error either right after or during startup, so I modified the command to:

/usr/local/bin/matlab -display :$RANDINT -r spm_DTI2_2_${RUN}

so it would start up, and then run my script, and hooray, it works! I have just verified that the output is correct and the display worked. But I still wanted to bring this to your attention, because there is likely a logical problem that is causing the first command to error out that could effect many users.

Another issue that came up for my run this weekend (which I think might be related) is for a large number of runs, they don't run at all, but instead I open the .out file (still on my head node) and see the message "ERROR" - I reviewed the order of the runs, and it is the case that the first 20 (minus one) have the first std error specified above and finished running with the output file sent to the output folder, the next 11 or so started running, but the job died on the cluster and the .out files are on my head node, and the remaining ones have the ERROR output. So I was thinking that perhaps the first error caused nodes to crash, and the ERROR jobs were attempts to run jobs on a crashed machine? If that is the case, I apologize to anyone else that was trying to run scripts this weekend!


Other Potentially Useful Information
I found matlab_crash_dump files on my head node, but they are largely empty.

I have run this script since the implementation of hugin, but perhaps not since the last upgrade of matlab on the cluster.

I will continue to look into this, but please let me know if you need any other information, or if I am missing a huge detail! I am currently going through and cleaning up and documenting errored runs, and then will do more detailed testing before running with any python scripts again.

Many thanks!

-Vanessa

syam.gadde
BIAC Staff

USA
421 Posts

Posted - Jan 31 2011 :  09:48:01 AM  Show Profile  Reply with Quote
It is possible that all of these are related. Anyway, I was able to reproduce this error by creating a tmp.m filled with unprintable garbage (no, not the filthy kind) and piping it to Matlab. It's almost certain that the problem is in the .m file that you are trying to run. You can try this:

tr -c '[:print:]\r\n\t' 'X' < INPUT.m > TMP.m

This will translate any unexpected characters in INPUT.m (change this to your file name) into the character 'X' and write it to the file TMP.m. Look at TMP.m and look for X characters -- that would tell you where there might be a problem.
Go to Top of Page

vvs4
Junior Member

USA
40 Posts

Posted - Jan 31 2011 :  11:23:10 AM  Show Profile  Visit vvs4's Homepage  Send vvs4 an AOL message  Reply with Quote
Cool! I just tried that command on both a template .m script produced at run time, as well as a copy of the original, and opened it up to do a find all for capital X's. I realized that it would be hard to discriminate if a capital X in a comment should or should not be there, so I used the diff command (diff input.m TMP.m) and it found me some trash, the same thing in two spots! It looks like this:

377c377
< % images starting with D…
---
> % images starting with DX
720c720
< % images starting with D…
---
> % images starting with DX


So the trash looks like this --> …

I'm not even sure what that is (it's obviously not "..."), or how it got into the file, but when I deleted it and re-ran the script with the original settings, it worked perfectly! So the problem was "mysterious unprintable garbage" - thanks for your help Syam!

-Vanessa
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.42 seconds. Snitz Forums 2000