<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:a="urn:schemas-microsoft-com:office:access" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" xmlns:b="urn:schemas-microsoft-com:office:publisher" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns:oa="urn:schemas-microsoft-com:office:activation" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:q="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc="http://microsoft.com/officenet/conferencing" xmlns:D="DAV:" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:mt="http://schemas.microsoft.com/sharepoint/soap/meetings/" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ppda="http://www.passport.com/NameSpace.xsd" xmlns:ois="http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir="http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:dsp="http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc="http://schemas.microsoft.com/data/udc" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sub="http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec="http://www.w3.org/2001/04/xmlenc#" xmlns:sp="http://schemas.microsoft.com/sharepoint/" xmlns:sps="http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:udcs="http://schemas.microsoft.com/data/udc/soap" xmlns:udcxf="http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf="http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss="http://schemas.microsoft.com/office/2006/digsig-setup" xmlns:dssi="http://schemas.microsoft.com/office/2006/digsig" xmlns:mdssi="http://schemas.openxmlformats.org/package/2006/digital-signature" xmlns:mver="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns:mrels="http://schemas.openxmlformats.org/package/2006/relationships" xmlns:spwp="http://microsoft.com/sharepoint/webpartpages" xmlns:ex12t="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:ex12m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:pptsl="http://schemas.microsoft.com/sharepoint/soap/SlideLibrary/" xmlns:spsl="http://microsoft.com/webservices/SharePointPortalServer/PublishedLinksService" xmlns:Z="urn:schemas-microsoft-com:" xmlns:st="" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thanks for the response, Don. <o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>The specific RDMA suggestion isn’t relevant to our case
(our hardware doesn’t support it), but you may be right that this is an
optimizations related issue. I’ll probably try playing with
optimizations next. I’ve got the same settings as has worked for
previous versions – but perhaps something in the new code has made one of
the settings problematic.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Regarding the suggestions I’ve been getting relating to WRFIO_NCD_LARGE_FILE_SUPPORT
– I don’t think that’s the problem. I’m splitting
my output into single frame files to keep the file size small. I may try
that also, just for the heck of it.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Based on the sporadic nature of this (sometimes it happens,
sometimes it doesn’t, when it hangs seems fairly random), I suspect it’s
some type of timing issue like a race condition. If I can’t get it
working, I may just drop back to 3.1.1, at least until 3.2.1 comes out.
;-)<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thanks all,<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Mike<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Don Morton
[mailto:Don.Morton@alaska.edu] <br>
<b>Sent:</b> Friday, April 16, 2010 9:09 AM<br>
<b>To:</b> Zulauf, Michael<br>
<b>Cc:</b> wrf-users@ucar.edu<br>
<b>Subject:</b> Re: [Wrf-users] WRF 3.2 jobs hanging up sporadically on wrfout
output<o:p></o:p></span></p>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal>I was having these sorts of problems with WRF 3.1.1 a few
weeks ago on our Sun Opteron cluster. It was always hanging on the
writing of wrfout, typically on an inner nest, and it wasn't consistent from
run to run. I had the luxury of being able to try these cases on other
machines, and didn't experience problems on those.<o:p></o:p></p>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>Our folks here suggested I turn off the MPI RDMA (Remote
Direct Memory Access) optimizations, which slowed performance substantially,
but resolved the issue.<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>It's been my experience over the years with WRF, that
frequently these problems are resolved if you turn off optimizations.<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
<div>
<p class=MsoNormal>If you're using a Sun cluster, I can give you a little more
info privately.<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal><o:p> </o:p></p>
<div>
<p class=MsoNormal>On Thu, Apr 15, 2010 at 1:49 PM, Zulauf, Michael <<a
href="mailto:Michael.Zulauf@iberdrolausa.com">Michael.Zulauf@iberdrolausa.com</a>>
wrote:<o:p></o:p></p>
<p class=MsoNormal>Hi all,<br>
<br>
I'm trying to get WRF V3.2 running by utilizing a setup that I've<br>
successfully run with V3.1.1 (and earlier). The configure/compile<br>
seemed to go fine using the same basic configuration details that have<br>
worked in the past. When I look over the Updates in V3.2, I don't see<br>
anything problematic for me.<br>
<br>
We're running with four grids, nesting from 27km to 1km, initialized and<br>
forced with GFS output. The nest initializations are delayed from the<br>
outer grid initialization by 3, 6, and 9 hours, respecitively. The 1km<br>
grid has wrfout (netcdf) output every 20 minutes, the other grids every<br>
hour.<br>
<br>
What I'm seeing is that the job appears to be running fine for some<br>
time, but eventually the job hangs up during wrfout output - usually on<br>
the finest grid - but not exclusively. Changing small details (such as<br>
changing restart_interval) can make it run longer or shorter. Sometimes<br>
even with no changes it will run a different length of time.<br>
<br>
I've got debug_level set to 300, so I get tons of output. When it<br>
hangs, the wrf process don't die, but all output stops. There are no<br>
error messages or anything else that indicate a problem (at least none<br>
that I can find). What I do get is a truncated (always 32 byte) wrfout<br>
file. For example:<br>
<br>
-rw-r--r-- 1 p20457 staff 32 Apr 15 13:02<br>
wrfout_d04_2009-12-14_09:00:00<br>
<br>
The wrfout's that get written before it hangs appear to be fine, with<br>
valid data. frames_per_outfile is set to 1, so the files never get<br>
excessively large - maybe on the order of 175MB. All of the previous<br>
versions of WRF that I've used continue work fine on this hardware/OS<br>
combination (a cluster of dual-dual core Opterons, running CentOS) -<br>
just V3.2 has issues.<br>
<br>
Like I said, the wrf processes don't die, but all output ceases, even<br>
with the massive amount of debug info. The last lines in the rsl.error<br>
and rsl.out files is always something of this type:<br>
<br>
date 2009-12-14_09:00:00<br>
ds 1
1 1<br>
de 1
1 1<br>
ps 1
1 1<br>
pe 1
1 1<br>
ms 1
1 1<br>
me 1
1 1<br>
output_wrf.b writing 0d real<br>
<br>
The specific times and and variables being written vary, depending on<br>
when the job hangs.<br>
<br>
I haven't dug deeply into what's going on, but it seems like possibly<br>
some sort of race condition or communications deadlock or something.<br>
Does anybody have ideas of where I should go from here? It seems to me<br>
like maybe something basic has changed with V3.2, and perhaps I need to<br>
adjust something in my configuration or setup.<br>
<br>
Thanks,<br>
Mike<br>
<br>
--<br>
Mike Zulauf<br>
Meteorologist<br>
Wind Asset Management<br>
Iberdrola Renewables<br>
1125 NW Couch, Suite 700<br>
Portland, OR 97209<br>
Office: 503-478-6304 Cell: 503-913-0403<br>
<br>
<br>
<br>
<br>
<br>
This message is intended for the exclusive attention of the address(es)
indicated. Any information contained herein is strictly confidential and
privileged, especially as regards person data,<br>
which must not be disclosed. If you are the intended recipient and have received
it by mistake or learn about it in any other way, please notify us by return
e-mail and delete this message from<br>
your computer system. Any unauthorized use, reproduction, alteration,
filing or sending of this message and/or any attached files to third parties
may lead to legal proceedings being taken. Any<br>
opinion expressed herein is solely that of the author(s) and does not
necessarily represent the opinion of Iberdrola. The sender does not guarantee
the integrity, speed or safety of this<br>
message, not accept responsibility for any possible damage arising from the
interception, incorporation of virus or any other manipulation carried out by
third parties.<br>
<br>
_______________________________________________<br>
Wrf-users mailing list<br>
<a href="mailto:Wrf-users@ucar.edu">Wrf-users@ucar.edu</a><br>
<a href="http://mailman.ucar.edu/mailman/listinfo/wrf-users" target="_blank">http://mailman.ucar.edu/mailman/listinfo/wrf-users</a><o:p></o:p></p>
</div>
<p class=MsoNormal><br>
<br clear=all>
<br>
-- <br>
Arctic Region Supercomputing Center<br>
<a href="http://www.arsc.edu/~morton/">http://www.arsc.edu/~morton/</a><o:p></o:p></p>
</div>
</div>
</body>
<!--[object_id=#iberdrolausa.com#]--><P><FONT face=Tahoma color=#0000ff size=2></FONT> </P>
<P class=MsoNormal><FONT size=1><SPAN style="FONT-SIZE: 8pt; COLOR: blue; FONT-FAMILY: Arial"><FONT color=#000000>This message is intended for the exclusive attention of the address(es) indicated. Any information contained herein is strictly confidential and privileged, especially as regards person data, which must not be disclosed. If you are the intended recipient and have received it by mistake or learn about it in any other way, please notify us by return e-mail and delete this message from your computer system. Any unauthorized use, reproduction, alteration, filing or sending of this message and/or any attached files to third parties may lead to legal proceedings being taken. Any opinion expressed herein is solely that of the author(s) and does not necessarily represent the opinion of Iberdrola. The sender does not guarantee the integrity, speed or safety of this message, not accept responsibility for any possible damage arising from the interception, incorporation of virus or any other manipulation carried out by third parties.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></FONT></SPAN></FONT></P></html>