[Met_help] [rt.rap.ucar.edu #92432] History for CONVERT_EXE problem in METplus batch job
Minna Win via RT
met_help at ucar.edu
Fri Oct 18 09:28:10 MDT 2019
----------------------------------------------------------------
Initial Request
----------------------------------------------------------------
Hi,
I'm running METplus on WCOSS Gyre (Phase II), and I'm running into an
execution issue that's coming from the CONVERT_EXE configuration setting.
In my system conf file, I originally had CONVERT_EXE pointing to
/usr/bin/convert. Everything works fine when running METplus interactively
on the command line. However, when running METplus in a batch script, I get
the following error:
ERROR: Executable CONVERT_EXE does not exist at /usr/bin/convert
The error is fatal, and no output is produced. I've had that error happen
before in a batch job unrelated to METplus, so I believe that error is a
Gyre issue.
I was advised to try changing CONVERT_EXE to point to ImageMagick's convert
at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making that change allows
METplus to find the executable and successfully produce output. However,
with that configuration setting, my MPI/poescript tasks run serially
instead of in parallel. I've double checked only changing the CONVERT_EXE
setting, and I can confirm that the error and issue I'm seeing are at least
somewhat influenced by that setting.
Any advice or help on how to work around this issue would be greatly
appreciated.
Thanks,
Logan
--
*Logan C. Dawson, Ph.D.*
Support Scientist, I.M. Systems Group, Inc.
NOAA/NWS/NCEP/EMC
5830 University Research Court
College Park, MD 20740
(301) 683-3944
----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Minna Win
Time: Thu Oct 03 11:52:07 2019
Hi Logan,
It looks like you are having issues with running METplus in batch
mode.
I've asked a NOAA/GSD colleague for some assistance, and he would be
interested in seeing what command you are using in your MPI/poescript.
If
you have any log output you can provide, that would also be helpful.
Are
you using a shell script to invoke METplus, then another command for
the
poescript?
Thanks,
Minna
---------------
Minna Win
National Center for Atmospheric Research
Developmental Testbed Center
Phone: 303-497-8423
Fax: 303-497-8401
On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA Affiliate via RT <
met_help at ucar.edu> wrote:
>
> Thu Oct 03 10:23:36 2019: Request 92432 was acted upon.
> Transaction: Ticket created by logan.dawson at noaa.gov
> Queue: met_help
> Subject: CONVERT_EXE problem in METplus batch job
> Owner: Nobody
> Requestors: logan.dawson at noaa.gov
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
>
>
> Hi,
>
> I'm running METplus on WCOSS Gyre (Phase II), and I'm running into
an
> execution issue that's coming from the CONVERT_EXE configuration
setting.
>
> In my system conf file, I originally had CONVERT_EXE pointing to
> /usr/bin/convert. Everything works fine when running METplus
interactively
> on the command line. However, when running METplus in a batch
script, I get
> the following error:
> ERROR: Executable CONVERT_EXE does not exist at /usr/bin/convert
>
> The error is fatal, and no output is produced. I've had that error
happen
> before in a batch job unrelated to METplus, so I believe that error
is a
> Gyre issue.
>
> I was advised to try changing CONVERT_EXE to point to ImageMagick's
convert
> at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making that change
allows
> METplus to find the executable and successfully produce output.
However,
> with that configuration setting, my MPI/poescript tasks run serially
> instead of in parallel. I've double checked only changing the
CONVERT_EXE
> setting, and I can confirm that the error and issue I'm seeing are
at least
> somewhat influenced by that setting.
>
> Any advice or help on how to work around this issue would be greatly
> appreciated.
>
> Thanks,
> Logan
>
>
> --
> *Logan C. Dawson, Ph.D.*
> Support Scientist, I.M. Systems Group, Inc.
> NOAA/NWS/NCEP/EMC
> 5830 University Research Court
> College Park, MD 20740
> (301) 683-3944
>
>
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Logan Dawson - NOAA Affiliate
Time: Thu Oct 03 12:28:12 2019
Hi Minna,
Thanks for the quick response.
I put the example scripts and output in a web directory that's
publicly
accessible:
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
The general premise of what I'm doing is submitting a batch job with
the
verif_FV3SAR.sh script, which generates the poescript. The poescript
runs
three different shell scripts to run grid_stat for three different
forecast
fields.
Would a potential solution be to have the poescript actually make the
3
METplus calls rather than having those calls embedded within the shell
scripts?
If you need anything else or have trouble accessing the scripts and
log
files, please let me know.
Thanks,
Logan
On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT <met_help at ucar.edu>
wrote:
> Hi Logan,
>
> It looks like you are having issues with running METplus in batch
mode.
> I've asked a NOAA/GSD colleague for some assistance, and he would be
> interested in seeing what command you are using in your
MPI/poescript. If
> you have any log output you can provide, that would also be helpful.
Are
> you using a shell script to invoke METplus, then another command for
the
> poescript?
>
> Thanks,
> Minna
> ---------------
> Minna Win
> National Center for Atmospheric Research
> Developmental Testbed Center
> Phone: 303-497-8423
> Fax: 303-497-8401
>
>
>
> On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA Affiliate via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > Thu Oct 03 10:23:36 2019: Request 92432 was acted upon.
> > Transaction: Ticket created by logan.dawson at noaa.gov
> > Queue: met_help
> > Subject: CONVERT_EXE problem in METplus batch job
> > Owner: Nobody
> > Requestors: logan.dawson at noaa.gov
> > Status: new
> > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> >
> >
> > Hi,
> >
> > I'm running METplus on WCOSS Gyre (Phase II), and I'm running into
an
> > execution issue that's coming from the CONVERT_EXE configuration
setting.
> >
> > In my system conf file, I originally had CONVERT_EXE pointing to
> > /usr/bin/convert. Everything works fine when running METplus
> interactively
> > on the command line. However, when running METplus in a batch
script, I
> get
> > the following error:
> > ERROR: Executable CONVERT_EXE does not exist at /usr/bin/convert
> >
> > The error is fatal, and no output is produced. I've had that error
happen
> > before in a batch job unrelated to METplus, so I believe that
error is a
> > Gyre issue.
> >
> > I was advised to try changing CONVERT_EXE to point to
ImageMagick's
> convert
> > at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making that change
allows
> > METplus to find the executable and successfully produce output.
However,
> > with that configuration setting, my MPI/poescript tasks run
serially
> > instead of in parallel. I've double checked only changing the
CONVERT_EXE
> > setting, and I can confirm that the error and issue I'm seeing are
at
> least
> > somewhat influenced by that setting.
> >
> > Any advice or help on how to work around this issue would be
greatly
> > appreciated.
> >
> > Thanks,
> > Logan
> >
> >
> > --
> > *Logan C. Dawson, Ph.D.*
> > Support Scientist, I.M. Systems Group, Inc.
> > NOAA/NWS/NCEP/EMC
> > 5830 University Research Court
> > College Park, MD 20740
> > (301) 683-3944
> >
> >
>
>
--
*Logan C. Dawson, Ph.D.*
Support Scientist, I.M. Systems Group, Inc.
NOAA/NWS/NCEP/EMC
5830 University Research Court
College Park, MD 20740
(301) 683-3944
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Minna Win
Time: Thu Oct 03 13:46:20 2019
Hi Logan,
Jim, our NOAA/GSD member of the METplus wrapper team is our resident
MPI
expert. He took a quick look at your scripts and doesn't think that
circumventing the shell scripts with direct calls to METplus will
change
any outcome (although if you want to try it, who knows?). One thing
Jim is
curious about is what you are using to verify that things are running
serially (or in parallel). Jim will be in meetings all afternoon and
won't
get a chance to look more closely until tomorrow.
Thanks,
Minna
---------------
Minna Win
National Center for Atmospheric Research
Developmental Testbed Center
Phone: 303-497-8423
Fax: 303-497-8401
On Thu, Oct 3, 2019 at 12:28 PM Logan Dawson - NOAA Affiliate via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
>
> Hi Minna,
>
> Thanks for the quick response.
>
> I put the example scripts and output in a web directory that's
publicly
> accessible:
> https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
>
> The general premise of what I'm doing is submitting a batch job with
the
> verif_FV3SAR.sh script, which generates the poescript. The poescript
runs
> three different shell scripts to run grid_stat for three different
forecast
> fields.
>
> Would a potential solution be to have the poescript actually make
the 3
> METplus calls rather than having those calls embedded within the
shell
> scripts?
>
> If you need anything else or have trouble accessing the scripts and
log
> files, please let me know.
>
> Thanks,
> Logan
>
> On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT <met_help at ucar.edu>
wrote:
>
> > Hi Logan,
> >
> > It looks like you are having issues with running METplus in batch
mode.
> > I've asked a NOAA/GSD colleague for some assistance, and he would
be
> > interested in seeing what command you are using in your
MPI/poescript.
> If
> > you have any log output you can provide, that would also be
helpful. Are
> > you using a shell script to invoke METplus, then another command
for the
> > poescript?
> >
> > Thanks,
> > Minna
> > ---------------
> > Minna Win
> > National Center for Atmospheric Research
> > Developmental Testbed Center
> > Phone: 303-497-8423
> > Fax: 303-497-8401
> >
> >
> >
> > On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA Affiliate via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > Thu Oct 03 10:23:36 2019: Request 92432 was acted upon.
> > > Transaction: Ticket created by logan.dawson at noaa.gov
> > > Queue: met_help
> > > Subject: CONVERT_EXE problem in METplus batch job
> > > Owner: Nobody
> > > Requestors: logan.dawson at noaa.gov
> > > Status: new
> > > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
> >
> > >
> > >
> > > Hi,
> > >
> > > I'm running METplus on WCOSS Gyre (Phase II), and I'm running
into an
> > > execution issue that's coming from the CONVERT_EXE configuration
> setting.
> > >
> > > In my system conf file, I originally had CONVERT_EXE pointing to
> > > /usr/bin/convert. Everything works fine when running METplus
> > interactively
> > > on the command line. However, when running METplus in a batch
script, I
> > get
> > > the following error:
> > > ERROR: Executable CONVERT_EXE does not exist at
/usr/bin/convert
> > >
> > > The error is fatal, and no output is produced. I've had that
error
> happen
> > > before in a batch job unrelated to METplus, so I believe that
error is
> a
> > > Gyre issue.
> > >
> > > I was advised to try changing CONVERT_EXE to point to
ImageMagick's
> > convert
> > > at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making that
change
> allows
> > > METplus to find the executable and successfully produce output.
> However,
> > > with that configuration setting, my MPI/poescript tasks run
serially
> > > instead of in parallel. I've double checked only changing the
> CONVERT_EXE
> > > setting, and I can confirm that the error and issue I'm seeing
are at
> > least
> > > somewhat influenced by that setting.
> > >
> > > Any advice or help on how to work around this issue would be
greatly
> > > appreciated.
> > >
> > > Thanks,
> > > Logan
> > >
> > >
> > > --
> > > *Logan C. Dawson, Ph.D.*
> > > Support Scientist, I.M. Systems Group, Inc.
> > > NOAA/NWS/NCEP/EMC
> > > 5830 University Research Court
> > > College Park, MD 20740
> > > (301) 683-3944
> > >
> > >
> >
> >
>
> --
> *Logan C. Dawson, Ph.D.*
> Support Scientist, I.M. Systems Group, Inc.
> NOAA/NWS/NCEP/EMC
> 5830 University Research Court
> College Park, MD 20740
> (301) 683-3944
>
>
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Logan Dawson - NOAA Affiliate
Time: Thu Oct 03 16:34:33 2019
Hi Minna,
To confirm that it was running serially, I directed the output of each
METplus call into a file to see if any output was being produced. For
the
job that's running serially
<https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/poe_serial/>,
I
only got one such file (refc.out). For the parallel job
<https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/CONVERT_EXE_fails/>
that fails due to the CONVERT_EXE error, I got three output files
(refc.out, refd1.out, and retop.out). The same is true for the number
of
metplus_final_gridstat_${field}.conf files that each job produced.
For full clarity, once I realized the job seemed to be running
serially, I
killed it since it was clear it wasn't functioning properly. That's
why
there isn't a stat_analysis conf file in the poe_serial directory.
Thanks,
Logan
On Thu, Oct 3, 2019 at 3:46 PM Minna Win via RT <met_help at ucar.edu>
wrote:
> Hi Logan,
>
> Jim, our NOAA/GSD member of the METplus wrapper team is our resident
MPI
> expert. He took a quick look at your scripts and doesn't think that
> circumventing the shell scripts with direct calls to METplus will
change
> any outcome (although if you want to try it, who knows?). One thing
Jim is
> curious about is what you are using to verify that things are
running
> serially (or in parallel). Jim will be in meetings all afternoon
and won't
> get a chance to look more closely until tomorrow.
>
> Thanks,
> Minna
> ---------------
> Minna Win
> National Center for Atmospheric Research
> Developmental Testbed Center
> Phone: 303-497-8423
> Fax: 303-497-8401
>
>
>
> On Thu, Oct 3, 2019 at 12:28 PM Logan Dawson - NOAA Affiliate via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> >
> > Hi Minna,
> >
> > Thanks for the quick response.
> >
> > I put the example scripts and output in a web directory that's
publicly
> > accessible:
> > https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
> >
> > The general premise of what I'm doing is submitting a batch job
with the
> > verif_FV3SAR.sh script, which generates the poescript. The
poescript runs
> > three different shell scripts to run grid_stat for three different
> forecast
> > fields.
> >
> > Would a potential solution be to have the poescript actually make
the 3
> > METplus calls rather than having those calls embedded within the
shell
> > scripts?
> >
> > If you need anything else or have trouble accessing the scripts
and log
> > files, please let me know.
> >
> > Thanks,
> > Logan
> >
> > On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT
<met_help at ucar.edu>
> wrote:
> >
> > > Hi Logan,
> > >
> > > It looks like you are having issues with running METplus in
batch mode.
> > > I've asked a NOAA/GSD colleague for some assistance, and he
would be
> > > interested in seeing what command you are using in your
MPI/poescript.
> > If
> > > you have any log output you can provide, that would also be
helpful.
> Are
> > > you using a shell script to invoke METplus, then another command
for
> the
> > > poescript?
> > >
> > > Thanks,
> > > Minna
> > > ---------------
> > > Minna Win
> > > National Center for Atmospheric Research
> > > Developmental Testbed Center
> > > Phone: 303-497-8423
> > > Fax: 303-497-8401
> > >
> > >
> > >
> > > On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA Affiliate
via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > Thu Oct 03 10:23:36 2019: Request 92432 was acted upon.
> > > > Transaction: Ticket created by logan.dawson at noaa.gov
> > > > Queue: met_help
> > > > Subject: CONVERT_EXE problem in METplus batch job
> > > > Owner: Nobody
> > > > Requestors: logan.dawson at noaa.gov
> > > > Status: new
> > > > Ticket <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
> > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I'm running METplus on WCOSS Gyre (Phase II), and I'm running
into an
> > > > execution issue that's coming from the CONVERT_EXE
configuration
> > setting.
> > > >
> > > > In my system conf file, I originally had CONVERT_EXE pointing
to
> > > > /usr/bin/convert. Everything works fine when running METplus
> > > interactively
> > > > on the command line. However, when running METplus in a batch
> script, I
> > > get
> > > > the following error:
> > > > ERROR: Executable CONVERT_EXE does not exist at
/usr/bin/convert
> > > >
> > > > The error is fatal, and no output is produced. I've had that
error
> > happen
> > > > before in a batch job unrelated to METplus, so I believe that
error
> is
> > a
> > > > Gyre issue.
> > > >
> > > > I was advised to try changing CONVERT_EXE to point to
ImageMagick's
> > > convert
> > > > at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making that
change
> > allows
> > > > METplus to find the executable and successfully produce
output.
> > However,
> > > > with that configuration setting, my MPI/poescript tasks run
serially
> > > > instead of in parallel. I've double checked only changing the
> > CONVERT_EXE
> > > > setting, and I can confirm that the error and issue I'm seeing
are at
> > > least
> > > > somewhat influenced by that setting.
> > > >
> > > > Any advice or help on how to work around this issue would be
greatly
> > > > appreciated.
> > > >
> > > > Thanks,
> > > > Logan
> > > >
> > > >
> > > > --
> > > > *Logan C. Dawson, Ph.D.*
> > > > Support Scientist, I.M. Systems Group, Inc.
> > > > NOAA/NWS/NCEP/EMC
> > > > 5830 University Research Court
> > > > College Park, MD 20740
> > > > (301) 683-3944
> > > >
> > > >
> > >
> > >
> >
> > --
> > *Logan C. Dawson, Ph.D.*
> > Support Scientist, I.M. Systems Group, Inc.
> > NOAA/NWS/NCEP/EMC
> > 5830 University Research Court
> > College Park, MD 20740
> > (301) 683-3944
> >
> >
>
>
--
*Logan C. Dawson, Ph.D.*
Support Scientist, I.M. Systems Group, Inc.
NOAA/NWS/NCEP/EMC
5830 University Research Court
College Park, MD 20740
(301) 683-3944
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Minna Win
Time: Fri Oct 04 10:29:15 2019
Hi Logan,
Jim seems to think that:
--------------------------snip--------------------------------
*Since Logan indicates jobs are running serially ... that has nothing
to do
with METplus calls.*
*I think the best approach is to remove the METplus calls, and first
make
sure that he is able to*
*submit jobs from a POE script and run them in parallel ... ie.
replaced
with sleep command etc ...*
*If that works ... than call a single sh script and make sure METplus
is
running properly ...*
*I'm starting to think that it isn't .... but that has nothing to do
with
the running in parallel. *
*----------------------snip------------------------------*
Can you try the above and let me know what happens?
Thanks,
Minna
*---------------Minna Win*
National Center for Atmospheric Research
Developmental Testbed Center
Phone: 303-497-8423
Fax: 303-497-8401
On Thu, Oct 3, 2019 at 4:34 PM Logan Dawson - NOAA Affiliate via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
>
> Hi Minna,
>
> To confirm that it was running serially, I directed the output of
each
> METplus call into a file to see if any output was being produced.
For the
> job that's running serially
>
<https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/poe_serial/>,
I
> only got one such file (refc.out). For the parallel job
> <
>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/CONVERT_EXE_fails/
> >
> that fails due to the CONVERT_EXE error, I got three output files
> (refc.out, refd1.out, and retop.out). The same is true for the
number of
> metplus_final_gridstat_${field}.conf files that each job produced.
>
> For full clarity, once I realized the job seemed to be running
serially, I
> killed it since it was clear it wasn't functioning properly. That's
why
> there isn't a stat_analysis conf file in the poe_serial directory.
>
> Thanks,
> Logan
>
> On Thu, Oct 3, 2019 at 3:46 PM Minna Win via RT <met_help at ucar.edu>
wrote:
>
> > Hi Logan,
> >
> > Jim, our NOAA/GSD member of the METplus wrapper team is our
resident MPI
> > expert. He took a quick look at your scripts and doesn't think
that
> > circumventing the shell scripts with direct calls to METplus will
change
> > any outcome (although if you want to try it, who knows?). One
thing Jim
> is
> > curious about is what you are using to verify that things are
running
> > serially (or in parallel). Jim will be in meetings all afternoon
and
> won't
> > get a chance to look more closely until tomorrow.
> >
> > Thanks,
> > Minna
> > ---------------
> > Minna Win
> > National Center for Atmospheric Research
> > Developmental Testbed Center
> > Phone: 303-497-8423
> > Fax: 303-497-8401
> >
> >
> >
> > On Thu, Oct 3, 2019 at 12:28 PM Logan Dawson - NOAA Affiliate via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> > >
> > > Hi Minna,
> > >
> > > Thanks for the quick response.
> > >
> > > I put the example scripts and output in a web directory that's
publicly
> > > accessible:
> > > https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
> > >
> > > The general premise of what I'm doing is submitting a batch job
with
> the
> > > verif_FV3SAR.sh script, which generates the poescript. The
poescript
> runs
> > > three different shell scripts to run grid_stat for three
different
> > forecast
> > > fields.
> > >
> > > Would a potential solution be to have the poescript actually
make the 3
> > > METplus calls rather than having those calls embedded within the
shell
> > > scripts?
> > >
> > > If you need anything else or have trouble accessing the scripts
and log
> > > files, please let me know.
> > >
> > > Thanks,
> > > Logan
> > >
> > > On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT
<met_help at ucar.edu>
> > wrote:
> > >
> > > > Hi Logan,
> > > >
> > > > It looks like you are having issues with running METplus in
batch
> mode.
> > > > I've asked a NOAA/GSD colleague for some assistance, and he
would be
> > > > interested in seeing what command you are using in your
> MPI/poescript.
> > > If
> > > > you have any log output you can provide, that would also be
helpful.
> > Are
> > > > you using a shell script to invoke METplus, then another
command for
> > the
> > > > poescript?
> > > >
> > > > Thanks,
> > > > Minna
> > > > ---------------
> > > > Minna Win
> > > > National Center for Atmospheric Research
> > > > Developmental Testbed Center
> > > > Phone: 303-497-8423
> > > > Fax: 303-497-8401
> > > >
> > > >
> > > >
> > > > On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA Affiliate
via RT
> <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > Thu Oct 03 10:23:36 2019: Request 92432 was acted upon.
> > > > > Transaction: Ticket created by logan.dawson at noaa.gov
> > > > > Queue: met_help
> > > > > Subject: CONVERT_EXE problem in METplus batch job
> > > > > Owner: Nobody
> > > > > Requestors: logan.dawson at noaa.gov
> > > > > Status: new
> > > > > Ticket <URL:
> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
> > > >
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm running METplus on WCOSS Gyre (Phase II), and I'm
running into
> an
> > > > > execution issue that's coming from the CONVERT_EXE
configuration
> > > setting.
> > > > >
> > > > > In my system conf file, I originally had CONVERT_EXE
pointing to
> > > > > /usr/bin/convert. Everything works fine when running METplus
> > > > interactively
> > > > > on the command line. However, when running METplus in a
batch
> > script, I
> > > > get
> > > > > the following error:
> > > > > ERROR: Executable CONVERT_EXE does not exist at
/usr/bin/convert
> > > > >
> > > > > The error is fatal, and no output is produced. I've had that
error
> > > happen
> > > > > before in a batch job unrelated to METplus, so I believe
that error
> > is
> > > a
> > > > > Gyre issue.
> > > > >
> > > > > I was advised to try changing CONVERT_EXE to point to
ImageMagick's
> > > > convert
> > > > > at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making that
change
> > > allows
> > > > > METplus to find the executable and successfully produce
output.
> > > However,
> > > > > with that configuration setting, my MPI/poescript tasks run
> serially
> > > > > instead of in parallel. I've double checked only changing
the
> > > CONVERT_EXE
> > > > > setting, and I can confirm that the error and issue I'm
seeing are
> at
> > > > least
> > > > > somewhat influenced by that setting.
> > > > >
> > > > > Any advice or help on how to work around this issue would be
> greatly
> > > > > appreciated.
> > > > >
> > > > > Thanks,
> > > > > Logan
> > > > >
> > > > >
> > > > > --
> > > > > *Logan C. Dawson, Ph.D.*
> > > > > Support Scientist, I.M. Systems Group, Inc.
> > > > > NOAA/NWS/NCEP/EMC
> > > > > 5830 University Research Court
> > > > > College Park, MD 20740
> > > > > (301) 683-3944
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > --
> > > *Logan C. Dawson, Ph.D.*
> > > Support Scientist, I.M. Systems Group, Inc.
> > > NOAA/NWS/NCEP/EMC
> > > 5830 University Research Court
> > > College Park, MD 20740
> > > (301) 683-3944
> > >
> > >
> >
> >
>
> --
> *Logan C. Dawson, Ph.D.*
> Support Scientist, I.M. Systems Group, Inc.
> NOAA/NWS/NCEP/EMC
> 5830 University Research Court
> College Park, MD 20740
> (301) 683-3944
>
>
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Logan Dawson - NOAA Affiliate
Time: Mon Oct 07 12:05:44 2019
Hi Minna,
Sorry for the delay in getting back to you. I just wanted to let you
know
that I got pulled into a high-priority task on last Friday, so it will
be
another day or two before I'm able to get back to testing around the
issues
I'm having with running METplus with a poescript. I'll update you and
Jim
on what I find as soon as I can.
Thanks,
Logan
On Fri, Oct 4, 2019 at 4:29 PM Minna Win via RT <met_help at ucar.edu>
wrote:
> Hi Logan,
>
> Jim seems to think that:
> --------------------------snip--------------------------------
> *Since Logan indicates jobs are running serially ... that has
nothing to do
> with METplus calls.*
>
> *I think the best approach is to remove the METplus calls, and first
make
> sure that he is able to*
> *submit jobs from a POE script and run them in parallel ... ie.
replaced
> with sleep command etc ...*
>
> *If that works ... than call a single sh script and make sure
METplus is
> running properly ...*
> *I'm starting to think that it isn't .... but that has nothing to do
with
> the running in parallel. *
>
> *----------------------snip------------------------------*
>
>
> Can you try the above and let me know what happens?
>
> Thanks,
> Minna
>
>
> *---------------Minna Win*
> National Center for Atmospheric Research
> Developmental Testbed Center
> Phone: 303-497-8423
> Fax: 303-497-8401
>
>
>
> On Thu, Oct 3, 2019 at 4:34 PM Logan Dawson - NOAA Affiliate via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> >
> > Hi Minna,
> >
> > To confirm that it was running serially, I directed the output of
each
> > METplus call into a file to see if any output was being produced.
For
> the
> > job that's running serially
> >
<https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/poe_serial/>,
> I
> > only got one such file (refc.out). For the parallel job
> > <
> >
>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/CONVERT_EXE_fails/
> > >
> > that fails due to the CONVERT_EXE error, I got three output files
> > (refc.out, refd1.out, and retop.out). The same is true for the
number of
> > metplus_final_gridstat_${field}.conf files that each job produced.
> >
> > For full clarity, once I realized the job seemed to be running
serially,
> I
> > killed it since it was clear it wasn't functioning properly.
That's why
> > there isn't a stat_analysis conf file in the poe_serial directory.
> >
> > Thanks,
> > Logan
> >
> > On Thu, Oct 3, 2019 at 3:46 PM Minna Win via RT
<met_help at ucar.edu>
> wrote:
> >
> > > Hi Logan,
> > >
> > > Jim, our NOAA/GSD member of the METplus wrapper team is our
resident
> MPI
> > > expert. He took a quick look at your scripts and doesn't think
that
> > > circumventing the shell scripts with direct calls to METplus
will
> change
> > > any outcome (although if you want to try it, who knows?). One
thing
> Jim
> > is
> > > curious about is what you are using to verify that things are
running
> > > serially (or in parallel). Jim will be in meetings all
afternoon and
> > won't
> > > get a chance to look more closely until tomorrow.
> > >
> > > Thanks,
> > > Minna
> > > ---------------
> > > Minna Win
> > > National Center for Atmospheric Research
> > > Developmental Testbed Center
> > > Phone: 303-497-8423
> > > Fax: 303-497-8401
> > >
> > >
> > >
> > > On Thu, Oct 3, 2019 at 12:28 PM Logan Dawson - NOAA Affiliate
via RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
>
> > > >
> > > > Hi Minna,
> > > >
> > > > Thanks for the quick response.
> > > >
> > > > I put the example scripts and output in a web directory that's
> publicly
> > > > accessible:
> > > > https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
> > > >
> > > > The general premise of what I'm doing is submitting a batch
job with
> > the
> > > > verif_FV3SAR.sh script, which generates the poescript. The
poescript
> > runs
> > > > three different shell scripts to run grid_stat for three
different
> > > forecast
> > > > fields.
> > > >
> > > > Would a potential solution be to have the poescript actually
make
> the 3
> > > > METplus calls rather than having those calls embedded within
the
> shell
> > > > scripts?
> > > >
> > > > If you need anything else or have trouble accessing the
scripts and
> log
> > > > files, please let me know.
> > > >
> > > > Thanks,
> > > > Logan
> > > >
> > > > On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT
<met_help at ucar.edu>
> > > wrote:
> > > >
> > > > > Hi Logan,
> > > > >
> > > > > It looks like you are having issues with running METplus in
batch
> > mode.
> > > > > I've asked a NOAA/GSD colleague for some assistance, and he
would
> be
> > > > > interested in seeing what command you are using in your
> > MPI/poescript.
> > > > If
> > > > > you have any log output you can provide, that would also be
> helpful.
> > > Are
> > > > > you using a shell script to invoke METplus, then another
command
> for
> > > the
> > > > > poescript?
> > > > >
> > > > > Thanks,
> > > > > Minna
> > > > > ---------------
> > > > > Minna Win
> > > > > National Center for Atmospheric Research
> > > > > Developmental Testbed Center
> > > > > Phone: 303-497-8423
> > > > > Fax: 303-497-8401
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA
Affiliate via
> RT
> > <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > > >
> > > > > > Thu Oct 03 10:23:36 2019: Request 92432 was acted upon.
> > > > > > Transaction: Ticket created by logan.dawson at noaa.gov
> > > > > > Queue: met_help
> > > > > > Subject: CONVERT_EXE problem in METplus batch job
> > > > > > Owner: Nobody
> > > > > > Requestors: logan.dawson at noaa.gov
> > > > > > Status: new
> > > > > > Ticket <URL:
> > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
> > > > >
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm running METplus on WCOSS Gyre (Phase II), and I'm
running
> into
> > an
> > > > > > execution issue that's coming from the CONVERT_EXE
configuration
> > > > setting.
> > > > > >
> > > > > > In my system conf file, I originally had CONVERT_EXE
pointing to
> > > > > > /usr/bin/convert. Everything works fine when running
METplus
> > > > > interactively
> > > > > > on the command line. However, when running METplus in a
batch
> > > script, I
> > > > > get
> > > > > > the following error:
> > > > > > ERROR: Executable CONVERT_EXE does not exist at
/usr/bin/convert
> > > > > >
> > > > > > The error is fatal, and no output is produced. I've had
that
> error
> > > > happen
> > > > > > before in a batch job unrelated to METplus, so I believe
that
> error
> > > is
> > > > a
> > > > > > Gyre issue.
> > > > > >
> > > > > > I was advised to try changing CONVERT_EXE to point to
> ImageMagick's
> > > > > convert
> > > > > > at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making
that
> change
> > > > allows
> > > > > > METplus to find the executable and successfully produce
output.
> > > > However,
> > > > > > with that configuration setting, my MPI/poescript tasks
run
> > serially
> > > > > > instead of in parallel. I've double checked only changing
the
> > > > CONVERT_EXE
> > > > > > setting, and I can confirm that the error and issue I'm
seeing
> are
> > at
> > > > > least
> > > > > > somewhat influenced by that setting.
> > > > > >
> > > > > > Any advice or help on how to work around this issue would
be
> > greatly
> > > > > > appreciated.
> > > > > >
> > > > > > Thanks,
> > > > > > Logan
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *Logan C. Dawson, Ph.D.*
> > > > > > Support Scientist, I.M. Systems Group, Inc.
> > > > > > NOAA/NWS/NCEP/EMC
> > > > > > 5830 University Research Court
> > > > > > College Park, MD 20740
> > > > > > (301) 683-3944
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > *Logan C. Dawson, Ph.D.*
> > > > Support Scientist, I.M. Systems Group, Inc.
> > > > NOAA/NWS/NCEP/EMC
> > > > 5830 University Research Court
> > > > College Park, MD 20740
> > > > (301) 683-3944
> > > >
> > > >
> > >
> > >
> >
> > --
> > *Logan C. Dawson, Ph.D.*
> > Support Scientist, I.M. Systems Group, Inc.
> > NOAA/NWS/NCEP/EMC
> > 5830 University Research Court
> > College Park, MD 20740
> > (301) 683-3944
> >
> >
>
>
--
*Logan C. Dawson, Ph.D.*
Support Scientist, I.M. Systems Group, Inc.
NOAA/NWS/NCEP/EMC
5830 University Research Court
College Park, MD 20740
(301) 683-3944
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Minna Win
Time: Mon Oct 07 12:19:14 2019
Thanks for the update, Logan.
Regards,
Minna
---------------
Minna Win
National Center for Atmospheric Research
Developmental Testbed Center
Phone: 303-497-8423
Fax: 303-497-8401
On Mon, Oct 7, 2019 at 12:06 PM Logan Dawson - NOAA Affiliate via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
>
> Hi Minna,
>
> Sorry for the delay in getting back to you. I just wanted to let you
know
> that I got pulled into a high-priority task on last Friday, so it
will be
> another day or two before I'm able to get back to testing around the
issues
> I'm having with running METplus with a poescript. I'll update you
and Jim
> on what I find as soon as I can.
>
> Thanks,
> Logan
>
> On Fri, Oct 4, 2019 at 4:29 PM Minna Win via RT <met_help at ucar.edu>
wrote:
>
> > Hi Logan,
> >
> > Jim seems to think that:
> > --------------------------snip--------------------------------
> > *Since Logan indicates jobs are running serially ... that has
nothing to
> do
> > with METplus calls.*
> >
> > *I think the best approach is to remove the METplus calls, and
first make
> > sure that he is able to*
> > *submit jobs from a POE script and run them in parallel ... ie.
replaced
> > with sleep command etc ...*
> >
> > *If that works ... than call a single sh script and make sure
METplus is
> > running properly ...*
> > *I'm starting to think that it isn't .... but that has nothing to
do with
> > the running in parallel. *
> >
> > *----------------------snip------------------------------*
> >
> >
> > Can you try the above and let me know what happens?
> >
> > Thanks,
> > Minna
> >
> >
> > *---------------Minna Win*
> > National Center for Atmospheric Research
> > Developmental Testbed Center
> > Phone: 303-497-8423
> > Fax: 303-497-8401
> >
> >
> >
> > On Thu, Oct 3, 2019 at 4:34 PM Logan Dawson - NOAA Affiliate via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> > >
> > > Hi Minna,
> > >
> > > To confirm that it was running serially, I directed the output
of each
> > > METplus call into a file to see if any output was being
produced. For
> > the
> > > job that's running serially
> > >
<https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/poe_serial/
> >,
> > I
> > > only got one such file (refc.out). For the parallel job
> > > <
> > >
> >
>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/CONVERT_EXE_fails/
> > > >
> > > that fails due to the CONVERT_EXE error, I got three output
files
> > > (refc.out, refd1.out, and retop.out). The same is true for the
number
> of
> > > metplus_final_gridstat_${field}.conf files that each job
produced.
> > >
> > > For full clarity, once I realized the job seemed to be running
> serially,
> > I
> > > killed it since it was clear it wasn't functioning properly.
That's why
> > > there isn't a stat_analysis conf file in the poe_serial
directory.
> > >
> > > Thanks,
> > > Logan
> > >
> > > On Thu, Oct 3, 2019 at 3:46 PM Minna Win via RT
<met_help at ucar.edu>
> > wrote:
> > >
> > > > Hi Logan,
> > > >
> > > > Jim, our NOAA/GSD member of the METplus wrapper team is our
resident
> > MPI
> > > > expert. He took a quick look at your scripts and doesn't
think that
> > > > circumventing the shell scripts with direct calls to METplus
will
> > change
> > > > any outcome (although if you want to try it, who knows?). One
thing
> > Jim
> > > is
> > > > curious about is what you are using to verify that things are
running
> > > > serially (or in parallel). Jim will be in meetings all
afternoon and
> > > won't
> > > > get a chance to look more closely until tomorrow.
> > > >
> > > > Thanks,
> > > > Minna
> > > > ---------------
> > > > Minna Win
> > > > National Center for Atmospheric Research
> > > > Developmental Testbed Center
> > > > Phone: 303-497-8423
> > > > Fax: 303-497-8401
> > > >
> > > >
> > > >
> > > > On Thu, Oct 3, 2019 at 12:28 PM Logan Dawson - NOAA Affiliate
via RT
> <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> > > > >
> > > > > Hi Minna,
> > > > >
> > > > > Thanks for the quick response.
> > > > >
> > > > > I put the example scripts and output in a web directory
that's
> > publicly
> > > > > accessible:
> > > > > https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
> > > > >
> > > > > The general premise of what I'm doing is submitting a batch
job
> with
> > > the
> > > > > verif_FV3SAR.sh script, which generates the poescript. The
> poescript
> > > runs
> > > > > three different shell scripts to run grid_stat for three
different
> > > > forecast
> > > > > fields.
> > > > >
> > > > > Would a potential solution be to have the poescript actually
make
> > the 3
> > > > > METplus calls rather than having those calls embedded within
the
> > shell
> > > > > scripts?
> > > > >
> > > > > If you need anything else or have trouble accessing the
scripts and
> > log
> > > > > files, please let me know.
> > > > >
> > > > > Thanks,
> > > > > Logan
> > > > >
> > > > > On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT
<met_help at ucar.edu
> >
> > > > wrote:
> > > > >
> > > > > > Hi Logan,
> > > > > >
> > > > > > It looks like you are having issues with running METplus
in batch
> > > mode.
> > > > > > I've asked a NOAA/GSD colleague for some assistance, and
he would
> > be
> > > > > > interested in seeing what command you are using in your
> > > MPI/poescript.
> > > > > If
> > > > > > you have any log output you can provide, that would also
be
> > helpful.
> > > > Are
> > > > > > you using a shell script to invoke METplus, then another
command
> > for
> > > > the
> > > > > > poescript?
> > > > > >
> > > > > > Thanks,
> > > > > > Minna
> > > > > > ---------------
> > > > > > Minna Win
> > > > > > National Center for Atmospheric Research
> > > > > > Developmental Testbed Center
> > > > > > Phone: 303-497-8423
> > > > > > Fax: 303-497-8401
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA
Affiliate via
> > RT
> > > <
> > > > > > met_help at ucar.edu> wrote:
> > > > > >
> > > > > > >
> > > > > > > Thu Oct 03 10:23:36 2019: Request 92432 was acted upon.
> > > > > > > Transaction: Ticket created by logan.dawson at noaa.gov
> > > > > > > Queue: met_help
> > > > > > > Subject: CONVERT_EXE problem in METplus batch job
> > > > > > > Owner: Nobody
> > > > > > > Requestors: logan.dawson at noaa.gov
> > > > > > > Status: new
> > > > > > > Ticket <URL:
> > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm running METplus on WCOSS Gyre (Phase II), and I'm
running
> > into
> > > an
> > > > > > > execution issue that's coming from the CONVERT_EXE
> configuration
> > > > > setting.
> > > > > > >
> > > > > > > In my system conf file, I originally had CONVERT_EXE
pointing
> to
> > > > > > > /usr/bin/convert. Everything works fine when running
METplus
> > > > > > interactively
> > > > > > > on the command line. However, when running METplus in a
batch
> > > > script, I
> > > > > > get
> > > > > > > the following error:
> > > > > > > ERROR: Executable CONVERT_EXE does not exist at
> /usr/bin/convert
> > > > > > >
> > > > > > > The error is fatal, and no output is produced. I've had
that
> > error
> > > > > happen
> > > > > > > before in a batch job unrelated to METplus, so I believe
that
> > error
> > > > is
> > > > > a
> > > > > > > Gyre issue.
> > > > > > >
> > > > > > > I was advised to try changing CONVERT_EXE to point to
> > ImageMagick's
> > > > > > convert
> > > > > > > at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making
that
> > change
> > > > > allows
> > > > > > > METplus to find the executable and successfully produce
output.
> > > > > However,
> > > > > > > with that configuration setting, my MPI/poescript tasks
run
> > > serially
> > > > > > > instead of in parallel. I've double checked only
changing the
> > > > > CONVERT_EXE
> > > > > > > setting, and I can confirm that the error and issue I'm
seeing
> > are
> > > at
> > > > > > least
> > > > > > > somewhat influenced by that setting.
> > > > > > >
> > > > > > > Any advice or help on how to work around this issue
would be
> > > greatly
> > > > > > > appreciated.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Logan
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *Logan C. Dawson, Ph.D.*
> > > > > > > Support Scientist, I.M. Systems Group, Inc.
> > > > > > > NOAA/NWS/NCEP/EMC
> > > > > > > 5830 University Research Court
> > > > > > > College Park, MD 20740
> > > > > > > (301) 683-3944
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > *Logan C. Dawson, Ph.D.*
> > > > > Support Scientist, I.M. Systems Group, Inc.
> > > > > NOAA/NWS/NCEP/EMC
> > > > > 5830 University Research Court
> > > > > College Park, MD 20740
> > > > > (301) 683-3944
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > --
> > > *Logan C. Dawson, Ph.D.*
> > > Support Scientist, I.M. Systems Group, Inc.
> > > NOAA/NWS/NCEP/EMC
> > > 5830 University Research Court
> > > College Park, MD 20740
> > > (301) 683-3944
> > >
> > >
> >
> >
>
> --
> *Logan C. Dawson, Ph.D.*
> Support Scientist, I.M. Systems Group, Inc.
> NOAA/NWS/NCEP/EMC
> 5830 University Research Court
> College Park, MD 20740
> (301) 683-3944
>
>
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Logan Dawson - NOAA Affiliate
Time: Wed Oct 16 15:17:22 2019
Hi Minna,
Sorry again for the delay in getting back to this. I finally had some
time
to test this afternoon, and it looks like I don't have my mpi task set
up
properly. That, or I have a misunderstanding about how mpirun and
poescripts work....
I've attached the example batch job script (verif_HRRR.sh) where I'm
trying
to use mpi to run a poescript. The verif_HRRR_refc.sh script is an
example
of where I had been trying to make the METplus calls. At Jim's
suggestion,
I swapped in some date and sleep commands to see when the refc, refd1,
and
retop scripts are actually running. For each script, I print the date
twice
with a 5-second wait time in between the date command calls.
In the HRRR.out file, you can see that the refd1 script doesn't begin
running until 5 seconds after the refc script initially prints the
date.
And the retop script begins 5 seconds after that.
Is that correct or do I need to configure my job script differently to
have
all 3 scripts begin running at the same time?
Thanks,
Logan
On Mon, Oct 7, 2019 at 6:19 PM Minna Win via RT <met_help at ucar.edu>
wrote:
> Thanks for the update, Logan.
>
>
> Regards,
> Minna
> ---------------
> Minna Win
> National Center for Atmospheric Research
> Developmental Testbed Center
> Phone: 303-497-8423
> Fax: 303-497-8401
>
>
>
> On Mon, Oct 7, 2019 at 12:06 PM Logan Dawson - NOAA Affiliate via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> >
> > Hi Minna,
> >
> > Sorry for the delay in getting back to you. I just wanted to let
you know
> > that I got pulled into a high-priority task on last Friday, so it
will be
> > another day or two before I'm able to get back to testing around
the
> issues
> > I'm having with running METplus with a poescript. I'll update you
and Jim
> > on what I find as soon as I can.
> >
> > Thanks,
> > Logan
> >
> > On Fri, Oct 4, 2019 at 4:29 PM Minna Win via RT
<met_help at ucar.edu>
> wrote:
> >
> > > Hi Logan,
> > >
> > > Jim seems to think that:
> > > --------------------------snip--------------------------------
> > > *Since Logan indicates jobs are running serially ... that has
nothing
> to
> > do
> > > with METplus calls.*
> > >
> > > *I think the best approach is to remove the METplus calls, and
first
> make
> > > sure that he is able to*
> > > *submit jobs from a POE script and run them in parallel ... ie.
> replaced
> > > with sleep command etc ...*
> > >
> > > *If that works ... than call a single sh script and make sure
METplus
> is
> > > running properly ...*
> > > *I'm starting to think that it isn't .... but that has nothing
to do
> with
> > > the running in parallel. *
> > >
> > > *----------------------snip------------------------------*
> > >
> > >
> > > Can you try the above and let me know what happens?
> > >
> > > Thanks,
> > > Minna
> > >
> > >
> > > *---------------Minna Win*
> > > National Center for Atmospheric Research
> > > Developmental Testbed Center
> > > Phone: 303-497-8423
> > > Fax: 303-497-8401
> > >
> > >
> > >
> > > On Thu, Oct 3, 2019 at 4:34 PM Logan Dawson - NOAA Affiliate via
RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
>
> > > >
> > > > Hi Minna,
> > > >
> > > > To confirm that it was running serially, I directed the output
of
> each
> > > > METplus call into a file to see if any output was being
produced.
> For
> > > the
> > > > job that's running serially
> > > > <
>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/poe_serial/
> > >,
> > > I
> > > > only got one such file (refc.out). For the parallel job
> > > > <
> > > >
> > >
> >
>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/CONVERT_EXE_fails/
> > > > >
> > > > that fails due to the CONVERT_EXE error, I got three output
files
> > > > (refc.out, refd1.out, and retop.out). The same is true for the
number
> > of
> > > > metplus_final_gridstat_${field}.conf files that each job
produced.
> > > >
> > > > For full clarity, once I realized the job seemed to be running
> > serially,
> > > I
> > > > killed it since it was clear it wasn't functioning properly.
That's
> why
> > > > there isn't a stat_analysis conf file in the poe_serial
directory.
> > > >
> > > > Thanks,
> > > > Logan
> > > >
> > > > On Thu, Oct 3, 2019 at 3:46 PM Minna Win via RT
<met_help at ucar.edu>
> > > wrote:
> > > >
> > > > > Hi Logan,
> > > > >
> > > > > Jim, our NOAA/GSD member of the METplus wrapper team is our
> resident
> > > MPI
> > > > > expert. He took a quick look at your scripts and doesn't
think
> that
> > > > > circumventing the shell scripts with direct calls to METplus
will
> > > change
> > > > > any outcome (although if you want to try it, who knows?).
One
> thing
> > > Jim
> > > > is
> > > > > curious about is what you are using to verify that things
are
> running
> > > > > serially (or in parallel). Jim will be in meetings all
afternoon
> and
> > > > won't
> > > > > get a chance to look more closely until tomorrow.
> > > > >
> > > > > Thanks,
> > > > > Minna
> > > > > ---------------
> > > > > Minna Win
> > > > > National Center for Atmospheric Research
> > > > > Developmental Testbed Center
> > > > > Phone: 303-497-8423
> > > > > Fax: 303-497-8401
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 3, 2019 at 12:28 PM Logan Dawson - NOAA
Affiliate via
> RT
> > <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > > >
> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> > > > > >
> > > > > > Hi Minna,
> > > > > >
> > > > > > Thanks for the quick response.
> > > > > >
> > > > > > I put the example scripts and output in a web directory
that's
> > > publicly
> > > > > > accessible:
> > > > > > https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
> > > > > >
> > > > > > The general premise of what I'm doing is submitting a
batch job
> > with
> > > > the
> > > > > > verif_FV3SAR.sh script, which generates the poescript. The
> > poescript
> > > > runs
> > > > > > three different shell scripts to run grid_stat for three
> different
> > > > > forecast
> > > > > > fields.
> > > > > >
> > > > > > Would a potential solution be to have the poescript
actually make
> > > the 3
> > > > > > METplus calls rather than having those calls embedded
within the
> > > shell
> > > > > > scripts?
> > > > > >
> > > > > > If you need anything else or have trouble accessing the
scripts
> and
> > > log
> > > > > > files, please let me know.
> > > > > >
> > > > > > Thanks,
> > > > > > Logan
> > > > > >
> > > > > > On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT <
> met_help at ucar.edu
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Logan,
> > > > > > >
> > > > > > > It looks like you are having issues with running METplus
in
> batch
> > > > mode.
> > > > > > > I've asked a NOAA/GSD colleague for some assistance, and
he
> would
> > > be
> > > > > > > interested in seeing what command you are using in your
> > > > MPI/poescript.
> > > > > > If
> > > > > > > you have any log output you can provide, that would also
be
> > > helpful.
> > > > > Are
> > > > > > > you using a shell script to invoke METplus, then another
> command
> > > for
> > > > > the
> > > > > > > poescript?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Minna
> > > > > > > ---------------
> > > > > > > Minna Win
> > > > > > > National Center for Atmospheric Research
> > > > > > > Developmental Testbed Center
> > > > > > > Phone: 303-497-8423
> > > > > > > Fax: 303-497-8401
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA
Affiliate
> via
> > > RT
> > > > <
> > > > > > > met_help at ucar.edu> wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Thu Oct 03 10:23:36 2019: Request 92432 was acted
upon.
> > > > > > > > Transaction: Ticket created by logan.dawson at noaa.gov
> > > > > > > > Queue: met_help
> > > > > > > > Subject: CONVERT_EXE problem in METplus batch job
> > > > > > > > Owner: Nobody
> > > > > > > > Requestors: logan.dawson at noaa.gov
> > > > > > > > Status: new
> > > > > > > > Ticket <URL:
> > > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm running METplus on WCOSS Gyre (Phase II), and I'm
running
> > > into
> > > > an
> > > > > > > > execution issue that's coming from the CONVERT_EXE
> > configuration
> > > > > > setting.
> > > > > > > >
> > > > > > > > In my system conf file, I originally had CONVERT_EXE
pointing
> > to
> > > > > > > > /usr/bin/convert. Everything works fine when running
METplus
> > > > > > > interactively
> > > > > > > > on the command line. However, when running METplus in
a batch
> > > > > script, I
> > > > > > > get
> > > > > > > > the following error:
> > > > > > > > ERROR: Executable CONVERT_EXE does not exist at
> > /usr/bin/convert
> > > > > > > >
> > > > > > > > The error is fatal, and no output is produced. I've
had that
> > > error
> > > > > > happen
> > > > > > > > before in a batch job unrelated to METplus, so I
believe that
> > > error
> > > > > is
> > > > > > a
> > > > > > > > Gyre issue.
> > > > > > > >
> > > > > > > > I was advised to try changing CONVERT_EXE to point to
> > > ImageMagick's
> > > > > > > convert
> > > > > > > > at /usrx/local/ImageMagick/6.8.3-3/bin/convert. Making
that
> > > change
> > > > > > allows
> > > > > > > > METplus to find the executable and successfully
produce
> output.
> > > > > > However,
> > > > > > > > with that configuration setting, my MPI/poescript
tasks run
> > > > serially
> > > > > > > > instead of in parallel. I've double checked only
changing the
> > > > > > CONVERT_EXE
> > > > > > > > setting, and I can confirm that the error and issue
I'm
> seeing
> > > are
> > > > at
> > > > > > > least
> > > > > > > > somewhat influenced by that setting.
> > > > > > > >
> > > > > > > > Any advice or help on how to work around this issue
would be
> > > > greatly
> > > > > > > > appreciated.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Logan
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > *Logan C. Dawson, Ph.D.*
> > > > > > > > Support Scientist, I.M. Systems Group, Inc.
> > > > > > > > NOAA/NWS/NCEP/EMC
> > > > > > > > 5830 University Research Court
> > > > > > > > College Park, MD 20740
> > > > > > > > (301) 683-3944
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > *Logan C. Dawson, Ph.D.*
> > > > > > Support Scientist, I.M. Systems Group, Inc.
> > > > > > NOAA/NWS/NCEP/EMC
> > > > > > 5830 University Research Court
> > > > > > College Park, MD 20740
> > > > > > (301) 683-3944
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > *Logan C. Dawson, Ph.D.*
> > > > Support Scientist, I.M. Systems Group, Inc.
> > > > NOAA/NWS/NCEP/EMC
> > > > 5830 University Research Court
> > > > College Park, MD 20740
> > > > (301) 683-3944
> > > >
> > > >
> > >
> > >
> >
> > --
> > *Logan C. Dawson, Ph.D.*
> > Support Scientist, I.M. Systems Group, Inc.
> > NOAA/NWS/NCEP/EMC
> > 5830 University Research Court
> > College Park, MD 20740
> > (301) 683-3944
> >
> >
>
>
--
*Logan C. Dawson, Ph.D.*
Support Scientist, I.M. Systems Group, Inc.
NOAA/NWS/NCEP/EMC
5830 University Research Court
College Park, MD 20740
(301) 683-3944
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Logan Dawson - NOAA Affiliate
Time: Thu Oct 17 14:57:27 2019
Hi Minna,
You can close this ticket if you’d like. I didn’t find a fix to the
issue
on WCOSS Phase II (Tide and Gyre), but Ying Lin let me know that her
mpirun
jobs were working correctly on the WCOSS Dell machines. I moved my
code to
Mars today and got the expected behavior out of the MPI task (both
with
simple commands and METplus calls).
There still seems to be some issue with getting Tide/Gyre to run a
mpirun
job properly when the ImageMagick convert executable is used, but I
don’t
think it’s worth further effort to figure out what’s going wrong since
Tide
and Gyre will be decommissioned at some point next year.
Thanks,
Logan
On Wed, Oct 16, 2019 at 5:17 PM Logan Dawson - NOAA Affiliate <
logan.dawson at noaa.gov> wrote:
> Hi Minna,
>
> Sorry again for the delay in getting back to this. I finally had
some time
> to test this afternoon, and it looks like I don't have my mpi task
set up
> properly. That, or I have a misunderstanding about how mpirun and
> poescripts work....
>
> I've attached the example batch job script (verif_HRRR.sh) where I'm
> trying to use mpi to run a poescript. The verif_HRRR_refc.sh script
is an
> example of where I had been trying to make the METplus calls. At
Jim's
> suggestion, I swapped in some date and sleep commands to see when
the refc,
> refd1, and retop scripts are actually running. For each script, I
print the
> date twice with a 5-second wait time in between the date command
calls.
>
> In the HRRR.out file, you can see that the refd1 script doesn't
begin
> running until 5 seconds after the refc script initially prints the
date.
> And the retop script begins 5 seconds after that.
>
> Is that correct or do I need to configure my job script differently
to
> have all 3 scripts begin running at the same time?
>
> Thanks,
> Logan
>
> On Mon, Oct 7, 2019 at 6:19 PM Minna Win via RT <met_help at ucar.edu>
wrote:
>
>> Thanks for the update, Logan.
>>
>>
>> Regards,
>> Minna
>> ---------------
>> Minna Win
>> National Center for Atmospheric Research
>> Developmental Testbed Center
>> Phone: 303-497-8423
>> Fax: 303-497-8401
>>
>>
>>
>> On Mon, Oct 7, 2019 at 12:06 PM Logan Dawson - NOAA Affiliate via
RT <
>> met_help at ucar.edu> wrote:
>>
>> >
>> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
>> >
>> > Hi Minna,
>> >
>> > Sorry for the delay in getting back to you. I just wanted to let
you
>> know
>> > that I got pulled into a high-priority task on last Friday, so it
will
>> be
>> > another day or two before I'm able to get back to testing around
the
>> issues
>> > I'm having with running METplus with a poescript. I'll update you
and
>> Jim
>> > on what I find as soon as I can.
>> >
>> > Thanks,
>> > Logan
>> >
>> > On Fri, Oct 4, 2019 at 4:29 PM Minna Win via RT
<met_help at ucar.edu>
>> wrote:
>> >
>> > > Hi Logan,
>> > >
>> > > Jim seems to think that:
>> > > --------------------------snip--------------------------------
>> > > *Since Logan indicates jobs are running serially ... that has
nothing
>> to
>> > do
>> > > with METplus calls.*
>> > >
>> > > *I think the best approach is to remove the METplus calls, and
first
>> make
>> > > sure that he is able to*
>> > > *submit jobs from a POE script and run them in parallel ...
ie.
>> replaced
>> > > with sleep command etc ...*
>> > >
>> > > *If that works ... than call a single sh script and make sure
METplus
>> is
>> > > running properly ...*
>> > > *I'm starting to think that it isn't .... but that has nothing
to do
>> with
>> > > the running in parallel. *
>> > >
>> > > *----------------------snip------------------------------*
>> > >
>> > >
>> > > Can you try the above and let me know what happens?
>> > >
>> > > Thanks,
>> > > Minna
>> > >
>> > >
>> > > *---------------Minna Win*
>> > > National Center for Atmospheric Research
>> > > Developmental Testbed Center
>> > > Phone: 303-497-8423
>> > > Fax: 303-497-8401
>> > >
>> > >
>> > >
>> > > On Thu, Oct 3, 2019 at 4:34 PM Logan Dawson - NOAA Affiliate
via RT <
>> > > met_help at ucar.edu> wrote:
>> > >
>> > > >
>> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
>
>> > > >
>> > > > Hi Minna,
>> > > >
>> > > > To confirm that it was running serially, I directed the
output of
>> each
>> > > > METplus call into a file to see if any output was being
produced.
>> For
>> > > the
>> > > > job that's running serially
>> > > > <
>>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/poe_serial/
>> > >,
>> > > I
>> > > > only got one such file (refc.out). For the parallel job
>> > > > <
>> > > >
>> > >
>> >
>>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/CONVERT_EXE_fails/
>> > > > >
>> > > > that fails due to the CONVERT_EXE error, I got three output
files
>> > > > (refc.out, refd1.out, and retop.out). The same is true for
the
>> number
>> > of
>> > > > metplus_final_gridstat_${field}.conf files that each job
produced.
>> > > >
>> > > > For full clarity, once I realized the job seemed to be
running
>> > serially,
>> > > I
>> > > > killed it since it was clear it wasn't functioning properly.
That's
>> why
>> > > > there isn't a stat_analysis conf file in the poe_serial
directory.
>> > > >
>> > > > Thanks,
>> > > > Logan
>> > > >
>> > > > On Thu, Oct 3, 2019 at 3:46 PM Minna Win via RT
<met_help at ucar.edu>
>> > > wrote:
>> > > >
>> > > > > Hi Logan,
>> > > > >
>> > > > > Jim, our NOAA/GSD member of the METplus wrapper team is our
>> resident
>> > > MPI
>> > > > > expert. He took a quick look at your scripts and doesn't
think
>> that
>> > > > > circumventing the shell scripts with direct calls to
METplus will
>> > > change
>> > > > > any outcome (although if you want to try it, who knows?).
One
>> thing
>> > > Jim
>> > > > is
>> > > > > curious about is what you are using to verify that things
are
>> running
>> > > > > serially (or in parallel). Jim will be in meetings all
afternoon
>> and
>> > > > won't
>> > > > > get a chance to look more closely until tomorrow.
>> > > > >
>> > > > > Thanks,
>> > > > > Minna
>> > > > > ---------------
>> > > > > Minna Win
>> > > > > National Center for Atmospheric Research
>> > > > > Developmental Testbed Center
>> > > > > Phone: 303-497-8423
>> > > > > Fax: 303-497-8401
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, Oct 3, 2019 at 12:28 PM Logan Dawson - NOAA
Affiliate via
>> RT
>> > <
>> > > > > met_help at ucar.edu> wrote:
>> > > > >
>> > > > > >
>> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
>> > > > > >
>> > > > > > Hi Minna,
>> > > > > >
>> > > > > > Thanks for the quick response.
>> > > > > >
>> > > > > > I put the example scripts and output in a web directory
that's
>> > > publicly
>> > > > > > accessible:
>> > > > > >
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
>> > > > > >
>> > > > > > The general premise of what I'm doing is submitting a
batch job
>> > with
>> > > > the
>> > > > > > verif_FV3SAR.sh script, which generates the poescript.
The
>> > poescript
>> > > > runs
>> > > > > > three different shell scripts to run grid_stat for three
>> different
>> > > > > forecast
>> > > > > > fields.
>> > > > > >
>> > > > > > Would a potential solution be to have the poescript
actually
>> make
>> > > the 3
>> > > > > > METplus calls rather than having those calls embedded
within the
>> > > shell
>> > > > > > scripts?
>> > > > > >
>> > > > > > If you need anything else or have trouble accessing the
scripts
>> and
>> > > log
>> > > > > > files, please let me know.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Logan
>> > > > > >
>> > > > > > On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT <
>> met_help at ucar.edu
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi Logan,
>> > > > > > >
>> > > > > > > It looks like you are having issues with running
METplus in
>> batch
>> > > > mode.
>> > > > > > > I've asked a NOAA/GSD colleague for some assistance,
and he
>> would
>> > > be
>> > > > > > > interested in seeing what command you are using in your
>> > > > MPI/poescript.
>> > > > > > If
>> > > > > > > you have any log output you can provide, that would
also be
>> > > helpful.
>> > > > > Are
>> > > > > > > you using a shell script to invoke METplus, then
another
>> command
>> > > for
>> > > > > the
>> > > > > > > poescript?
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Minna
>> > > > > > > ---------------
>> > > > > > > Minna Win
>> > > > > > > National Center for Atmospheric Research
>> > > > > > > Developmental Testbed Center
>> > > > > > > Phone: 303-497-8423
>> > > > > > > Fax: 303-497-8401
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA
Affiliate
>> via
>> > > RT
>> > > > <
>> > > > > > > met_help at ucar.edu> wrote:
>> > > > > > >
>> > > > > > > >
>> > > > > > > > Thu Oct 03 10:23:36 2019: Request 92432 was acted
upon.
>> > > > > > > > Transaction: Ticket created by logan.dawson at noaa.gov
>> > > > > > > > Queue: met_help
>> > > > > > > > Subject: CONVERT_EXE problem in METplus batch
job
>> > > > > > > > Owner: Nobody
>> > > > > > > > Requestors: logan.dawson at noaa.gov
>> > > > > > > > Status: new
>> > > > > > > > Ticket <URL:
>> > > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
>> > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I'm running METplus on WCOSS Gyre (Phase II), and I'm
>> running
>> > > into
>> > > > an
>> > > > > > > > execution issue that's coming from the CONVERT_EXE
>> > configuration
>> > > > > > setting.
>> > > > > > > >
>> > > > > > > > In my system conf file, I originally had CONVERT_EXE
>> pointing
>> > to
>> > > > > > > > /usr/bin/convert. Everything works fine when running
METplus
>> > > > > > > interactively
>> > > > > > > > on the command line. However, when running METplus in
a
>> batch
>> > > > > script, I
>> > > > > > > get
>> > > > > > > > the following error:
>> > > > > > > > ERROR: Executable CONVERT_EXE does not exist at
>> > /usr/bin/convert
>> > > > > > > >
>> > > > > > > > The error is fatal, and no output is produced. I've
had that
>> > > error
>> > > > > > happen
>> > > > > > > > before in a batch job unrelated to METplus, so I
believe
>> that
>> > > error
>> > > > > is
>> > > > > > a
>> > > > > > > > Gyre issue.
>> > > > > > > >
>> > > > > > > > I was advised to try changing CONVERT_EXE to point to
>> > > ImageMagick's
>> > > > > > > convert
>> > > > > > > > at /usrx/local/ImageMagick/6.8.3-3/bin/convert.
Making that
>> > > change
>> > > > > > allows
>> > > > > > > > METplus to find the executable and successfully
produce
>> output.
>> > > > > > However,
>> > > > > > > > with that configuration setting, my MPI/poescript
tasks run
>> > > > serially
>> > > > > > > > instead of in parallel. I've double checked only
changing
>> the
>> > > > > > CONVERT_EXE
>> > > > > > > > setting, and I can confirm that the error and issue
I'm
>> seeing
>> > > are
>> > > > at
>> > > > > > > least
>> > > > > > > > somewhat influenced by that setting.
>> > > > > > > >
>> > > > > > > > Any advice or help on how to work around this issue
would be
>> > > > greatly
>> > > > > > > > appreciated.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Logan
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > *Logan C. Dawson, Ph.D.*
>> > > > > > > > Support Scientist, I.M. Systems Group, Inc.
>> > > > > > > > NOAA/NWS/NCEP/EMC
>> > > > > > > > 5830 University Research Court
>> > > > > > > > College Park, MD 20740
>> > > > > > > > (301) 683-3944
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > *Logan C. Dawson, Ph.D.*
>> > > > > > Support Scientist, I.M. Systems Group, Inc.
>> > > > > > NOAA/NWS/NCEP/EMC
>> > > > > > 5830 University Research Court
>> > > > > > College Park, MD 20740
>> > > > > > (301) 683-3944
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > > > --
>> > > > *Logan C. Dawson, Ph.D.*
>> > > > Support Scientist, I.M. Systems Group, Inc.
>> > > > NOAA/NWS/NCEP/EMC
>> > > > 5830 University Research Court
>> > > > College Park, MD 20740
>> > > > (301) 683-3944
>> > > >
>> > > >
>> > >
>> > >
>> >
>> > --
>> > *Logan C. Dawson, Ph.D.*
>> > Support Scientist, I.M. Systems Group, Inc.
>> > NOAA/NWS/NCEP/EMC
>> > 5830 University Research Court
>> > College Park, MD 20740
>> > (301) 683-3944
>> >
>> >
>>
>>
>
> --
> *Logan C. Dawson, Ph.D.*
> Support Scientist, I.M. Systems Group, Inc.
> NOAA/NWS/NCEP/EMC
> 5830 University Research Court
> College Park, MD 20740
> (301) 683-3944
>
--
*Logan C. Dawson, Ph.D.*
Support Scientist, I.M. Systems Group, Inc.
NOAA/NWS/NCEP/EMC
5830 University Research Court
College Park, MD 20740
(301) 683-3944
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Minna Win
Time: Fri Oct 18 09:26:49 2019
Thanks Logan, I'll go ahead and close the ticket.
Thanks,
Minna
---------------
Minna Win
National Center for Atmospheric Research
Developmental Testbed Center
Phone: 303-497-8423
Fax: 303-497-8401
On Thu, Oct 17, 2019 at 2:57 PM Logan Dawson - NOAA Affiliate via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
>
> Hi Minna,
>
> You can close this ticket if you’d like. I didn’t find a fix to the
issue
> on WCOSS Phase II (Tide and Gyre), but Ying Lin let me know that her
mpirun
> jobs were working correctly on the WCOSS Dell machines. I moved my
code to
> Mars today and got the expected behavior out of the MPI task (both
with
> simple commands and METplus calls).
>
> There still seems to be some issue with getting Tide/Gyre to run a
mpirun
> job properly when the ImageMagick convert executable is used, but I
don’t
> think it’s worth further effort to figure out what’s going wrong
since Tide
> and Gyre will be decommissioned at some point next year.
>
> Thanks,
> Logan
>
> On Wed, Oct 16, 2019 at 5:17 PM Logan Dawson - NOAA Affiliate <
> logan.dawson at noaa.gov> wrote:
>
> > Hi Minna,
> >
> > Sorry again for the delay in getting back to this. I finally had
some
> time
> > to test this afternoon, and it looks like I don't have my mpi task
set up
> > properly. That, or I have a misunderstanding about how mpirun and
> > poescripts work....
> >
> > I've attached the example batch job script (verif_HRRR.sh) where
I'm
> > trying to use mpi to run a poescript. The verif_HRRR_refc.sh
script is an
> > example of where I had been trying to make the METplus calls. At
Jim's
> > suggestion, I swapped in some date and sleep commands to see when
the
> refc,
> > refd1, and retop scripts are actually running. For each script, I
print
> the
> > date twice with a 5-second wait time in between the date command
calls.
> >
> > In the HRRR.out file, you can see that the refd1 script doesn't
begin
> > running until 5 seconds after the refc script initially prints the
date.
> > And the retop script begins 5 seconds after that.
> >
> > Is that correct or do I need to configure my job script
differently to
> > have all 3 scripts begin running at the same time?
> >
> > Thanks,
> > Logan
> >
> > On Mon, Oct 7, 2019 at 6:19 PM Minna Win via RT
<met_help at ucar.edu>
> wrote:
> >
> >> Thanks for the update, Logan.
> >>
> >>
> >> Regards,
> >> Minna
> >> ---------------
> >> Minna Win
> >> National Center for Atmospheric Research
> >> Developmental Testbed Center
> >> Phone: 303-497-8423
> >> Fax: 303-497-8401
> >>
> >>
> >>
> >> On Mon, Oct 7, 2019 at 12:06 PM Logan Dawson - NOAA Affiliate via
RT <
> >> met_help at ucar.edu> wrote:
> >>
> >> >
> >> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> >> >
> >> > Hi Minna,
> >> >
> >> > Sorry for the delay in getting back to you. I just wanted to
let you
> >> know
> >> > that I got pulled into a high-priority task on last Friday, so
it will
> >> be
> >> > another day or two before I'm able to get back to testing
around the
> >> issues
> >> > I'm having with running METplus with a poescript. I'll update
you and
> >> Jim
> >> > on what I find as soon as I can.
> >> >
> >> > Thanks,
> >> > Logan
> >> >
> >> > On Fri, Oct 4, 2019 at 4:29 PM Minna Win via RT
<met_help at ucar.edu>
> >> wrote:
> >> >
> >> > > Hi Logan,
> >> > >
> >> > > Jim seems to think that:
> >> > >
--------------------------snip--------------------------------
> >> > > *Since Logan indicates jobs are running serially ... that has
> nothing
> >> to
> >> > do
> >> > > with METplus calls.*
> >> > >
> >> > > *I think the best approach is to remove the METplus calls,
and first
> >> make
> >> > > sure that he is able to*
> >> > > *submit jobs from a POE script and run them in parallel ...
ie.
> >> replaced
> >> > > with sleep command etc ...*
> >> > >
> >> > > *If that works ... than call a single sh script and make sure
> METplus
> >> is
> >> > > running properly ...*
> >> > > *I'm starting to think that it isn't .... but that has
nothing to do
> >> with
> >> > > the running in parallel. *
> >> > >
> >> > > *----------------------snip------------------------------*
> >> > >
> >> > >
> >> > > Can you try the above and let me know what happens?
> >> > >
> >> > > Thanks,
> >> > > Minna
> >> > >
> >> > >
> >> > > *---------------Minna Win*
> >> > > National Center for Atmospheric Research
> >> > > Developmental Testbed Center
> >> > > Phone: 303-497-8423
> >> > > Fax: 303-497-8401
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Oct 3, 2019 at 4:34 PM Logan Dawson - NOAA Affiliate
via RT
> <
> >> > > met_help at ucar.edu> wrote:
> >> > >
> >> > > >
> >> > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432 >
> >> > > >
> >> > > > Hi Minna,
> >> > > >
> >> > > > To confirm that it was running serially, I directed the
output of
> >> each
> >> > > > METplus call into a file to see if any output was being
produced.
> >> For
> >> > > the
> >> > > > job that's running serially
> >> > > > <
> >>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/poe_serial/
> >> > >,
> >> > > I
> >> > > > only got one such file (refc.out). For the parallel job
> >> > > > <
> >> > > >
> >> > >
> >> >
> >>
>
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/CONVERT_EXE_fails/
> >> > > > >
> >> > > > that fails due to the CONVERT_EXE error, I got three output
files
> >> > > > (refc.out, refd1.out, and retop.out). The same is true for
the
> >> number
> >> > of
> >> > > > metplus_final_gridstat_${field}.conf files that each job
produced.
> >> > > >
> >> > > > For full clarity, once I realized the job seemed to be
running
> >> > serially,
> >> > > I
> >> > > > killed it since it was clear it wasn't functioning
properly.
> That's
> >> why
> >> > > > there isn't a stat_analysis conf file in the poe_serial
directory.
> >> > > >
> >> > > > Thanks,
> >> > > > Logan
> >> > > >
> >> > > > On Thu, Oct 3, 2019 at 3:46 PM Minna Win via RT <
> met_help at ucar.edu>
> >> > > wrote:
> >> > > >
> >> > > > > Hi Logan,
> >> > > > >
> >> > > > > Jim, our NOAA/GSD member of the METplus wrapper team is
our
> >> resident
> >> > > MPI
> >> > > > > expert. He took a quick look at your scripts and doesn't
think
> >> that
> >> > > > > circumventing the shell scripts with direct calls to
METplus
> will
> >> > > change
> >> > > > > any outcome (although if you want to try it, who knows?).
One
> >> thing
> >> > > Jim
> >> > > > is
> >> > > > > curious about is what you are using to verify that things
are
> >> running
> >> > > > > serially (or in parallel). Jim will be in meetings all
> afternoon
> >> and
> >> > > > won't
> >> > > > > get a chance to look more closely until tomorrow.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Minna
> >> > > > > ---------------
> >> > > > > Minna Win
> >> > > > > National Center for Atmospheric Research
> >> > > > > Developmental Testbed Center
> >> > > > > Phone: 303-497-8423
> >> > > > > Fax: 303-497-8401
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thu, Oct 3, 2019 at 12:28 PM Logan Dawson - NOAA
Affiliate
> via
> >> RT
> >> > <
> >> > > > > met_help at ucar.edu> wrote:
> >> > > > >
> >> > > > > >
> >> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
> >
> >> > > > > >
> >> > > > > > Hi Minna,
> >> > > > > >
> >> > > > > > Thanks for the quick response.
> >> > > > > >
> >> > > > > > I put the example scripts and output in a web directory
that's
> >> > > publicly
> >> > > > > > accessible:
> >> > > > > >
https://www.emc.ncep.noaa.gov/users/Logan.Dawson/met_help/
> >> > > > > >
> >> > > > > > The general premise of what I'm doing is submitting a
batch
> job
> >> > with
> >> > > > the
> >> > > > > > verif_FV3SAR.sh script, which generates the poescript.
The
> >> > poescript
> >> > > > runs
> >> > > > > > three different shell scripts to run grid_stat for
three
> >> different
> >> > > > > forecast
> >> > > > > > fields.
> >> > > > > >
> >> > > > > > Would a potential solution be to have the poescript
actually
> >> make
> >> > > the 3
> >> > > > > > METplus calls rather than having those calls embedded
within
> the
> >> > > shell
> >> > > > > > scripts?
> >> > > > > >
> >> > > > > > If you need anything else or have trouble accessing the
> scripts
> >> and
> >> > > log
> >> > > > > > files, please let me know.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Logan
> >> > > > > >
> >> > > > > > On Thu, Oct 3, 2019 at 1:52 PM Minna Win via RT <
> >> met_help at ucar.edu
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Logan,
> >> > > > > > >
> >> > > > > > > It looks like you are having issues with running
METplus in
> >> batch
> >> > > > mode.
> >> > > > > > > I've asked a NOAA/GSD colleague for some assistance,
and he
> >> would
> >> > > be
> >> > > > > > > interested in seeing what command you are using in
your
> >> > > > MPI/poescript.
> >> > > > > > If
> >> > > > > > > you have any log output you can provide, that would
also be
> >> > > helpful.
> >> > > > > Are
> >> > > > > > > you using a shell script to invoke METplus, then
another
> >> command
> >> > > for
> >> > > > > the
> >> > > > > > > poescript?
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Minna
> >> > > > > > > ---------------
> >> > > > > > > Minna Win
> >> > > > > > > National Center for Atmospheric Research
> >> > > > > > > Developmental Testbed Center
> >> > > > > > > Phone: 303-497-8423
> >> > > > > > > Fax: 303-497-8401
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Thu, Oct 3, 2019 at 10:24 AM Logan Dawson - NOAA
> Affiliate
> >> via
> >> > > RT
> >> > > > <
> >> > > > > > > met_help at ucar.edu> wrote:
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > > Thu Oct 03 10:23:36 2019: Request 92432 was acted
upon.
> >> > > > > > > > Transaction: Ticket created by
logan.dawson at noaa.gov
> >> > > > > > > > Queue: met_help
> >> > > > > > > > Subject: CONVERT_EXE problem in METplus batch
job
> >> > > > > > > > Owner: Nobody
> >> > > > > > > > Requestors: logan.dawson at noaa.gov
> >> > > > > > > > Status: new
> >> > > > > > > > Ticket <URL:
> >> > > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=92432
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Hi,
> >> > > > > > > >
> >> > > > > > > > I'm running METplus on WCOSS Gyre (Phase II), and
I'm
> >> running
> >> > > into
> >> > > > an
> >> > > > > > > > execution issue that's coming from the CONVERT_EXE
> >> > configuration
> >> > > > > > setting.
> >> > > > > > > >
> >> > > > > > > > In my system conf file, I originally had
CONVERT_EXE
> >> pointing
> >> > to
> >> > > > > > > > /usr/bin/convert. Everything works fine when
running
> METplus
> >> > > > > > > interactively
> >> > > > > > > > on the command line. However, when running METplus
in a
> >> batch
> >> > > > > script, I
> >> > > > > > > get
> >> > > > > > > > the following error:
> >> > > > > > > > ERROR: Executable CONVERT_EXE does not exist at
> >> > /usr/bin/convert
> >> > > > > > > >
> >> > > > > > > > The error is fatal, and no output is produced. I've
had
> that
> >> > > error
> >> > > > > > happen
> >> > > > > > > > before in a batch job unrelated to METplus, so I
believe
> >> that
> >> > > error
> >> > > > > is
> >> > > > > > a
> >> > > > > > > > Gyre issue.
> >> > > > > > > >
> >> > > > > > > > I was advised to try changing CONVERT_EXE to point
to
> >> > > ImageMagick's
> >> > > > > > > convert
> >> > > > > > > > at /usrx/local/ImageMagick/6.8.3-3/bin/convert.
Making
> that
> >> > > change
> >> > > > > > allows
> >> > > > > > > > METplus to find the executable and successfully
produce
> >> output.
> >> > > > > > However,
> >> > > > > > > > with that configuration setting, my MPI/poescript
tasks
> run
> >> > > > serially
> >> > > > > > > > instead of in parallel. I've double checked only
changing
> >> the
> >> > > > > > CONVERT_EXE
> >> > > > > > > > setting, and I can confirm that the error and issue
I'm
> >> seeing
> >> > > are
> >> > > > at
> >> > > > > > > least
> >> > > > > > > > somewhat influenced by that setting.
> >> > > > > > > >
> >> > > > > > > > Any advice or help on how to work around this issue
would
> be
> >> > > > greatly
> >> > > > > > > > appreciated.
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Logan
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > *Logan C. Dawson, Ph.D.*
> >> > > > > > > > Support Scientist, I.M. Systems Group, Inc.
> >> > > > > > > > NOAA/NWS/NCEP/EMC
> >> > > > > > > > 5830 University Research Court
> >> > > > > > > > College Park, MD 20740
> >> > > > > > > > (301) 683-3944
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > *Logan C. Dawson, Ph.D.*
> >> > > > > > Support Scientist, I.M. Systems Group, Inc.
> >> > > > > > NOAA/NWS/NCEP/EMC
> >> > > > > > 5830 University Research Court
> >> > > > > > College Park, MD 20740
> >> > > > > > (301) 683-3944
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > > > --
> >> > > > *Logan C. Dawson, Ph.D.*
> >> > > > Support Scientist, I.M. Systems Group, Inc.
> >> > > > NOAA/NWS/NCEP/EMC
> >> > > > 5830 University Research Court
> >> > > > College Park, MD 20740
> >> > > > (301) 683-3944
> >> > > >
> >> > > >
> >> > >
> >> > >
> >> >
> >> > --
> >> > *Logan C. Dawson, Ph.D.*
> >> > Support Scientist, I.M. Systems Group, Inc.
> >> > NOAA/NWS/NCEP/EMC
> >> > 5830 University Research Court
> >> > College Park, MD 20740
> >> > (301) 683-3944
> >> >
> >> >
> >>
> >>
> >
> > --
> > *Logan C. Dawson, Ph.D.*
> > Support Scientist, I.M. Systems Group, Inc.
> > NOAA/NWS/NCEP/EMC
> > 5830 University Research Court
> > College Park, MD 20740
> > (301) 683-3944
> >
> --
> *Logan C. Dawson, Ph.D.*
> Support Scientist, I.M. Systems Group, Inc.
> NOAA/NWS/NCEP/EMC
> 5830 University Research Court
> College Park, MD 20740
> (301) 683-3944
>
>
------------------------------------------------
Subject: CONVERT_EXE problem in METplus batch job
From: Minna Win
Time: Fri Oct 18 09:28:09 2019
Closing ticket per user's request. This is an issue with ImageMagik
only on Tide/Gyre, which are soon to be de-comissioned. The code
works on other HPC hosts.
------------------------------------------------
More information about the Met_help
mailing list