[Met_help] [rt.rap.ucar.edu #94009] History for netCDF issues when submitting MET jobs to bsub
Julie Prestopnik via RT
met_help at ucar.edu
Tue Mar 31 09:40:35 MDT 2020
----------------------------------------------------------------
Initial Request
----------------------------------------------------------------
Greetings
Sorry to email both the MET and WCOSS help desks, but I wasn't sure where
to send this ticket. We have been encountering errors creating and reading
netCDF files on WCOSS (mars) recently. These errors occur when we call the
MET programs for jobs submitted to the lsf queue.
I was running a series of MET's grid_stat jobs yesterday. All was going
well until some time after 19Z, all jobs began to fail. Here is the error
that I got:
terminate called after throwing an instance of
'netCDF::exceptions::NcHdfErr'
what(): NetCDF: HDF error
file: ncCheck.cpp line:92
A sample of the netCDF files that I was using can be found in
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
I do not have any issues reading data from these files using ncdump. They
don't appear to be corrupted. They are copies of files that I created two
months ago and have been testing with for some time now.
Here are the bsub settings I'm using to submit this job:
export NTASK=104
export PTILE=28
export OMP_NUM_THREAD=20
bsub -J ${flg}_${src}_${valid_date}_${elem} \
-W 3:00 \
-oo $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
-eo $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
-P MDLST-T2O \
-M 3000 \
-q "dev" \
-cwd $PWD \
-R "affinity[core(1)]" \
-n $NTASK \
-R "span[ptile=$PTILE]" \
-w "$regrid_dep" \
$procdir/met_creeper_linden.sh -s $src -t $valid_date -g $flg -f
$force -e $elem
A sample mpmd file that is used for CFP can be found here:
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
Myself and Erin Thead have also encountered issues creating netCDF files
using MET's regrid_data_plane program on WCOSS. This issue has only been
occurring for the past week or two and again only occurs when a job is
submitted with bsub. Its not clear to us if something on WCOSS has changed
(netCDF/hdf library), if MET is having issues reading/writing netCDF files
when jobs are run in parallel, or something else.
If you need any more information from us, please let me know. Again, sorry
for emailing both help desks at once.
Thanks
John
--
John Wagner
Verification Task Lead
COR Task Manager
NOAA/National Weather Service
Meteorological Development Laboratory
Digital Forecast Services Branch
SSMC2 Room 10106
Silver Spring, MD 20910
(301) 427-9471 (office)
(908) 902-4155 (cell/text)
----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: John Halley Gotway
Time: Fri Jan 31 13:06:13 2020
Hi John,
This is the sort of behavior that could be caused by issues in your
runtime
environment. Perhaps at runtime, the linker is finding and using an
incompatible version of the HDF5 library... which results in this
error.
Your timing is perfect. We literally just met with sys admins here at
NCAR
about how we've been compiling MET. They strongly encourage us to
define
the "rpath" (i.e. run path) in the linker flags (LDFLAGS) when we
configure/compile MET. The effect of that is that the executables
know
exactly what directories to search for the dependent libraries at
runtime
rather than relying on the user's environment (i.e. LD_LIBRARY_PATH)
to
find them.
One option we could try is asking Julie Prestopnik to recompile this
version of MET using these LDFLAGS settings. You could repoint your
script to this other version and test to see if the behavior goes away
or
persists.
If that solves it, we'll ask Julie to update her process for
installing
future releases of MET. If not, it's back to the drawing board.
Can you tell me exactly what version of MET you're running? Which
WCOSS
machine and what "module" commands you use to load MET?
Thanks,
John
On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA Federal via RT <
met_help at ucar.edu> wrote:
>
> Fri Jan 31 06:56:10 2020: Request 94009 was acted upon.
> Transaction: Ticket created by john.l.wagner at noaa.gov
> Queue: met_help
> Subject: netCDF issues when submitting MET jobs to bsub
> Owner: Nobody
> Requestors: john.l.wagner at noaa.gov
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
>
>
> Greetings
> Sorry to email both the MET and WCOSS help desks, but I wasn't sure
where
> to send this ticket. We have been encountering errors creating and
reading
> netCDF files on WCOSS (mars) recently. These errors occur when we
call the
> MET programs for jobs submitted to the lsf queue.
> I was running a series of MET's grid_stat jobs yesterday. All was
going
> well until some time after 19Z, all jobs began to fail. Here is the
error
> that I got:
>
> terminate called after throwing an instance of
> 'netCDF::exceptions::NcHdfErr'
> what(): NetCDF: HDF error
> file: ncCheck.cpp line:92
>
> A sample of the netCDF files that I was using can be found in
>
>
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
>
> I do not have any issues reading data from these files using ncdump.
They
> don't appear to be corrupted. They are copies of files that I
created two
> months ago and have been testing with for some time now.
> Here are the bsub settings I'm using to submit this job:
>
> export NTASK=104
> export PTILE=28
> export OMP_NUM_THREAD=20
> bsub -J ${flg}_${src}_${valid_date}_${elem} \
> -W 3:00 \
> -oo $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> -eo $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> -P MDLST-T2O \
> -M 3000 \
> -q "dev" \
> -cwd $PWD \
> -R "affinity[core(1)]" \
> -n $NTASK \
> -R "span[ptile=$PTILE]" \
> -w "$regrid_dep" \
> $procdir/met_creeper_linden.sh -s $src -t $valid_date -g
$flg -f
> $force -e $elem
>
> A sample mpmd file that is used for CFP can be found here:
>
>
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
>
> Myself and Erin Thead have also encountered issues creating netCDF
files
> using MET's regrid_data_plane program on WCOSS. This issue has only
been
> occurring for the past week or two and again only occurs when a job
is
> submitted with bsub. Its not clear to us if something on WCOSS has
changed
> (netCDF/hdf library), if MET is having issues reading/writing netCDF
files
> when jobs are run in parallel, or something else.
> If you need any more information from us, please let me know.
Again, sorry
> for emailing both help desks at once.
> Thanks
> John
> --
> John Wagner
> Verification Task Lead
> COR Task Manager
> NOAA/National Weather Service
> Meteorological Development Laboratory
> Digital Forecast Services Branch
> SSMC2 Room 10106
> Silver Spring, MD 20910
> (301) 427-9471 (office)
> (908) 902-4155 (cell/text)
>
>
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: John L Wagner - NOAA Federal
Time: Fri Jan 31 13:32:40 2020
Thanks John. I'm running MET v8.1 on mars (dell/phase 3).
module use /usrx/local/dev/modulefiles
module load met/8.1
On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via RT
<met_help at ucar.edu>
wrote:
> Hi John,
>
> This is the sort of behavior that could be caused by issues in your
runtime
> environment. Perhaps at runtime, the linker is finding and using an
> incompatible version of the HDF5 library... which results in this
error.
>
> Your timing is perfect. We literally just met with sys admins here
at NCAR
> about how we've been compiling MET. They strongly encourage us to
define
> the "rpath" (i.e. run path) in the linker flags (LDFLAGS) when we
> configure/compile MET. The effect of that is that the executables
know
> exactly what directories to search for the dependent libraries at
runtime
> rather than relying on the user's environment (i.e. LD_LIBRARY_PATH)
to
> find them.
>
> One option we could try is asking Julie Prestopnik to recompile this
> version of MET using these LDFLAGS settings. You could repoint your
> script to this other version and test to see if the behavior goes
away or
> persists.
>
> If that solves it, we'll ask Julie to update her process for
installing
> future releases of MET. If not, it's back to the drawing board.
>
> Can you tell me exactly what version of MET you're running? Which
WCOSS
> machine and what "module" commands you use to load MET?
>
> Thanks,
> John
>
> On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA Federal via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > Fri Jan 31 06:56:10 2020: Request 94009 was acted upon.
> > Transaction: Ticket created by john.l.wagner at noaa.gov
> > Queue: met_help
> > Subject: netCDF issues when submitting MET jobs to bsub
> > Owner: Nobody
> > Requestors: john.l.wagner at noaa.gov
> > Status: new
> > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> >
> >
> > Greetings
> > Sorry to email both the MET and WCOSS help desks, but I wasn't
sure where
> > to send this ticket. We have been encountering errors creating
and
> reading
> > netCDF files on WCOSS (mars) recently. These errors occur when we
call
> the
> > MET programs for jobs submitted to the lsf queue.
> > I was running a series of MET's grid_stat jobs yesterday. All was
going
> > well until some time after 19Z, all jobs began to fail. Here is
the
> error
> > that I got:
> >
> > terminate called after throwing an instance of
> > 'netCDF::exceptions::NcHdfErr'
> > what(): NetCDF: HDF error
> > file: ncCheck.cpp line:92
> >
> > A sample of the netCDF files that I was using can be found in
> >
> >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> >
> > I do not have any issues reading data from these files using
ncdump.
> They
> > don't appear to be corrupted. They are copies of files that I
created
> two
> > months ago and have been testing with for some time now.
> > Here are the bsub settings I'm using to submit this job:
> >
> > export NTASK=104
> > export PTILE=28
> > export OMP_NUM_THREAD=20
> > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > -W 3:00 \
> > -oo $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log
\
> > -eo $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log
\
> > -P MDLST-T2O \
> > -M 3000 \
> > -q "dev" \
> > -cwd $PWD \
> > -R "affinity[core(1)]" \
> > -n $NTASK \
> > -R "span[ptile=$PTILE]" \
> > -w "$regrid_dep" \
> > $procdir/met_creeper_linden.sh -s $src -t $valid_date -g
$flg
> -f
> > $force -e $elem
> >
> > A sample mpmd file that is used for CFP can be found here:
> >
> >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> >
> > Myself and Erin Thead have also encountered issues creating netCDF
files
> > using MET's regrid_data_plane program on WCOSS. This issue has
only been
> > occurring for the past week or two and again only occurs when a
job is
> > submitted with bsub. Its not clear to us if something on WCOSS
has
> changed
> > (netCDF/hdf library), if MET is having issues reading/writing
netCDF
> files
> > when jobs are run in parallel, or something else.
> > If you need any more information from us, please let me know.
Again,
> sorry
> > for emailing both help desks at once.
> > Thanks
> > John
> > --
> > John Wagner
> > Verification Task Lead
> > COR Task Manager
> > NOAA/National Weather Service
> > Meteorological Development Laboratory
> > Digital Forecast Services Branch
> > SSMC2 Room 10106
> > Silver Spring, MD 20910
> > (301) 427-9471 (office)
> > (908) 902-4155 (cell/text)
> >
> >
>
>
--
John Wagner
Verification Task Lead
COR Task Manager
NOAA/National Weather Service
Meteorological Development Laboratory
Digital Forecast Services Branch
SSMC2 Room 10106
Silver Spring, MD 20910
(301) 427-9471 (office)
(908) 902-4155 (cell/text)
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: John Halley Gotway
Time: Fri Jan 31 14:04:55 2020
John,
Great, thanks for the info. I'm going to re-assign this ticket to
Julie
Prestopnik, and she'll follow up with you next week when she's ready
for
you to test.
Thanks,
John
On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA Federal via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
>
> Thanks John. I'm running MET v8.1 on mars (dell/phase 3).
>
> module use /usrx/local/dev/modulefiles
> module load met/8.1
>
> On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via RT <
> met_help at ucar.edu>
> wrote:
>
> > Hi John,
> >
> > This is the sort of behavior that could be caused by issues in
your
> runtime
> > environment. Perhaps at runtime, the linker is finding and using
an
> > incompatible version of the HDF5 library... which results in this
error.
> >
> > Your timing is perfect. We literally just met with sys admins
here at
> NCAR
> > about how we've been compiling MET. They strongly encourage us to
define
> > the "rpath" (i.e. run path) in the linker flags (LDFLAGS) when we
> > configure/compile MET. The effect of that is that the executables
know
> > exactly what directories to search for the dependent libraries at
runtime
> > rather than relying on the user's environment (i.e.
LD_LIBRARY_PATH) to
> > find them.
> >
> > One option we could try is asking Julie Prestopnik to recompile
this
> > version of MET using these LDFLAGS settings. You could repoint
your
> > script to this other version and test to see if the behavior goes
away or
> > persists.
> >
> > If that solves it, we'll ask Julie to update her process for
installing
> > future releases of MET. If not, it's back to the drawing board.
> >
> > Can you tell me exactly what version of MET you're running? Which
WCOSS
> > machine and what "module" commands you use to load MET?
> >
> > Thanks,
> > John
> >
> > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA Federal via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > Fri Jan 31 06:56:10 2020: Request 94009 was acted upon.
> > > Transaction: Ticket created by john.l.wagner at noaa.gov
> > > Queue: met_help
> > > Subject: netCDF issues when submitting MET jobs to bsub
> > > Owner: Nobody
> > > Requestors: john.l.wagner at noaa.gov
> > > Status: new
> > > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> >
> > >
> > >
> > > Greetings
> > > Sorry to email both the MET and WCOSS help desks, but I wasn't
sure
> where
> > > to send this ticket. We have been encountering errors creating
and
> > reading
> > > netCDF files on WCOSS (mars) recently. These errors occur when
we call
> > the
> > > MET programs for jobs submitted to the lsf queue.
> > > I was running a series of MET's grid_stat jobs yesterday. All
was
> going
> > > well until some time after 19Z, all jobs began to fail. Here is
the
> > error
> > > that I got:
> > >
> > > terminate called after throwing an instance of
> > > 'netCDF::exceptions::NcHdfErr'
> > > what(): NetCDF: HDF error
> > > file: ncCheck.cpp line:92
> > >
> > > A sample of the netCDF files that I was using can be found in
> > >
> > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > >
> > > I do not have any issues reading data from these files using
ncdump.
> > They
> > > don't appear to be corrupted. They are copies of files that I
created
> > two
> > > months ago and have been testing with for some time now.
> > > Here are the bsub settings I'm using to submit this job:
> > >
> > > export NTASK=104
> > > export PTILE=28
> > > export OMP_NUM_THREAD=20
> > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > -W 3:00 \
> > > -oo
$logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > -eo
$logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > -P MDLST-T2O \
> > > -M 3000 \
> > > -q "dev" \
> > > -cwd $PWD \
> > > -R "affinity[core(1)]" \
> > > -n $NTASK \
> > > -R "span[ptile=$PTILE]" \
> > > -w "$regrid_dep" \
> > > $procdir/met_creeper_linden.sh -s $src -t $valid_date
-g $flg
> > -f
> > > $force -e $elem
> > >
> > > A sample mpmd file that is used for CFP can be found here:
> > >
> > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > >
> > > Myself and Erin Thead have also encountered issues creating
netCDF
> files
> > > using MET's regrid_data_plane program on WCOSS. This issue has
only
> been
> > > occurring for the past week or two and again only occurs when a
job is
> > > submitted with bsub. Its not clear to us if something on WCOSS
has
> > changed
> > > (netCDF/hdf library), if MET is having issues reading/writing
netCDF
> > files
> > > when jobs are run in parallel, or something else.
> > > If you need any more information from us, please let me know.
Again,
> > sorry
> > > for emailing both help desks at once.
> > > Thanks
> > > John
> > > --
> > > John Wagner
> > > Verification Task Lead
> > > COR Task Manager
> > > NOAA/National Weather Service
> > > Meteorological Development Laboratory
> > > Digital Forecast Services Branch
> > > SSMC2 Room 10106
> > > Silver Spring, MD 20910
> > > (301) 427-9471 (office)
> > > (908) 902-4155 (cell/text)
> > >
> > >
> >
> >
>
> --
> John Wagner
> Verification Task Lead
> COR Task Manager
> NOAA/National Weather Service
> Meteorological Development Laboratory
> Digital Forecast Services Branch
> SSMC2 Room 10106
> Silver Spring, MD 20910
> (301) 427-9471 (office)
> (908) 902-4155 (cell/text)
>
>
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: Julie Prestopnik
Time: Mon Feb 03 10:57:29 2020
Hi John.
Thank you for sending the location of MET that you are using. I do
not
have access to write to that location - it is more of an official
location
for the MET software. However, I will still be able to recompile that
version of MET in a different location and create a modulefile for you
to
use and test with. Unfortunately, I am unable to access WCOSS at this
time. I have submitted a helpdesk ticket to the WCOSS helpdesk. Once
I am
able to access WCOSS again, I will get log on to Mars and recompile
MET. I
will follow up once it is ready for you.
Thank you!
Julie
On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA Federal via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
>
> Thanks John. I'm running MET v8.1 on mars (dell/phase 3).
>
> module use /usrx/local/dev/modulefiles
> module load met/8.1
>
> On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via RT <
> met_help at ucar.edu>
> wrote:
>
> > Hi John,
> >
> > This is the sort of behavior that could be caused by issues in
your
> runtime
> > environment. Perhaps at runtime, the linker is finding and using
an
> > incompatible version of the HDF5 library... which results in this
error.
> >
> > Your timing is perfect. We literally just met with sys admins
here at
> NCAR
> > about how we've been compiling MET. They strongly encourage us to
define
> > the "rpath" (i.e. run path) in the linker flags (LDFLAGS) when we
> > configure/compile MET. The effect of that is that the executables
know
> > exactly what directories to search for the dependent libraries at
runtime
> > rather than relying on the user's environment (i.e.
LD_LIBRARY_PATH) to
> > find them.
> >
> > One option we could try is asking Julie Prestopnik to recompile
this
> > version of MET using these LDFLAGS settings. You could repoint
your
> > script to this other version and test to see if the behavior goes
away or
> > persists.
> >
> > If that solves it, we'll ask Julie to update her process for
installing
> > future releases of MET. If not, it's back to the drawing board.
> >
> > Can you tell me exactly what version of MET you're running? Which
WCOSS
> > machine and what "module" commands you use to load MET?
> >
> > Thanks,
> > John
> >
> > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA Federal via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > Fri Jan 31 06:56:10 2020: Request 94009 was acted upon.
> > > Transaction: Ticket created by john.l.wagner at noaa.gov
> > > Queue: met_help
> > > Subject: netCDF issues when submitting MET jobs to bsub
> > > Owner: Nobody
> > > Requestors: john.l.wagner at noaa.gov
> > > Status: new
> > > Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> >
> > >
> > >
> > > Greetings
> > > Sorry to email both the MET and WCOSS help desks, but I wasn't
sure
> where
> > > to send this ticket. We have been encountering errors creating
and
> > reading
> > > netCDF files on WCOSS (mars) recently. These errors occur when
we call
> > the
> > > MET programs for jobs submitted to the lsf queue.
> > > I was running a series of MET's grid_stat jobs yesterday. All
was
> going
> > > well until some time after 19Z, all jobs began to fail. Here is
the
> > error
> > > that I got:
> > >
> > > terminate called after throwing an instance of
> > > 'netCDF::exceptions::NcHdfErr'
> > > what(): NetCDF: HDF error
> > > file: ncCheck.cpp line:92
> > >
> > > A sample of the netCDF files that I was using can be found in
> > >
> > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > >
> > > I do not have any issues reading data from these files using
ncdump.
> > They
> > > don't appear to be corrupted. They are copies of files that I
created
> > two
> > > months ago and have been testing with for some time now.
> > > Here are the bsub settings I'm using to submit this job:
> > >
> > > export NTASK=104
> > > export PTILE=28
> > > export OMP_NUM_THREAD=20
> > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > -W 3:00 \
> > > -oo
$logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > -eo
$logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > -P MDLST-T2O \
> > > -M 3000 \
> > > -q "dev" \
> > > -cwd $PWD \
> > > -R "affinity[core(1)]" \
> > > -n $NTASK \
> > > -R "span[ptile=$PTILE]" \
> > > -w "$regrid_dep" \
> > > $procdir/met_creeper_linden.sh -s $src -t $valid_date
-g $flg
> > -f
> > > $force -e $elem
> > >
> > > A sample mpmd file that is used for CFP can be found here:
> > >
> > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > >
> > > Myself and Erin Thead have also encountered issues creating
netCDF
> files
> > > using MET's regrid_data_plane program on WCOSS. This issue has
only
> been
> > > occurring for the past week or two and again only occurs when a
job is
> > > submitted with bsub. Its not clear to us if something on WCOSS
has
> > changed
> > > (netCDF/hdf library), if MET is having issues reading/writing
netCDF
> > files
> > > when jobs are run in parallel, or something else.
> > > If you need any more information from us, please let me know.
Again,
> > sorry
> > > for emailing both help desks at once.
> > > Thanks
> > > John
> > > --
> > > John Wagner
> > > Verification Task Lead
> > > COR Task Manager
> > > NOAA/National Weather Service
> > > Meteorological Development Laboratory
> > > Digital Forecast Services Branch
> > > SSMC2 Room 10106
> > > Silver Spring, MD 20910
> > > (301) 427-9471 (office)
> > > (908) 902-4155 (cell/text)
> > >
> > >
> >
> >
>
> --
> John Wagner
> Verification Task Lead
> COR Task Manager
> NOAA/National Weather Service
> Meteorological Development Laboratory
> Digital Forecast Services Branch
> SSMC2 Room 10106
> Silver Spring, MD 20910
> (301) 427-9471 (office)
> (908) 902-4155 (cell/text)
>
>
--
Julie Prestopnik
Software Engineer
National Center for Atmospheric Research
Research Applications Laboratory
Phone: 303.497.8399
Email: jpresto at ucar.edu
My working day may not be your working day. Please do not feel
obliged to
reply to this email outside of your normal working hours.
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: John L Wagner - NOAA Federal
Time: Mon Feb 03 11:37:26 2020
Thanks Julie. I'll be able to run met wherever you park it.
There are some file system failures today on mars that may be
preventing
you from logging in. Even if you could get on, there's a good chance
you
wouldn't be able to do anything today anyway.
On Mon, Feb 3, 2020 at 12:57 PM Julie Prestopnik via RT
<met_help at ucar.edu>
wrote:
> Hi John.
>
> Thank you for sending the location of MET that you are using. I do
not
> have access to write to that location - it is more of an official
location
> for the MET software. However, I will still be able to recompile
that
> version of MET in a different location and create a modulefile for
you to
> use and test with. Unfortunately, I am unable to access WCOSS at
this
> time. I have submitted a helpdesk ticket to the WCOSS helpdesk.
Once I am
> able to access WCOSS again, I will get log on to Mars and recompile
MET. I
> will follow up once it is ready for you.
>
> Thank you!
>
> Julie
>
> On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA Federal via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> >
> > Thanks John. I'm running MET v8.1 on mars (dell/phase 3).
> >
> > module use /usrx/local/dev/modulefiles
> > module load met/8.1
> >
> > On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via RT <
> > met_help at ucar.edu>
> > wrote:
> >
> > > Hi John,
> > >
> > > This is the sort of behavior that could be caused by issues in
your
> > runtime
> > > environment. Perhaps at runtime, the linker is finding and
using an
> > > incompatible version of the HDF5 library... which results in
this
> error.
> > >
> > > Your timing is perfect. We literally just met with sys admins
here at
> > NCAR
> > > about how we've been compiling MET. They strongly encourage us
to
> define
> > > the "rpath" (i.e. run path) in the linker flags (LDFLAGS) when
we
> > > configure/compile MET. The effect of that is that the
executables know
> > > exactly what directories to search for the dependent libraries
at
> runtime
> > > rather than relying on the user's environment (i.e.
LD_LIBRARY_PATH) to
> > > find them.
> > >
> > > One option we could try is asking Julie Prestopnik to recompile
this
> > > version of MET using these LDFLAGS settings. You could repoint
your
> > > script to this other version and test to see if the behavior
goes away
> or
> > > persists.
> > >
> > > If that solves it, we'll ask Julie to update her process for
installing
> > > future releases of MET. If not, it's back to the drawing board.
> > >
> > > Can you tell me exactly what version of MET you're running?
Which
> WCOSS
> > > machine and what "module" commands you use to load MET?
> > >
> > > Thanks,
> > > John
> > >
> > > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA Federal via
RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > Fri Jan 31 06:56:10 2020: Request 94009 was acted upon.
> > > > Transaction: Ticket created by john.l.wagner at noaa.gov
> > > > Queue: met_help
> > > > Subject: netCDF issues when submitting MET jobs to bsub
> > > > Owner: Nobody
> > > > Requestors: john.l.wagner at noaa.gov
> > > > Status: new
> > > > Ticket <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > >
> > > >
> > > >
> > > > Greetings
> > > > Sorry to email both the MET and WCOSS help desks, but I wasn't
sure
> > where
> > > > to send this ticket. We have been encountering errors
creating and
> > > reading
> > > > netCDF files on WCOSS (mars) recently. These errors occur
when we
> call
> > > the
> > > > MET programs for jobs submitted to the lsf queue.
> > > > I was running a series of MET's grid_stat jobs yesterday. All
was
> > going
> > > > well until some time after 19Z, all jobs began to fail. Here
is the
> > > error
> > > > that I got:
> > > >
> > > > terminate called after throwing an instance of
> > > > 'netCDF::exceptions::NcHdfErr'
> > > > what(): NetCDF: HDF error
> > > > file: ncCheck.cpp line:92
> > > >
> > > > A sample of the netCDF files that I was using can be found in
> > > >
> > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > > >
> > > > I do not have any issues reading data from these files using
ncdump.
> > > They
> > > > don't appear to be corrupted. They are copies of files that I
> created
> > > two
> > > > months ago and have been testing with for some time now.
> > > > Here are the bsub settings I'm using to submit this job:
> > > >
> > > > export NTASK=104
> > > > export PTILE=28
> > > > export OMP_NUM_THREAD=20
> > > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > > -W 3:00 \
> > > > -oo
$logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > -eo
$logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > -P MDLST-T2O \
> > > > -M 3000 \
> > > > -q "dev" \
> > > > -cwd $PWD \
> > > > -R "affinity[core(1)]" \
> > > > -n $NTASK \
> > > > -R "span[ptile=$PTILE]" \
> > > > -w "$regrid_dep" \
> > > > $procdir/met_creeper_linden.sh -s $src -t
$valid_date -g
> $flg
> > > -f
> > > > $force -e $elem
> > > >
> > > > A sample mpmd file that is used for CFP can be found here:
> > > >
> > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > > >
> > > > Myself and Erin Thead have also encountered issues creating
netCDF
> > files
> > > > using MET's regrid_data_plane program on WCOSS. This issue
has only
> > been
> > > > occurring for the past week or two and again only occurs when
a job
> is
> > > > submitted with bsub. Its not clear to us if something on
WCOSS has
> > > changed
> > > > (netCDF/hdf library), if MET is having issues reading/writing
netCDF
> > > files
> > > > when jobs are run in parallel, or something else.
> > > > If you need any more information from us, please let me know.
Again,
> > > sorry
> > > > for emailing both help desks at once.
> > > > Thanks
> > > > John
> > > > --
> > > > John Wagner
> > > > Verification Task Lead
> > > > COR Task Manager
> > > > NOAA/National Weather Service
> > > > Meteorological Development Laboratory
> > > > Digital Forecast Services Branch
> > > > SSMC2 Room 10106
> > > > Silver Spring, MD 20910
> > > > (301) 427-9471 (office)
> > > > (908) 902-4155 (cell/text)
> > > >
> > > >
> > >
> > >
> >
> > --
> > John Wagner
> > Verification Task Lead
> > COR Task Manager
> > NOAA/National Weather Service
> > Meteorological Development Laboratory
> > Digital Forecast Services Branch
> > SSMC2 Room 10106
> > Silver Spring, MD 20910
> > (301) 427-9471 (office)
> > (908) 902-4155 (cell/text)
> >
> >
>
> --
> Julie Prestopnik
> Software Engineer
> National Center for Atmospheric Research
> Research Applications Laboratory
> Phone: 303.497.8399
> Email: jpresto at ucar.edu
>
> My working day may not be your working day. Please do not feel
obliged to
> reply to this email outside of your normal working hours.
>
>
--
John Wagner
Verification Task Lead
COR Task Manager
NOAA/National Weather Service
Meteorological Development Laboratory
Digital Forecast Services Branch
SSMC2 Room 10106
Silver Spring, MD 20910
(301) 427-9471 (office)
(908) 902-4155 (cell/text)
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: Julie Prestopnik
Time: Fri Feb 14 13:30:06 2020
Hi John. I still don't have my access back yet, but it is being
worked
on. Hopefully, by early next week, the issue will be resolved. I
just
wanted to give you a status update and let you know I haven't
forgotten
about this task.
Julie
On Mon, Feb 3, 2020 at 11:37 AM John L Wagner - NOAA Federal via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
>
> Thanks Julie. I'll be able to run met wherever you park it.
> There are some file system failures today on mars that may be
preventing
> you from logging in. Even if you could get on, there's a good
chance you
> wouldn't be able to do anything today anyway.
>
> On Mon, Feb 3, 2020 at 12:57 PM Julie Prestopnik via RT
<met_help at ucar.edu
> >
> wrote:
>
> > Hi John.
> >
> > Thank you for sending the location of MET that you are using. I
do not
> > have access to write to that location - it is more of an official
> location
> > for the MET software. However, I will still be able to recompile
that
> > version of MET in a different location and create a modulefile for
you to
> > use and test with. Unfortunately, I am unable to access WCOSS at
this
> > time. I have submitted a helpdesk ticket to the WCOSS helpdesk.
Once I
> am
> > able to access WCOSS again, I will get log on to Mars and
recompile
> MET. I
> > will follow up once it is ready for you.
> >
> > Thank you!
> >
> > Julie
> >
> > On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA Federal via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > >
> > > Thanks John. I'm running MET v8.1 on mars (dell/phase 3).
> > >
> > > module use /usrx/local/dev/modulefiles
> > > module load met/8.1
> > >
> > > On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via RT <
> > > met_help at ucar.edu>
> > > wrote:
> > >
> > > > Hi John,
> > > >
> > > > This is the sort of behavior that could be caused by issues in
your
> > > runtime
> > > > environment. Perhaps at runtime, the linker is finding and
using an
> > > > incompatible version of the HDF5 library... which results in
this
> > error.
> > > >
> > > > Your timing is perfect. We literally just met with sys admins
here
> at
> > > NCAR
> > > > about how we've been compiling MET. They strongly encourage
us to
> > define
> > > > the "rpath" (i.e. run path) in the linker flags (LDFLAGS) when
we
> > > > configure/compile MET. The effect of that is that the
executables
> know
> > > > exactly what directories to search for the dependent libraries
at
> > runtime
> > > > rather than relying on the user's environment (i.e.
LD_LIBRARY_PATH)
> to
> > > > find them.
> > > >
> > > > One option we could try is asking Julie Prestopnik to
recompile this
> > > > version of MET using these LDFLAGS settings. You could
repoint your
> > > > script to this other version and test to see if the behavior
goes
> away
> > or
> > > > persists.
> > > >
> > > > If that solves it, we'll ask Julie to update her process for
> installing
> > > > future releases of MET. If not, it's back to the drawing
board.
> > > >
> > > > Can you tell me exactly what version of MET you're running?
Which
> > WCOSS
> > > > machine and what "module" commands you use to load MET?
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA Federal
via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > Fri Jan 31 06:56:10 2020: Request 94009 was acted upon.
> > > > > Transaction: Ticket created by john.l.wagner at noaa.gov
> > > > > Queue: met_help
> > > > > Subject: netCDF issues when submitting MET jobs to bsub
> > > > > Owner: Nobody
> > > > > Requestors: john.l.wagner at noaa.gov
> > > > > Status: new
> > > > > Ticket <URL:
> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > > >
> > > > >
> > > > >
> > > > > Greetings
> > > > > Sorry to email both the MET and WCOSS help desks, but I
wasn't sure
> > > where
> > > > > to send this ticket. We have been encountering errors
creating and
> > > > reading
> > > > > netCDF files on WCOSS (mars) recently. These errors occur
when we
> > call
> > > > the
> > > > > MET programs for jobs submitted to the lsf queue.
> > > > > I was running a series of MET's grid_stat jobs yesterday.
All was
> > > going
> > > > > well until some time after 19Z, all jobs began to fail.
Here is
> the
> > > > error
> > > > > that I got:
> > > > >
> > > > > terminate called after throwing an instance of
> > > > > 'netCDF::exceptions::NcHdfErr'
> > > > > what(): NetCDF: HDF error
> > > > > file: ncCheck.cpp line:92
> > > > >
> > > > > A sample of the netCDF files that I was using can be found
in
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > > > >
> > > > > I do not have any issues reading data from these files using
> ncdump.
> > > > They
> > > > > don't appear to be corrupted. They are copies of files that
I
> > created
> > > > two
> > > > > months ago and have been testing with for some time now.
> > > > > Here are the bsub settings I'm using to submit this job:
> > > > >
> > > > > export NTASK=104
> > > > > export PTILE=28
> > > > > export OMP_NUM_THREAD=20
> > > > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > > > -W 3:00 \
> > > > > -oo
$logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > -eo
$logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > -P MDLST-T2O \
> > > > > -M 3000 \
> > > > > -q "dev" \
> > > > > -cwd $PWD \
> > > > > -R "affinity[core(1)]" \
> > > > > -n $NTASK \
> > > > > -R "span[ptile=$PTILE]" \
> > > > > -w "$regrid_dep" \
> > > > > $procdir/met_creeper_linden.sh -s $src -t
$valid_date -g
> > $flg
> > > > -f
> > > > > $force -e $elem
> > > > >
> > > > > A sample mpmd file that is used for CFP can be found here:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > > > >
> > > > > Myself and Erin Thead have also encountered issues creating
netCDF
> > > files
> > > > > using MET's regrid_data_plane program on WCOSS. This issue
has
> only
> > > been
> > > > > occurring for the past week or two and again only occurs
when a job
> > is
> > > > > submitted with bsub. Its not clear to us if something on
WCOSS has
> > > > changed
> > > > > (netCDF/hdf library), if MET is having issues
reading/writing
> netCDF
> > > > files
> > > > > when jobs are run in parallel, or something else.
> > > > > If you need any more information from us, please let me
know.
> Again,
> > > > sorry
> > > > > for emailing both help desks at once.
> > > > > Thanks
> > > > > John
> > > > > --
> > > > > John Wagner
> > > > > Verification Task Lead
> > > > > COR Task Manager
> > > > > NOAA/National Weather Service
> > > > > Meteorological Development Laboratory
> > > > > Digital Forecast Services Branch
> > > > > SSMC2 Room 10106
> > > > > Silver Spring, MD 20910
> > > > > (301) 427-9471 (office)
> > > > > (908) 902-4155 (cell/text)
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > --
> > > John Wagner
> > > Verification Task Lead
> > > COR Task Manager
> > > NOAA/National Weather Service
> > > Meteorological Development Laboratory
> > > Digital Forecast Services Branch
> > > SSMC2 Room 10106
> > > Silver Spring, MD 20910
> > > (301) 427-9471 (office)
> > > (908) 902-4155 (cell/text)
> > >
> > >
> >
> > --
> > Julie Prestopnik
> > Software Engineer
> > National Center for Atmospheric Research
> > Research Applications Laboratory
> > Phone: 303.497.8399
> > Email: jpresto at ucar.edu
> >
> > My working day may not be your working day. Please do not feel
obliged
> to
> > reply to this email outside of your normal working hours.
> >
> >
>
> --
> John Wagner
> Verification Task Lead
> COR Task Manager
> NOAA/National Weather Service
> Meteorological Development Laboratory
> Digital Forecast Services Branch
> SSMC2 Room 10106
> Silver Spring, MD 20910
> (301) 427-9471 (office)
> (908) 902-4155 (cell/text)
>
>
--
Julie Prestopnik
Software Engineer
National Center for Atmospheric Research
Research Applications Laboratory
Phone: 303.497.8399
Email: jpresto at ucar.edu
My working day may not be your working day. Please do not feel
obliged to
reply to this email outside of your normal working hours.
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: John L Wagner - NOAA Federal
Time: Fri Feb 14 13:39:03 2020
Thanks for the update Julie. Much appreciated.
Things have been running smoother on WCOSS the past two days.
Hopefully
the fix that they put in for the file servers earlier this week was
really
what we needed.
On Fri, Feb 14, 2020 at 3:30 PM Julie Prestopnik via RT
<met_help at ucar.edu>
wrote:
> Hi John. I still don't have my access back yet, but it is being
worked
> on. Hopefully, by early next week, the issue will be resolved. I
just
> wanted to give you a status update and let you know I haven't
forgotten
> about this task.
>
> Julie
>
> On Mon, Feb 3, 2020 at 11:37 AM John L Wagner - NOAA Federal via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> >
> > Thanks Julie. I'll be able to run met wherever you park it.
> > There are some file system failures today on mars that may be
preventing
> > you from logging in. Even if you could get on, there's a good
chance you
> > wouldn't be able to do anything today anyway.
> >
> > On Mon, Feb 3, 2020 at 12:57 PM Julie Prestopnik via RT <
> met_help at ucar.edu
> > >
> > wrote:
> >
> > > Hi John.
> > >
> > > Thank you for sending the location of MET that you are using. I
do not
> > > have access to write to that location - it is more of an
official
> > location
> > > for the MET software. However, I will still be able to
recompile that
> > > version of MET in a different location and create a modulefile
for you
> to
> > > use and test with. Unfortunately, I am unable to access WCOSS
at this
> > > time. I have submitted a helpdesk ticket to the WCOSS helpdesk.
Once
> I
> > am
> > > able to access WCOSS again, I will get log on to Mars and
recompile
> > MET. I
> > > will follow up once it is ready for you.
> > >
> > > Thank you!
> > >
> > > Julie
> > >
> > > On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA Federal via
RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
>
> > > >
> > > > Thanks John. I'm running MET v8.1 on mars (dell/phase 3).
> > > >
> > > > module use /usrx/local/dev/modulefiles
> > > > module load met/8.1
> > > >
> > > > On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via RT <
> > > > met_help at ucar.edu>
> > > > wrote:
> > > >
> > > > > Hi John,
> > > > >
> > > > > This is the sort of behavior that could be caused by issues
in your
> > > > runtime
> > > > > environment. Perhaps at runtime, the linker is finding and
using
> an
> > > > > incompatible version of the HDF5 library... which results in
this
> > > error.
> > > > >
> > > > > Your timing is perfect. We literally just met with sys
admins here
> > at
> > > > NCAR
> > > > > about how we've been compiling MET. They strongly encourage
us to
> > > define
> > > > > the "rpath" (i.e. run path) in the linker flags (LDFLAGS)
when we
> > > > > configure/compile MET. The effect of that is that the
executables
> > know
> > > > > exactly what directories to search for the dependent
libraries at
> > > runtime
> > > > > rather than relying on the user's environment (i.e.
> LD_LIBRARY_PATH)
> > to
> > > > > find them.
> > > > >
> > > > > One option we could try is asking Julie Prestopnik to
recompile
> this
> > > > > version of MET using these LDFLAGS settings. You could
repoint
> your
> > > > > script to this other version and test to see if the behavior
goes
> > away
> > > or
> > > > > persists.
> > > > >
> > > > > If that solves it, we'll ask Julie to update her process for
> > installing
> > > > > future releases of MET. If not, it's back to the drawing
board.
> > > > >
> > > > > Can you tell me exactly what version of MET you're running?
Which
> > > WCOSS
> > > > > machine and what "module" commands you use to load MET?
> > > > >
> > > > > Thanks,
> > > > > John
> > > > >
> > > > > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA Federal
via
> RT <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > > >
> > > > > > Fri Jan 31 06:56:10 2020: Request 94009 was acted upon.
> > > > > > Transaction: Ticket created by john.l.wagner at noaa.gov
> > > > > > Queue: met_help
> > > > > > Subject: netCDF issues when submitting MET jobs to
bsub
> > > > > > Owner: Nobody
> > > > > > Requestors: john.l.wagner at noaa.gov
> > > > > > Status: new
> > > > > > Ticket <URL:
> > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > > > >
> > > > > >
> > > > > >
> > > > > > Greetings
> > > > > > Sorry to email both the MET and WCOSS help desks, but I
wasn't
> sure
> > > > where
> > > > > > to send this ticket. We have been encountering errors
creating
> and
> > > > > reading
> > > > > > netCDF files on WCOSS (mars) recently. These errors occur
when
> we
> > > call
> > > > > the
> > > > > > MET programs for jobs submitted to the lsf queue.
> > > > > > I was running a series of MET's grid_stat jobs yesterday.
All
> was
> > > > going
> > > > > > well until some time after 19Z, all jobs began to fail.
Here is
> > the
> > > > > error
> > > > > > that I got:
> > > > > >
> > > > > > terminate called after throwing an instance of
> > > > > > 'netCDF::exceptions::NcHdfErr'
> > > > > > what(): NetCDF: HDF error
> > > > > > file: ncCheck.cpp line:92
> > > > > >
> > > > > > A sample of the netCDF files that I was using can be found
in
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > > > > >
> > > > > > I do not have any issues reading data from these files
using
> > ncdump.
> > > > > They
> > > > > > don't appear to be corrupted. They are copies of files
that I
> > > created
> > > > > two
> > > > > > months ago and have been testing with for some time now.
> > > > > > Here are the bsub settings I'm using to submit this job:
> > > > > >
> > > > > > export NTASK=104
> > > > > > export PTILE=28
> > > > > > export OMP_NUM_THREAD=20
> > > > > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > > > > -W 3:00 \
> > > > > > -oo
> $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > -eo
> $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > -P MDLST-T2O \
> > > > > > -M 3000 \
> > > > > > -q "dev" \
> > > > > > -cwd $PWD \
> > > > > > -R "affinity[core(1)]" \
> > > > > > -n $NTASK \
> > > > > > -R "span[ptile=$PTILE]" \
> > > > > > -w "$regrid_dep" \
> > > > > > $procdir/met_creeper_linden.sh -s $src -t
$valid_date
> -g
> > > $flg
> > > > > -f
> > > > > > $force -e $elem
> > > > > >
> > > > > > A sample mpmd file that is used for CFP can be found here:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > > > > >
> > > > > > Myself and Erin Thead have also encountered issues
creating
> netCDF
> > > > files
> > > > > > using MET's regrid_data_plane program on WCOSS. This
issue has
> > only
> > > > been
> > > > > > occurring for the past week or two and again only occurs
when a
> job
> > > is
> > > > > > submitted with bsub. Its not clear to us if something on
WCOSS
> has
> > > > > changed
> > > > > > (netCDF/hdf library), if MET is having issues
reading/writing
> > netCDF
> > > > > files
> > > > > > when jobs are run in parallel, or something else.
> > > > > > If you need any more information from us, please let me
know.
> > Again,
> > > > > sorry
> > > > > > for emailing both help desks at once.
> > > > > > Thanks
> > > > > > John
> > > > > > --
> > > > > > John Wagner
> > > > > > Verification Task Lead
> > > > > > COR Task Manager
> > > > > > NOAA/National Weather Service
> > > > > > Meteorological Development Laboratory
> > > > > > Digital Forecast Services Branch
> > > > > > SSMC2 Room 10106
> > > > > > Silver Spring, MD 20910
> > > > > > (301) 427-9471 (office)
> > > > > > (908) 902-4155 (cell/text)
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > John Wagner
> > > > Verification Task Lead
> > > > COR Task Manager
> > > > NOAA/National Weather Service
> > > > Meteorological Development Laboratory
> > > > Digital Forecast Services Branch
> > > > SSMC2 Room 10106
> > > > Silver Spring, MD 20910
> > > > (301) 427-9471 (office)
> > > > (908) 902-4155 (cell/text)
> > > >
> > > >
> > >
> > > --
> > > Julie Prestopnik
> > > Software Engineer
> > > National Center for Atmospheric Research
> > > Research Applications Laboratory
> > > Phone: 303.497.8399
> > > Email: jpresto at ucar.edu
> > >
> > > My working day may not be your working day. Please do not feel
obliged
> > to
> > > reply to this email outside of your normal working hours.
> > >
> > >
> >
> > --
> > John Wagner
> > Verification Task Lead
> > COR Task Manager
> > NOAA/National Weather Service
> > Meteorological Development Laboratory
> > Digital Forecast Services Branch
> > SSMC2 Room 10106
> > Silver Spring, MD 20910
> > (301) 427-9471 (office)
> > (908) 902-4155 (cell/text)
> >
> >
>
> --
> Julie Prestopnik
> Software Engineer
> National Center for Atmospheric Research
> Research Applications Laboratory
> Phone: 303.497.8399
> Email: jpresto at ucar.edu
>
> My working day may not be your working day. Please do not feel
obliged to
> reply to this email outside of your normal working hours.
>
>
--
John Wagner
Verification Task Lead
COR Task Manager
NOAA/National Weather Service
Meteorological Development Laboratory
Digital Forecast Services Branch
SSMC2 Room 10106
Silver Spring, MD 20910
(301) 427-9471 (office)
(908) 902-4155 (cell/text)
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: Julie Prestopnik
Time: Wed Feb 26 11:06:20 2020
Hi John.
I have my WCOSS access back. I have installed met-9.0_beta3 using
rpath
on Venus and on Mars. I have not recompiled older versions of MET in
this
way. Will using met-9.0_beta3 work ok for you? Although I know you
said
that the fix on Mars may have fixed your problem already...
If you are interested in trying out met-9.0_beta3, you can take a look
at
this page:
https://dtcenter.org/community-code/model-evaluation-tools-met/metv9-
0-existing-builds-metplus-3-0-installations
to find what you need to load in order to access this version. Just
click
on "NOAA machines" and find Venus or Mars, respectively.
Please let us know if you have any questions.
Thanks,
Julie
On Fri, Feb 14, 2020 at 1:39 PM John L Wagner - NOAA Federal via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
>
> Thanks for the update Julie. Much appreciated.
> Things have been running smoother on WCOSS the past two days.
Hopefully
> the fix that they put in for the file servers earlier this week was
really
> what we needed.
>
> On Fri, Feb 14, 2020 at 3:30 PM Julie Prestopnik via RT
<met_help at ucar.edu
> >
> wrote:
>
> > Hi John. I still don't have my access back yet, but it is being
worked
> > on. Hopefully, by early next week, the issue will be resolved. I
just
> > wanted to give you a status update and let you know I haven't
forgotten
> > about this task.
> >
> > Julie
> >
> > On Mon, Feb 3, 2020 at 11:37 AM John L Wagner - NOAA Federal via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > >
> > > Thanks Julie. I'll be able to run met wherever you park it.
> > > There are some file system failures today on mars that may be
> preventing
> > > you from logging in. Even if you could get on, there's a good
chance
> you
> > > wouldn't be able to do anything today anyway.
> > >
> > > On Mon, Feb 3, 2020 at 12:57 PM Julie Prestopnik via RT <
> > met_help at ucar.edu
> > > >
> > > wrote:
> > >
> > > > Hi John.
> > > >
> > > > Thank you for sending the location of MET that you are using.
I do
> not
> > > > have access to write to that location - it is more of an
official
> > > location
> > > > for the MET software. However, I will still be able to
recompile
> that
> > > > version of MET in a different location and create a modulefile
for
> you
> > to
> > > > use and test with. Unfortunately, I am unable to access WCOSS
at
> this
> > > > time. I have submitted a helpdesk ticket to the WCOSS
helpdesk.
> Once
> > I
> > > am
> > > > able to access WCOSS again, I will get log on to Mars and
recompile
> > > MET. I
> > > > will follow up once it is ready for you.
> > > >
> > > > Thank you!
> > > >
> > > > Julie
> > > >
> > > > On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA Federal
via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > > > >
> > > > > Thanks John. I'm running MET v8.1 on mars (dell/phase 3).
> > > > >
> > > > > module use /usrx/local/dev/modulefiles
> > > > > module load met/8.1
> > > > >
> > > > > On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via RT <
> > > > > met_help at ucar.edu>
> > > > > wrote:
> > > > >
> > > > > > Hi John,
> > > > > >
> > > > > > This is the sort of behavior that could be caused by
issues in
> your
> > > > > runtime
> > > > > > environment. Perhaps at runtime, the linker is finding
and using
> > an
> > > > > > incompatible version of the HDF5 library... which results
in this
> > > > error.
> > > > > >
> > > > > > Your timing is perfect. We literally just met with sys
admins
> here
> > > at
> > > > > NCAR
> > > > > > about how we've been compiling MET. They strongly
encourage us
> to
> > > > define
> > > > > > the "rpath" (i.e. run path) in the linker flags (LDFLAGS)
when we
> > > > > > configure/compile MET. The effect of that is that the
> executables
> > > know
> > > > > > exactly what directories to search for the dependent
libraries at
> > > > runtime
> > > > > > rather than relying on the user's environment (i.e.
> > LD_LIBRARY_PATH)
> > > to
> > > > > > find them.
> > > > > >
> > > > > > One option we could try is asking Julie Prestopnik to
recompile
> > this
> > > > > > version of MET using these LDFLAGS settings. You could
repoint
> > your
> > > > > > script to this other version and test to see if the
behavior goes
> > > away
> > > > or
> > > > > > persists.
> > > > > >
> > > > > > If that solves it, we'll ask Julie to update her process
for
> > > installing
> > > > > > future releases of MET. If not, it's back to the drawing
board.
> > > > > >
> > > > > > Can you tell me exactly what version of MET you're
running?
> Which
> > > > WCOSS
> > > > > > machine and what "module" commands you use to load MET?
> > > > > >
> > > > > > Thanks,
> > > > > > John
> > > > > >
> > > > > > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA
Federal via
> > RT <
> > > > > > met_help at ucar.edu> wrote:
> > > > > >
> > > > > > >
> > > > > > > Fri Jan 31 06:56:10 2020: Request 94009 was acted upon.
> > > > > > > Transaction: Ticket created by john.l.wagner at noaa.gov
> > > > > > > Queue: met_help
> > > > > > > Subject: netCDF issues when submitting MET jobs to
bsub
> > > > > > > Owner: Nobody
> > > > > > > Requestors: john.l.wagner at noaa.gov
> > > > > > > Status: new
> > > > > > > Ticket <URL:
> > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Greetings
> > > > > > > Sorry to email both the MET and WCOSS help desks, but I
wasn't
> > sure
> > > > > where
> > > > > > > to send this ticket. We have been encountering errors
creating
> > and
> > > > > > reading
> > > > > > > netCDF files on WCOSS (mars) recently. These errors
occur when
> > we
> > > > call
> > > > > > the
> > > > > > > MET programs for jobs submitted to the lsf queue.
> > > > > > > I was running a series of MET's grid_stat jobs
yesterday. All
> > was
> > > > > going
> > > > > > > well until some time after 19Z, all jobs began to fail.
Here
> is
> > > the
> > > > > > error
> > > > > > > that I got:
> > > > > > >
> > > > > > > terminate called after throwing an instance of
> > > > > > > 'netCDF::exceptions::NcHdfErr'
> > > > > > > what(): NetCDF: HDF error
> > > > > > > file: ncCheck.cpp line:92
> > > > > > >
> > > > > > > A sample of the netCDF files that I was using can be
found in
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > > > > > >
> > > > > > > I do not have any issues reading data from these files
using
> > > ncdump.
> > > > > > They
> > > > > > > don't appear to be corrupted. They are copies of files
that I
> > > > created
> > > > > > two
> > > > > > > months ago and have been testing with for some time now.
> > > > > > > Here are the bsub settings I'm using to submit this job:
> > > > > > >
> > > > > > > export NTASK=104
> > > > > > > export PTILE=28
> > > > > > > export OMP_NUM_THREAD=20
> > > > > > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > > > > > -W 3:00 \
> > > > > > > -oo
> > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > -eo
> > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > -P MDLST-T2O \
> > > > > > > -M 3000 \
> > > > > > > -q "dev" \
> > > > > > > -cwd $PWD \
> > > > > > > -R "affinity[core(1)]" \
> > > > > > > -n $NTASK \
> > > > > > > -R "span[ptile=$PTILE]" \
> > > > > > > -w "$regrid_dep" \
> > > > > > > $procdir/met_creeper_linden.sh -s $src -t
$valid_date
> > -g
> > > > $flg
> > > > > > -f
> > > > > > > $force -e $elem
> > > > > > >
> > > > > > > A sample mpmd file that is used for CFP can be found
here:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > > > > > >
> > > > > > > Myself and Erin Thead have also encountered issues
creating
> > netCDF
> > > > > files
> > > > > > > using MET's regrid_data_plane program on WCOSS. This
issue has
> > > only
> > > > > been
> > > > > > > occurring for the past week or two and again only occurs
when a
> > job
> > > > is
> > > > > > > submitted with bsub. Its not clear to us if something
on WCOSS
> > has
> > > > > > changed
> > > > > > > (netCDF/hdf library), if MET is having issues
reading/writing
> > > netCDF
> > > > > > files
> > > > > > > when jobs are run in parallel, or something else.
> > > > > > > If you need any more information from us, please let me
know.
> > > Again,
> > > > > > sorry
> > > > > > > for emailing both help desks at once.
> > > > > > > Thanks
> > > > > > > John
> > > > > > > --
> > > > > > > John Wagner
> > > > > > > Verification Task Lead
> > > > > > > COR Task Manager
> > > > > > > NOAA/National Weather Service
> > > > > > > Meteorological Development Laboratory
> > > > > > > Digital Forecast Services Branch
> > > > > > > SSMC2 Room 10106
> > > > > > > Silver Spring, MD 20910
> > > > > > > (301) 427-9471 (office)
> > > > > > > (908) 902-4155 (cell/text)
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > John Wagner
> > > > > Verification Task Lead
> > > > > COR Task Manager
> > > > > NOAA/National Weather Service
> > > > > Meteorological Development Laboratory
> > > > > Digital Forecast Services Branch
> > > > > SSMC2 Room 10106
> > > > > Silver Spring, MD 20910
> > > > > (301) 427-9471 (office)
> > > > > (908) 902-4155 (cell/text)
> > > > >
> > > > >
> > > >
> > > > --
> > > > Julie Prestopnik
> > > > Software Engineer
> > > > National Center for Atmospheric Research
> > > > Research Applications Laboratory
> > > > Phone: 303.497.8399
> > > > Email: jpresto at ucar.edu
> > > >
> > > > My working day may not be your working day. Please do not
feel
> obliged
> > > to
> > > > reply to this email outside of your normal working hours.
> > > >
> > > >
> > >
> > > --
> > > John Wagner
> > > Verification Task Lead
> > > COR Task Manager
> > > NOAA/National Weather Service
> > > Meteorological Development Laboratory
> > > Digital Forecast Services Branch
> > > SSMC2 Room 10106
> > > Silver Spring, MD 20910
> > > (301) 427-9471 (office)
> > > (908) 902-4155 (cell/text)
> > >
> > >
> >
> > --
> > Julie Prestopnik
> > Software Engineer
> > National Center for Atmospheric Research
> > Research Applications Laboratory
> > Phone: 303.497.8399
> > Email: jpresto at ucar.edu
> >
> > My working day may not be your working day. Please do not feel
obliged
> to
> > reply to this email outside of your normal working hours.
> >
> >
>
> --
> John Wagner
> Verification Task Lead
> COR Task Manager
> NOAA/National Weather Service
> Meteorological Development Laboratory
> Digital Forecast Services Branch
> SSMC2 Room 10106
> Silver Spring, MD 20910
> (301) 427-9471 (office)
> (908) 902-4155 (cell/text)
>
>
--
Julie Prestopnik
Software Engineer
National Center for Atmospheric Research
Research Applications Laboratory
Phone: 303.497.8399
Email: jpresto at ucar.edu
My working day may not be your working day. Please do not feel
obliged to
reply to this email outside of your normal working hours.
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: John L Wagner - NOAA Federal
Time: Wed Feb 26 11:11:29 2020
Thanks Julie. Yes, we still want to test with V9.0 , just to confirm
that
the fix on mars was the issue.
I should be able to get to this later this week. I'll let you know if
I
run into any issues.
Thanks
John
On Wed, Feb 26, 2020 at 1:06 PM Julie Prestopnik via RT
<met_help at ucar.edu>
wrote:
> Hi John.
>
> I have my WCOSS access back. I have installed met-9.0_beta3 using
rpath
> on Venus and on Mars. I have not recompiled older versions of MET
in this
> way. Will using met-9.0_beta3 work ok for you? Although I know you
said
> that the fix on Mars may have fixed your problem already...
>
> If you are interested in trying out met-9.0_beta3, you can take a
look at
> this page:
>
>
> https://dtcenter.org/community-code/model-evaluation-tools-
met/metv9-0-existing-builds-metplus-3-0-installations
>
> to find what you need to load in order to access this version. Just
click
> on "NOAA machines" and find Venus or Mars, respectively.
>
> Please let us know if you have any questions.
>
> Thanks,
> Julie
>
> On Fri, Feb 14, 2020 at 1:39 PM John L Wagner - NOAA Federal via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> >
> > Thanks for the update Julie. Much appreciated.
> > Things have been running smoother on WCOSS the past two days.
Hopefully
> > the fix that they put in for the file servers earlier this week
was
> really
> > what we needed.
> >
> > On Fri, Feb 14, 2020 at 3:30 PM Julie Prestopnik via RT <
> met_help at ucar.edu
> > >
> > wrote:
> >
> > > Hi John. I still don't have my access back yet, but it is
being
> worked
> > > on. Hopefully, by early next week, the issue will be resolved.
I just
> > > wanted to give you a status update and let you know I haven't
forgotten
> > > about this task.
> > >
> > > Julie
> > >
> > > On Mon, Feb 3, 2020 at 11:37 AM John L Wagner - NOAA Federal via
RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
>
> > > >
> > > > Thanks Julie. I'll be able to run met wherever you park it.
> > > > There are some file system failures today on mars that may be
> > preventing
> > > > you from logging in. Even if you could get on, there's a good
chance
> > you
> > > > wouldn't be able to do anything today anyway.
> > > >
> > > > On Mon, Feb 3, 2020 at 12:57 PM Julie Prestopnik via RT <
> > > met_help at ucar.edu
> > > > >
> > > > wrote:
> > > >
> > > > > Hi John.
> > > > >
> > > > > Thank you for sending the location of MET that you are
using. I do
> > not
> > > > > have access to write to that location - it is more of an
official
> > > > location
> > > > > for the MET software. However, I will still be able to
recompile
> > that
> > > > > version of MET in a different location and create a
modulefile for
> > you
> > > to
> > > > > use and test with. Unfortunately, I am unable to access
WCOSS at
> > this
> > > > > time. I have submitted a helpdesk ticket to the WCOSS
helpdesk.
> > Once
> > > I
> > > > am
> > > > > able to access WCOSS again, I will get log on to Mars and
recompile
> > > > MET. I
> > > > > will follow up once it is ready for you.
> > > > >
> > > > > Thank you!
> > > > >
> > > > > Julie
> > > > >
> > > > > On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA Federal
via
> RT <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > > >
> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > > > > >
> > > > > > Thanks John. I'm running MET v8.1 on mars (dell/phase 3).
> > > > > >
> > > > > > module use /usrx/local/dev/modulefiles
> > > > > > module load met/8.1
> > > > > >
> > > > > > On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via RT
<
> > > > > > met_help at ucar.edu>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi John,
> > > > > > >
> > > > > > > This is the sort of behavior that could be caused by
issues in
> > your
> > > > > > runtime
> > > > > > > environment. Perhaps at runtime, the linker is finding
and
> using
> > > an
> > > > > > > incompatible version of the HDF5 library... which
results in
> this
> > > > > error.
> > > > > > >
> > > > > > > Your timing is perfect. We literally just met with sys
admins
> > here
> > > > at
> > > > > > NCAR
> > > > > > > about how we've been compiling MET. They strongly
encourage us
> > to
> > > > > define
> > > > > > > the "rpath" (i.e. run path) in the linker flags
(LDFLAGS) when
> we
> > > > > > > configure/compile MET. The effect of that is that the
> > executables
> > > > know
> > > > > > > exactly what directories to search for the dependent
libraries
> at
> > > > > runtime
> > > > > > > rather than relying on the user's environment (i.e.
> > > LD_LIBRARY_PATH)
> > > > to
> > > > > > > find them.
> > > > > > >
> > > > > > > One option we could try is asking Julie Prestopnik to
recompile
> > > this
> > > > > > > version of MET using these LDFLAGS settings. You could
repoint
> > > your
> > > > > > > script to this other version and test to see if the
behavior
> goes
> > > > away
> > > > > or
> > > > > > > persists.
> > > > > > >
> > > > > > > If that solves it, we'll ask Julie to update her process
for
> > > > installing
> > > > > > > future releases of MET. If not, it's back to the
drawing
> board.
> > > > > > >
> > > > > > > Can you tell me exactly what version of MET you're
running?
> > Which
> > > > > WCOSS
> > > > > > > machine and what "module" commands you use to load MET?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > John
> > > > > > >
> > > > > > > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA
Federal
> via
> > > RT <
> > > > > > > met_help at ucar.edu> wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Fri Jan 31 06:56:10 2020: Request 94009 was acted
upon.
> > > > > > > > Transaction: Ticket created by john.l.wagner at noaa.gov
> > > > > > > > Queue: met_help
> > > > > > > > Subject: netCDF issues when submitting MET jobs
to bsub
> > > > > > > > Owner: Nobody
> > > > > > > > Requestors: john.l.wagner at noaa.gov
> > > > > > > > Status: new
> > > > > > > > Ticket <URL:
> > > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Greetings
> > > > > > > > Sorry to email both the MET and WCOSS help desks, but
I
> wasn't
> > > sure
> > > > > > where
> > > > > > > > to send this ticket. We have been encountering errors
> creating
> > > and
> > > > > > > reading
> > > > > > > > netCDF files on WCOSS (mars) recently. These errors
occur
> when
> > > we
> > > > > call
> > > > > > > the
> > > > > > > > MET programs for jobs submitted to the lsf queue.
> > > > > > > > I was running a series of MET's grid_stat jobs
yesterday.
> All
> > > was
> > > > > > going
> > > > > > > > well until some time after 19Z, all jobs began to
fail. Here
> > is
> > > > the
> > > > > > > error
> > > > > > > > that I got:
> > > > > > > >
> > > > > > > > terminate called after throwing an instance of
> > > > > > > > 'netCDF::exceptions::NcHdfErr'
> > > > > > > > what(): NetCDF: HDF error
> > > > > > > > file: ncCheck.cpp line:92
> > > > > > > >
> > > > > > > > A sample of the netCDF files that I was using can be
found in
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > > > > > > >
> > > > > > > > I do not have any issues reading data from these files
using
> > > > ncdump.
> > > > > > > They
> > > > > > > > don't appear to be corrupted. They are copies of
files that
> I
> > > > > created
> > > > > > > two
> > > > > > > > months ago and have been testing with for some time
now.
> > > > > > > > Here are the bsub settings I'm using to submit this
job:
> > > > > > > >
> > > > > > > > export NTASK=104
> > > > > > > > export PTILE=28
> > > > > > > > export OMP_NUM_THREAD=20
> > > > > > > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > > > > > > -W 3:00 \
> > > > > > > > -oo
> > > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > > -eo
> > > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > > -P MDLST-T2O \
> > > > > > > > -M 3000 \
> > > > > > > > -q "dev" \
> > > > > > > > -cwd $PWD \
> > > > > > > > -R "affinity[core(1)]" \
> > > > > > > > -n $NTASK \
> > > > > > > > -R "span[ptile=$PTILE]" \
> > > > > > > > -w "$regrid_dep" \
> > > > > > > > $procdir/met_creeper_linden.sh -s $src -t
> $valid_date
> > > -g
> > > > > $flg
> > > > > > > -f
> > > > > > > > $force -e $elem
> > > > > > > >
> > > > > > > > A sample mpmd file that is used for CFP can be found
here:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > > > > > > >
> > > > > > > > Myself and Erin Thead have also encountered issues
creating
> > > netCDF
> > > > > > files
> > > > > > > > using MET's regrid_data_plane program on WCOSS. This
issue
> has
> > > > only
> > > > > > been
> > > > > > > > occurring for the past week or two and again only
occurs
> when a
> > > job
> > > > > is
> > > > > > > > submitted with bsub. Its not clear to us if something
on
> WCOSS
> > > has
> > > > > > > changed
> > > > > > > > (netCDF/hdf library), if MET is having issues
reading/writing
> > > > netCDF
> > > > > > > files
> > > > > > > > when jobs are run in parallel, or something else.
> > > > > > > > If you need any more information from us, please let
me know.
> > > > Again,
> > > > > > > sorry
> > > > > > > > for emailing both help desks at once.
> > > > > > > > Thanks
> > > > > > > > John
> > > > > > > > --
> > > > > > > > John Wagner
> > > > > > > > Verification Task Lead
> > > > > > > > COR Task Manager
> > > > > > > > NOAA/National Weather Service
> > > > > > > > Meteorological Development Laboratory
> > > > > > > > Digital Forecast Services Branch
> > > > > > > > SSMC2 Room 10106
> > > > > > > > Silver Spring, MD 20910
> > > > > > > > (301) 427-9471 (office)
> > > > > > > > (908) 902-4155 (cell/text)
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > John Wagner
> > > > > > Verification Task Lead
> > > > > > COR Task Manager
> > > > > > NOAA/National Weather Service
> > > > > > Meteorological Development Laboratory
> > > > > > Digital Forecast Services Branch
> > > > > > SSMC2 Room 10106
> > > > > > Silver Spring, MD 20910
> > > > > > (301) 427-9471 (office)
> > > > > > (908) 902-4155 (cell/text)
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Julie Prestopnik
> > > > > Software Engineer
> > > > > National Center for Atmospheric Research
> > > > > Research Applications Laboratory
> > > > > Phone: 303.497.8399
> > > > > Email: jpresto at ucar.edu
> > > > >
> > > > > My working day may not be your working day. Please do not
feel
> > obliged
> > > > to
> > > > > reply to this email outside of your normal working hours.
> > > > >
> > > > >
> > > >
> > > > --
> > > > John Wagner
> > > > Verification Task Lead
> > > > COR Task Manager
> > > > NOAA/National Weather Service
> > > > Meteorological Development Laboratory
> > > > Digital Forecast Services Branch
> > > > SSMC2 Room 10106
> > > > Silver Spring, MD 20910
> > > > (301) 427-9471 (office)
> > > > (908) 902-4155 (cell/text)
> > > >
> > > >
> > >
> > > --
> > > Julie Prestopnik
> > > Software Engineer
> > > National Center for Atmospheric Research
> > > Research Applications Laboratory
> > > Phone: 303.497.8399
> > > Email: jpresto at ucar.edu
> > >
> > > My working day may not be your working day. Please do not feel
obliged
> > to
> > > reply to this email outside of your normal working hours.
> > >
> > >
> >
> > --
> > John Wagner
> > Verification Task Lead
> > COR Task Manager
> > NOAA/National Weather Service
> > Meteorological Development Laboratory
> > Digital Forecast Services Branch
> > SSMC2 Room 10106
> > Silver Spring, MD 20910
> > (301) 427-9471 (office)
> > (908) 902-4155 (cell/text)
> >
> >
>
> --
> Julie Prestopnik
> Software Engineer
> National Center for Atmospheric Research
> Research Applications Laboratory
> Phone: 303.497.8399
> Email: jpresto at ucar.edu
>
> My working day may not be your working day. Please do not feel
obliged to
> reply to this email outside of your normal working hours.
>
>
--
John Wagner
Verification Task Lead
COR Task Manager
NOAA/National Weather Service
Meteorological Development Laboratory
Digital Forecast Services Branch
SSMC2 Room 10106
Silver Spring, MD 20910
(301) 427-9471 (office)
(908) 902-4155 (cell/text)
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: Julie Prestopnik
Time: Wed Feb 26 11:23:46 2020
Thank you, John!
On Wed, Feb 26, 2020 at 11:11 AM John L Wagner - NOAA Federal via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
>
> Thanks Julie. Yes, we still want to test with V9.0 , just to
confirm that
> the fix on mars was the issue.
> I should be able to get to this later this week. I'll let you know
if I
> run into any issues.
> Thanks
> John
>
> On Wed, Feb 26, 2020 at 1:06 PM Julie Prestopnik via RT
<met_help at ucar.edu
> >
> wrote:
>
> > Hi John.
> >
> > I have my WCOSS access back. I have installed met-9.0_beta3
using rpath
> > on Venus and on Mars. I have not recompiled older versions of MET
in
> this
> > way. Will using met-9.0_beta3 work ok for you? Although I know
you said
> > that the fix on Mars may have fixed your problem already...
> >
> > If you are interested in trying out met-9.0_beta3, you can take a
look at
> > this page:
> >
> >
> >
> https://dtcenter.org/community-code/model-evaluation-tools-
met/metv9-0-existing-builds-metplus-3-0-installations
> >
> > to find what you need to load in order to access this version.
Just
> click
> > on "NOAA machines" and find Venus or Mars, respectively.
> >
> > Please let us know if you have any questions.
> >
> > Thanks,
> > Julie
> >
> > On Fri, Feb 14, 2020 at 1:39 PM John L Wagner - NOAA Federal via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > >
> > > Thanks for the update Julie. Much appreciated.
> > > Things have been running smoother on WCOSS the past two days.
> Hopefully
> > > the fix that they put in for the file servers earlier this week
was
> > really
> > > what we needed.
> > >
> > > On Fri, Feb 14, 2020 at 3:30 PM Julie Prestopnik via RT <
> > met_help at ucar.edu
> > > >
> > > wrote:
> > >
> > > > Hi John. I still don't have my access back yet, but it is
being
> > worked
> > > > on. Hopefully, by early next week, the issue will be
resolved. I
> just
> > > > wanted to give you a status update and let you know I haven't
> forgotten
> > > > about this task.
> > > >
> > > > Julie
> > > >
> > > > On Mon, Feb 3, 2020 at 11:37 AM John L Wagner - NOAA Federal
via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > > > >
> > > > > Thanks Julie. I'll be able to run met wherever you park it.
> > > > > There are some file system failures today on mars that may
be
> > > preventing
> > > > > you from logging in. Even if you could get on, there's a
good
> chance
> > > you
> > > > > wouldn't be able to do anything today anyway.
> > > > >
> > > > > On Mon, Feb 3, 2020 at 12:57 PM Julie Prestopnik via RT <
> > > > met_help at ucar.edu
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi John.
> > > > > >
> > > > > > Thank you for sending the location of MET that you are
using. I
> do
> > > not
> > > > > > have access to write to that location - it is more of an
official
> > > > > location
> > > > > > for the MET software. However, I will still be able to
recompile
> > > that
> > > > > > version of MET in a different location and create a
modulefile
> for
> > > you
> > > > to
> > > > > > use and test with. Unfortunately, I am unable to access
WCOSS at
> > > this
> > > > > > time. I have submitted a helpdesk ticket to the WCOSS
helpdesk.
> > > Once
> > > > I
> > > > > am
> > > > > > able to access WCOSS again, I will get log on to Mars and
> recompile
> > > > > MET. I
> > > > > > will follow up once it is ready for you.
> > > > > >
> > > > > > Thank you!
> > > > > >
> > > > > > Julie
> > > > > >
> > > > > > On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA
Federal via
> > RT <
> > > > > > met_help at ucar.edu> wrote:
> > > > > >
> > > > > > >
> > > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> >
> > > > > > >
> > > > > > > Thanks John. I'm running MET v8.1 on mars (dell/phase
3).
> > > > > > >
> > > > > > > module use /usrx/local/dev/modulefiles
> > > > > > > module load met/8.1
> > > > > > >
> > > > > > > On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via
RT <
> > > > > > > met_help at ucar.edu>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi John,
> > > > > > > >
> > > > > > > > This is the sort of behavior that could be caused by
issues
> in
> > > your
> > > > > > > runtime
> > > > > > > > environment. Perhaps at runtime, the linker is
finding and
> > using
> > > > an
> > > > > > > > incompatible version of the HDF5 library... which
results in
> > this
> > > > > > error.
> > > > > > > >
> > > > > > > > Your timing is perfect. We literally just met with
sys
> admins
> > > here
> > > > > at
> > > > > > > NCAR
> > > > > > > > about how we've been compiling MET. They strongly
encourage
> us
> > > to
> > > > > > define
> > > > > > > > the "rpath" (i.e. run path) in the linker flags
(LDFLAGS)
> when
> > we
> > > > > > > > configure/compile MET. The effect of that is that the
> > > executables
> > > > > know
> > > > > > > > exactly what directories to search for the dependent
> libraries
> > at
> > > > > > runtime
> > > > > > > > rather than relying on the user's environment (i.e.
> > > > LD_LIBRARY_PATH)
> > > > > to
> > > > > > > > find them.
> > > > > > > >
> > > > > > > > One option we could try is asking Julie Prestopnik to
> recompile
> > > > this
> > > > > > > > version of MET using these LDFLAGS settings. You
could
> repoint
> > > > your
> > > > > > > > script to this other version and test to see if the
behavior
> > goes
> > > > > away
> > > > > > or
> > > > > > > > persists.
> > > > > > > >
> > > > > > > > If that solves it, we'll ask Julie to update her
process for
> > > > > installing
> > > > > > > > future releases of MET. If not, it's back to the
drawing
> > board.
> > > > > > > >
> > > > > > > > Can you tell me exactly what version of MET you're
running?
> > > Which
> > > > > > WCOSS
> > > > > > > > machine and what "module" commands you use to load
MET?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > John
> > > > > > > >
> > > > > > > > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA
Federal
> > via
> > > > RT <
> > > > > > > > met_help at ucar.edu> wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Fri Jan 31 06:56:10 2020: Request 94009 was acted
upon.
> > > > > > > > > Transaction: Ticket created by
john.l.wagner at noaa.gov
> > > > > > > > > Queue: met_help
> > > > > > > > > Subject: netCDF issues when submitting MET jobs
to
> bsub
> > > > > > > > > Owner: Nobody
> > > > > > > > > Requestors: john.l.wagner at noaa.gov
> > > > > > > > > Status: new
> > > > > > > > > Ticket <URL:
> > > > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Greetings
> > > > > > > > > Sorry to email both the MET and WCOSS help desks,
but I
> > wasn't
> > > > sure
> > > > > > > where
> > > > > > > > > to send this ticket. We have been encountering
errors
> > creating
> > > > and
> > > > > > > > reading
> > > > > > > > > netCDF files on WCOSS (mars) recently. These errors
occur
> > when
> > > > we
> > > > > > call
> > > > > > > > the
> > > > > > > > > MET programs for jobs submitted to the lsf queue.
> > > > > > > > > I was running a series of MET's grid_stat jobs
yesterday.
> > All
> > > > was
> > > > > > > going
> > > > > > > > > well until some time after 19Z, all jobs began to
fail.
> Here
> > > is
> > > > > the
> > > > > > > > error
> > > > > > > > > that I got:
> > > > > > > > >
> > > > > > > > > terminate called after throwing an instance of
> > > > > > > > > 'netCDF::exceptions::NcHdfErr'
> > > > > > > > > what(): NetCDF: HDF error
> > > > > > > > > file: ncCheck.cpp line:92
> > > > > > > > >
> > > > > > > > > A sample of the netCDF files that I was using can be
found
> in
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > > > > > > > >
> > > > > > > > > I do not have any issues reading data from these
files
> using
> > > > > ncdump.
> > > > > > > > They
> > > > > > > > > don't appear to be corrupted. They are copies of
files
> that
> > I
> > > > > > created
> > > > > > > > two
> > > > > > > > > months ago and have been testing with for some time
now.
> > > > > > > > > Here are the bsub settings I'm using to submit this
job:
> > > > > > > > >
> > > > > > > > > export NTASK=104
> > > > > > > > > export PTILE=28
> > > > > > > > > export OMP_NUM_THREAD=20
> > > > > > > > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > > > > > > > -W 3:00 \
> > > > > > > > > -oo
> > > > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > > > -eo
> > > > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > > > -P MDLST-T2O \
> > > > > > > > > -M 3000 \
> > > > > > > > > -q "dev" \
> > > > > > > > > -cwd $PWD \
> > > > > > > > > -R "affinity[core(1)]" \
> > > > > > > > > -n $NTASK \
> > > > > > > > > -R "span[ptile=$PTILE]" \
> > > > > > > > > -w "$regrid_dep" \
> > > > > > > > > $procdir/met_creeper_linden.sh -s $src -t
> > $valid_date
> > > > -g
> > > > > > $flg
> > > > > > > > -f
> > > > > > > > > $force -e $elem
> > > > > > > > >
> > > > > > > > > A sample mpmd file that is used for CFP can be found
here:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > > > > > > > >
> > > > > > > > > Myself and Erin Thead have also encountered issues
creating
> > > > netCDF
> > > > > > > files
> > > > > > > > > using MET's regrid_data_plane program on WCOSS.
This issue
> > has
> > > > > only
> > > > > > > been
> > > > > > > > > occurring for the past week or two and again only
occurs
> > when a
> > > > job
> > > > > > is
> > > > > > > > > submitted with bsub. Its not clear to us if
something on
> > WCOSS
> > > > has
> > > > > > > > changed
> > > > > > > > > (netCDF/hdf library), if MET is having issues
> reading/writing
> > > > > netCDF
> > > > > > > > files
> > > > > > > > > when jobs are run in parallel, or something else.
> > > > > > > > > If you need any more information from us, please let
me
> know.
> > > > > Again,
> > > > > > > > sorry
> > > > > > > > > for emailing both help desks at once.
> > > > > > > > > Thanks
> > > > > > > > > John
> > > > > > > > > --
> > > > > > > > > John Wagner
> > > > > > > > > Verification Task Lead
> > > > > > > > > COR Task Manager
> > > > > > > > > NOAA/National Weather Service
> > > > > > > > > Meteorological Development Laboratory
> > > > > > > > > Digital Forecast Services Branch
> > > > > > > > > SSMC2 Room 10106
> > > > > > > > > Silver Spring, MD 20910
> > > > > > > > > (301) 427-9471 (office)
> > > > > > > > > (908) 902-4155 (cell/text)
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > John Wagner
> > > > > > > Verification Task Lead
> > > > > > > COR Task Manager
> > > > > > > NOAA/National Weather Service
> > > > > > > Meteorological Development Laboratory
> > > > > > > Digital Forecast Services Branch
> > > > > > > SSMC2 Room 10106
> > > > > > > Silver Spring, MD 20910
> > > > > > > (301) 427-9471 (office)
> > > > > > > (908) 902-4155 (cell/text)
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Julie Prestopnik
> > > > > > Software Engineer
> > > > > > National Center for Atmospheric Research
> > > > > > Research Applications Laboratory
> > > > > > Phone: 303.497.8399
> > > > > > Email: jpresto at ucar.edu
> > > > > >
> > > > > > My working day may not be your working day. Please do not
feel
> > > obliged
> > > > > to
> > > > > > reply to this email outside of your normal working hours.
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > John Wagner
> > > > > Verification Task Lead
> > > > > COR Task Manager
> > > > > NOAA/National Weather Service
> > > > > Meteorological Development Laboratory
> > > > > Digital Forecast Services Branch
> > > > > SSMC2 Room 10106
> > > > > Silver Spring, MD 20910
> > > > > (301) 427-9471 (office)
> > > > > (908) 902-4155 (cell/text)
> > > > >
> > > > >
> > > >
> > > > --
> > > > Julie Prestopnik
> > > > Software Engineer
> > > > National Center for Atmospheric Research
> > > > Research Applications Laboratory
> > > > Phone: 303.497.8399
> > > > Email: jpresto at ucar.edu
> > > >
> > > > My working day may not be your working day. Please do not
feel
> obliged
> > > to
> > > > reply to this email outside of your normal working hours.
> > > >
> > > >
> > >
> > > --
> > > John Wagner
> > > Verification Task Lead
> > > COR Task Manager
> > > NOAA/National Weather Service
> > > Meteorological Development Laboratory
> > > Digital Forecast Services Branch
> > > SSMC2 Room 10106
> > > Silver Spring, MD 20910
> > > (301) 427-9471 (office)
> > > (908) 902-4155 (cell/text)
> > >
> > >
> >
> > --
> > Julie Prestopnik
> > Software Engineer
> > National Center for Atmospheric Research
> > Research Applications Laboratory
> > Phone: 303.497.8399
> > Email: jpresto at ucar.edu
> >
> > My working day may not be your working day. Please do not feel
obliged
> to
> > reply to this email outside of your normal working hours.
> >
> >
>
> --
> John Wagner
> Verification Task Lead
> COR Task Manager
> NOAA/National Weather Service
> Meteorological Development Laboratory
> Digital Forecast Services Branch
> SSMC2 Room 10106
> Silver Spring, MD 20910
> (301) 427-9471 (office)
> (908) 902-4155 (cell/text)
>
>
--
Julie Prestopnik
Software Engineer
National Center for Atmospheric Research
Research Applications Laboratory
Phone: 303.497.8399
Email: jpresto at ucar.edu
My working day may not be your working day. Please do not feel
obliged to
reply to this email outside of your normal working hours.
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: John L Wagner - NOAA Federal
Time: Thu Mar 05 13:14:29 2020
Hi Julie
I'm testing the v9.0 beta now on venus with grid_stat, as we have seen
some
slowdowns on venus over the past 2 days. A single run of grid_stat
with
v8.1 was taking around 3 minutes. With v9.0_beta, its taking a little
over
a minute.
The v9.0 beta seems to be significantly faster so far. If there is
anything else I should be looking for as I test, please let me know.
Thanks
John
On Wed, Feb 26, 2020 at 1:23 PM Julie Prestopnik via RT
<met_help at ucar.edu>
wrote:
> Thank you, John!
>
> On Wed, Feb 26, 2020 at 11:11 AM John L Wagner - NOAA Federal via RT
<
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> >
> > Thanks Julie. Yes, we still want to test with V9.0 , just to
confirm
> that
> > the fix on mars was the issue.
> > I should be able to get to this later this week. I'll let you
know if I
> > run into any issues.
> > Thanks
> > John
> >
> > On Wed, Feb 26, 2020 at 1:06 PM Julie Prestopnik via RT <
> met_help at ucar.edu
> > >
> > wrote:
> >
> > > Hi John.
> > >
> > > I have my WCOSS access back. I have installed met-9.0_beta3
using
> rpath
> > > on Venus and on Mars. I have not recompiled older versions of
MET in
> > this
> > > way. Will using met-9.0_beta3 work ok for you? Although I know
you
> said
> > > that the fix on Mars may have fixed your problem already...
> > >
> > > If you are interested in trying out met-9.0_beta3, you can take
a look
> at
> > > this page:
> > >
> > >
> > >
> >
> https://dtcenter.org/community-code/model-evaluation-tools-
met/metv9-0-existing-builds-metplus-3-0-installations
> > >
> > > to find what you need to load in order to access this version.
Just
> > click
> > > on "NOAA machines" and find Venus or Mars, respectively.
> > >
> > > Please let us know if you have any questions.
> > >
> > > Thanks,
> > > Julie
> > >
> > > On Fri, Feb 14, 2020 at 1:39 PM John L Wagner - NOAA Federal via
RT <
> > > met_help at ucar.edu> wrote:
> > >
> > > >
> > > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
>
> > > >
> > > > Thanks for the update Julie. Much appreciated.
> > > > Things have been running smoother on WCOSS the past two days.
> > Hopefully
> > > > the fix that they put in for the file servers earlier this
week was
> > > really
> > > > what we needed.
> > > >
> > > > On Fri, Feb 14, 2020 at 3:30 PM Julie Prestopnik via RT <
> > > met_help at ucar.edu
> > > > >
> > > > wrote:
> > > >
> > > > > Hi John. I still don't have my access back yet, but it is
being
> > > worked
> > > > > on. Hopefully, by early next week, the issue will be
resolved. I
> > just
> > > > > wanted to give you a status update and let you know I
haven't
> > forgotten
> > > > > about this task.
> > > > >
> > > > > Julie
> > > > >
> > > > > On Mon, Feb 3, 2020 at 11:37 AM John L Wagner - NOAA Federal
via
> RT <
> > > > > met_help at ucar.edu> wrote:
> > > > >
> > > > > >
> > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > > > > >
> > > > > > Thanks Julie. I'll be able to run met wherever you park
it.
> > > > > > There are some file system failures today on mars that may
be
> > > > preventing
> > > > > > you from logging in. Even if you could get on, there's a
good
> > chance
> > > > you
> > > > > > wouldn't be able to do anything today anyway.
> > > > > >
> > > > > > On Mon, Feb 3, 2020 at 12:57 PM Julie Prestopnik via RT <
> > > > > met_help at ucar.edu
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi John.
> > > > > > >
> > > > > > > Thank you for sending the location of MET that you are
using.
> I
> > do
> > > > not
> > > > > > > have access to write to that location - it is more of an
> official
> > > > > > location
> > > > > > > for the MET software. However, I will still be able to
> recompile
> > > > that
> > > > > > > version of MET in a different location and create a
modulefile
> > for
> > > > you
> > > > > to
> > > > > > > use and test with. Unfortunately, I am unable to access
WCOSS
> at
> > > > this
> > > > > > > time. I have submitted a helpdesk ticket to the WCOSS
> helpdesk.
> > > > Once
> > > > > I
> > > > > > am
> > > > > > > able to access WCOSS again, I will get log on to Mars
and
> > recompile
> > > > > > MET. I
> > > > > > > will follow up once it is ready for you.
> > > > > > >
> > > > > > > Thank you!
> > > > > > >
> > > > > > > Julie
> > > > > > >
> > > > > > > On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA
Federal
> via
> > > RT <
> > > > > > > met_help at ucar.edu> wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > >
> > > > > > > >
> > > > > > > > Thanks John. I'm running MET v8.1 on mars (dell/phase
3).
> > > > > > > >
> > > > > > > > module use /usrx/local/dev/modulefiles
> > > > > > > > module load met/8.1
> > > > > > > >
> > > > > > > > On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway via
RT <
> > > > > > > > met_help at ucar.edu>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi John,
> > > > > > > > >
> > > > > > > > > This is the sort of behavior that could be caused by
issues
> > in
> > > > your
> > > > > > > > runtime
> > > > > > > > > environment. Perhaps at runtime, the linker is
finding and
> > > using
> > > > > an
> > > > > > > > > incompatible version of the HDF5 library... which
results
> in
> > > this
> > > > > > > error.
> > > > > > > > >
> > > > > > > > > Your timing is perfect. We literally just met with
sys
> > admins
> > > > here
> > > > > > at
> > > > > > > > NCAR
> > > > > > > > > about how we've been compiling MET. They strongly
> encourage
> > us
> > > > to
> > > > > > > define
> > > > > > > > > the "rpath" (i.e. run path) in the linker flags
(LDFLAGS)
> > when
> > > we
> > > > > > > > > configure/compile MET. The effect of that is that
the
> > > > executables
> > > > > > know
> > > > > > > > > exactly what directories to search for the dependent
> > libraries
> > > at
> > > > > > > runtime
> > > > > > > > > rather than relying on the user's environment (i.e.
> > > > > LD_LIBRARY_PATH)
> > > > > > to
> > > > > > > > > find them.
> > > > > > > > >
> > > > > > > > > One option we could try is asking Julie Prestopnik
to
> > recompile
> > > > > this
> > > > > > > > > version of MET using these LDFLAGS settings. You
could
> > repoint
> > > > > your
> > > > > > > > > script to this other version and test to see if the
> behavior
> > > goes
> > > > > > away
> > > > > > > or
> > > > > > > > > persists.
> > > > > > > > >
> > > > > > > > > If that solves it, we'll ask Julie to update her
process
> for
> > > > > > installing
> > > > > > > > > future releases of MET. If not, it's back to the
drawing
> > > board.
> > > > > > > > >
> > > > > > > > > Can you tell me exactly what version of MET you're
running?
> > > > Which
> > > > > > > WCOSS
> > > > > > > > > machine and what "module" commands you use to load
MET?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > John
> > > > > > > > >
> > > > > > > > > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner - NOAA
> Federal
> > > via
> > > > > RT <
> > > > > > > > > met_help at ucar.edu> wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Fri Jan 31 06:56:10 2020: Request 94009 was acted
upon.
> > > > > > > > > > Transaction: Ticket created by
john.l.wagner at noaa.gov
> > > > > > > > > > Queue: met_help
> > > > > > > > > > Subject: netCDF issues when submitting MET
jobs to
> > bsub
> > > > > > > > > > Owner: Nobody
> > > > > > > > > > Requestors: john.l.wagner at noaa.gov
> > > > > > > > > > Status: new
> > > > > > > > > > Ticket <URL:
> > > > > > > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Greetings
> > > > > > > > > > Sorry to email both the MET and WCOSS help desks,
but I
> > > wasn't
> > > > > sure
> > > > > > > > where
> > > > > > > > > > to send this ticket. We have been encountering
errors
> > > creating
> > > > > and
> > > > > > > > > reading
> > > > > > > > > > netCDF files on WCOSS (mars) recently. These
errors
> occur
> > > when
> > > > > we
> > > > > > > call
> > > > > > > > > the
> > > > > > > > > > MET programs for jobs submitted to the lsf queue.
> > > > > > > > > > I was running a series of MET's grid_stat jobs
yesterday.
> > > All
> > > > > was
> > > > > > > > going
> > > > > > > > > > well until some time after 19Z, all jobs began to
fail.
> > Here
> > > > is
> > > > > > the
> > > > > > > > > error
> > > > > > > > > > that I got:
> > > > > > > > > >
> > > > > > > > > > terminate called after throwing an instance of
> > > > > > > > > > 'netCDF::exceptions::NcHdfErr'
> > > > > > > > > > what(): NetCDF: HDF error
> > > > > > > > > > file: ncCheck.cpp line:92
> > > > > > > > > >
> > > > > > > > > > A sample of the netCDF files that I was using can
be
> found
> > in
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > > > > > > > > >
> > > > > > > > > > I do not have any issues reading data from these
files
> > using
> > > > > > ncdump.
> > > > > > > > > They
> > > > > > > > > > don't appear to be corrupted. They are copies of
files
> > that
> > > I
> > > > > > > created
> > > > > > > > > two
> > > > > > > > > > months ago and have been testing with for some
time now.
> > > > > > > > > > Here are the bsub settings I'm using to submit
this job:
> > > > > > > > > >
> > > > > > > > > > export NTASK=104
> > > > > > > > > > export PTILE=28
> > > > > > > > > > export OMP_NUM_THREAD=20
> > > > > > > > > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > > > > > > > > -W 3:00 \
> > > > > > > > > > -oo
> > > > > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > > > > -eo
> > > > > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > > > > -P MDLST-T2O \
> > > > > > > > > > -M 3000 \
> > > > > > > > > > -q "dev" \
> > > > > > > > > > -cwd $PWD \
> > > > > > > > > > -R "affinity[core(1)]" \
> > > > > > > > > > -n $NTASK \
> > > > > > > > > > -R "span[ptile=$PTILE]" \
> > > > > > > > > > -w "$regrid_dep" \
> > > > > > > > > > $procdir/met_creeper_linden.sh -s $src
-t
> > > $valid_date
> > > > > -g
> > > > > > > $flg
> > > > > > > > > -f
> > > > > > > > > > $force -e $elem
> > > > > > > > > >
> > > > > > > > > > A sample mpmd file that is used for CFP can be
found
> here:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > > > > > > > > >
> > > > > > > > > > Myself and Erin Thead have also encountered issues
> creating
> > > > > netCDF
> > > > > > > > files
> > > > > > > > > > using MET's regrid_data_plane program on WCOSS.
This
> issue
> > > has
> > > > > > only
> > > > > > > > been
> > > > > > > > > > occurring for the past week or two and again only
occurs
> > > when a
> > > > > job
> > > > > > > is
> > > > > > > > > > submitted with bsub. Its not clear to us if
something on
> > > WCOSS
> > > > > has
> > > > > > > > > changed
> > > > > > > > > > (netCDF/hdf library), if MET is having issues
> > reading/writing
> > > > > > netCDF
> > > > > > > > > files
> > > > > > > > > > when jobs are run in parallel, or something else.
> > > > > > > > > > If you need any more information from us, please
let me
> > know.
> > > > > > Again,
> > > > > > > > > sorry
> > > > > > > > > > for emailing both help desks at once.
> > > > > > > > > > Thanks
> > > > > > > > > > John
> > > > > > > > > > --
> > > > > > > > > > John Wagner
> > > > > > > > > > Verification Task Lead
> > > > > > > > > > COR Task Manager
> > > > > > > > > > NOAA/National Weather Service
> > > > > > > > > > Meteorological Development Laboratory
> > > > > > > > > > Digital Forecast Services Branch
> > > > > > > > > > SSMC2 Room 10106
> > > > > > > > > > Silver Spring, MD 20910
> > > > > > > > > > (301) 427-9471 (office)
> > > > > > > > > > (908) 902-4155 (cell/text)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > John Wagner
> > > > > > > > Verification Task Lead
> > > > > > > > COR Task Manager
> > > > > > > > NOAA/National Weather Service
> > > > > > > > Meteorological Development Laboratory
> > > > > > > > Digital Forecast Services Branch
> > > > > > > > SSMC2 Room 10106
> > > > > > > > Silver Spring, MD 20910
> > > > > > > > (301) 427-9471 (office)
> > > > > > > > (908) 902-4155 (cell/text)
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Julie Prestopnik
> > > > > > > Software Engineer
> > > > > > > National Center for Atmospheric Research
> > > > > > > Research Applications Laboratory
> > > > > > > Phone: 303.497.8399
> > > > > > > Email: jpresto at ucar.edu
> > > > > > >
> > > > > > > My working day may not be your working day. Please do
not feel
> > > > obliged
> > > > > > to
> > > > > > > reply to this email outside of your normal working
hours.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > John Wagner
> > > > > > Verification Task Lead
> > > > > > COR Task Manager
> > > > > > NOAA/National Weather Service
> > > > > > Meteorological Development Laboratory
> > > > > > Digital Forecast Services Branch
> > > > > > SSMC2 Room 10106
> > > > > > Silver Spring, MD 20910
> > > > > > (301) 427-9471 (office)
> > > > > > (908) 902-4155 (cell/text)
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Julie Prestopnik
> > > > > Software Engineer
> > > > > National Center for Atmospheric Research
> > > > > Research Applications Laboratory
> > > > > Phone: 303.497.8399
> > > > > Email: jpresto at ucar.edu
> > > > >
> > > > > My working day may not be your working day. Please do not
feel
> > obliged
> > > > to
> > > > > reply to this email outside of your normal working hours.
> > > > >
> > > > >
> > > >
> > > > --
> > > > John Wagner
> > > > Verification Task Lead
> > > > COR Task Manager
> > > > NOAA/National Weather Service
> > > > Meteorological Development Laboratory
> > > > Digital Forecast Services Branch
> > > > SSMC2 Room 10106
> > > > Silver Spring, MD 20910
> > > > (301) 427-9471 (office)
> > > > (908) 902-4155 (cell/text)
> > > >
> > > >
> > >
> > > --
> > > Julie Prestopnik
> > > Software Engineer
> > > National Center for Atmospheric Research
> > > Research Applications Laboratory
> > > Phone: 303.497.8399
> > > Email: jpresto at ucar.edu
> > >
> > > My working day may not be your working day. Please do not feel
obliged
> > to
> > > reply to this email outside of your normal working hours.
> > >
> > >
> >
> > --
> > John Wagner
> > Verification Task Lead
> > COR Task Manager
> > NOAA/National Weather Service
> > Meteorological Development Laboratory
> > Digital Forecast Services Branch
> > SSMC2 Room 10106
> > Silver Spring, MD 20910
> > (301) 427-9471 (office)
> > (908) 902-4155 (cell/text)
> >
> >
>
> --
> Julie Prestopnik
> Software Engineer
> National Center for Atmospheric Research
> Research Applications Laboratory
> Phone: 303.497.8399
> Email: jpresto at ucar.edu
>
> My working day may not be your working day. Please do not feel
obliged to
> reply to this email outside of your normal working hours.
>
>
--
John Wagner
Verification Task Lead
COR Task Manager
NOAA/National Weather Service
Meteorological Development Laboratory
Digital Forecast Services Branch
SSMC2 Room 10106
Silver Spring, MD 20910
(301) 427-9471 (office)
(908) 902-4155 (cell/text)
------------------------------------------------
Subject: netCDF issues when submitting MET jobs to bsub
From: Julie Prestopnik
Time: Thu Mar 05 13:23:15 2020
Hi John.
Thank you for letting us know. I'm glad to hear about the speed up
with
grid_stat in the beta version! We have various people testing out
various
functionality. Please just let us know if you encounter anything
you're
not expecting.
Thanks,
Julie
On Thu, Mar 5, 2020 at 1:15 PM John L Wagner - NOAA Federal via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
>
> Hi Julie
> I'm testing the v9.0 beta now on venus with grid_stat, as we have
seen some
> slowdowns on venus over the past 2 days. A single run of grid_stat
with
> v8.1 was taking around 3 minutes. With v9.0_beta, its taking a
little over
> a minute.
> The v9.0 beta seems to be significantly faster so far. If there is
> anything else I should be looking for as I test, please let me know.
> Thanks
> John
>
> On Wed, Feb 26, 2020 at 1:23 PM Julie Prestopnik via RT
<met_help at ucar.edu
> >
> wrote:
>
> > Thank you, John!
> >
> > On Wed, Feb 26, 2020 at 11:11 AM John L Wagner - NOAA Federal via
RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > >
> > > Thanks Julie. Yes, we still want to test with V9.0 , just to
confirm
> > that
> > > the fix on mars was the issue.
> > > I should be able to get to this later this week. I'll let you
know if
> I
> > > run into any issues.
> > > Thanks
> > > John
> > >
> > > On Wed, Feb 26, 2020 at 1:06 PM Julie Prestopnik via RT <
> > met_help at ucar.edu
> > > >
> > > wrote:
> > >
> > > > Hi John.
> > > >
> > > > I have my WCOSS access back. I have installed met-9.0_beta3
using
> > rpath
> > > > on Venus and on Mars. I have not recompiled older versions of
MET in
> > > this
> > > > way. Will using met-9.0_beta3 work ok for you? Although I
know you
> > said
> > > > that the fix on Mars may have fixed your problem already...
> > > >
> > > > If you are interested in trying out met-9.0_beta3, you can
take a
> look
> > at
> > > > this page:
> > > >
> > > >
> > > >
> > >
> >
> https://dtcenter.org/community-code/model-evaluation-tools-
met/metv9-0-existing-builds-metplus-3-0-installations
> > > >
> > > > to find what you need to load in order to access this version.
Just
> > > click
> > > > on "NOAA machines" and find Venus or Mars, respectively.
> > > >
> > > > Please let us know if you have any questions.
> > > >
> > > > Thanks,
> > > > Julie
> > > >
> > > > On Fri, Feb 14, 2020 at 1:39 PM John L Wagner - NOAA Federal
via RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > > >
> > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009 >
> > > > >
> > > > > Thanks for the update Julie. Much appreciated.
> > > > > Things have been running smoother on WCOSS the past two
days.
> > > Hopefully
> > > > > the fix that they put in for the file servers earlier this
week was
> > > > really
> > > > > what we needed.
> > > > >
> > > > > On Fri, Feb 14, 2020 at 3:30 PM Julie Prestopnik via RT <
> > > > met_help at ucar.edu
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi John. I still don't have my access back yet, but it
is being
> > > > worked
> > > > > > on. Hopefully, by early next week, the issue will be
resolved.
> I
> > > just
> > > > > > wanted to give you a status update and let you know I
haven't
> > > forgotten
> > > > > > about this task.
> > > > > >
> > > > > > Julie
> > > > > >
> > > > > > On Mon, Feb 3, 2020 at 11:37 AM John L Wagner - NOAA
Federal via
> > RT <
> > > > > > met_help at ucar.edu> wrote:
> > > > > >
> > > > > > >
> > > > > > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> >
> > > > > > >
> > > > > > > Thanks Julie. I'll be able to run met wherever you park
it.
> > > > > > > There are some file system failures today on mars that
may be
> > > > > preventing
> > > > > > > you from logging in. Even if you could get on, there's
a good
> > > chance
> > > > > you
> > > > > > > wouldn't be able to do anything today anyway.
> > > > > > >
> > > > > > > On Mon, Feb 3, 2020 at 12:57 PM Julie Prestopnik via RT
<
> > > > > > met_help at ucar.edu
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi John.
> > > > > > > >
> > > > > > > > Thank you for sending the location of MET that you are
using.
> > I
> > > do
> > > > > not
> > > > > > > > have access to write to that location - it is more of
an
> > official
> > > > > > > location
> > > > > > > > for the MET software. However, I will still be able
to
> > recompile
> > > > > that
> > > > > > > > version of MET in a different location and create a
> modulefile
> > > for
> > > > > you
> > > > > > to
> > > > > > > > use and test with. Unfortunately, I am unable to
access
> WCOSS
> > at
> > > > > this
> > > > > > > > time. I have submitted a helpdesk ticket to the WCOSS
> > helpdesk.
> > > > > Once
> > > > > > I
> > > > > > > am
> > > > > > > > able to access WCOSS again, I will get log on to Mars
and
> > > recompile
> > > > > > > MET. I
> > > > > > > > will follow up once it is ready for you.
> > > > > > > >
> > > > > > > > Thank you!
> > > > > > > >
> > > > > > > > Julie
> > > > > > > >
> > > > > > > > On Fri, Jan 31, 2020 at 1:33 PM John L Wagner - NOAA
Federal
> > via
> > > > RT <
> > > > > > > > met_help at ucar.edu> wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > <URL:
> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > > >
> > > > > > > > >
> > > > > > > > > Thanks John. I'm running MET v8.1 on mars
(dell/phase 3).
> > > > > > > > >
> > > > > > > > > module use /usrx/local/dev/modulefiles
> > > > > > > > > module load met/8.1
> > > > > > > > >
> > > > > > > > > On Fri, Jan 31, 2020 at 3:06 PM John Halley Gotway
via RT <
> > > > > > > > > met_help at ucar.edu>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi John,
> > > > > > > > > >
> > > > > > > > > > This is the sort of behavior that could be caused
by
> issues
> > > in
> > > > > your
> > > > > > > > > runtime
> > > > > > > > > > environment. Perhaps at runtime, the linker is
finding
> and
> > > > using
> > > > > > an
> > > > > > > > > > incompatible version of the HDF5 library... which
results
> > in
> > > > this
> > > > > > > > error.
> > > > > > > > > >
> > > > > > > > > > Your timing is perfect. We literally just met
with sys
> > > admins
> > > > > here
> > > > > > > at
> > > > > > > > > NCAR
> > > > > > > > > > about how we've been compiling MET. They strongly
> > encourage
> > > us
> > > > > to
> > > > > > > > define
> > > > > > > > > > the "rpath" (i.e. run path) in the linker flags
(LDFLAGS)
> > > when
> > > > we
> > > > > > > > > > configure/compile MET. The effect of that is that
the
> > > > > executables
> > > > > > > know
> > > > > > > > > > exactly what directories to search for the
dependent
> > > libraries
> > > > at
> > > > > > > > runtime
> > > > > > > > > > rather than relying on the user's environment
(i.e.
> > > > > > LD_LIBRARY_PATH)
> > > > > > > to
> > > > > > > > > > find them.
> > > > > > > > > >
> > > > > > > > > > One option we could try is asking Julie Prestopnik
to
> > > recompile
> > > > > > this
> > > > > > > > > > version of MET using these LDFLAGS settings. You
could
> > > repoint
> > > > > > your
> > > > > > > > > > script to this other version and test to see if
the
> > behavior
> > > > goes
> > > > > > > away
> > > > > > > > or
> > > > > > > > > > persists.
> > > > > > > > > >
> > > > > > > > > > If that solves it, we'll ask Julie to update her
process
> > for
> > > > > > > installing
> > > > > > > > > > future releases of MET. If not, it's back to the
drawing
> > > > board.
> > > > > > > > > >
> > > > > > > > > > Can you tell me exactly what version of MET you're
> running?
> > > > > Which
> > > > > > > > WCOSS
> > > > > > > > > > machine and what "module" commands you use to load
MET?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > John
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 31, 2020 at 6:56 AM John L Wagner -
NOAA
> > Federal
> > > > via
> > > > > > RT <
> > > > > > > > > > met_help at ucar.edu> wrote:
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Fri Jan 31 06:56:10 2020: Request 94009 was
acted upon.
> > > > > > > > > > > Transaction: Ticket created by
john.l.wagner at noaa.gov
> > > > > > > > > > > Queue: met_help
> > > > > > > > > > > Subject: netCDF issues when submitting MET
jobs to
> > > bsub
> > > > > > > > > > > Owner: Nobody
> > > > > > > > > > > Requestors: john.l.wagner at noaa.gov
> > > > > > > > > > > Status: new
> > > > > > > > > > > Ticket <URL:
> > > > > > > >
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=94009
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Greetings
> > > > > > > > > > > Sorry to email both the MET and WCOSS help
desks, but I
> > > > wasn't
> > > > > > sure
> > > > > > > > > where
> > > > > > > > > > > to send this ticket. We have been encountering
errors
> > > > creating
> > > > > > and
> > > > > > > > > > reading
> > > > > > > > > > > netCDF files on WCOSS (mars) recently. These
errors
> > occur
> > > > when
> > > > > > we
> > > > > > > > call
> > > > > > > > > > the
> > > > > > > > > > > MET programs for jobs submitted to the lsf
queue.
> > > > > > > > > > > I was running a series of MET's grid_stat jobs
> yesterday.
> > > > All
> > > > > > was
> > > > > > > > > going
> > > > > > > > > > > well until some time after 19Z, all jobs began
to fail.
> > > Here
> > > > > is
> > > > > > > the
> > > > > > > > > > error
> > > > > > > > > > > that I got:
> > > > > > > > > > >
> > > > > > > > > > > terminate called after throwing an instance of
> > > > > > > > > > > 'netCDF::exceptions::NcHdfErr'
> > > > > > > > > > > what(): NetCDF: HDF error
> > > > > > > > > > > file: ncCheck.cpp line:92
> > > > > > > > > > >
> > > > > > > > > > > A sample of the netCDF files that I was using
can be
> > found
> > > in
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/match_co_2019100509.134086/009
> > > > > > > > > > >
> > > > > > > > > > > I do not have any issues reading data from these
files
> > > using
> > > > > > > ncdump.
> > > > > > > > > > They
> > > > > > > > > > > don't appear to be corrupted. They are copies
of files
> > > that
> > > > I
> > > > > > > > created
> > > > > > > > > > two
> > > > > > > > > > > months ago and have been testing with for some
time
> now.
> > > > > > > > > > > Here are the bsub settings I'm using to submit
this
> job:
> > > > > > > > > > >
> > > > > > > > > > > export NTASK=104
> > > > > > > > > > > export PTILE=28
> > > > > > > > > > > export OMP_NUM_THREAD=20
> > > > > > > > > > > bsub -J ${flg}_${src}_${valid_date}_${elem} \
> > > > > > > > > > > -W 3:00 \
> > > > > > > > > > > -oo
> > > > > > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > > > > > -eo
> > > > > > $logdir/${flg}_${src}_mpmd_${valid_date}_${elem}.log \
> > > > > > > > > > > -P MDLST-T2O \
> > > > > > > > > > > -M 3000 \
> > > > > > > > > > > -q "dev" \
> > > > > > > > > > > -cwd $PWD \
> > > > > > > > > > > -R "affinity[core(1)]" \
> > > > > > > > > > > -n $NTASK \
> > > > > > > > > > > -R "span[ptile=$PTILE]" \
> > > > > > > > > > > -w "$regrid_dep" \
> > > > > > > > > > > $procdir/met_creeper_linden.sh -s $src
-t
> > > > $valid_date
> > > > > > -g
> > > > > > > > $flg
> > > > > > > > > > -f
> > > > > > > > > > > $force -e $elem
> > > > > > > > > > >
> > > > > > > > > > > A sample mpmd file that is used for CFP can be
found
> > here:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
/gpfs/dell3/ptmp/John.L.Wagner/matching_urma_tt_2019100509.134086/mpmd_file_matching_urma_2019100509_tt
> > > > > > > > > > >
> > > > > > > > > > > Myself and Erin Thead have also encountered
issues
> > creating
> > > > > > netCDF
> > > > > > > > > files
> > > > > > > > > > > using MET's regrid_data_plane program on WCOSS.
This
> > issue
> > > > has
> > > > > > > only
> > > > > > > > > been
> > > > > > > > > > > occurring for the past week or two and again
only
> occurs
> > > > when a
> > > > > > job
> > > > > > > > is
> > > > > > > > > > > submitted with bsub. Its not clear to us if
something
> on
> > > > WCOSS
> > > > > > has
> > > > > > > > > > changed
> > > > > > > > > > > (netCDF/hdf library), if MET is having issues
> > > reading/writing
> > > > > > > netCDF
> > > > > > > > > > files
> > > > > > > > > > > when jobs are run in parallel, or something
else.
> > > > > > > > > > > If you need any more information from us, please
let me
> > > know.
> > > > > > > Again,
> > > > > > > > > > sorry
> > > > > > > > > > > for emailing both help desks at once.
> > > > > > > > > > > Thanks
> > > > > > > > > > > John
> > > > > > > > > > > --
> > > > > > > > > > > John Wagner
> > > > > > > > > > > Verification Task Lead
> > > > > > > > > > > COR Task Manager
> > > > > > > > > > > NOAA/National Weather Service
> > > > > > > > > > > Meteorological Development Laboratory
> > > > > > > > > > > Digital Forecast Services Branch
> > > > > > > > > > > SSMC2 Room 10106
> > > > > > > > > > > Silver Spring, MD 20910
> > > > > > > > > > > (301) 427-9471 (office)
> > > > > > > > > > > (908) 902-4155 (cell/text)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > John Wagner
> > > > > > > > > Verification Task Lead
> > > > > > > > > COR Task Manager
> > > > > > > > > NOAA/National Weather Service
> > > > > > > > > Meteorological Development Laboratory
> > > > > > > > > Digital Forecast Services Branch
> > > > > > > > > SSMC2 Room 10106
> > > > > > > > > Silver Spring, MD 20910
> > > > > > > > > (301) 427-9471 (office)
> > > > > > > > > (908) 902-4155 (cell/text)
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Julie Prestopnik
> > > > > > > > Software Engineer
> > > > > > > > National Center for Atmospheric Research
> > > > > > > > Research Applications Laboratory
> > > > > > > > Phone: 303.497.8399
> > > > > > > > Email: jpresto at ucar.edu
> > > > > > > >
> > > > > > > > My working day may not be your working day. Please do
not
> feel
> > > > > obliged
> > > > > > > to
> > > > > > > > reply to this email outside of your normal working
hours.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > John Wagner
> > > > > > > Verification Task Lead
> > > > > > > COR Task Manager
> > > > > > > NOAA/National Weather Service
> > > > > > > Meteorological Development Laboratory
> > > > > > > Digital Forecast Services Branch
> > > > > > > SSMC2 Room 10106
> > > > > > > Silver Spring, MD 20910
> > > > > > > (301) 427-9471 (office)
> > > > > > > (908) 902-4155 (cell/text)
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Julie Prestopnik
> > > > > > Software Engineer
> > > > > > National Center for Atmospheric Research
> > > > > > Research Applications Laboratory
> > > > > > Phone: 303.497.8399
> > > > > > Email: jpresto at ucar.edu
> > > > > >
> > > > > > My working day may not be your working day. Please do not
feel
> > > obliged
> > > > > to
> > > > > > reply to this email outside of your normal working hours.
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > John Wagner
> > > > > Verification Task Lead
> > > > > COR Task Manager
> > > > > NOAA/National Weather Service
> > > > > Meteorological Development Laboratory
> > > > > Digital Forecast Services Branch
> > > > > SSMC2 Room 10106
> > > > > Silver Spring, MD 20910
> > > > > (301) 427-9471 (office)
> > > > > (908) 902-4155 (cell/text)
> > > > >
> > > > >
> > > >
> > > > --
> > > > Julie Prestopnik
> > > > Software Engineer
> > > > National Center for Atmospheric Research
> > > > Research Applications Laboratory
> > > > Phone: 303.497.8399
> > > > Email: jpresto at ucar.edu
> > > >
> > > > My working day may not be your working day. Please do not
feel
> obliged
> > > to
> > > > reply to this email outside of your normal working hours.
> > > >
> > > >
> > >
> > > --
> > > John Wagner
> > > Verification Task Lead
> > > COR Task Manager
> > > NOAA/National Weather Service
> > > Meteorological Development Laboratory
> > > Digital Forecast Services Branch
> > > SSMC2 Room 10106
> > > Silver Spring, MD 20910
> > > (301) 427-9471 (office)
> > > (908) 902-4155 (cell/text)
> > >
> > >
> >
> > --
> > Julie Prestopnik
> > Software Engineer
> > National Center for Atmospheric Research
> > Research Applications Laboratory
> > Phone: 303.497.8399
> > Email: jpresto at ucar.edu
> >
> > My working day may not be your working day. Please do not feel
obliged
> to
> > reply to this email outside of your normal working hours.
> >
> >
>
> --
> John Wagner
> Verification Task Lead
> COR Task Manager
> NOAA/National Weather Service
> Meteorological Development Laboratory
> Digital Forecast Services Branch
> SSMC2 Room 10106
> Silver Spring, MD 20910
> (301) 427-9471 (office)
> (908) 902-4155 (cell/text)
>
>
--
Julie Prestopnik
Software Engineer
National Center for Atmospheric Research
Research Applications Laboratory
Phone: 303.497.8399
Email: jpresto at ucar.edu
My working day may not be your working day. Please do not feel
obliged to
reply to this email outside of your normal working hours.
------------------------------------------------
More information about the Met_help
mailing list