[Met_help] [rt.rap.ucar.edu #91683] History for MTD stopped for no reasons
John Halley Gotway via RT
met_help at ucar.edu
Fri Aug 30 09:46:53 MDT 2019
----------------------------------------------------------------
Initial Request
----------------------------------------------------------------
To whom it may concern:
I have encountered that MTD stopped processing specific files. The files are readable and can be used to generate figures using “plot_data_plane” as follows:
plot_data_plane ./stivinput/2003/stiv_2003030100.nc ./ps_stiv_20030100.ps 'name="Precipitation"; level="(0,*,*)";'
DEBUG 1: Opening data file: ./stivinput/2003/stiv_2003030100.nc
DEBUG 1: Creating postscript file: ./ps_stiv_20030100.ps
I am running about ten years, however some of data processes with MTD, while some didn’t.
mtd -single $stivinput/2003/stiv_2003022823.nc $stivinput/2003/stiv_2003030100.nc -config /home/hisnamey/scratch/MET/config/stiv_config -outdir $stivout -v 2
DEBUG 2: mtd_read_data() -> processing file "/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc"
DEBUG 2: mtd_read_data() -> processing file "/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc"
DEBUG 2: regridding, if needed ...
--> So it stopped here / could not proceed.
I used cdo to extract netcdf files in time, however I am not sure what would be differences between years they passed / could not pass
I hope I can hear from you if someone experiences similar situation or problem. So 2002-2008 data did not work / while 2009-2013 data worked.
Best regards,
Yunsung Hwang
----------------------------------------------------------------
Complete Ticket History
----------------------------------------------------------------
Subject: MTD stopped for no reasons
From: John Halley Gotway
Time: Mon Aug 26 11:06:58 2019
Yunsung,
I see that you're having trouble running MTD on a long time series of
data. Since the time series is so long, the first thing I'd check is
whether or not you've run out of memory on your machine. When running
on a
Linux machine, I'll often start a command in one window and let it
run...
and then in another window, run the "top" command.
"top" shows you what processes are running, and what percent of the
CPU and
memory they're consuming. If you're running 6 years of daily data =
2190
time steps, you may just not have enough memory to do so! I'm not
aware of
us running MTD internally on longer than 30 time steps, for example.
One simple thing to test would be running it separately for each year
to
see if the issue you're seeing goes away.
Hope that helps.
Thanks,
John Halley Gotway
On Mon, Aug 26, 2019 at 10:45 AM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> Mon Aug 26 10:45:31 2019: Request 91683 was acted upon.
> Transaction: Ticket created by yunsung.hwang at usask.ca
> Queue: met_help
> Subject: MTD stopped for no reasons
> Owner: Nobody
> Requestors: yunsung.hwang at usask.ca
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
>
> To whom it may concern:
>
> I have encountered that MTD stopped processing specific files. The
files
> are readable and can be used to generate figures using
“plot_data_plane” as
> follows:
>
>
> plot_data_plane ./stivinput/2003/stiv_2003030100.nc
./ps_stiv_20030100.ps
> 'name="Precipitation"; level="(0,*,*)";'
>
> DEBUG 1: Opening data file: ./stivinput/2003/stiv_2003030100.nc
>
> DEBUG 1: Creating postscript file: ./ps_stiv_20030100.ps
>
> I am running about ten years, however some of data processes with
MTD,
> while some didn’t.
>
>
> mtd -single $stivinput/2003/stiv_2003022823.nc $stivinput/2003/
> stiv_2003030100.nc -config
/home/hisnamey/scratch/MET/config/stiv_config
> -outdir $stivout -v 2
>
> DEBUG 2: mtd_read_data() -> processing file
> "/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc"
>
> DEBUG 2: mtd_read_data() -> processing file
> "/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc"
>
> DEBUG 2: regridding, if needed ...
> --> So it stopped here / could not proceed.
>
> I used cdo to extract netcdf files in time, however I am not sure
what
> would be differences between years they passed / could not pass
>
> I hope I can hear from you if someone experiences similar situation
or
> problem. So 2002-2008 data did not work / while 2009-2013 data
worked.
>
> Best regards,
> Yunsung Hwang
>
>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #91683] MTD stopped for no reasons
From: yunsung.hwang at usask.ca
Time: Mon Aug 26 11:34:01 2019
Dear John Halley Gotway:
Thanks for your kind response.
The thing is, I did not run MTD in long period. I used 24 hourly data
(24 time stpes). I used 64 to 128 GB to run the MTD command in
background.
The real question was
1. there are two sets of data from 2013 and 2002
2. those are readable in MET by using "plot_data_plane"
3. 1 March 2013 worked properly using MTD
4. 1 March 2002 did not work using MTD and stopped at "DEBUG 2:
regridding, if needed ..."
When I submitted the job, not working one stayed there for hours. And
you might not have similar situation before based on what you're
saying.
Best regards,
Yunsung Hwang
On 2019-08-26, 11:07 AM, "John Halley Gotway via RT"
<met_help at ucar.edu> wrote:
Yunsung,
I see that you're having trouble running MTD on a long time series
of
data. Since the time series is so long, the first thing I'd check
is
whether or not you've run out of memory on your machine. When
running on a
Linux machine, I'll often start a command in one window and let it
run...
and then in another window, run the "top" command.
"top" shows you what processes are running, and what percent of
the CPU and
memory they're consuming. If you're running 6 years of daily data
= 2190
time steps, you may just not have enough memory to do so! I'm not
aware of
us running MTD internally on longer than 30 time steps, for
example.
One simple thing to test would be running it separately for each
year to
see if the issue you're seeing goes away.
Hope that helps.
Thanks,
John Halley Gotway
On Mon, Aug 26, 2019 at 10:45 AM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> Mon Aug 26 10:45:31 2019: Request 91683 was acted upon.
> Transaction: Ticket created by yunsung.hwang at usask.ca
> Queue: met_help
> Subject: MTD stopped for no reasons
> Owner: Nobody
> Requestors: yunsung.hwang at usask.ca
> Status: new
> Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
>
> To whom it may concern:
>
> I have encountered that MTD stopped processing specific files.
The files
> are readable and can be used to generate figures using
“plot_data_plane” as
> follows:
>
>
> plot_data_plane ./stivinput/2003/stiv_2003030100.nc
./ps_stiv_20030100.ps
> 'name="Precipitation"; level="(0,*,*)";'
>
> DEBUG 1: Opening data file: ./stivinput/2003/stiv_2003030100.nc
>
> DEBUG 1: Creating postscript file: ./ps_stiv_20030100.ps
>
> I am running about ten years, however some of data processes
with MTD,
> while some didn’t.
>
>
> mtd -single $stivinput/2003/stiv_2003022823.nc $stivinput/2003/
> stiv_2003030100.nc -config
/home/hisnamey/scratch/MET/config/stiv_config
> -outdir $stivout -v 2
>
> DEBUG 2: mtd_read_data() -> processing file
> "/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc"
>
> DEBUG 2: mtd_read_data() -> processing file
> "/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc"
>
> DEBUG 2: regridding, if needed ...
> --> So it stopped here / could not proceed.
>
> I used cdo to extract netcdf files in time, however I am not
sure what
> would be differences between years they passed / could not pass
>
> I hope I can hear from you if someone experiences similar
situation or
> problem. So 2002-2008 data did not work / while 2009-2013 data
worked.
>
> Best regards,
> Yunsung Hwang
>
>
------------------------------------------------
Subject: MTD stopped for no reasons
From: John Halley Gotway
Time: Mon Aug 26 11:43:00 2019
Yunsung,
Ah, OK. So you are running MTD with 24 time steps. That sounds much
more
reasonable.
If you're able to figure out exactly which MTD run is hanging, you
could
send me the data for that day, and I could try to replicate/fix the
behavior here. Please go to this link:
https://dtcenter.org/community-code/model-evaluation-tools-met/met-
help-desk
And scroll down to "How to send us data" to post data on our anonymous
ftp
site.
Thanks,
John
On Mon, Aug 26, 2019 at 11:34 AM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Dear John Halley Gotway:
>
> Thanks for your kind response.
>
> The thing is, I did not run MTD in long period. I used 24 hourly
data (24
> time stpes). I used 64 to 128 GB to run the MTD command in
background.
>
> The real question was
> 1. there are two sets of data from 2013 and 2002
> 2. those are readable in MET by using "plot_data_plane"
> 3. 1 March 2013 worked properly using MTD
> 4. 1 March 2002 did not work using MTD and stopped at "DEBUG 2:
> regridding, if needed ..."
>
> When I submitted the job, not working one stayed there for hours.
And you
> might not have similar situation before based on what you're saying.
>
> Best regards,
> Yunsung Hwang
>
> On 2019-08-26, 11:07 AM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> wrote:
>
> Yunsung,
>
> I see that you're having trouble running MTD on a long time
series of
> data. Since the time series is so long, the first thing I'd
check is
> whether or not you've run out of memory on your machine. When
running
> on a
> Linux machine, I'll often start a command in one window and let
it
> run...
> and then in another window, run the "top" command.
>
> "top" shows you what processes are running, and what percent of
the
> CPU and
> memory they're consuming. If you're running 6 years of daily
data =
> 2190
> time steps, you may just not have enough memory to do so! I'm
not
> aware of
> us running MTD internally on longer than 30 time steps, for
example.
>
> One simple thing to test would be running it separately for each
year
> to
> see if the issue you're seeing goes away.
>
> Hope that helps.
>
> Thanks,
> John Halley Gotway
>
>
>
> On Mon, Aug 26, 2019 at 10:45 AM yunsung.hwang at usask.ca via RT <
> met_help at ucar.edu> wrote:
>
> >
> > Mon Aug 26 10:45:31 2019: Request 91683 was acted upon.
> > Transaction: Ticket created by yunsung.hwang at usask.ca
> > Queue: met_help
> > Subject: MTD stopped for no reasons
> > Owner: Nobody
> > Requestors: yunsung.hwang at usask.ca
> > Status: new
> > Ticket <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> >
> >
> > To whom it may concern:
> >
> > I have encountered that MTD stopped processing specific files.
The
> files
> > are readable and can be used to generate figures using
> “plot_data_plane” as
> > follows:
> >
> >
> > plot_data_plane ./stivinput/2003/stiv_2003030100.nc ./
> ps_stiv_20030100.ps
> > 'name="Precipitation"; level="(0,*,*)";'
> >
> > DEBUG 1: Opening data file:
./stivinput/2003/stiv_2003030100.nc
> >
> > DEBUG 1: Creating postscript file: ./ps_stiv_20030100.ps
> >
> > I am running about ten years, however some of data processes
with
> MTD,
> > while some didn’t.
> >
> >
> > mtd -single $stivinput/2003/stiv_2003022823.nc
$stivinput/2003/
> > stiv_2003030100.nc -config
> /home/hisnamey/scratch/MET/config/stiv_config
> > -outdir $stivout -v 2
> >
> > DEBUG 2: mtd_read_data() -> processing file
> > "/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc"
> >
> > DEBUG 2: mtd_read_data() -> processing file
> > "/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc"
> >
> > DEBUG 2: regridding, if needed ...
> > --> So it stopped here / could not proceed.
> >
> > I used cdo to extract netcdf files in time, however I am not
sure
> what
> > would be differences between years they passed / could not
pass
> >
> > I hope I can hear from you if someone experiences similar
situation
> or
> > problem. So 2002-2008 data did not work / while 2009-2013 data
> worked.
> >
> > Best regards,
> > Yunsung Hwang
> >
> >
>
>
>
>
>
>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #91683] MTD stopped for no reasons
From: yunsung.hwang at usask.ca
Time: Mon Aug 26 11:59:50 2019
Dear John Halley Gotway:
Thanks for your kind response.
I prepared all the files including netcdf / shellscript / log of mtd.
I am not sure how to use in Mac OSX or server. Is there any website or
so I can take a look at how to transfer the files to the server?
Best regards,
Yunsung Hwang
On 2019-08-26, 11:43 AM, "John Halley Gotway via RT"
<met_help at ucar.edu> wrote:
Yunsung,
Ah, OK. So you are running MTD with 24 time steps. That sounds
much more
reasonable.
If you're able to figure out exactly which MTD run is hanging, you
could
send me the data for that day, and I could try to replicate/fix
the
behavior here. Please go to this link:
https://dtcenter.org/community-code/model-evaluation-tools-
met/met-help-desk
And scroll down to "How to send us data" to post data on our
anonymous ftp
site.
Thanks,
John
On Mon, Aug 26, 2019 at 11:34 AM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Dear John Halley Gotway:
>
> Thanks for your kind response.
>
> The thing is, I did not run MTD in long period. I used 24 hourly
data (24
> time stpes). I used 64 to 128 GB to run the MTD command in
background.
>
> The real question was
> 1. there are two sets of data from 2013 and 2002
> 2. those are readable in MET by using "plot_data_plane"
> 3. 1 March 2013 worked properly using MTD
> 4. 1 March 2002 did not work using MTD and stopped at "DEBUG 2:
> regridding, if needed ..."
>
> When I submitted the job, not working one stayed there for
hours. And you
> might not have similar situation before based on what you're
saying.
>
> Best regards,
> Yunsung Hwang
>
> On 2019-08-26, 11:07 AM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> wrote:
>
> Yunsung,
>
> I see that you're having trouble running MTD on a long time
series of
> data. Since the time series is so long, the first thing I'd
check is
> whether or not you've run out of memory on your machine.
When running
> on a
> Linux machine, I'll often start a command in one window and
let it
> run...
> and then in another window, run the "top" command.
>
> "top" shows you what processes are running, and what percent
of the
> CPU and
> memory they're consuming. If you're running 6 years of
daily data =
> 2190
> time steps, you may just not have enough memory to do so!
I'm not
> aware of
> us running MTD internally on longer than 30 time steps, for
example.
>
> One simple thing to test would be running it separately for
each year
> to
> see if the issue you're seeing goes away.
>
> Hope that helps.
>
> Thanks,
> John Halley Gotway
>
>
>
> On Mon, Aug 26, 2019 at 10:45 AM yunsung.hwang at usask.ca via
RT <
> met_help at ucar.edu> wrote:
>
> >
> > Mon Aug 26 10:45:31 2019: Request 91683 was acted upon.
> > Transaction: Ticket created by yunsung.hwang at usask.ca
> > Queue: met_help
> > Subject: MTD stopped for no reasons
> > Owner: Nobody
> > Requestors: yunsung.hwang at usask.ca
> > Status: new
> > Ticket <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> >
> >
> > To whom it may concern:
> >
> > I have encountered that MTD stopped processing specific
files. The
> files
> > are readable and can be used to generate figures using
> “plot_data_plane” as
> > follows:
> >
> >
> > plot_data_plane ./stivinput/2003/stiv_2003030100.nc ./
> ps_stiv_20030100.ps
> > 'name="Precipitation"; level="(0,*,*)";'
> >
> > DEBUG 1: Opening data file:
./stivinput/2003/stiv_2003030100.nc
> >
> > DEBUG 1: Creating postscript file: ./ps_stiv_20030100.ps
> >
> > I am running about ten years, however some of data
processes with
> MTD,
> > while some didn’t.
> >
> >
> > mtd -single $stivinput/2003/stiv_2003022823.nc
$stivinput/2003/
> > stiv_2003030100.nc -config
> /home/hisnamey/scratch/MET/config/stiv_config
> > -outdir $stivout -v 2
> >
> > DEBUG 2: mtd_read_data() -> processing file
> >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc"
> >
> > DEBUG 2: mtd_read_data() -> processing file
> >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc"
> >
> > DEBUG 2: regridding, if needed ...
> > --> So it stopped here / could not proceed.
> >
> > I used cdo to extract netcdf files in time, however I am
not sure
> what
> > would be differences between years they passed / could not
pass
> >
> > I hope I can hear from you if someone experiences similar
situation
> or
> > problem. So 2002-2008 data did not work / while 2009-2013
data
> worked.
> >
> > Best regards,
> > Yunsung Hwang
> >
> >
>
>
>
>
>
>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #91683] MTD stopped for no reasons
From: yunsung.hwang at usask.ca
Time: Mon Aug 26 12:24:37 2019
Dear John Halley Gotway:
Sorry for taking this much time to deal with ftp. I was not familiar
with using ftp to transfer the file.
I put data to " /incoming/irap/met_help/hwang_data"
I put shell script / config file for mtd / log of stopping mtd (would
stay there until I kill the process).
I chose three of netcdf files to shorten the time to transfer. The
files are generated by using CDO version 1.7.2.
Let me know if you have problem of reading the files.
Thanks for your help in advance!!
Best regards,
Yunsung Hwang
On 2019-08-26, 11:43 AM, "John Halley Gotway via RT"
<met_help at ucar.edu> wrote:
Yunsung,
Ah, OK. So you are running MTD with 24 time steps. That sounds
much more
reasonable.
If you're able to figure out exactly which MTD run is hanging, you
could
send me the data for that day, and I could try to replicate/fix
the
behavior here. Please go to this link:
https://dtcenter.org/community-code/model-evaluation-tools-
met/met-help-desk
And scroll down to "How to send us data" to post data on our
anonymous ftp
site.
Thanks,
John
On Mon, Aug 26, 2019 at 11:34 AM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Dear John Halley Gotway:
>
> Thanks for your kind response.
>
> The thing is, I did not run MTD in long period. I used 24 hourly
data (24
> time stpes). I used 64 to 128 GB to run the MTD command in
background.
>
> The real question was
> 1. there are two sets of data from 2013 and 2002
> 2. those are readable in MET by using "plot_data_plane"
> 3. 1 March 2013 worked properly using MTD
> 4. 1 March 2002 did not work using MTD and stopped at "DEBUG 2:
> regridding, if needed ..."
>
> When I submitted the job, not working one stayed there for
hours. And you
> might not have similar situation before based on what you're
saying.
>
> Best regards,
> Yunsung Hwang
>
> On 2019-08-26, 11:07 AM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> wrote:
>
> Yunsung,
>
> I see that you're having trouble running MTD on a long time
series of
> data. Since the time series is so long, the first thing I'd
check is
> whether or not you've run out of memory on your machine.
When running
> on a
> Linux machine, I'll often start a command in one window and
let it
> run...
> and then in another window, run the "top" command.
>
> "top" shows you what processes are running, and what percent
of the
> CPU and
> memory they're consuming. If you're running 6 years of
daily data =
> 2190
> time steps, you may just not have enough memory to do so!
I'm not
> aware of
> us running MTD internally on longer than 30 time steps, for
example.
>
> One simple thing to test would be running it separately for
each year
> to
> see if the issue you're seeing goes away.
>
> Hope that helps.
>
> Thanks,
> John Halley Gotway
>
>
>
> On Mon, Aug 26, 2019 at 10:45 AM yunsung.hwang at usask.ca via
RT <
> met_help at ucar.edu> wrote:
>
> >
> > Mon Aug 26 10:45:31 2019: Request 91683 was acted upon.
> > Transaction: Ticket created by yunsung.hwang at usask.ca
> > Queue: met_help
> > Subject: MTD stopped for no reasons
> > Owner: Nobody
> > Requestors: yunsung.hwang at usask.ca
> > Status: new
> > Ticket <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> >
> >
> > To whom it may concern:
> >
> > I have encountered that MTD stopped processing specific
files. The
> files
> > are readable and can be used to generate figures using
> “plot_data_plane” as
> > follows:
> >
> >
> > plot_data_plane ./stivinput/2003/stiv_2003030100.nc ./
> ps_stiv_20030100.ps
> > 'name="Precipitation"; level="(0,*,*)";'
> >
> > DEBUG 1: Opening data file:
./stivinput/2003/stiv_2003030100.nc
> >
> > DEBUG 1: Creating postscript file: ./ps_stiv_20030100.ps
> >
> > I am running about ten years, however some of data
processes with
> MTD,
> > while some didn’t.
> >
> >
> > mtd -single $stivinput/2003/stiv_2003022823.nc
$stivinput/2003/
> > stiv_2003030100.nc -config
> /home/hisnamey/scratch/MET/config/stiv_config
> > -outdir $stivout -v 2
> >
> > DEBUG 2: mtd_read_data() -> processing file
> >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc"
> >
> > DEBUG 2: mtd_read_data() -> processing file
> >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc"
> >
> > DEBUG 2: regridding, if needed ...
> > --> So it stopped here / could not proceed.
> >
> > I used cdo to extract netcdf files in time, however I am
not sure
> what
> > would be differences between years they passed / could not
pass
> >
> > I hope I can hear from you if someone experiences similar
situation
> or
> > problem. So 2002-2008 data did not work / while 2009-2013
data
> worked.
> >
> > Best regards,
> > Yunsung Hwang
> >
> >
>
>
>
>
>
>
------------------------------------------------
Subject: MTD stopped for no reasons
From: John Halley Gotway
Time: Mon Aug 26 13:00:29 2019
Thanks for sending the sample data. I pulled it down and will work on
testing it out today.
John
On Mon, Aug 26, 2019 at 12:25 PM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Dear John Halley Gotway:
>
> Sorry for taking this much time to deal with ftp. I was not familiar
with
> using ftp to transfer the file.
>
> I put data to " /incoming/irap/met_help/hwang_data"
> I put shell script / config file for mtd / log of stopping mtd
(would stay
> there until I kill the process).
> I chose three of netcdf files to shorten the time to transfer. The
files
> are generated by using CDO version 1.7.2.
>
> Let me know if you have problem of reading the files.
> Thanks for your help in advance!!
>
> Best regards,
> Yunsung Hwang
>
> On 2019-08-26, 11:43 AM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> wrote:
>
> Yunsung,
>
> Ah, OK. So you are running MTD with 24 time steps. That sounds
much
> more
> reasonable.
>
> If you're able to figure out exactly which MTD run is hanging,
you
> could
> send me the data for that day, and I could try to replicate/fix
the
> behavior here. Please go to this link:
>
> https://dtcenter.org/community-code/model-evaluation-tools-met/met-
help-desk
>
> And scroll down to "How to send us data" to post data on our
anonymous
> ftp
> site.
>
> Thanks,
> John
>
> On Mon, Aug 26, 2019 at 11:34 AM yunsung.hwang at usask.ca via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
>
> >
> > Dear John Halley Gotway:
> >
> > Thanks for your kind response.
> >
> > The thing is, I did not run MTD in long period. I used 24
hourly
> data (24
> > time stpes). I used 64 to 128 GB to run the MTD command in
> background.
> >
> > The real question was
> > 1. there are two sets of data from 2013 and 2002
> > 2. those are readable in MET by using "plot_data_plane"
> > 3. 1 March 2013 worked properly using MTD
> > 4. 1 March 2002 did not work using MTD and stopped at "DEBUG
2:
> > regridding, if needed ..."
> >
> > When I submitted the job, not working one stayed there for
hours.
> And you
> > might not have similar situation before based on what you're
saying.
> >
> > Best regards,
> > Yunsung Hwang
> >
> > On 2019-08-26, 11:07 AM, "John Halley Gotway via RT" <
> met_help at ucar.edu>
> > wrote:
> >
> > Yunsung,
> >
> > I see that you're having trouble running MTD on a long
time
> series of
> > data. Since the time series is so long, the first thing
I'd
> check is
> > whether or not you've run out of memory on your machine.
When
> running
> > on a
> > Linux machine, I'll often start a command in one window
and let
> it
> > run...
> > and then in another window, run the "top" command.
> >
> > "top" shows you what processes are running, and what
percent of
> the
> > CPU and
> > memory they're consuming. If you're running 6 years of
daily
> data =
> > 2190
> > time steps, you may just not have enough memory to do so!
I'm
> not
> > aware of
> > us running MTD internally on longer than 30 time steps,
for
> example.
> >
> > One simple thing to test would be running it separately
for each
> year
> > to
> > see if the issue you're seeing goes away.
> >
> > Hope that helps.
> >
> > Thanks,
> > John Halley Gotway
> >
> >
> >
> > On Mon, Aug 26, 2019 at 10:45 AM yunsung.hwang at usask.ca
via RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > Mon Aug 26 10:45:31 2019: Request 91683 was acted upon.
> > > Transaction: Ticket created by yunsung.hwang at usask.ca
> > > Queue: met_help
> > > Subject: MTD stopped for no reasons
> > > Owner: Nobody
> > > Requestors: yunsung.hwang at usask.ca
> > > Status: new
> > > Ticket <URL:
> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> > >
> > >
> > > To whom it may concern:
> > >
> > > I have encountered that MTD stopped processing specific
files.
> The
> > files
> > > are readable and can be used to generate figures using
> > “plot_data_plane” as
> > > follows:
> > >
> > >
> > > plot_data_plane ./stivinput/2003/stiv_2003030100.nc ./
> > ps_stiv_20030100.ps
> > > 'name="Precipitation"; level="(0,*,*)";'
> > >
> > > DEBUG 1: Opening data file: ./stivinput/2003/
> stiv_2003030100.nc
> > >
> > > DEBUG 1: Creating postscript file: ./ps_stiv_20030100.ps
> > >
> > > I am running about ten years, however some of data
processes
> with
> > MTD,
> > > while some didn’t.
> > >
> > >
> > > mtd -single $stivinput/2003/stiv_2003022823.nc
> $stivinput/2003/
> > > stiv_2003030100.nc -config
> > /home/hisnamey/scratch/MET/config/stiv_config
> > > -outdir $stivout -v 2
> > >
> > > DEBUG 2: mtd_read_data() -> processing file
> > >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc"
> > >
> > > DEBUG 2: mtd_read_data() -> processing file
> > >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc"
> > >
> > > DEBUG 2: regridding, if needed ...
> > > --> So it stopped here / could not proceed.
> > >
> > > I used cdo to extract netcdf files in time, however I am
not
> sure
> > what
> > > would be differences between years they passed / could
not pass
> > >
> > > I hope I can hear from you if someone experiences
similar
> situation
> > or
> > > problem. So 2002-2008 data did not work / while 2009-
2013 data
> > worked.
> > >
> > > Best regards,
> > > Yunsung Hwang
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>
------------------------------------------------
Subject: MTD stopped for no reasons
From: John Halley Gotway
Time: Tue Aug 27 11:52:19 2019
I reassigned this support ticket to Randy Bullock, the developer of
the MTD
software. I was able to replicate the behavior you described. I've
asked
Randy to debug it to better understand what's taking so long and how
to fix
it.
You should be hear back from him when he has an update ready.
Thanks,
John
On Mon, Aug 26, 2019 at 1:00 PM John Halley Gotway <johnhg at ucar.edu>
wrote:
> Thanks for sending the sample data. I pulled it down and will work
on
> testing it out today.
>
> John
>
> On Mon, Aug 26, 2019 at 12:25 PM yunsung.hwang at usask.ca via RT <
> met_help at ucar.edu> wrote:
>
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>>
>> Dear John Halley Gotway:
>>
>> Sorry for taking this much time to deal with ftp. I was not
familiar with
>> using ftp to transfer the file.
>>
>> I put data to " /incoming/irap/met_help/hwang_data"
>> I put shell script / config file for mtd / log of stopping mtd
(would
>> stay there until I kill the process).
>> I chose three of netcdf files to shorten the time to transfer. The
files
>> are generated by using CDO version 1.7.2.
>>
>> Let me know if you have problem of reading the files.
>> Thanks for your help in advance!!
>>
>> Best regards,
>> Yunsung Hwang
>>
>> On 2019-08-26, 11:43 AM, "John Halley Gotway via RT"
<met_help at ucar.edu>
>> wrote:
>>
>> Yunsung,
>>
>> Ah, OK. So you are running MTD with 24 time steps. That
sounds much
>> more
>> reasonable.
>>
>> If you're able to figure out exactly which MTD run is hanging,
you
>> could
>> send me the data for that day, and I could try to replicate/fix
the
>> behavior here. Please go to this link:
>>
>> https://dtcenter.org/community-code/model-evaluation-tools-met/met-
help-desk
>>
>> And scroll down to "How to send us data" to post data on our
>> anonymous ftp
>> site.
>>
>> Thanks,
>> John
>>
>> On Mon, Aug 26, 2019 at 11:34 AM yunsung.hwang at usask.ca via RT
<
>> met_help at ucar.edu> wrote:
>>
>> >
>> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
>
>> >
>> > Dear John Halley Gotway:
>> >
>> > Thanks for your kind response.
>> >
>> > The thing is, I did not run MTD in long period. I used 24
hourly
>> data (24
>> > time stpes). I used 64 to 128 GB to run the MTD command in
>> background.
>> >
>> > The real question was
>> > 1. there are two sets of data from 2013 and 2002
>> > 2. those are readable in MET by using "plot_data_plane"
>> > 3. 1 March 2013 worked properly using MTD
>> > 4. 1 March 2002 did not work using MTD and stopped at "DEBUG
2:
>> > regridding, if needed ..."
>> >
>> > When I submitted the job, not working one stayed there for
hours.
>> And you
>> > might not have similar situation before based on what you're
saying.
>> >
>> > Best regards,
>> > Yunsung Hwang
>> >
>> > On 2019-08-26, 11:07 AM, "John Halley Gotway via RT" <
>> met_help at ucar.edu>
>> > wrote:
>> >
>> > Yunsung,
>> >
>> > I see that you're having trouble running MTD on a long
time
>> series of
>> > data. Since the time series is so long, the first thing
I'd
>> check is
>> > whether or not you've run out of memory on your machine.
When
>> running
>> > on a
>> > Linux machine, I'll often start a command in one window
and let
>> it
>> > run...
>> > and then in another window, run the "top" command.
>> >
>> > "top" shows you what processes are running, and what
percent of
>> the
>> > CPU and
>> > memory they're consuming. If you're running 6 years of
daily
>> data =
>> > 2190
>> > time steps, you may just not have enough memory to do so!
I'm
>> not
>> > aware of
>> > us running MTD internally on longer than 30 time steps,
for
>> example.
>> >
>> > One simple thing to test would be running it separately
for
>> each year
>> > to
>> > see if the issue you're seeing goes away.
>> >
>> > Hope that helps.
>> >
>> > Thanks,
>> > John Halley Gotway
>> >
>> >
>> >
>> > On Mon, Aug 26, 2019 at 10:45 AM yunsung.hwang at usask.ca
via RT
>> <
>> > met_help at ucar.edu> wrote:
>> >
>> > >
>> > > Mon Aug 26 10:45:31 2019: Request 91683 was acted upon.
>> > > Transaction: Ticket created by yunsung.hwang at usask.ca
>> > > Queue: met_help
>> > > Subject: MTD stopped for no reasons
>> > > Owner: Nobody
>> > > Requestors: yunsung.hwang at usask.ca
>> > > Status: new
>> > > Ticket <URL:
>> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>> > >
>> > >
>> > > To whom it may concern:
>> > >
>> > > I have encountered that MTD stopped processing specific
>> files. The
>> > files
>> > > are readable and can be used to generate figures using
>> > “plot_data_plane” as
>> > > follows:
>> > >
>> > >
>> > > plot_data_plane ./stivinput/2003/stiv_2003030100.nc ./
>> > ps_stiv_20030100.ps
>> > > 'name="Precipitation"; level="(0,*,*)";'
>> > >
>> > > DEBUG 1: Opening data file: ./stivinput/2003/
>> stiv_2003030100.nc
>> > >
>> > > DEBUG 1: Creating postscript file:
./ps_stiv_20030100.ps
>> > >
>> > > I am running about ten years, however some of data
processes
>> with
>> > MTD,
>> > > while some didn’t.
>> > >
>> > >
>> > > mtd -single $stivinput/2003/stiv_2003022823.nc
>> $stivinput/2003/
>> > > stiv_2003030100.nc -config
>> > /home/hisnamey/scratch/MET/config/stiv_config
>> > > -outdir $stivout -v 2
>> > >
>> > > DEBUG 2: mtd_read_data() -> processing file
>> > >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc
>> "
>> > >
>> > > DEBUG 2: mtd_read_data() -> processing file
>> > >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc
>> "
>> > >
>> > > DEBUG 2: regridding, if needed ...
>> > > --> So it stopped here / could not proceed.
>> > >
>> > > I used cdo to extract netcdf files in time, however I
am not
>> sure
>> > what
>> > > would be differences between years they passed / could
not
>> pass
>> > >
>> > > I hope I can hear from you if someone experiences
similar
>> situation
>> > or
>> > > problem. So 2002-2008 data did not work / while 2009-
2013 data
>> > worked.
>> > >
>> > > Best regards,
>> > > Yunsung Hwang
>> > >
>> > >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #91683] MTD stopped for no reasons
From: yunsung.hwang at usask.ca
Time: Tue Aug 27 11:59:57 2019
Dear John Halley Gotway:
Thank you very much for your help!!
I would wait for additional notifications from met_help.
Best regards,
Yunsung Hwang
On 2019-08-27, 11:57 AM, "John Halley Gotway via RT"
<met_help at ucar.edu> wrote:
I reassigned this support ticket to Randy Bullock, the developer
of the MTD
software. I was able to replicate the behavior you described.
I've asked
Randy to debug it to better understand what's taking so long and
how to fix
it.
You should be hear back from him when he has an update ready.
Thanks,
John
On Mon, Aug 26, 2019 at 1:00 PM John Halley Gotway
<johnhg at ucar.edu> wrote:
> Thanks for sending the sample data. I pulled it down and will
work on
> testing it out today.
>
> John
>
> On Mon, Aug 26, 2019 at 12:25 PM yunsung.hwang at usask.ca via RT <
> met_help at ucar.edu> wrote:
>
>>
>> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>>
>> Dear John Halley Gotway:
>>
>> Sorry for taking this much time to deal with ftp. I was not
familiar with
>> using ftp to transfer the file.
>>
>> I put data to " /incoming/irap/met_help/hwang_data"
>> I put shell script / config file for mtd / log of stopping mtd
(would
>> stay there until I kill the process).
>> I chose three of netcdf files to shorten the time to transfer.
The files
>> are generated by using CDO version 1.7.2.
>>
>> Let me know if you have problem of reading the files.
>> Thanks for your help in advance!!
>>
>> Best regards,
>> Yunsung Hwang
>>
>> On 2019-08-26, 11:43 AM, "John Halley Gotway via RT"
<met_help at ucar.edu>
>> wrote:
>>
>> Yunsung,
>>
>> Ah, OK. So you are running MTD with 24 time steps. That
sounds much
>> more
>> reasonable.
>>
>> If you're able to figure out exactly which MTD run is
hanging, you
>> could
>> send me the data for that day, and I could try to
replicate/fix the
>> behavior here. Please go to this link:
>>
>> https://dtcenter.org/community-code/model-evaluation-tools-
met/met-help-desk
>>
>> And scroll down to "How to send us data" to post data on
our
>> anonymous ftp
>> site.
>>
>> Thanks,
>> John
>>
>> On Mon, Aug 26, 2019 at 11:34 AM yunsung.hwang at usask.ca via
RT <
>> met_help at ucar.edu> wrote:
>>
>> >
>> > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>> >
>> > Dear John Halley Gotway:
>> >
>> > Thanks for your kind response.
>> >
>> > The thing is, I did not run MTD in long period. I used 24
hourly
>> data (24
>> > time stpes). I used 64 to 128 GB to run the MTD command
in
>> background.
>> >
>> > The real question was
>> > 1. there are two sets of data from 2013 and 2002
>> > 2. those are readable in MET by using "plot_data_plane"
>> > 3. 1 March 2013 worked properly using MTD
>> > 4. 1 March 2002 did not work using MTD and stopped at
"DEBUG 2:
>> > regridding, if needed ..."
>> >
>> > When I submitted the job, not working one stayed there
for hours.
>> And you
>> > might not have similar situation before based on what
you're saying.
>> >
>> > Best regards,
>> > Yunsung Hwang
>> >
>> > On 2019-08-26, 11:07 AM, "John Halley Gotway via RT" <
>> met_help at ucar.edu>
>> > wrote:
>> >
>> > Yunsung,
>> >
>> > I see that you're having trouble running MTD on a
long time
>> series of
>> > data. Since the time series is so long, the first
thing I'd
>> check is
>> > whether or not you've run out of memory on your
machine. When
>> running
>> > on a
>> > Linux machine, I'll often start a command in one
window and let
>> it
>> > run...
>> > and then in another window, run the "top" command.
>> >
>> > "top" shows you what processes are running, and what
percent of
>> the
>> > CPU and
>> > memory they're consuming. If you're running 6 years
of daily
>> data =
>> > 2190
>> > time steps, you may just not have enough memory to do
so! I'm
>> not
>> > aware of
>> > us running MTD internally on longer than 30 time
steps, for
>> example.
>> >
>> > One simple thing to test would be running it
separately for
>> each year
>> > to
>> > see if the issue you're seeing goes away.
>> >
>> > Hope that helps.
>> >
>> > Thanks,
>> > John Halley Gotway
>> >
>> >
>> >
>> > On Mon, Aug 26, 2019 at 10:45 AM
yunsung.hwang at usask.ca via RT
>> <
>> > met_help at ucar.edu> wrote:
>> >
>> > >
>> > > Mon Aug 26 10:45:31 2019: Request 91683 was acted
upon.
>> > > Transaction: Ticket created by
yunsung.hwang at usask.ca
>> > > Queue: met_help
>> > > Subject: MTD stopped for no reasons
>> > > Owner: Nobody
>> > > Requestors: yunsung.hwang at usask.ca
>> > > Status: new
>> > > Ticket <URL:
>> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>> > >
>> > >
>> > > To whom it may concern:
>> > >
>> > > I have encountered that MTD stopped processing
specific
>> files. The
>> > files
>> > > are readable and can be used to generate figures
using
>> > “plot_data_plane” as
>> > > follows:
>> > >
>> > >
>> > > plot_data_plane ./stivinput/2003/stiv_2003030100.nc
./
>> > ps_stiv_20030100.ps
>> > > 'name="Precipitation"; level="(0,*,*)";'
>> > >
>> > > DEBUG 1: Opening data file: ./stivinput/2003/
>> stiv_2003030100.nc
>> > >
>> > > DEBUG 1: Creating postscript file:
./ps_stiv_20030100.ps
>> > >
>> > > I am running about ten years, however some of data
processes
>> with
>> > MTD,
>> > > while some didn’t.
>> > >
>> > >
>> > > mtd -single $stivinput/2003/stiv_2003022823.nc
>> $stivinput/2003/
>> > > stiv_2003030100.nc -config
>> > /home/hisnamey/scratch/MET/config/stiv_config
>> > > -outdir $stivout -v 2
>> > >
>> > > DEBUG 2: mtd_read_data() -> processing file
>> > >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003022823.nc
>> "
>> > >
>> > > DEBUG 2: mtd_read_data() -> processing file
>> > >
"/home/hisnamey/scratch/MET/stivinput/2003/stiv_2003030100.nc
>> "
>> > >
>> > > DEBUG 2: regridding, if needed ...
>> > > --> So it stopped here / could not proceed.
>> > >
>> > > I used cdo to extract netcdf files in time, however
I am not
>> sure
>> > what
>> > > would be differences between years they passed /
could not
>> pass
>> > >
>> > > I hope I can hear from you if someone experiences
similar
>> situation
>> > or
>> > > problem. So 2002-2008 data did not work / while
2009-2013 data
>> > worked.
>> > >
>> > > Best regards,
>> > > Yunsung Hwang
>> > >
>> > >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>>
------------------------------------------------
Subject: MTD stopped for no reasons
From: John Halley Gotway
Time: Thu Aug 29 13:37:25 2019
Yunsung,
I took a closer look at this data, ran through the debugger, and see
what's
going on. It appears this is really a problem with your data instead
of
the software. However, the problem with the data led to a somewhat
comical
result.
You sent me 3 sample NetCDF files: stiv_2003022823.nc,
stiv_2003030100.nc,
stiv_2003030101.nc
The problem is that the lat and lon variables in those files contain
all
missing values:
ncdump -v lat stiv_2003022823.nc
And this is evident when I try to plot the data using plot_data_plane:
/plot_data_plane ./stiv_2003030100.nc ./stiv_2003030100.ps
'name="Precipitation"; level="(0,*,*)";'
The resulting image (attached) includes no map data which indicates
that
MET doesn't know where on earth this data lives.
So why does it hang? Rather surprisingly, the NetCDF library code in
MET
parses the grid spec from the lat/lon data. But the min/max lat/lon
values
are stored as a fill value of 9.96E36. And it tries to rescale that
longitude value down to the expected range of -180 to 180 by
adding/subtracting 360's as needed. But that math takes a very, very,
very
long time.
So the apparent "hang" in running MTD (or timeout on your HPC) was
really
just caused by missing lat/lon values in your input data files.
Ideally, MTD would not have made it that far into the processing for
this
to be a problem, so it would be better for use to add some sanity
checks to
the code which parses the grid information.
Hope that helps clarify.
Thanks,
John
On Tue, Aug 27, 2019 at 11:59 AM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Dear John Halley Gotway:
>
> Thank you very much for your help!!
>
> I would wait for additional notifications from met_help.
>
> Best regards,
> Yunsung Hwang
>
> On 2019-08-27, 11:57 AM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> wrote:
>
> I reassigned this support ticket to Randy Bullock, the developer
of
> the MTD
> software. I was able to replicate the behavior you described.
I've
> asked
> Randy to debug it to better understand what's taking so long and
how
> to fix
> it.
>
> You should be hear back from him when he has an update ready.
>
> Thanks,
> John
>
> On Mon, Aug 26, 2019 at 1:00 PM John Halley Gotway
<johnhg at ucar.edu>
> wrote:
>
> > Thanks for sending the sample data. I pulled it down and will
work
> on
> > testing it out today.
> >
> > John
> >
> > On Mon, Aug 26, 2019 at 12:25 PM yunsung.hwang at usask.ca via RT
<
> > met_help at ucar.edu> wrote:
> >
> >>
> >> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
>
> >>
> >> Dear John Halley Gotway:
> >>
> >> Sorry for taking this much time to deal with ftp. I was not
> familiar with
> >> using ftp to transfer the file.
> >>
> >> I put data to " /incoming/irap/met_help/hwang_data"
> >> I put shell script / config file for mtd / log of stopping
mtd
> (would
> >> stay there until I kill the process).
> >> I chose three of netcdf files to shorten the time to
transfer. The
> files
> >> are generated by using CDO version 1.7.2.
> >>
> >> Let me know if you have problem of reading the files.
> >> Thanks for your help in advance!!
> >>
> >> Best regards,
> >> Yunsung Hwang
> >>
> >> On 2019-08-26, 11:43 AM, "John Halley Gotway via RT" <
> met_help at ucar.edu>
> >> wrote:
> >>
> >> Yunsung,
> >>
> >> Ah, OK. So you are running MTD with 24 time steps. That
> sounds much
> >> more
> >> reasonable.
> >>
> >> If you're able to figure out exactly which MTD run is
hanging,
> you
> >> could
> >> send me the data for that day, and I could try to
replicate/fix
> the
> >> behavior here. Please go to this link:
> >>
> >>
> https://dtcenter.org/community-code/model-evaluation-tools-met/met-
help-desk
> >>
> >> And scroll down to "How to send us data" to post data on
our
> >> anonymous ftp
> >> site.
> >>
> >> Thanks,
> >> John
> >>
> >> On Mon, Aug 26, 2019 at 11:34 AM yunsung.hwang at usask.ca
via RT
> <
> >> met_help at ucar.edu> wrote:
> >>
> >> >
> >> > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> >
> >> >
> >> > Dear John Halley Gotway:
> >> >
> >> > Thanks for your kind response.
> >> >
> >> > The thing is, I did not run MTD in long period. I used
24
> hourly
> >> data (24
> >> > time stpes). I used 64 to 128 GB to run the MTD command
in
> >> background.
> >> >
> >> > The real question was
> >> > 1. there are two sets of data from 2013 and 2002
> >> > 2. those are readable in MET by using "plot_data_plane"
> >> > 3. 1 March 2013 worked properly using MTD
> >> > 4. 1 March 2002 did not work using MTD and stopped at
"DEBUG
> 2:
> >> > regridding, if needed ..."
> >> >
> >> > When I submitted the job, not working one stayed there
for
> hours.
> >> And you
> >> > might not have similar situation before based on what
you're
> saying.
> >> >
> >> > Best regards,
> >> > Yunsung Hwang
> >> >
> >> > On 2019-08-26, 11:07 AM, "John Halley Gotway via RT" <
> >> met_help at ucar.edu>
> >> > wrote:
> >> >
> >> > Yunsung,
> >> >
> >> > I see that you're having trouble running MTD on a
long
> time
> >> series of
> >> > data. Since the time series is so long, the first
thing
> I'd
> >> check is
> >> > whether or not you've run out of memory on your
machine.
> When
> >> running
> >> > on a
> >> > Linux machine, I'll often start a command in one
window
> and let
> >> it
> >> > run...
> >> > and then in another window, run the "top" command.
> >> >
> >> > "top" shows you what processes are running, and
what
> percent of
> >> the
> >> > CPU and
> >> > memory they're consuming. If you're running 6
years of
> daily
> >> data =
> >> > 2190
> >> > time steps, you may just not have enough memory to
do
> so! I'm
> >> not
> >> > aware of
> >> > us running MTD internally on longer than 30 time
steps,
> for
> >> example.
> >> >
> >> > One simple thing to test would be running it
separately
> for
> >> each year
> >> > to
> >> > see if the issue you're seeing goes away.
> >> >
> >> > Hope that helps.
> >> >
> >> > Thanks,
> >> > John Halley Gotway
> >> >
> >> >
> >> >
> >> > On Mon, Aug 26, 2019 at 10:45 AM
yunsung.hwang at usask.ca
> via RT
> >> <
> >> > met_help at ucar.edu> wrote:
> >> >
> >> > >
> >> > > Mon Aug 26 10:45:31 2019: Request 91683 was acted
upon.
> >> > > Transaction: Ticket created by
yunsung.hwang at usask.ca
> >> > > Queue: met_help
> >> > > Subject: MTD stopped for no reasons
> >> > > Owner: Nobody
> >> > > Requestors: yunsung.hwang at usask.ca
> >> > > Status: new
> >> > > Ticket <URL:
> >> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
>
> >> > >
> >> > >
> >> > > To whom it may concern:
> >> > >
> >> > > I have encountered that MTD stopped processing
specific
> >> files. The
> >> > files
> >> > > are readable and can be used to generate figures
using
> >> > “plot_data_plane” as
> >> > > follows:
> >> > >
> >> > >
> >> > > plot_data_plane
./stivinput/2003/stiv_2003030100.nc ./
> >> > ps_stiv_20030100.ps
> >> > > 'name="Precipitation"; level="(0,*,*)";'
> >> > >
> >> > > DEBUG 1: Opening data file: ./stivinput/2003/
> >> stiv_2003030100.nc
> >> > >
> >> > > DEBUG 1: Creating postscript file: ./
> ps_stiv_20030100.ps
> >> > >
> >> > > I am running about ten years, however some of
data
> processes
> >> with
> >> > MTD,
> >> > > while some didn’t.
> >> > >
> >> > >
> >> > > mtd -single $stivinput/2003/stiv_2003022823.nc
> >> $stivinput/2003/
> >> > > stiv_2003030100.nc -config
> >> > /home/hisnamey/scratch/MET/config/stiv_config
> >> > > -outdir $stivout -v 2
> >> > >
> >> > > DEBUG 2: mtd_read_data() -> processing file
> >> > > "/home/hisnamey/scratch/MET/stivinput/2003/
> stiv_2003022823.nc
> >> "
> >> > >
> >> > > DEBUG 2: mtd_read_data() -> processing file
> >> > > "/home/hisnamey/scratch/MET/stivinput/2003/
> stiv_2003030100.nc
> >> "
> >> > >
> >> > > DEBUG 2: regridding, if needed ...
> >> > > --> So it stopped here / could not proceed.
> >> > >
> >> > > I used cdo to extract netcdf files in time,
however I
> am not
> >> sure
> >> > what
> >> > > would be differences between years they passed /
could
> not
> >> pass
> >> > >
> >> > > I hope I can hear from you if someone experiences
> similar
> >> situation
> >> > or
> >> > > problem. So 2002-2008 data did not work / while
> 2009-2013 data
> >> > worked.
> >> > >
> >> > > Best regards,
> >> > > Yunsung Hwang
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >>
> >>
> >>
>
>
>
>
>
>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #91683] MTD stopped for no reasons
From: yunsung.hwang at usask.ca
Time: Thu Aug 29 14:09:56 2019
Dear John Halley Gotway:
Thanks a lot for your help.
So I think those files worked included proper information of lat lon
while failed did not have the information.
If possible, is there any way I can extract the information of lat,lon
from one file and put into another file?
I mean extract "lat,lon info" from "2013.nc" and put the extracted
"lat,lon" info to "2003.nc"?
Is there something I can do with NCL or NCO or CDO or do I need to
handle this by taking care of NF-compliant netcdf file?
Thanks a lot for your help and hope I can hear from you shortly. If I
need to contact some places else, please let me know and it would be
great help for me since I am not quite sure how to deal with this
issue by myself clearly.
Best regards,
Yunsung Hwang
On 2019-08-29, 1:37 PM, "John Halley Gotway via RT"
<met_help at ucar.edu> wrote:
Yunsung,
I took a closer look at this data, ran through the debugger, and
see what's
going on. It appears this is really a problem with your data
instead of
the software. However, the problem with the data led to a
somewhat comical
result.
You sent me 3 sample NetCDF files: stiv_2003022823.nc,
stiv_2003030100.nc,
stiv_2003030101.nc
The problem is that the lat and lon variables in those files
contain all
missing values:
ncdump -v lat stiv_2003022823.nc
And this is evident when I try to plot the data using
plot_data_plane:
/plot_data_plane ./stiv_2003030100.nc ./stiv_2003030100.ps
'name="Precipitation"; level="(0,*,*)";'
The resulting image (attached) includes no map data which
indicates that
MET doesn't know where on earth this data lives.
So why does it hang? Rather surprisingly, the NetCDF library code
in MET
parses the grid spec from the lat/lon data. But the min/max
lat/lon values
are stored as a fill value of 9.96E36. And it tries to rescale
that
longitude value down to the expected range of -180 to 180 by
adding/subtracting 360's as needed. But that math takes a very,
very, very
long time.
So the apparent "hang" in running MTD (or timeout on your HPC) was
really
just caused by missing lat/lon values in your input data files.
Ideally, MTD would not have made it that far into the processing
for this
to be a problem, so it would be better for use to add some sanity
checks to
the code which parses the grid information.
Hope that helps clarify.
Thanks,
John
On Tue, Aug 27, 2019 at 11:59 AM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Dear John Halley Gotway:
>
> Thank you very much for your help!!
>
> I would wait for additional notifications from met_help.
>
> Best regards,
> Yunsung Hwang
>
> On 2019-08-27, 11:57 AM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> wrote:
>
> I reassigned this support ticket to Randy Bullock, the
developer of
> the MTD
> software. I was able to replicate the behavior you
described. I've
> asked
> Randy to debug it to better understand what's taking so long
and how
> to fix
> it.
>
> You should be hear back from him when he has an update
ready.
>
> Thanks,
> John
>
> On Mon, Aug 26, 2019 at 1:00 PM John Halley Gotway
<johnhg at ucar.edu>
> wrote:
>
> > Thanks for sending the sample data. I pulled it down and
will work
> on
> > testing it out today.
> >
> > John
> >
> > On Mon, Aug 26, 2019 at 12:25 PM yunsung.hwang at usask.ca
via RT <
> > met_help at ucar.edu> wrote:
> >
> >>
> >> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> >>
> >> Dear John Halley Gotway:
> >>
> >> Sorry for taking this much time to deal with ftp. I was
not
> familiar with
> >> using ftp to transfer the file.
> >>
> >> I put data to " /incoming/irap/met_help/hwang_data"
> >> I put shell script / config file for mtd / log of
stopping mtd
> (would
> >> stay there until I kill the process).
> >> I chose three of netcdf files to shorten the time to
transfer. The
> files
> >> are generated by using CDO version 1.7.2.
> >>
> >> Let me know if you have problem of reading the files.
> >> Thanks for your help in advance!!
> >>
> >> Best regards,
> >> Yunsung Hwang
> >>
> >> On 2019-08-26, 11:43 AM, "John Halley Gotway via RT" <
> met_help at ucar.edu>
> >> wrote:
> >>
> >> Yunsung,
> >>
> >> Ah, OK. So you are running MTD with 24 time steps.
That
> sounds much
> >> more
> >> reasonable.
> >>
> >> If you're able to figure out exactly which MTD run is
hanging,
> you
> >> could
> >> send me the data for that day, and I could try to
replicate/fix
> the
> >> behavior here. Please go to this link:
> >>
> >>
> https://dtcenter.org/community-code/model-evaluation-tools-
met/met-help-desk
> >>
> >> And scroll down to "How to send us data" to post data
on our
> >> anonymous ftp
> >> site.
> >>
> >> Thanks,
> >> John
> >>
> >> On Mon, Aug 26, 2019 at 11:34 AM
yunsung.hwang at usask.ca via RT
> <
> >> met_help at ucar.edu> wrote:
> >>
> >> >
> >> > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> >
> >> >
> >> > Dear John Halley Gotway:
> >> >
> >> > Thanks for your kind response.
> >> >
> >> > The thing is, I did not run MTD in long period. I
used 24
> hourly
> >> data (24
> >> > time stpes). I used 64 to 128 GB to run the MTD
command in
> >> background.
> >> >
> >> > The real question was
> >> > 1. there are two sets of data from 2013 and 2002
> >> > 2. those are readable in MET by using
"plot_data_plane"
> >> > 3. 1 March 2013 worked properly using MTD
> >> > 4. 1 March 2002 did not work using MTD and stopped
at "DEBUG
> 2:
> >> > regridding, if needed ..."
> >> >
> >> > When I submitted the job, not working one stayed
there for
> hours.
> >> And you
> >> > might not have similar situation before based on
what you're
> saying.
> >> >
> >> > Best regards,
> >> > Yunsung Hwang
> >> >
> >> > On 2019-08-26, 11:07 AM, "John Halley Gotway via
RT" <
> >> met_help at ucar.edu>
> >> > wrote:
> >> >
> >> > Yunsung,
> >> >
> >> > I see that you're having trouble running MTD on
a long
> time
> >> series of
> >> > data. Since the time series is so long, the
first thing
> I'd
> >> check is
> >> > whether or not you've run out of memory on your
machine.
> When
> >> running
> >> > on a
> >> > Linux machine, I'll often start a command in
one window
> and let
> >> it
> >> > run...
> >> > and then in another window, run the "top"
command.
> >> >
> >> > "top" shows you what processes are running, and
what
> percent of
> >> the
> >> > CPU and
> >> > memory they're consuming. If you're running 6
years of
> daily
> >> data =
> >> > 2190
> >> > time steps, you may just not have enough memory
to do
> so! I'm
> >> not
> >> > aware of
> >> > us running MTD internally on longer than 30
time steps,
> for
> >> example.
> >> >
> >> > One simple thing to test would be running it
separately
> for
> >> each year
> >> > to
> >> > see if the issue you're seeing goes away.
> >> >
> >> > Hope that helps.
> >> >
> >> > Thanks,
> >> > John Halley Gotway
> >> >
> >> >
> >> >
> >> > On Mon, Aug 26, 2019 at 10:45 AM
yunsung.hwang at usask.ca
> via RT
> >> <
> >> > met_help at ucar.edu> wrote:
> >> >
> >> > >
> >> > > Mon Aug 26 10:45:31 2019: Request 91683 was
acted upon.
> >> > > Transaction: Ticket created by
yunsung.hwang at usask.ca
> >> > > Queue: met_help
> >> > > Subject: MTD stopped for no reasons
> >> > > Owner: Nobody
> >> > > Requestors: yunsung.hwang at usask.ca
> >> > > Status: new
> >> > > Ticket <URL:
> >> >
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> >> > >
> >> > >
> >> > > To whom it may concern:
> >> > >
> >> > > I have encountered that MTD stopped
processing specific
> >> files. The
> >> > files
> >> > > are readable and can be used to generate
figures using
> >> > “plot_data_plane” as
> >> > > follows:
> >> > >
> >> > >
> >> > > plot_data_plane
./stivinput/2003/stiv_2003030100.nc ./
> >> > ps_stiv_20030100.ps
> >> > > 'name="Precipitation"; level="(0,*,*)";'
> >> > >
> >> > > DEBUG 1: Opening data file: ./stivinput/2003/
> >> stiv_2003030100.nc
> >> > >
> >> > > DEBUG 1: Creating postscript file: ./
> ps_stiv_20030100.ps
> >> > >
> >> > > I am running about ten years, however some of
data
> processes
> >> with
> >> > MTD,
> >> > > while some didn’t.
> >> > >
> >> > >
> >> > > mtd -single
$stivinput/2003/stiv_2003022823.nc
> >> $stivinput/2003/
> >> > > stiv_2003030100.nc -config
> >> > /home/hisnamey/scratch/MET/config/stiv_config
> >> > > -outdir $stivout -v 2
> >> > >
> >> > > DEBUG 2: mtd_read_data() -> processing file
> >> > > "/home/hisnamey/scratch/MET/stivinput/2003/
> stiv_2003022823.nc
> >> "
> >> > >
> >> > > DEBUG 2: mtd_read_data() -> processing file
> >> > > "/home/hisnamey/scratch/MET/stivinput/2003/
> stiv_2003030100.nc
> >> "
> >> > >
> >> > > DEBUG 2: regridding, if needed ...
> >> > > --> So it stopped here / could not proceed.
> >> > >
> >> > > I used cdo to extract netcdf files in time,
however I
> am not
> >> sure
> >> > what
> >> > > would be differences between years they
passed / could
> not
> >> pass
> >> > >
> >> > > I hope I can hear from you if someone
experiences
> similar
> >> situation
> >> > or
> >> > > problem. So 2002-2008 data did not work /
while
> 2009-2013 data
> >> > worked.
> >> > >
> >> > > Best regards,
> >> > > Yunsung Hwang
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >>
> >>
> >>
>
>
>
>
>
>
------------------------------------------------
Subject: MTD stopped for no reasons
From: John Halley Gotway
Time: Thu Aug 29 15:20:33 2019
Yunsung,
Hmmm, well there's just so many different ways people edit NetCDF
files,
it's hard to know what direction to point you in.
Actually, there is a current development task for MET that would
really
help out in this situation if it were already done! We want to give
users
the ability to overwrite the metadata read from gridded data files
without
having to edit the files directly. But that feature doesn't exist
yet.
If you are comfortable with python, R, or NCL, you could read the
NetCDF
files into those tools, correct the lat/lon values, and then write
them
back out.
Or there may be NetCDF Operator commands that may be useful to you:
http://nco.sourceforge.net/
Unfortunately, I don't know them well enough to tell you exactly how
to do
it.
John
On Thu, Aug 29, 2019 at 2:10 PM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Dear John Halley Gotway:
>
> Thanks a lot for your help.
>
> So I think those files worked included proper information of lat lon
while
> failed did not have the information.
>
> If possible, is there any way I can extract the information of
lat,lon
> from one file and put into another file?
>
> I mean extract "lat,lon info" from "2013.nc" and put the extracted
> "lat,lon" info to "2003.nc"?
>
> Is there something I can do with NCL or NCO or CDO or do I need to
handle
> this by taking care of NF-compliant netcdf file?
>
> Thanks a lot for your help and hope I can hear from you shortly. If
I need
> to contact some places else, please let me know and it would be
great help
> for me since I am not quite sure how to deal with this issue by
myself
> clearly.
>
> Best regards,
> Yunsung Hwang
>
> On 2019-08-29, 1:37 PM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> wrote:
>
> Yunsung,
>
> I took a closer look at this data, ran through the debugger, and
see
> what's
> going on. It appears this is really a problem with your data
instead
> of
> the software. However, the problem with the data led to a
somewhat
> comical
> result.
>
> You sent me 3 sample NetCDF files: stiv_2003022823.nc,
> stiv_2003030100.nc,
> stiv_2003030101.nc
>
> The problem is that the lat and lon variables in those files
contain
> all
> missing values:
> ncdump -v lat stiv_2003022823.nc
>
> And this is evident when I try to plot the data using
plot_data_plane:
> /plot_data_plane ./stiv_2003030100.nc ./stiv_2003030100.ps
> 'name="Precipitation"; level="(0,*,*)";'
>
> The resulting image (attached) includes no map data which
indicates
> that
> MET doesn't know where on earth this data lives.
>
> So why does it hang? Rather surprisingly, the NetCDF library
code in
> MET
> parses the grid spec from the lat/lon data. But the min/max
lat/lon
> values
> are stored as a fill value of 9.96E36. And it tries to rescale
that
> longitude value down to the expected range of -180 to 180 by
> adding/subtracting 360's as needed. But that math takes a very,
very,
> very
> long time.
>
> So the apparent "hang" in running MTD (or timeout on your HPC)
was
> really
> just caused by missing lat/lon values in your input data files.
>
> Ideally, MTD would not have made it that far into the processing
for
> this
> to be a problem, so it would be better for use to add some
sanity
> checks to
> the code which parses the grid information.
>
> Hope that helps clarify.
>
> Thanks,
> John
>
> On Tue, Aug 27, 2019 at 11:59 AM yunsung.hwang at usask.ca via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
>
> >
> > Dear John Halley Gotway:
> >
> > Thank you very much for your help!!
> >
> > I would wait for additional notifications from met_help.
> >
> > Best regards,
> > Yunsung Hwang
> >
> > On 2019-08-27, 11:57 AM, "John Halley Gotway via RT" <
> met_help at ucar.edu>
> > wrote:
> >
> > I reassigned this support ticket to Randy Bullock, the
developer
> of
> > the MTD
> > software. I was able to replicate the behavior you
described.
> I've
> > asked
> > Randy to debug it to better understand what's taking so
long and
> how
> > to fix
> > it.
> >
> > You should be hear back from him when he has an update
ready.
> >
> > Thanks,
> > John
> >
> > On Mon, Aug 26, 2019 at 1:00 PM John Halley Gotway <
> johnhg at ucar.edu>
> > wrote:
> >
> > > Thanks for sending the sample data. I pulled it down
and will
> work
> > on
> > > testing it out today.
> > >
> > > John
> > >
> > > On Mon, Aug 26, 2019 at 12:25 PM yunsung.hwang at usask.ca
via
> RT <
> > > met_help at ucar.edu> wrote:
> > >
> > >>
> > >> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> >
> > >>
> > >> Dear John Halley Gotway:
> > >>
> > >> Sorry for taking this much time to deal with ftp. I was
not
> > familiar with
> > >> using ftp to transfer the file.
> > >>
> > >> I put data to " /incoming/irap/met_help/hwang_data"
> > >> I put shell script / config file for mtd / log of
stopping mtd
> > (would
> > >> stay there until I kill the process).
> > >> I chose three of netcdf files to shorten the time to
> transfer. The
> > files
> > >> are generated by using CDO version 1.7.2.
> > >>
> > >> Let me know if you have problem of reading the files.
> > >> Thanks for your help in advance!!
> > >>
> > >> Best regards,
> > >> Yunsung Hwang
> > >>
> > >> On 2019-08-26, 11:43 AM, "John Halley Gotway via RT" <
> > met_help at ucar.edu>
> > >> wrote:
> > >>
> > >> Yunsung,
> > >>
> > >> Ah, OK. So you are running MTD with 24 time steps.
That
> > sounds much
> > >> more
> > >> reasonable.
> > >>
> > >> If you're able to figure out exactly which MTD run
is
> hanging,
> > you
> > >> could
> > >> send me the data for that day, and I could try to
> replicate/fix
> > the
> > >> behavior here. Please go to this link:
> > >>
> > >>
> >
> https://dtcenter.org/community-code/model-evaluation-tools-met/met-
help-desk
> > >>
> > >> And scroll down to "How to send us data" to post
data on
> our
> > >> anonymous ftp
> > >> site.
> > >>
> > >> Thanks,
> > >> John
> > >>
> > >> On Mon, Aug 26, 2019 at 11:34 AM
yunsung.hwang at usask.ca
> via RT
> > <
> > >> met_help at ucar.edu> wrote:
> > >>
> > >> >
> > >> > <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> > >
> > >> >
> > >> > Dear John Halley Gotway:
> > >> >
> > >> > Thanks for your kind response.
> > >> >
> > >> > The thing is, I did not run MTD in long period. I
used
> 24
> > hourly
> > >> data (24
> > >> > time stpes). I used 64 to 128 GB to run the MTD
command
> in
> > >> background.
> > >> >
> > >> > The real question was
> > >> > 1. there are two sets of data from 2013 and 2002
> > >> > 2. those are readable in MET by using
"plot_data_plane"
> > >> > 3. 1 March 2013 worked properly using MTD
> > >> > 4. 1 March 2002 did not work using MTD and
stopped at
> "DEBUG
> > 2:
> > >> > regridding, if needed ..."
> > >> >
> > >> > When I submitted the job, not working one stayed
there
> for
> > hours.
> > >> And you
> > >> > might not have similar situation before based on
what
> you're
> > saying.
> > >> >
> > >> > Best regards,
> > >> > Yunsung Hwang
> > >> >
> > >> > On 2019-08-26, 11:07 AM, "John Halley Gotway via
RT" <
> > >> met_help at ucar.edu>
> > >> > wrote:
> > >> >
> > >> > Yunsung,
> > >> >
> > >> > I see that you're having trouble running MTD
on a
> long
> > time
> > >> series of
> > >> > data. Since the time series is so long, the
first
> thing
> > I'd
> > >> check is
> > >> > whether or not you've run out of memory on
your
> machine.
> > When
> > >> running
> > >> > on a
> > >> > Linux machine, I'll often start a command in
one
> window
> > and let
> > >> it
> > >> > run...
> > >> > and then in another window, run the "top"
command.
> > >> >
> > >> > "top" shows you what processes are running,
and what
> > percent of
> > >> the
> > >> > CPU and
> > >> > memory they're consuming. If you're running
6
> years of
> > daily
> > >> data =
> > >> > 2190
> > >> > time steps, you may just not have enough
memory to
> do
> > so! I'm
> > >> not
> > >> > aware of
> > >> > us running MTD internally on longer than 30
time
> steps,
> > for
> > >> example.
> > >> >
> > >> > One simple thing to test would be running it
> separately
> > for
> > >> each year
> > >> > to
> > >> > see if the issue you're seeing goes away.
> > >> >
> > >> > Hope that helps.
> > >> >
> > >> > Thanks,
> > >> > John Halley Gotway
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Aug 26, 2019 at 10:45 AM
> yunsung.hwang at usask.ca
> > via RT
> > >> <
> > >> > met_help at ucar.edu> wrote:
> > >> >
> > >> > >
> > >> > > Mon Aug 26 10:45:31 2019: Request 91683 was
acted
> upon.
> > >> > > Transaction: Ticket created by
> yunsung.hwang at usask.ca
> > >> > > Queue: met_help
> > >> > > Subject: MTD stopped for no reasons
> > >> > > Owner: Nobody
> > >> > > Requestors: yunsung.hwang at usask.ca
> > >> > > Status: new
> > >> > > Ticket <URL:
> > >> >
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> >
> > >> > >
> > >> > >
> > >> > > To whom it may concern:
> > >> > >
> > >> > > I have encountered that MTD stopped
processing
> specific
> > >> files. The
> > >> > files
> > >> > > are readable and can be used to generate
figures
> using
> > >> > “plot_data_plane” as
> > >> > > follows:
> > >> > >
> > >> > >
> > >> > > plot_data_plane ./stivinput/2003/
> stiv_2003030100.nc ./
> > >> > ps_stiv_20030100.ps
> > >> > > 'name="Precipitation"; level="(0,*,*)";'
> > >> > >
> > >> > > DEBUG 1: Opening data file:
./stivinput/2003/
> > >> stiv_2003030100.nc
> > >> > >
> > >> > > DEBUG 1: Creating postscript file: ./
> > ps_stiv_20030100.ps
> > >> > >
> > >> > > I am running about ten years, however some
of data
> > processes
> > >> with
> > >> > MTD,
> > >> > > while some didn’t.
> > >> > >
> > >> > >
> > >> > > mtd -single
$stivinput/2003/stiv_2003022823.nc
> > >> $stivinput/2003/
> > >> > > stiv_2003030100.nc -config
> > >> > /home/hisnamey/scratch/MET/config/stiv_config
> > >> > > -outdir $stivout -v 2
> > >> > >
> > >> > > DEBUG 2: mtd_read_data() -> processing file
> > >> > > "/home/hisnamey/scratch/MET/stivinput/2003/
> > stiv_2003022823.nc
> > >> "
> > >> > >
> > >> > > DEBUG 2: mtd_read_data() -> processing file
> > >> > > "/home/hisnamey/scratch/MET/stivinput/2003/
> > stiv_2003030100.nc
> > >> "
> > >> > >
> > >> > > DEBUG 2: regridding, if needed ...
> > >> > > --> So it stopped here / could not proceed.
> > >> > >
> > >> > > I used cdo to extract netcdf files in time,
> however I
> > am not
> > >> sure
> > >> > what
> > >> > > would be differences between years they
passed /
> could
> > not
> > >> pass
> > >> > >
> > >> > > I hope I can hear from you if someone
experiences
> > similar
> > >> situation
> > >> > or
> > >> > > problem. So 2002-2008 data did not work /
while
> > 2009-2013 data
> > >> > worked.
> > >> > >
> > >> > > Best regards,
> > >> > > Yunsung Hwang
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #91683] MTD stopped for no reasons
From: yunsung.hwang at usask.ca
Time: Thu Aug 29 15:43:56 2019
Dear John Halley Gotway:
Thank you very much for your kind response.
I will try to override the metadata into the file and see if I can run
MTD with updated information.
Have a good rest of the day!!
Best regards,
Yunsung Hwang
On 2019-08-29, 3:20 PM, "John Halley Gotway via RT"
<met_help at ucar.edu> wrote:
Yunsung,
Hmmm, well there's just so many different ways people edit NetCDF
files,
it's hard to know what direction to point you in.
Actually, there is a current development task for MET that would
really
help out in this situation if it were already done! We want to
give users
the ability to overwrite the metadata read from gridded data files
without
having to edit the files directly. But that feature doesn't exist
yet.
If you are comfortable with python, R, or NCL, you could read the
NetCDF
files into those tools, correct the lat/lon values, and then write
them
back out.
Or there may be NetCDF Operator commands that may be useful to
you:
http://nco.sourceforge.net/
Unfortunately, I don't know them well enough to tell you exactly
how to do
it.
John
On Thu, Aug 29, 2019 at 2:10 PM yunsung.hwang at usask.ca via RT <
met_help at ucar.edu> wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Dear John Halley Gotway:
>
> Thanks a lot for your help.
>
> So I think those files worked included proper information of lat
lon while
> failed did not have the information.
>
> If possible, is there any way I can extract the information of
lat,lon
> from one file and put into another file?
>
> I mean extract "lat,lon info" from "2013.nc" and put the
extracted
> "lat,lon" info to "2003.nc"?
>
> Is there something I can do with NCL or NCO or CDO or do I need
to handle
> this by taking care of NF-compliant netcdf file?
>
> Thanks a lot for your help and hope I can hear from you shortly.
If I need
> to contact some places else, please let me know and it would be
great help
> for me since I am not quite sure how to deal with this issue by
myself
> clearly.
>
> Best regards,
> Yunsung Hwang
>
> On 2019-08-29, 1:37 PM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> wrote:
>
> Yunsung,
>
> I took a closer look at this data, ran through the debugger,
and see
> what's
> going on. It appears this is really a problem with your
data instead
> of
> the software. However, the problem with the data led to a
somewhat
> comical
> result.
>
> You sent me 3 sample NetCDF files: stiv_2003022823.nc,
> stiv_2003030100.nc,
> stiv_2003030101.nc
>
> The problem is that the lat and lon variables in those files
contain
> all
> missing values:
> ncdump -v lat stiv_2003022823.nc
>
> And this is evident when I try to plot the data using
plot_data_plane:
> /plot_data_plane ./stiv_2003030100.nc
./stiv_2003030100.ps
> 'name="Precipitation"; level="(0,*,*)";'
>
> The resulting image (attached) includes no map data which
indicates
> that
> MET doesn't know where on earth this data lives.
>
> So why does it hang? Rather surprisingly, the NetCDF
library code in
> MET
> parses the grid spec from the lat/lon data. But the min/max
lat/lon
> values
> are stored as a fill value of 9.96E36. And it tries to
rescale that
> longitude value down to the expected range of -180 to 180 by
> adding/subtracting 360's as needed. But that math takes a
very, very,
> very
> long time.
>
> So the apparent "hang" in running MTD (or timeout on your
HPC) was
> really
> just caused by missing lat/lon values in your input data
files.
>
> Ideally, MTD would not have made it that far into the
processing for
> this
> to be a problem, so it would be better for use to add some
sanity
> checks to
> the code which parses the grid information.
>
> Hope that helps clarify.
>
> Thanks,
> John
>
> On Tue, Aug 27, 2019 at 11:59 AM yunsung.hwang at usask.ca via
RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> >
> > Dear John Halley Gotway:
> >
> > Thank you very much for your help!!
> >
> > I would wait for additional notifications from met_help.
> >
> > Best regards,
> > Yunsung Hwang
> >
> > On 2019-08-27, 11:57 AM, "John Halley Gotway via RT" <
> met_help at ucar.edu>
> > wrote:
> >
> > I reassigned this support ticket to Randy Bullock, the
developer
> of
> > the MTD
> > software. I was able to replicate the behavior you
described.
> I've
> > asked
> > Randy to debug it to better understand what's taking
so long and
> how
> > to fix
> > it.
> >
> > You should be hear back from him when he has an update
ready.
> >
> > Thanks,
> > John
> >
> > On Mon, Aug 26, 2019 at 1:00 PM John Halley Gotway <
> johnhg at ucar.edu>
> > wrote:
> >
> > > Thanks for sending the sample data. I pulled it
down and will
> work
> > on
> > > testing it out today.
> > >
> > > John
> > >
> > > On Mon, Aug 26, 2019 at 12:25 PM
yunsung.hwang at usask.ca via
> RT <
> > > met_help at ucar.edu> wrote:
> > >
> > >>
> > >> <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> >
> > >>
> > >> Dear John Halley Gotway:
> > >>
> > >> Sorry for taking this much time to deal with ftp. I
was not
> > familiar with
> > >> using ftp to transfer the file.
> > >>
> > >> I put data to " /incoming/irap/met_help/hwang_data"
> > >> I put shell script / config file for mtd / log of
stopping mtd
> > (would
> > >> stay there until I kill the process).
> > >> I chose three of netcdf files to shorten the time
to
> transfer. The
> > files
> > >> are generated by using CDO version 1.7.2.
> > >>
> > >> Let me know if you have problem of reading the
files.
> > >> Thanks for your help in advance!!
> > >>
> > >> Best regards,
> > >> Yunsung Hwang
> > >>
> > >> On 2019-08-26, 11:43 AM, "John Halley Gotway via
RT" <
> > met_help at ucar.edu>
> > >> wrote:
> > >>
> > >> Yunsung,
> > >>
> > >> Ah, OK. So you are running MTD with 24 time
steps. That
> > sounds much
> > >> more
> > >> reasonable.
> > >>
> > >> If you're able to figure out exactly which MTD
run is
> hanging,
> > you
> > >> could
> > >> send me the data for that day, and I could try
to
> replicate/fix
> > the
> > >> behavior here. Please go to this link:
> > >>
> > >>
> >
> https://dtcenter.org/community-code/model-evaluation-tools-
met/met-help-desk
> > >>
> > >> And scroll down to "How to send us data" to
post data on
> our
> > >> anonymous ftp
> > >> site.
> > >>
> > >> Thanks,
> > >> John
> > >>
> > >> On Mon, Aug 26, 2019 at 11:34 AM
yunsung.hwang at usask.ca
> via RT
> > <
> > >> met_help at ucar.edu> wrote:
> > >>
> > >> >
> > >> > <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> > >
> > >> >
> > >> > Dear John Halley Gotway:
> > >> >
> > >> > Thanks for your kind response.
> > >> >
> > >> > The thing is, I did not run MTD in long
period. I used
> 24
> > hourly
> > >> data (24
> > >> > time stpes). I used 64 to 128 GB to run the
MTD command
> in
> > >> background.
> > >> >
> > >> > The real question was
> > >> > 1. there are two sets of data from 2013 and
2002
> > >> > 2. those are readable in MET by using
"plot_data_plane"
> > >> > 3. 1 March 2013 worked properly using MTD
> > >> > 4. 1 March 2002 did not work using MTD and
stopped at
> "DEBUG
> > 2:
> > >> > regridding, if needed ..."
> > >> >
> > >> > When I submitted the job, not working one
stayed there
> for
> > hours.
> > >> And you
> > >> > might not have similar situation before based
on what
> you're
> > saying.
> > >> >
> > >> > Best regards,
> > >> > Yunsung Hwang
> > >> >
> > >> > On 2019-08-26, 11:07 AM, "John Halley Gotway
via RT" <
> > >> met_help at ucar.edu>
> > >> > wrote:
> > >> >
> > >> > Yunsung,
> > >> >
> > >> > I see that you're having trouble running
MTD on a
> long
> > time
> > >> series of
> > >> > data. Since the time series is so long,
the first
> thing
> > I'd
> > >> check is
> > >> > whether or not you've run out of memory
on your
> machine.
> > When
> > >> running
> > >> > on a
> > >> > Linux machine, I'll often start a command
in one
> window
> > and let
> > >> it
> > >> > run...
> > >> > and then in another window, run the "top"
command.
> > >> >
> > >> > "top" shows you what processes are
running, and what
> > percent of
> > >> the
> > >> > CPU and
> > >> > memory they're consuming. If you're
running 6
> years of
> > daily
> > >> data =
> > >> > 2190
> > >> > time steps, you may just not have enough
memory to
> do
> > so! I'm
> > >> not
> > >> > aware of
> > >> > us running MTD internally on longer than
30 time
> steps,
> > for
> > >> example.
> > >> >
> > >> > One simple thing to test would be running
it
> separately
> > for
> > >> each year
> > >> > to
> > >> > see if the issue you're seeing goes away.
> > >> >
> > >> > Hope that helps.
> > >> >
> > >> > Thanks,
> > >> > John Halley Gotway
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Aug 26, 2019 at 10:45 AM
> yunsung.hwang at usask.ca
> > via RT
> > >> <
> > >> > met_help at ucar.edu> wrote:
> > >> >
> > >> > >
> > >> > > Mon Aug 26 10:45:31 2019: Request 91683
was acted
> upon.
> > >> > > Transaction: Ticket created by
> yunsung.hwang at usask.ca
> > >> > > Queue: met_help
> > >> > > Subject: MTD stopped for no
reasons
> > >> > > Owner: Nobody
> > >> > > Requestors: yunsung.hwang at usask.ca
> > >> > > Status: new
> > >> > > Ticket <URL:
> > >> >
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> >
> > >> > >
> > >> > >
> > >> > > To whom it may concern:
> > >> > >
> > >> > > I have encountered that MTD stopped
processing
> specific
> > >> files. The
> > >> > files
> > >> > > are readable and can be used to
generate figures
> using
> > >> > “plot_data_plane” as
> > >> > > follows:
> > >> > >
> > >> > >
> > >> > > plot_data_plane ./stivinput/2003/
> stiv_2003030100.nc ./
> > >> > ps_stiv_20030100.ps
> > >> > > 'name="Precipitation";
level="(0,*,*)";'
> > >> > >
> > >> > > DEBUG 1: Opening data file:
./stivinput/2003/
> > >> stiv_2003030100.nc
> > >> > >
> > >> > > DEBUG 1: Creating postscript file: ./
> > ps_stiv_20030100.ps
> > >> > >
> > >> > > I am running about ten years, however
some of data
> > processes
> > >> with
> > >> > MTD,
> > >> > > while some didn’t.
> > >> > >
> > >> > >
> > >> > > mtd -single
$stivinput/2003/stiv_2003022823.nc
> > >> $stivinput/2003/
> > >> > > stiv_2003030100.nc -config
> > >> > /home/hisnamey/scratch/MET/config/stiv_config
> > >> > > -outdir $stivout -v 2
> > >> > >
> > >> > > DEBUG 2: mtd_read_data() -> processing
file
> > >> > >
"/home/hisnamey/scratch/MET/stivinput/2003/
> > stiv_2003022823.nc
> > >> "
> > >> > >
> > >> > > DEBUG 2: mtd_read_data() -> processing
file
> > >> > >
"/home/hisnamey/scratch/MET/stivinput/2003/
> > stiv_2003030100.nc
> > >> "
> > >> > >
> > >> > > DEBUG 2: regridding, if needed ...
> > >> > > --> So it stopped here / could not
proceed.
> > >> > >
> > >> > > I used cdo to extract netcdf files in
time,
> however I
> > am not
> > >> sure
> > >> > what
> > >> > > would be differences between years they
passed /
> could
> > not
> > >> pass
> > >> > >
> > >> > > I hope I can hear from you if someone
experiences
> > similar
> > >> situation
> > >> > or
> > >> > > problem. So 2002-2008 data did not work
/ while
> > 2009-2013 data
> > >> > worked.
> > >> > >
> > >> > > Best regards,
> > >> > > Yunsung Hwang
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>
------------------------------------------------
Subject: MTD stopped for no reasons
From: George McCabe
Time: Thu Aug 29 15:46:25 2019
Yunsung,
I think the ncks (nc kitchen sink) NCO utility may be able to do what
you
are trying to do. Here are some examples of how to use the tool:
http://nco.sourceforge.net/nco.html#xmp_ncks
On Thu, Aug 29, 2019 at 9:20 PM John Halley Gotway via RT
<met_help at ucar.edu>
wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Yunsung,
>
> Hmmm, well there's just so many different ways people edit NetCDF
files,
> it's hard to know what direction to point you in.
>
> Actually, there is a current development task for MET that would
really
> help out in this situation if it were already done! We want to give
users
> the ability to overwrite the metadata read from gridded data files
without
> having to edit the files directly. But that feature doesn't exist
yet.
>
> If you are comfortable with python, R, or NCL, you could read the
NetCDF
> files into those tools, correct the lat/lon values, and then write
them
> back out.
>
> Or there may be NetCDF Operator commands that may be useful to you:
> http://nco.sourceforge.net/
>
> Unfortunately, I don't know them well enough to tell you exactly how
to do
> it.
>
> John
>
> On Thu, Aug 29, 2019 at 2:10 PM yunsung.hwang at usask.ca via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> >
> > Dear John Halley Gotway:
> >
> > Thanks a lot for your help.
> >
> > So I think those files worked included proper information of lat
lon
> while
> > failed did not have the information.
> >
> > If possible, is there any way I can extract the information of
lat,lon
> > from one file and put into another file?
> >
> > I mean extract "lat,lon info" from "2013.nc" and put the extracted
> > "lat,lon" info to "2003.nc"?
> >
> > Is there something I can do with NCL or NCO or CDO or do I need to
handle
> > this by taking care of NF-compliant netcdf file?
> >
> > Thanks a lot for your help and hope I can hear from you shortly.
If I
> need
> > to contact some places else, please let me know and it would be
great
> help
> > for me since I am not quite sure how to deal with this issue by
myself
> > clearly.
> >
> > Best regards,
> > Yunsung Hwang
> >
> > On 2019-08-29, 1:37 PM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> > wrote:
> >
> > Yunsung,
> >
> > I took a closer look at this data, ran through the debugger,
and see
> > what's
> > going on. It appears this is really a problem with your data
instead
> > of
> > the software. However, the problem with the data led to a
somewhat
> > comical
> > result.
> >
> > You sent me 3 sample NetCDF files: stiv_2003022823.nc,
> > stiv_2003030100.nc,
> > stiv_2003030101.nc
> >
> > The problem is that the lat and lon variables in those files
contain
> > all
> > missing values:
> > ncdump -v lat stiv_2003022823.nc
> >
> > And this is evident when I try to plot the data using
> plot_data_plane:
> > /plot_data_plane ./stiv_2003030100.nc ./stiv_2003030100.ps
> > 'name="Precipitation"; level="(0,*,*)";'
> >
> > The resulting image (attached) includes no map data which
indicates
> > that
> > MET doesn't know where on earth this data lives.
> >
> > So why does it hang? Rather surprisingly, the NetCDF library
code in
> > MET
> > parses the grid spec from the lat/lon data. But the min/max
lat/lon
> > values
> > are stored as a fill value of 9.96E36. And it tries to
rescale that
> > longitude value down to the expected range of -180 to 180 by
> > adding/subtracting 360's as needed. But that math takes a
very,
> very,
> > very
> > long time.
> >
> > So the apparent "hang" in running MTD (or timeout on your HPC)
was
> > really
> > just caused by missing lat/lon values in your input data
files.
> >
> > Ideally, MTD would not have made it that far into the
processing for
> > this
> > to be a problem, so it would be better for use to add some
sanity
> > checks to
> > the code which parses the grid information.
> >
> > Hope that helps clarify.
> >
> > Thanks,
> > John
> >
> > On Tue, Aug 27, 2019 at 11:59 AM yunsung.hwang at usask.ca via RT
<
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> > >
> > > Dear John Halley Gotway:
> > >
> > > Thank you very much for your help!!
> > >
> > > I would wait for additional notifications from met_help.
> > >
> > > Best regards,
> > > Yunsung Hwang
> > >
> > > On 2019-08-27, 11:57 AM, "John Halley Gotway via RT" <
> > met_help at ucar.edu>
> > > wrote:
> > >
> > > I reassigned this support ticket to Randy Bullock, the
> developer
> > of
> > > the MTD
> > > software. I was able to replicate the behavior you
described.
> > I've
> > > asked
> > > Randy to debug it to better understand what's taking so
long
> and
> > how
> > > to fix
> > > it.
> > >
> > > You should be hear back from him when he has an update
ready.
> > >
> > > Thanks,
> > > John
> > >
> > > On Mon, Aug 26, 2019 at 1:00 PM John Halley Gotway <
> > johnhg at ucar.edu>
> > > wrote:
> > >
> > > > Thanks for sending the sample data. I pulled it down
and
> will
> > work
> > > on
> > > > testing it out today.
> > > >
> > > > John
> > > >
> > > > On Mon, Aug 26, 2019 at 12:25 PM
yunsung.hwang at usask.ca via
> > RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > >>
> > > >> <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> > >
> > > >>
> > > >> Dear John Halley Gotway:
> > > >>
> > > >> Sorry for taking this much time to deal with ftp. I
was not
> > > familiar with
> > > >> using ftp to transfer the file.
> > > >>
> > > >> I put data to " /incoming/irap/met_help/hwang_data"
> > > >> I put shell script / config file for mtd / log of
stopping
> mtd
> > > (would
> > > >> stay there until I kill the process).
> > > >> I chose three of netcdf files to shorten the time to
> > transfer. The
> > > files
> > > >> are generated by using CDO version 1.7.2.
> > > >>
> > > >> Let me know if you have problem of reading the files.
> > > >> Thanks for your help in advance!!
> > > >>
> > > >> Best regards,
> > > >> Yunsung Hwang
> > > >>
> > > >> On 2019-08-26, 11:43 AM, "John Halley Gotway via RT"
<
> > > met_help at ucar.edu>
> > > >> wrote:
> > > >>
> > > >> Yunsung,
> > > >>
> > > >> Ah, OK. So you are running MTD with 24 time
steps.
> That
> > > sounds much
> > > >> more
> > > >> reasonable.
> > > >>
> > > >> If you're able to figure out exactly which MTD
run is
> > hanging,
> > > you
> > > >> could
> > > >> send me the data for that day, and I could try to
> > replicate/fix
> > > the
> > > >> behavior here. Please go to this link:
> > > >>
> > > >>
> > >
> >
> https://dtcenter.org/community-code/model-evaluation-tools-met/met-
help-desk
> > > >>
> > > >> And scroll down to "How to send us data" to post
data on
> > our
> > > >> anonymous ftp
> > > >> site.
> > > >>
> > > >> Thanks,
> > > >> John
> > > >>
> > > >> On Mon, Aug 26, 2019 at 11:34 AM
yunsung.hwang at usask.ca
> > via RT
> > > <
> > > >> met_help at ucar.edu> wrote:
> > > >>
> > > >> >
> > > >> > <URL:
> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> > > >
> > > >> >
> > > >> > Dear John Halley Gotway:
> > > >> >
> > > >> > Thanks for your kind response.
> > > >> >
> > > >> > The thing is, I did not run MTD in long period.
I used
> > 24
> > > hourly
> > > >> data (24
> > > >> > time stpes). I used 64 to 128 GB to run the MTD
> command
> > in
> > > >> background.
> > > >> >
> > > >> > The real question was
> > > >> > 1. there are two sets of data from 2013 and
2002
> > > >> > 2. those are readable in MET by using
> "plot_data_plane"
> > > >> > 3. 1 March 2013 worked properly using MTD
> > > >> > 4. 1 March 2002 did not work using MTD and
stopped at
> > "DEBUG
> > > 2:
> > > >> > regridding, if needed ..."
> > > >> >
> > > >> > When I submitted the job, not working one
stayed there
> > for
> > > hours.
> > > >> And you
> > > >> > might not have similar situation before based
on what
> > you're
> > > saying.
> > > >> >
> > > >> > Best regards,
> > > >> > Yunsung Hwang
> > > >> >
> > > >> > On 2019-08-26, 11:07 AM, "John Halley Gotway
via RT" <
> > > >> met_help at ucar.edu>
> > > >> > wrote:
> > > >> >
> > > >> > Yunsung,
> > > >> >
> > > >> > I see that you're having trouble running
MTD on a
> > long
> > > time
> > > >> series of
> > > >> > data. Since the time series is so long,
the first
> > thing
> > > I'd
> > > >> check is
> > > >> > whether or not you've run out of memory on
your
> > machine.
> > > When
> > > >> running
> > > >> > on a
> > > >> > Linux machine, I'll often start a command
in one
> > window
> > > and let
> > > >> it
> > > >> > run...
> > > >> > and then in another window, run the "top"
command.
> > > >> >
> > > >> > "top" shows you what processes are running,
and
> what
> > > percent of
> > > >> the
> > > >> > CPU and
> > > >> > memory they're consuming. If you're
running 6
> > years of
> > > daily
> > > >> data =
> > > >> > 2190
> > > >> > time steps, you may just not have enough
memory to
> > do
> > > so! I'm
> > > >> not
> > > >> > aware of
> > > >> > us running MTD internally on longer than 30
time
> > steps,
> > > for
> > > >> example.
> > > >> >
> > > >> > One simple thing to test would be running
it
> > separately
> > > for
> > > >> each year
> > > >> > to
> > > >> > see if the issue you're seeing goes away.
> > > >> >
> > > >> > Hope that helps.
> > > >> >
> > > >> > Thanks,
> > > >> > John Halley Gotway
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Mon, Aug 26, 2019 at 10:45 AM
> > yunsung.hwang at usask.ca
> > > via RT
> > > >> <
> > > >> > met_help at ucar.edu> wrote:
> > > >> >
> > > >> > >
> > > >> > > Mon Aug 26 10:45:31 2019: Request 91683
was
> acted
> > upon.
> > > >> > > Transaction: Ticket created by
> > yunsung.hwang at usask.ca
> > > >> > > Queue: met_help
> > > >> > > Subject: MTD stopped for no reasons
> > > >> > > Owner: Nobody
> > > >> > > Requestors: yunsung.hwang at usask.ca
> > > >> > > Status: new
> > > >> > > Ticket <URL:
> > > >> >
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> > >
> > > >> > >
> > > >> > >
> > > >> > > To whom it may concern:
> > > >> > >
> > > >> > > I have encountered that MTD stopped
processing
> > specific
> > > >> files. The
> > > >> > files
> > > >> > > are readable and can be used to generate
figures
> > using
> > > >> > “plot_data_plane” as
> > > >> > > follows:
> > > >> > >
> > > >> > >
> > > >> > > plot_data_plane ./stivinput/2003/
> > stiv_2003030100.nc ./
> > > >> > ps_stiv_20030100.ps
> > > >> > > 'name="Precipitation"; level="(0,*,*)";'
> > > >> > >
> > > >> > > DEBUG 1: Opening data file:
./stivinput/2003/
> > > >> stiv_2003030100.nc
> > > >> > >
> > > >> > > DEBUG 1: Creating postscript file: ./
> > > ps_stiv_20030100.ps
> > > >> > >
> > > >> > > I am running about ten years, however
some of
> data
> > > processes
> > > >> with
> > > >> > MTD,
> > > >> > > while some didn’t.
> > > >> > >
> > > >> > >
> > > >> > > mtd -single
$stivinput/2003/stiv_2003022823.nc
> > > >> $stivinput/2003/
> > > >> > > stiv_2003030100.nc -config
> > > >> > /home/hisnamey/scratch/MET/config/stiv_config
> > > >> > > -outdir $stivout -v 2
> > > >> > >
> > > >> > > DEBUG 2: mtd_read_data() -> processing
file
> > > >> > >
"/home/hisnamey/scratch/MET/stivinput/2003/
> > > stiv_2003022823.nc
> > > >> "
> > > >> > >
> > > >> > > DEBUG 2: mtd_read_data() -> processing
file
> > > >> > >
"/home/hisnamey/scratch/MET/stivinput/2003/
> > > stiv_2003030100.nc
> > > >> "
> > > >> > >
> > > >> > > DEBUG 2: regridding, if needed ...
> > > >> > > --> So it stopped here / could not
proceed.
> > > >> > >
> > > >> > > I used cdo to extract netcdf files in
time,
> > however I
> > > am not
> > > >> sure
> > > >> > what
> > > >> > > would be differences between years they
passed /
> > could
> > > not
> > > >> pass
> > > >> > >
> > > >> > > I hope I can hear from you if someone
> experiences
> > > similar
> > > >> situation
> > > >> > or
> > > >> > > problem. So 2002-2008 data did not work /
while
> > > 2009-2013 data
> > > >> > worked.
> > > >> > >
> > > >> > > Best regards,
> > > >> > > Yunsung Hwang
> > > >> > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
------------------------------------------------
Subject: Re: [rt.rap.ucar.edu #91683] MTD stopped for no reasons
From: yunsung.hwang at usask.ca
Time: Fri Aug 30 09:25:05 2019
Dear George McCabe:
Thank you very much for your kindness to provide information.
I had chances to deal with nkgs before so I think I would try several
things.
Best regards,
Yunsung Hwang
On 2019-08-29, 3:46 PM, "George McCabe via RT" <met_help at ucar.edu>
wrote:
Yunsung,
I think the ncks (nc kitchen sink) NCO utility may be able to do
what you
are trying to do. Here are some examples of how to use the tool:
http://nco.sourceforge.net/nco.html#xmp_ncks
On Thu, Aug 29, 2019 at 9:20 PM John Halley Gotway via RT
<met_help at ucar.edu>
wrote:
>
> <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
>
> Yunsung,
>
> Hmmm, well there's just so many different ways people edit
NetCDF files,
> it's hard to know what direction to point you in.
>
> Actually, there is a current development task for MET that would
really
> help out in this situation if it were already done! We want to
give users
> the ability to overwrite the metadata read from gridded data
files without
> having to edit the files directly. But that feature doesn't
exist yet.
>
> If you are comfortable with python, R, or NCL, you could read
the NetCDF
> files into those tools, correct the lat/lon values, and then
write them
> back out.
>
> Or there may be NetCDF Operator commands that may be useful to
you:
> http://nco.sourceforge.net/
>
> Unfortunately, I don't know them well enough to tell you exactly
how to do
> it.
>
> John
>
> On Thu, Aug 29, 2019 at 2:10 PM yunsung.hwang at usask.ca via RT <
> met_help at ucar.edu> wrote:
>
> >
> > <URL: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
>
> >
> > Dear John Halley Gotway:
> >
> > Thanks a lot for your help.
> >
> > So I think those files worked included proper information of
lat lon
> while
> > failed did not have the information.
> >
> > If possible, is there any way I can extract the information of
lat,lon
> > from one file and put into another file?
> >
> > I mean extract "lat,lon info" from "2013.nc" and put the
extracted
> > "lat,lon" info to "2003.nc"?
> >
> > Is there something I can do with NCL or NCO or CDO or do I
need to handle
> > this by taking care of NF-compliant netcdf file?
> >
> > Thanks a lot for your help and hope I can hear from you
shortly. If I
> need
> > to contact some places else, please let me know and it would
be great
> help
> > for me since I am not quite sure how to deal with this issue
by myself
> > clearly.
> >
> > Best regards,
> > Yunsung Hwang
> >
> > On 2019-08-29, 1:37 PM, "John Halley Gotway via RT"
<met_help at ucar.edu>
> > wrote:
> >
> > Yunsung,
> >
> > I took a closer look at this data, ran through the
debugger, and see
> > what's
> > going on. It appears this is really a problem with your
data instead
> > of
> > the software. However, the problem with the data led to a
somewhat
> > comical
> > result.
> >
> > You sent me 3 sample NetCDF files: stiv_2003022823.nc,
> > stiv_2003030100.nc,
> > stiv_2003030101.nc
> >
> > The problem is that the lat and lon variables in those
files contain
> > all
> > missing values:
> > ncdump -v lat stiv_2003022823.nc
> >
> > And this is evident when I try to plot the data using
> plot_data_plane:
> > /plot_data_plane ./stiv_2003030100.nc
./stiv_2003030100.ps
> > 'name="Precipitation"; level="(0,*,*)";'
> >
> > The resulting image (attached) includes no map data which
indicates
> > that
> > MET doesn't know where on earth this data lives.
> >
> > So why does it hang? Rather surprisingly, the NetCDF
library code in
> > MET
> > parses the grid spec from the lat/lon data. But the
min/max lat/lon
> > values
> > are stored as a fill value of 9.96E36. And it tries to
rescale that
> > longitude value down to the expected range of -180 to 180
by
> > adding/subtracting 360's as needed. But that math takes a
very,
> very,
> > very
> > long time.
> >
> > So the apparent "hang" in running MTD (or timeout on your
HPC) was
> > really
> > just caused by missing lat/lon values in your input data
files.
> >
> > Ideally, MTD would not have made it that far into the
processing for
> > this
> > to be a problem, so it would be better for use to add some
sanity
> > checks to
> > the code which parses the grid information.
> >
> > Hope that helps clarify.
> >
> > Thanks,
> > John
> >
> > On Tue, Aug 27, 2019 at 11:59 AM yunsung.hwang at usask.ca
via RT <
> > met_help at ucar.edu> wrote:
> >
> > >
> > > <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683 >
> > >
> > > Dear John Halley Gotway:
> > >
> > > Thank you very much for your help!!
> > >
> > > I would wait for additional notifications from met_help.
> > >
> > > Best regards,
> > > Yunsung Hwang
> > >
> > > On 2019-08-27, 11:57 AM, "John Halley Gotway via RT" <
> > met_help at ucar.edu>
> > > wrote:
> > >
> > > I reassigned this support ticket to Randy Bullock,
the
> developer
> > of
> > > the MTD
> > > software. I was able to replicate the behavior you
described.
> > I've
> > > asked
> > > Randy to debug it to better understand what's taking
so long
> and
> > how
> > > to fix
> > > it.
> > >
> > > You should be hear back from him when he has an
update ready.
> > >
> > > Thanks,
> > > John
> > >
> > > On Mon, Aug 26, 2019 at 1:00 PM John Halley Gotway <
> > johnhg at ucar.edu>
> > > wrote:
> > >
> > > > Thanks for sending the sample data. I pulled it
down and
> will
> > work
> > > on
> > > > testing it out today.
> > > >
> > > > John
> > > >
> > > > On Mon, Aug 26, 2019 at 12:25 PM
yunsung.hwang at usask.ca via
> > RT <
> > > > met_help at ucar.edu> wrote:
> > > >
> > > >>
> > > >> <URL:
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> > >
> > > >>
> > > >> Dear John Halley Gotway:
> > > >>
> > > >> Sorry for taking this much time to deal with ftp.
I was not
> > > familiar with
> > > >> using ftp to transfer the file.
> > > >>
> > > >> I put data to "
/incoming/irap/met_help/hwang_data"
> > > >> I put shell script / config file for mtd / log of
stopping
> mtd
> > > (would
> > > >> stay there until I kill the process).
> > > >> I chose three of netcdf files to shorten the time
to
> > transfer. The
> > > files
> > > >> are generated by using CDO version 1.7.2.
> > > >>
> > > >> Let me know if you have problem of reading the
files.
> > > >> Thanks for your help in advance!!
> > > >>
> > > >> Best regards,
> > > >> Yunsung Hwang
> > > >>
> > > >> On 2019-08-26, 11:43 AM, "John Halley Gotway via
RT" <
> > > met_help at ucar.edu>
> > > >> wrote:
> > > >>
> > > >> Yunsung,
> > > >>
> > > >> Ah, OK. So you are running MTD with 24 time
steps.
> That
> > > sounds much
> > > >> more
> > > >> reasonable.
> > > >>
> > > >> If you're able to figure out exactly which
MTD run is
> > hanging,
> > > you
> > > >> could
> > > >> send me the data for that day, and I could
try to
> > replicate/fix
> > > the
> > > >> behavior here. Please go to this link:
> > > >>
> > > >>
> > >
> >
> https://dtcenter.org/community-code/model-evaluation-tools-
met/met-help-desk
> > > >>
> > > >> And scroll down to "How to send us data" to
post data on
> > our
> > > >> anonymous ftp
> > > >> site.
> > > >>
> > > >> Thanks,
> > > >> John
> > > >>
> > > >> On Mon, Aug 26, 2019 at 11:34 AM
yunsung.hwang at usask.ca
> > via RT
> > > <
> > > >> met_help at ucar.edu> wrote:
> > > >>
> > > >> >
> > > >> > <URL:
> > https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> > > >
> > > >> >
> > > >> > Dear John Halley Gotway:
> > > >> >
> > > >> > Thanks for your kind response.
> > > >> >
> > > >> > The thing is, I did not run MTD in long
period. I used
> > 24
> > > hourly
> > > >> data (24
> > > >> > time stpes). I used 64 to 128 GB to run the
MTD
> command
> > in
> > > >> background.
> > > >> >
> > > >> > The real question was
> > > >> > 1. there are two sets of data from 2013 and
2002
> > > >> > 2. those are readable in MET by using
> "plot_data_plane"
> > > >> > 3. 1 March 2013 worked properly using MTD
> > > >> > 4. 1 March 2002 did not work using MTD and
stopped at
> > "DEBUG
> > > 2:
> > > >> > regridding, if needed ..."
> > > >> >
> > > >> > When I submitted the job, not working one
stayed there
> > for
> > > hours.
> > > >> And you
> > > >> > might not have similar situation before
based on what
> > you're
> > > saying.
> > > >> >
> > > >> > Best regards,
> > > >> > Yunsung Hwang
> > > >> >
> > > >> > On 2019-08-26, 11:07 AM, "John Halley
Gotway via RT" <
> > > >> met_help at ucar.edu>
> > > >> > wrote:
> > > >> >
> > > >> > Yunsung,
> > > >> >
> > > >> > I see that you're having trouble
running MTD on a
> > long
> > > time
> > > >> series of
> > > >> > data. Since the time series is so
long, the first
> > thing
> > > I'd
> > > >> check is
> > > >> > whether or not you've run out of memory
on your
> > machine.
> > > When
> > > >> running
> > > >> > on a
> > > >> > Linux machine, I'll often start a
command in one
> > window
> > > and let
> > > >> it
> > > >> > run...
> > > >> > and then in another window, run the
"top" command.
> > > >> >
> > > >> > "top" shows you what processes are
running, and
> what
> > > percent of
> > > >> the
> > > >> > CPU and
> > > >> > memory they're consuming. If you're
running 6
> > years of
> > > daily
> > > >> data =
> > > >> > 2190
> > > >> > time steps, you may just not have
enough memory to
> > do
> > > so! I'm
> > > >> not
> > > >> > aware of
> > > >> > us running MTD internally on longer
than 30 time
> > steps,
> > > for
> > > >> example.
> > > >> >
> > > >> > One simple thing to test would be
running it
> > separately
> > > for
> > > >> each year
> > > >> > to
> > > >> > see if the issue you're seeing goes
away.
> > > >> >
> > > >> > Hope that helps.
> > > >> >
> > > >> > Thanks,
> > > >> > John Halley Gotway
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Mon, Aug 26, 2019 at 10:45 AM
> > yunsung.hwang at usask.ca
> > > via RT
> > > >> <
> > > >> > met_help at ucar.edu> wrote:
> > > >> >
> > > >> > >
> > > >> > > Mon Aug 26 10:45:31 2019: Request
91683 was
> acted
> > upon.
> > > >> > > Transaction: Ticket created by
> > yunsung.hwang at usask.ca
> > > >> > > Queue: met_help
> > > >> > > Subject: MTD stopped for no
reasons
> > > >> > > Owner: Nobody
> > > >> > > Requestors: yunsung.hwang at usask.ca
> > > >> > > Status: new
> > > >> > > Ticket <URL:
> > > >> >
> https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=91683
> > >
> > > >> > >
> > > >> > >
> > > >> > > To whom it may concern:
> > > >> > >
> > > >> > > I have encountered that MTD stopped
processing
> > specific
> > > >> files. The
> > > >> > files
> > > >> > > are readable and can be used to
generate figures
> > using
> > > >> > “plot_data_plane” as
> > > >> > > follows:
> > > >> > >
> > > >> > >
> > > >> > > plot_data_plane ./stivinput/2003/
> > stiv_2003030100.nc ./
> > > >> > ps_stiv_20030100.ps
> > > >> > > 'name="Precipitation";
level="(0,*,*)";'
> > > >> > >
> > > >> > > DEBUG 1: Opening data file:
./stivinput/2003/
> > > >> stiv_2003030100.nc
> > > >> > >
> > > >> > > DEBUG 1: Creating postscript file: ./
> > > ps_stiv_20030100.ps
> > > >> > >
> > > >> > > I am running about ten years, however
some of
> data
> > > processes
> > > >> with
> > > >> > MTD,
> > > >> > > while some didn’t.
> > > >> > >
> > > >> > >
> > > >> > > mtd -single
$stivinput/2003/stiv_2003022823.nc
> > > >> $stivinput/2003/
> > > >> > > stiv_2003030100.nc -config
> > > >> >
/home/hisnamey/scratch/MET/config/stiv_config
> > > >> > > -outdir $stivout -v 2
> > > >> > >
> > > >> > > DEBUG 2: mtd_read_data() ->
processing file
> > > >> > >
"/home/hisnamey/scratch/MET/stivinput/2003/
> > > stiv_2003022823.nc
> > > >> "
> > > >> > >
> > > >> > > DEBUG 2: mtd_read_data() ->
processing file
> > > >> > >
"/home/hisnamey/scratch/MET/stivinput/2003/
> > > stiv_2003030100.nc
> > > >> "
> > > >> > >
> > > >> > > DEBUG 2: regridding, if needed ...
> > > >> > > --> So it stopped here / could not
proceed.
> > > >> > >
> > > >> > > I used cdo to extract netcdf files in
time,
> > however I
> > > am not
> > > >> sure
> > > >> > what
> > > >> > > would be differences between years
they passed /
> > could
> > > not
> > > >> pass
> > > >> > >
> > > >> > > I hope I can hear from you if someone
> experiences
> > > similar
> > > >> situation
> > > >> > or
> > > >> > > problem. So 2002-2008 data did not
work / while
> > > 2009-2013 data
> > > >> > worked.
> > > >> > >
> > > >> > > Best regards,
> > > >> > > Yunsung Hwang
> > > >> > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
------------------------------------------------
More information about the Met_help
mailing list