[Met_help] [rt.rap.ucar.edu #98327] History for Use of block size in series_analysis and netcdf questions

John Halley Gotway via RT met_help at ucar.edu
Wed Feb 10 09:07:07 MST 2021


----------------------------------------------------------------
  Initial Request
----------------------------------------------------------------

Hi John et al.

I have been running the series_analysis tool for one of our projects and been somewhat intrigued by the block size in the config file.

I'm not sure I've exhausted the manual entirely on the optimal use of this variable but I certainly noticed a change in behaviour by changing it.

In the end I went with block_size = 1048576 and that made the warnings disappear. Is there a default size? Is there a way to work out the optimal size, with knowledge of the CPU and memory you have available for example?

There are still other issues, which we can hopefully shed some light on in the planned discussion on Jan 26th, with respect to netcdf files, a trivial one is the one in the example below: the warning about "bnds" not found. I really don't know whether that warning can be safely ignored... or not.

Also, the files used in this work have multiple lead times of the same variable in the same file together. Using the (0,*,*), (1,*,*)... etc way of referring to the time slice in the file can be precarious... it assumes that the fields are in sequential order with nothing missing, and neither of those things may be true. We know for example that data pulled back from tape can be in random order. However, from an HPC and mass storage perspective having each time slice in a separate file is not tenable whilst sorting files prior to MET usage could be quite expensive. We need to provide some important feedback to the LFRic metadata and data management people as to what/how the output files are to be structured so that we don't end up having to do lots of very expensive I/O refactoring of model output to make it non-ambiguous when used with MET. Maybe METplus is already clever enough to not just look at the array index but check that e.g. (0,*,*) = t+6 and (1,*,*) = t+7 etc or b!
 e able to search for t+6 in all available time slices and find the appropriate x in (x,*,*)?

In this example I used a list length 1, i.e. only one forecast/ob file pair to see what it did and computed only simple categorical stats. It's a 1200 x 1200 grid.

ctc    = ["TOTAL", "FY_OY", "FY_ON", "FN_OY", "FN_ON"];
cts    = [ "FBIAS", "PODY", "SEDI"];

vld497:/net/home/h02/frmm/MET/scripts > time ./run_series_analysis.ltg

DEBUG 1: Default Config File: /data/users/cfrd/MET_requirements/MET9.0/share/met/config/SeriesAnalysisConfig_default
DEBUG 1: User Config File: /home/h02/frmm/MET/config/SeriesAnalysisConfig.ltg
WARNING:
WARNING: get_nc_var(NcFile) --> The variable "bnds" does not exist!
WARNING:
WARNING:
WARNING: get_nc_var(NcFile) --> The variable "bnds" does not exist!
WARNING:
GSL_RNG_TYPE=mt19937
GSL_RNG_SEED=18446744073133882975
DEBUG 1: Length of configuration "fcst.field" = 1
DEBUG 1: Length of configuration "obs.field"  = 1
DEBUG 1: Length of forecast file list         = 1
DEBUG 1: Length of observation file list      = 1
DEBUG 1: The "fcst.field" and "obs.field" configuration entries and the "-fcst" and "-obs" command line options all have length one.
WARNING:
WARNING: get_nc_var(NcFile) --> The variable "bnds" does not exist!
WARNING:
WARNING:
WARNING: get_nc_var(NcFile) --> The variable "bnds" does not exist!
WARNING:
DEBUG 1: Reading stat column descriptions: /data/users/cfrd/MET_requirements/MET9.0/share/met/table_files/stat_column_description.txt
WARNING:
WARNING: get_nc_var(NcFile) --> The variable "bnds" does not exist!
WARNING:
WARNING:
WARNING: get_nc_var(NcFile) --> The variable "bnds" does not exist!
WARNING:
DEBUG 1: Output file: /scratch/frmm/MET/output/series_analysis/India/lightning/2019060100_2019060112.nc
real        6m16.74s
user       6m8.38s
sys          0m2.46s

Regards
Marion
--
Dr Marion Mittermaier     Manager: Model diagnostics and novel verification

Met Office   FitzRoy Road   Exeter   EX1 3PB   United Kingdom
Tel:  +44 (0) 330 135 1604
E-mail: marion.mittermaier at metoffice.gov.uk<mailto:marion.mittermaier at metoffice.gov.uk>  http://www.metoffice.gov.uk<http://www.metoffice.gov.uk/>

http://www.metoffice.gov.uk/research/people/marion-mittermaier

Associate Editor for 2021: Monthly Weather Review<https://www.ametsoc.org/index.cfm/ams/publications/journals/monthly-weather-review/>

[cid:image003.jpg at 01D6EFDB.0CB62BF0]



----------------------------------------------------------------
  Complete Ticket History
----------------------------------------------------------------

Subject: Use of block size in series_analysis and netcdf questions
From: John Halley Gotway
Time: Thu Jan 21 16:51:32 2021

Hi Marion,

We never found a good way to automatically optimize the setting of the
"block_size" configuration option for series-analysis. There's just so
many
situations that might exist, trying to pick a good number seemed like
a
fools errand. We could potentially figure out the available memory and
size
it based on that along with the number of data fields that need to be
tracked. But then that process has no idea about processes running on
the
machine. If the user runs 10 concurrent instances of series-analysis,
the
logic and the memory usage could get dicey very quickly.

We did add the warning message about having to loop through the data
multiple times. If there's some other logic we should consider, please
let
us know.

Regarding the setting of the level, you should be able to replace
"(0,*,*)"
with "(20210120_12,*,*)" instead (or whatever you requested date
actually
is).
If that timestamp convention does NOT work with your data, we should
fix it
so that it does!

That being said, I do think that specifying each individual timestamp
could
be very cumbersome. Once we clearly define the logic that's needed, it
seems like we could support a range of timestamps instead of one-by-
one.
For example, something like:
   level="(2021010_00-20210121_00,*,*)";

I am not familiar with this warning message:
WARNING: get_nc_var(NcFile) --> The variable "bnds" does not exist!

Can you please send a sample file that we can use to replicate this
behavior? I'll pass it off to Howard Soh to investigate it more
closely.

Thanks,
John

On Thu, Jan 21, 2021 at 11:04 AM Minna Win via RT <met_help at ucar.edu>
wrote:

>
> Thu Jan 21 11:03:31 2021: Request 98327 was acted upon.
> Transaction: Given to johnhg (John Halley Gotway) by minnawin
>        Queue: met_help
>      Subject: Use of block size in series_analysis and netcdf
questions
>        Owner: johnhg
>   Requestors: marion.mittermaier at metoffice.gov.uk
>       Status: new
>  Ticket <URL:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=98327 >
>
>
> This transaction appears to have no content
>

------------------------------------------------


More information about the Met_help mailing list