[ncl-talk] binary file read

Thu Mar 30 17:07:04 MDT 2017

I want to add that funny _FillValue is probably the result of writing
"-999.9" somewhere in source code, then converting it from single to double
precision in a separate operation.  This is a common mistake.  This is
NOT the same value as "-999.9d" or equivalent in NCL, IDL, or many other
other languages.

ncl 0> a = todouble (-999.9)
ncl 1> b = -999.9d

ncl 2> print (sprintf ("%20.13f", a))
(0)       -999.9000244140625
ncl 3> print (sprintf ("%20.13f", b))
(0)       -999.9000000000000

I agree with Dave Brown's observations and suggested method to read.  He
sets the _FillValue with the maximum necessary number of significant digits
for type double.  Otherwise NCL would not understand all the missing values.

--Dave A.

On Thu, Mar 30, 2017 at 3:55 PM, David Brown <dbrown at ucar.edu> wrote:

> My guess is that this file is a 2D 720 x 1440 array of doubles in
> little endian format. There is no specific IDL formatting.
> The file size is 8294400 which is exactly equal to 720 x 1440 x 8. It
> contains may NaNs (not a number) and it also has what is presumably a
> _FillValue with the value -999.9000244140625.
>
> Here's how I would read it:
>
> setfileoption("bin","ReadByteOrder","LittleEndian")
> d1 = cbinread("viirs_meandbdi_gridded_statis2015002.dat",-1,"double")
>
> Make an array of all the non-nan values (otherwise printMinMax will
> return NaN for both min and max)
> d1ind = ind(.not. isnan_ieee(d1))
> d1x = d1(d1ind)
> printMinMax(d1x,0)
>   output: (0)     min=-999.9000244140625   max=5.164013057067244
>
> If you scroll through the variable d1x values you will see that the
> min value is clearly an outlier and therefore is most likely a fill
> value.
> So set the _FillValue
> d1x at _FillValue = -999.9000244140625
> Now
> printMinMax(d1x,0)
>    output: (0)     min=0.2350846065940295   max=5.164013057067244
>
> Hopefully these are reasonable values.
>
> Now set the _FillValue for the original data and turn the NaNs into
> _FillValue
> d1 at _FillValue = d1x at _FillValue
> d1 = where(isnan_ieee(d1),d1 at _FillValue, d1)
> ncl 73> printMinMax(d1,0)
> (0)     min=0.2350846065940295   max=5.164013057067244
>
> But note out of the whole array there are not very many valid values:
>
> ncl 74> printVarSummary(d1)
> Variable: d1
> Type: double
> Total Size: 8294400 bytes
>             1036800 values
> Number of Dimensions: 1
> Dimensions and sizes: [1036800]
> Coordinates:
> Number Of Attributes: 1
>   _FillValue : -999.9000244140625
>
> ncl 75> print(num(.not. ismissing(d1)))
> (0)     1820
>
> Nevertheless I believe this is the correct interpretation of this dataset.
>  -dave
>
>
> On Thu, Mar 30, 2017 at 2:29 PM, Debasish Hazra
> <debasish.hazra5 at gmail.com> wrote:
> > Thanks Gus. Mary and myself both tried "endian" options, and presently
> > trying with
> >
> > "setfileoption("bin","readbyteorder","bigendian") option which seems to
> > produce reasonable minimum and maximum of data values. However, as Mary
> > mentioned large number of values are constant whcih is bit strange.
> >
> > You mentioned about "double" and I think input is in "double precision
> > floating point data and it is 8 bytes".
> >
> > Thanks.
> > Debasish
> >
> > On Thu, Mar 30, 2017 at 4:06 PM, Gus Correa <gus at ldeo.columbia.edu>
> wrote:
> >>
> >> Hi Mary, Debasish
> >>
> >> Could it be a little-endian vs. big-endian issue?
> >> I don't know IDL (I should! My boss uses it! :) )
> >> but their "read_binary" default endianness is "native" (like NCL).
> >> I.e., the endianness of the data on the file depends on the
> >> machine it was created (and data_type=5 is indeed double precision).
> >>
> >> Maybe using setfileoption('bin',"ReadByteOrder","BigEndian"),
> >> and trying also "LittleEndian" if not lucky with "Big"
> >> (who knows where the file was written ....),
> >> then cbinread/fbindirread with datatype "double" would help?
> >> Just a guess, and you probably tried the endianness thing already ...
> >>
> >> Best,
> >> Gus Correa
> >>
> >> On 03/30/2017 02:54 PM, Mary Haley wrote:
> >> > Hi Debasish,
> >> >
> >> > Dennis guess that maybe the "read_binary" function in IDL was meant to
> >> > read files created by "write_binary" but I didn't see a function with
> >> > that name. However, is it possible that this is some kind of special
> IDL
> >> > file and not a flat C binary file?
> >> >
> >> > In your IDL script, you have:
> >> >
> >> >
> >> > fdata=read_binary('viirs_meandbdi_gridded_statis2013'+
> day+'.dat',data_type=5,data_dims=[1440,720])
> >> >
> >> > If you read the documentation for "read_binary", it states that
> >> > "data_type=5" is double.
> >> >
> >> > In your NCL script, you are reading the data as an unsigned integer.
> >> >
> >> > I tried reading your data as a double, but I get what looks like
> >> > nonsensical values:
> >> >
> >> >  min=-1.642556686681977e+308   max=6.633924105807938e+307
> >> >
> >> > You are right that the unsigned integer values look reasonable, but
> only
> >> > after you multiply them by 1e-9.
> >> >
> >> > When I look at your unsigned values, I see that
> >> > 517,484
> >> > of your values are equal to the same number: 6.3615e-05, while only
> >> > 1,831
> >> >  values are equal to something else.
> >> > This seems a bit suspicious to me, and is likely the source of the
> >> > problem.
> >> >
> >> > I modified your script to plot red markers where the values are all
> >> > equal to 6.3615e-05, and black markers everywhere else. Does this look
> >> > correct?
> >> >
> >> > I have a feeling that there's something more to the "read_binary"
> >> > function that we need to know in order to read the file correctly.
> As I
> >> > think I mentioned before: perhaps each byte of data represents
> something
> >> > different, and you need to use something like dim_gbits to pick off
> >> > values.
> >> >
> >> > In your IDL script, is there anything you have to do additionally to
> the
> >> > data before you plot it?  Can you check the IDL script to see if you
> are
> >> > getting a lot of values equal to the same constant value that NCL is?
> >> >
> >> > --Mary
> >> >
> >> >
> >> >
> >> > On Thu, Mar 30, 2017 at 8:36 AM, Debasish Hazra
> >> > <debasish.hazra5 at gmail.com <mailto:debasish.hazra5 at gmail.com>> wrote:
> >> >
> >> >     Mary,
> >> >
> >> >     Thanks.Taking your suggestion and reading that as 2 * 720 * 1440
> and
> >> >     assuming input as C binary file, I am getting      min=1.4e-08
> >> >     max=4.29371 , which is reasonble. Attached is the new script. Any
> >> >     suggestions.
> >> >
> >> >     Debasish
> >> >
> >> >     On Wed, Mar 29, 2017 at 5:28 PM, Mary Haley <haley at ucar.edu
> >> >     <mailto:haley at ucar.edu>> wrote:
> >> >
> >> >         Hi Debasish,
> >> >
> >> >         Kevin and I took a look at this. For starters, there *is* an
> >> >         error message coming out of your script:
> >> >
> >> >         warning:cbinread: The size implied by the dimension arrays is
> >> >         greater that the size of the file.
> >> >          The default _FillValue for the specified type will be
> filled in.
> >> >          Note dimensions and values may not be aligned properly
> >> >
> >> >         If you look at the size of the file, it doesn't match with the
> >> >         dimensions you're requesting:
> >> >
> >> >         Size of file = 8294400 bytes
> >> >
> >> >         Size of dimensions = 5 * 720 * 1440 * 4 (for a uint) =
> 20736000
> >> >
> >> >         If this is truly a C binary file, it looks like it only has 2
> *
> >> >         720 * 1440 * 4 bytes.
> >> >
> >> >         This doesn't really change the results, however, because you
> >> >         still get two strange looking plots.
> >> >
> >> >         We tried several different things:
> >> >
> >> >         1) reading the data as ubyte, int, and ushort
> >> >         2) reversing the array to 1440 x 720 x 2
> >> >         3) reading the data as little endian
> >> >         4) plotting the data as a simple contour plot to take out the
> >> >         map component.
> >> >
> >> >         Nothing we did produced more information about the file, or
> >> >         produced better plots.
> >> >
> >> >         Is there some documentation on this file to understand how it
> >> >         was written? For example, are you sure the "uint" type is
> >> >         correct? Are you sure the dimension sizes are correct? Why are
> >> >         the values so large? Is it possible that this is "packed"
> data,
> >> >         and that you need to use a function like dim_gbits to pick off
> >> >         individual bits of information?
> >> >
> >> >         If you can find a C or Fortran code that was used to create
> this
> >> >         file, then it should be fairly straightforward to figure out
> how
> >> >         to read it.
> >> >
> >> >         --Mary
> >> >
> >> >
> >> >         On Wed, Mar 29, 2017 at 2:18 PM, Debasish Hazra
> >> >         <debasish.hazra5 at gmail.com <mailto:debasish.hazra5 at gmail.com
> >>
> >> >         wrote:
> >> >
> >> >             Hi,
> >> >
> >> >             I am trying to read a binary file with the attached code,
> >> >             but  getting all empty fields in the figure with no
> apparent
> >> >             error message. Uploaded  the data file in the ftp server
> >> >             "viirs_meandbdi_gridded_statis2015048.dat". Any help with
> >> >             this is appreciated.
> >> >
> >> >             Thanks.
> >> >             Debasish
> >> >
> >> >             On Wed, Mar 22, 2017 at 10:33 AM, Debasish Hazra
> >> >             <debasish.hazra5 at gmail.com
> >> >             <mailto:debasish.hazra5 at gmail.com>> wrote:
> >> >
> >> >                 Hi,
> >> >
> >> >                 I am trying to read a binary file with the attached
> >> >                 code, but  getting all empty fields in the figure with
> >> >                 no apparent error message. Uploaded  the data file in
> >> >                 the ftp server
> >> >                 "viirs_meandbdi_gridded_statis2015002.dat". Any help
> >> >                 with this is appreciated.
> >> >
> >> >                 Thanks.
> >> >                 Debasish.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20170330/55c62c48/attachment.html