[ncl-talk] Query about probability density functions

Fri Sep 13 09:42:06 MDT 2024

Giorgio, can you briefly explain what are the basic numerical differences
between the NCL and Python/IDL versions?  Please explain mathematically,
not with code.

On Fri, Sep 13, 2024 at 9:25 AM Giorgio Graffino <g.graffino at tim.it> wrote:

> I've found a way to normalize the PDF and make it like Python and IDL.
>
> Here is the modified function for reference. I removed the call to the
> Fortran subroutine, so now the function it's a bit slower than the original
> version, and made all values float instead of double.
>
> undef("pdfx_norm")
> function pdfx_norm(x:numeric, nbin[1]:integer, opt:logical)
> local nGood, nbins, xMin, xMax, mnmxint, xSpace \
>     ,bin, pdf, nTot, XMIN, XMAX, nLoOut, nHiOut
> begin
>
>  if (nbin.le.2) then
>      nbins = 25             ; default
>  else
>      nbins = nbin
>  end if
>
>  nGood = num(.not.ismissing(x))
>  if (nGood.lt.3) then
>      print("pdfx_norm: nGood="+nGood+" : Need more non-missing points")
>      pdf = new( nbins, "double", getFillValue(x))
>      return( pdf )
>  end if
>
>  XMIN = min(x)*1.
>  XMAX = max(x)*1.
>
>  xMin = 0.
>  xMax = 0.
>
>  if (opt .and. isatt(opt,"bin_min")) then
>      xMin = opt at bin_min                    ; user set
>  else
>      xMin = XMIN                           ; calculate
>  end if
>
>  if (opt .and. isatt(opt,"bin_max")) then
>      xMax = opt at bin_max                    ; user set
>  else
>      xMax = XMAX                           ; calculate
>  end if
>
>  if (opt .and. isatt(opt,"bin_nice")) then ; nice xMin, xMax
>      outside = False
>      if (isatt(opt,"bin_nice_outside")) then
>          outside = opt at bin_nice_outside
>      end if
>      mnmxint = nice_mnmxintvl( XMIN, XMAX, nbins, outside)
>      xMin    = mnmxint(0)
>      xMax    = mnmxint(1)
>      xSpace  = mnmxint(2)
>    ;;nbins   = round( (xMax-xMin)/xSpace , 3)
>  end if
>
> ;;dbin        = (xMax-xMin)/(nbins-1)          ; 5.2.0
>  dbin        = (xMax-xMin)/nbins
>  binBound    = xMin + ispan(0,nbins,1)*dbin
>  binBound(nbins) = xMax                       ; avoid roundoff
>
>  binCenter   = (binBound(0:nbins-1) + binBound(1:nbins))*0.5
>
>  binBoundMin = binBound(0)
>  binBoundMax = binBound(nbins)
>
>  pdf         = new( nbins,typeof(x), getFillValue(x))
>  pdf         = 0.
>
> do nb=0,nbins-1
>   pdf(nb) = num( x.ge.binBound(nb) .and. x.lt.binBound(nb+1) )
>   if (nb.eq.(nbins-1)) then                 ; last bin
>       pdf(nb) = pdf(nb) + num( x.eq.binBound(nb+1) )   ; include last bound
>   end if
> end do
> ;;----
>
>  pdf!0    = "x"
>  pdf&x    = binCenter
>                                       ; max possible in data
>  nMax     = num(x.ge.XMIN .and. x.le.XMAX)
>                                       ; actual number used
>  nUse     = num(x.ge.binBoundMin .and. x.le.binBoundMax)
>
>  nLoOut   = num(x.lt.binBoundMin)     ; number outliers
>  nHiOut   = num(x.gt.binBoundMax)
>
> ;  pdf      = pdf/nMax*1.e2                      ; original
> percent-frequency PDF
>  pdf      = pdf/sum(pdf)/dbin                 ; modified normalised PDF
>
>  pdf at bin_center     = binCenter
>  pdf at bin_bounds     = binBound
>  pdf at bin_bound_min  = binBoundMin
>  pdf at bin_bound_max  = binBoundMax
>  pdf at bin_spacing    = dbin            ; binBound(2)-binBound(1)
>  pdf at nbins          = nbins
>  pdf at nMax           = nMax
>  pdf at nUse           = nUse
>  if (nLoOut.gt.0 .or. nHiOut.gt.0) then
>      pdf at nLoOut     = nLoOut
>      pdf at nHiOut     = nHiOut
>  end if
>
>  pdf at long_name      = "PDF"
>  if (isatt(x,"long_name")) then
>      pdf at long_name  = "PDF: "+x at long_name
>  end if
>
>  return( pdf )
> end
>
>
> ------ Messaggio Originale ------
> Da: ncl-talk at mailman.ucar.edu
> A: dave.allured at noaa.gov Cc: ncl-talk at ucar.edu
> Inviato: mercoledì, settembre 11º 2024, 05:35 PM
> Oggetto: Re: [ncl-talk] Query about probability density functions
>
> Thanks Dave,
>
> The Fortran code looks straightforward. The last bit is for converting the
> pdf into frequency ("if flag is set", according to the comment), but the
> same code is also present in the contributed.ncl code. I'm assuming the
> flag isn't set, otherwise the conversion would be done twice.
>
> Unfortunately I'm not getting any closer to solve this problem. Simply
> dividing the NCL pdfs by 100 make them much smaller than Python or IDL
> pdfs. I don't know if anyone ever encountered this issue before.
>
> Cheers,
>
> Giorgio
>
>
> ------ Messaggio Originale ------
> Da: dave.allured at noaa.gov
> A: g.graffino at tim.it Cc: ncl-talk at ucar.edu
> Inviato: mercoledì, settembre 11º 2024, 04:58 PM
> Oggetto: Re: [ncl-talk] Query about probability density functions
> Giorgio, the fortran routine is at *ni/src/lib/nfpfort/xy1pdf77.f* in the
> NCL v6.6.2 source code.
>
>
> On Wed, Sep 11, 2024 at 8:01 AM Giorgio Graffino via ncl-talk <
> ncl-talk at mailman.ucar.edu> wrote:
>
>
>> Hi Dennis,
>>
>> Thank you for pointing out that detail.
>>
>> I've found the code of the pdfx function in here (
>> https://github.com/NCAR/ncl/blob/develop/ni/src/examples/gsun/contributed.ncl)
>> and the frequency is computed as pdf = 100d0*pdf/nMax. I tried to normalize
>> the pdf by multiplying the frequency pdf by the number of data and divide
>> by 100, but the results is very different depending on the number of data
>> to compute the pdf. Also, the results are different from similar
>> computations using Python, especially with large number of data. Without
>> knowing what the Fortran routine does (and I can't find it), I don't know
>> how to normalize NCL to match Python.
>>
>> Any suggestions?
>>
>> Cheers,
>>
>> Giorgio
>>
>>
>>
>> ------ Messaggio Originale ------
>> Da: shea at ucar.edu
>> A: g.graffino at tim.it Cc: ncl-talk at ucar.edu
>> Inviato: lunedì, settembre 9º 2024, 08:15 PM
>> Oggetto: Re: [ncl-talk] Query about probability density functions
>>
>> From the *pdfx*
>> <https://www.ncl.ucar.edu/Document/Functions/Contributed/pdfx.shtml>
>> documentation:  "The PDF units are percent [%]."
>>
>> Divide by 100.
>>
>>
>>
>> On Mon, Sep 9, 2024 at 10:26 AM Giorgio Graffino via ncl-talk <
>> ncl-talk at mailman.ucar.edu> wrote:
>>
>>
>>> Hello NCL experts,
>>>
>>> I'm using pdfx to compute probability density functions (
>>> https://www.ncl.ucar.edu/Document/Functions/Contributed/pdfx.shtml). I
>>> noticed that the cumulative density functions sum up to 100 in NCL, instead
>>> of 1 as per definition. Why is that? How can I normalize them back to 1?
>>>
>>>
>>> Thanks for your help.
>>>
>>> Cheers,
>>>
>>> Giorgio
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ucar.edu/pipermail/ncl-talk/attachments/20240913/6c2f2e92/attachment.htm>