[GO-ESSP] gridded data management systems

Jonathan Callahan Jonathan.S.Callahan at noaa.gov
Mon Nov 29 15:52:55 MST 2004


Jon,

Before you embark upon a grand software adventure to solve the specific 
problems you have identified I think you carefully review the work of 
other groups to make sure you don't end up just recreating an in house 
version of what other projects are already committed to.  Projects that 
come immediately to mind are NetCDF, OPeNDAP, OPeNDAP aggregation 
servers, our own FDS, GDS from COLA, CDAT from PCMDI, Benno's Ingrid, etc.

I bring this up after having a look at your data portal page 
(http://www.nerc-essc.ac.uk/godiva).  On this page you have borrowed 
heavily from the LAS interface, improving the dataset selection with an 
interactive menu on the left hand side.  All of this is well and good, 
folks are welcome to borrow bits and pieces of code as they see fit.   
But in the godiva case I will note that by largely reinventing the data 
access interface you have missed out on the incremental improvements 
(e.g. a non-applet clickable map in LAS) that gradually accumulate in 
long term funded and highly focused projects like LAS or NetCDF or OPeNDAP.

The functionality you envision for your data system seems quite daunting 
to me.  I would imagine it to be a multi-year job for a team of 
programmers to create such a system from scratch and have it actually 
work better than current systems.  And in those 5 years the current 
systems will have gotten incrementally better as well.  I imagine you 
would have much better success by gluing together existing pieces rather 
than expecting a single application to do it for you.  As Steve points 
out, there are already packages that can do 1-4 and it seems to me that 
the community would benefit most by seeing how a single institution can 
create an excellent site by gluing bits and pieces together.

Have a look at the overview of CDAT to see how they have approached a 
similar problem:

    http://esg.llnl.gov/cdat/overview.html



All the best whichever path you take,


-- Jon




Steve Hankin wrote:

> Hi Jon,
>
> When you refer to "standard flat files" are you including formats like 
> netCDF and HDF5 under that title?  This is often a source of 
> terminology confusion as "flat files" sometimes refers to "anything 
> but a database".  Others regard n-dimensional, multi-variate data 
> standards like netCDF and HDF to be alternatives to "flat" IEEE files.
>
> The question that you pose is essentially to weigh the pros and cons 
> of managing your data with a commercial database that has been 
> enhanced to handle grids, or to handle your data with netCDF (which in 
> the next version will merge with HDF5 to handle compression, tiles, 
> etc.) and the free netCDF utilities.  (Presumably from-scratch 
> development with IEEE binary files is not the way to go.)  You 
> mentioned some down-sides to the commercial software route (cost, 
> "proprietariness" of software, dependence on a single supplier,...).   
> Are the advantages of the database approach sufficient to outweigh 
> these costs?   You have also not mentioned network access to the 
> data.  Is it a requirement is for the data to be OPeNDAP accessible?  
> Or alternatively, is access from enterprise GIS systems at the center 
> of your bullseye?
>
>     * Items 1-2 are trivial for either system.  Comparative
>       performance ... do you have any data?  The Barrodale product is
>       new and one-of-a-kind.  It would be interesting to see some
>       benchmarks comparing it to netCDF and HDF5.
>     * Items 3-4 can be handled with the new FDS (Ferret Data Server)
>       and probably the GDS server, as well.  Custom code may be
>       required depending upon the list of projections that is desired,
>       but these are open environments, where this can be added.   Item
>       3-4 capabilities are also available and presumably well
>       supported if your database is embedded in an enterprise GIS
>       framework.
>     * Item 5 is probably better handled in a database environment,
>       though it can also be handled (with some effort -- in various
>       ways) in a Web service environment based on OPeNDAP.
>
> Just bouncing around the ideas.  This community will be interested to 
> hear what further you learn.
>
>     - steve
>
> ====================================
>
> Jon Blower wrote:
>
>> Hi all,
>>
>> As some of you may know, we at the Reading e-Science Centre have been
>> investigating some new ways to store and manage data from models of the
>> oceans and atmosphere.  We have been looking at storing data in 
>> databases,
>> rather than standard flat-file systems, and have over the last few 
>> months
>> been evaluating IBM's Informix database with Barrodale Computing 
>> Services'
>> Grid DataBlade plug-in (see http://www.resc.rdg.ac.uk/projects.php 
>> for more
>> details).  Eventually this might form the back-end to our own data 
>> portal
>> page (http://www.nerc-essc.ac.uk/godiva).
>>
>> We have found good and bad points about this system and are now 
>> wondering
>> how to take things forward.  I have been considering the feasibility of
>> writing (essentially from scratch) an intelligent storage/management
>> application for gridded geospatial data.  The key features of this 
>> system
>> would include:
>>
>> 1) Data would be stored in a single format but can be extracted in a 
>> variety
>> of formats
>> 2) Data could be sliced and subsetted in all possible ways (e.g. 
>> extraction
>> of 1-D timeseries, 2-D areas, 3-D volumes/animations, 4-D data 
>> blocks) and
>> extracted at different spatial and temporal resolutions
>> 3) Data could be stored on the original grid (including rotated 
>> grids) but
>> extracted on the grid of the user's choice
>> 4) The necessary projection and interpolation would happen on the fly
>> 5) The system would allow complex queries to be made (e.g. "Give me 
>> all the
>> times and locations at which the sea surface temperature was greater 
>> than 20
>> degC in the North Atlantic in June 2003")
>>
>> The systems we have looked at so far get us part, but not all, of the 
>> way
>> there.  Furthermore, the system currently under evaluation 
>> (Informix/Grid
>> DataBlade) is closed-source, commercial software so we can't modify it
>> ourselves.  However, such database-based systems have some key 
>> advantages
>> over standard flat files, notably intelligent tiling and caching, giving
>> very fast retrieval of data.
>>
>> I was wondering whether this community would welcome an effort to 
>> create an
>> open-source data management/storage system for geospatial data, 
>> perhaps as a
>> plug-in to an open-source DBMS such as PostgreSQL.  I haven't found an
>> existing project that answers our requirements, but please let me 
>> know if
>> you know of anything (some packages seem to deal with geospatial 
>> data, but
>> are not designed for _gridded_ data).  It seems that this could be of
>> benefit to a to the GO-ESSP community, considering that any Earth System
>> Portal must be backed by some kind of data store! ;-)
>>
>> This has been rather a long post, sorry!  Any suggestions or feedback 
>> would
>> be very much appreciated.
>>
>> Best wishes,
>> Jon
>>
>> --------------------------------------------------------------
>> Dr Jon Blower              Tel: +44 118 378 5213 (direct line)
>> Technical Director         Tel: +44 118 378 8741 (ESSC)
>> Reading e-Science Centre   Fax: +44 118 378 6413
>> ESSC                       Email: jdb at mail.nerc-essc.ac.uk
>> University of Reading
>> 3 Earley Gate
>> Reading RG6 6AL, UK
>> --------------------------------------------------------------
>>
>> _______________________________________________
>> GO-ESSP mailing list
>> GO-ESSP at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp
>>
> -- 
>
> Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
> 7600 Sand Point Way NE, Seattle, WA 98115-0070
> ph. (206) 526-6080, FAX (206) 526-6744
>  
>
>------------------------------------------------------------------------
>
>_______________________________________________
>GO-ESSP mailing list
>GO-ESSP at ucar.edu
>http://mailman.ucar.edu/mailman/listinfo/go-essp
>  
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp/attachments/20041129/43a52eee/attachment.htm


More information about the GO-ESSP mailing list