<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2800.1476" name=GENERATOR></HEAD>

<BODY>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff size=2>Hi 

Steve,</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff size=2>Thanks 

very much for this.&nbsp; We'll certainly check out the Ferret Data Server, it 

looks very interesting.&nbsp; To answer your questions:</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff size=2>I was 

referring to "flat files" to mean "anything but a database".&nbsp; Of course, I 

appreciate that netCDF and HDF are rather more sophisticated than "raw" IEEE 

files and currently we are working with file formats such as these (i.e. netCDF 

etc).&nbsp; As I see it, the advantages that databases might have over 

netCDF/HDF and their APIs are as follows (I'm happy to be corrected on this, I 

don't know all the details):</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2>1)&nbsp; Caching of frequently-used data chunks, improving performance in 

many cases (of course, operating systems and hard disks also have their own 

caching strategies outside of the database)</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2>2)&nbsp; Automatic splitting of the data into tiles so that data subsets 

can be retrieved efficiently (did you say that HDF does tiling 

too?)</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2>3)&nbsp; Intelligent,&nbsp;automatic&nbsp;storage of data at different 

resolutions ("pyramid" scheme).&nbsp; I would expect databases using such a 

scheme to be much faster than using the netCDF/HDF APIs at retrieving data at 

different resolutions.&nbsp; We have found that extracting, say, every third 

data point from an HDF5 file to be very slow.</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2>4)&nbsp; Better performance for slicing data in all four 

dimensions.&nbsp; In our experience, data sets based on netCDF/HDF tend to have 

one file per timestep (this is how they are typically output from models).&nbsp; 

This means that to extract a timeseries of 100 points, we need to open 100 

files, taking one point out of each.&nbsp; The equivalent operation is much 

faster with the database.</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2>5)&nbsp; Ability to store vector data (e.g. observations at points or 

along ship tracks) alongside raster (gridded) data.</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=687144910-25112004></SPAN><FONT face=Arial><FONT 

color=#0000ff><FONT size=2>T<SPAN class=687144910-25112004>he main downsides of 

most database-centred approaches are the fact that the systems are generally not 

open-source and are often expensive.</SPAN></FONT></FONT></FONT></DIV>

<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN 

class=687144910-25112004></SPAN></FONT></FONT></FONT>&nbsp;</DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff size=2>We 

haven't yet done any formal benchmarking of our database's performance, but from 

simply playing with it, it certainly seems like it performs well, particularly 

for the cases mentioned above.&nbsp; As far as network access goes, our 

intention would be to provide access via a Web Services interface, and/or 

OPeNDAP.</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff size=2>You 

mention GIS - my understanding of GIS is that it only really understands "two 

and a half" dimensions, i.e. it understands lat-long, and time is not a true 

dimension but an attributed of a data segment.&nbsp; I understand that GIS 

struggles with the third dimension (at least, standard OpenGIS does), being 

concerned mostly with surface features.&nbsp; So at the moment we do not intend 

to link with a GIS system.&nbsp; I understand that there are projects that are 

attempting to resolve this issue and create GIS systems suitable for fully 4-D 

data.</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff size=2>In 

summary, I guess I am trying to find out whether there are any existing 

open-source systems that can offer the same or similar functionality and 

performance to the database system we have under test.&nbsp; If not, perhaps 

there is a system that can be adapted and developed by the community - I'm sure 

that many groups must have considered precisely these problems and gotten some 

way to a solution.&nbsp; There may well be a way of creating a system that uses 

many of the tricks of a database, but is not actually itself connected to any 

particular DBMS.</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2>Regards,</FONT></SPAN></DIV>

<DIV><SPAN class=687144910-25112004><FONT face=Arial color=#0000ff 

size=2>Jon</FONT></SPAN></DIV>

<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">

  <DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma 

  size=2>-----Original Message-----<BR><B>From:</B> Steve Hankin 

  [mailto:Steven.C.Hankin@noaa.gov]<BR><B>Sent:</B> 24 November 2004 

  17:32<BR><B>To:</B> Jon Blower<BR><B>Cc:</B> go-essp@ucar.edu; Adit Santokhee; 

  Keith Haines<BR><B>Subject:</B> Re: [GO-ESSP] gridded data management 

  systems<BR><BR></FONT></DIV>Hi Jon, 

  <P>When you refer to "standard flat files" are you including formats like 

  netCDF and HDF5 under that title?&nbsp; This is often a source of terminology 

  confusion as "flat files" sometimes refers to "anything but a database".&nbsp; 

  Others regard n-dimensional, multi-variate data standards like netCDF and HDF 

  to be alternatives to "flat" IEEE files. 

  <P>The question that you pose is essentially to weigh the pros and cons of 

  managing your data with a commercial database that has been enhanced to handle 

  grids, or to handle your data with netCDF (which in the next version will 

  merge with HDF5 to handle compression, tiles, etc.) and the free netCDF 

  utilities.&nbsp; (Presumably from-scratch development with IEEE binary files 

  is not the way to go.)&nbsp; You mentioned some down-sides to the commercial 

  software route (cost, "proprietariness" of software, dependence on a single 

  supplier,...).&nbsp;&nbsp; Are the advantages of the database approach 

  sufficient to outweigh these costs?&nbsp;&nbsp; You have also not mentioned 

  network access to the data.&nbsp; Is it a requirement is for the data to be 

  OPeNDAP accessible?&nbsp; Or alternatively, is access from enterprise GIS 

  systems at the center of your bullseye? 

  <UL>

    <LI>Items 1-2 are trivial for either system.&nbsp; Comparative performance 

    ... do you have any data?&nbsp; The Barrodale product is new and 

    one-of-a-kind.&nbsp; It would be interesting to see some benchmarks 

    comparing it to netCDF and HDF5. 

    <LI>Items 3-4 can be handled with the new FDS (Ferret Data Server) and 

    probably the GDS server, as well.&nbsp; Custom code may be required 

    depending upon the list of projections that is desired, but these are open 

    environments, where this can be added.&nbsp;&nbsp; Item 3-4 capabilities are 

    also available and presumably well supported if your database is embedded in 

    an enterprise GIS framework. 

    <LI>Item 5 is probably better handled in a database environment, though it 

    can also be handled (with some effort -- in various ways) in a Web service 

    environment based on OPeNDAP. </LI></UL>Just bouncing around the ideas.&nbsp; 

  This community will be interested to hear what further you learn. 

  <P>&nbsp;&nbsp;&nbsp; - steve 

  <P>==================================== 

  <P>Jon Blower wrote: 

  <BLOCKQUOTE TYPE="CITE">Hi all, 

    <P>As some of you may know, we at the Reading e-Science Centre have been 

    <BR>investigating some new ways to store and manage data from models of the 

    <BR>oceans and atmosphere.&nbsp; We have been looking at storing data in 

    databases, <BR>rather than standard flat-file systems, and have over the 

    last few months <BR>been evaluating IBM's Informix database with Barrodale 

    Computing Services' <BR>Grid DataBlade plug-in (see <A 

    href="http://www.resc.rdg.ac.uk/projects.php">http://www.resc.rdg.ac.uk/projects.php</A> 

    for more <BR>details).&nbsp; Eventually this might form the back-end to our 

    own data portal <BR>page (<A 

    href="http://www.nerc-essc.ac.uk/godiva">http://www.nerc-essc.ac.uk/godiva</A>). 


    <P>We have found good and bad points about this system and are now wondering 

    <BR>how to take things forward.&nbsp; I have been considering the 

    feasibility of <BR>writing (essentially from scratch) an intelligent 

    storage/management <BR>application for gridded geospatial data.&nbsp; The 

    key features of this system <BR>would include: 

    <P>1) Data would be stored in a single format but can be extracted in a 

    variety <BR>of formats <BR>2) Data could be sliced and subsetted in all 

    possible ways (e.g. extraction <BR>of 1-D timeseries, 2-D areas, 3-D 

    volumes/animations, 4-D data blocks) and <BR>extracted at different spatial 

    and temporal resolutions <BR>3) Data could be stored on the original grid 

    (including rotated grids) but <BR>extracted on the grid of the user's choice 

    <BR>4) The necessary projection and interpolation would happen on the fly 

    <BR>5) The system would allow complex queries to be made (e.g. "Give me all 

    the <BR>times and locations at which the sea surface temperature was greater 

    than 20 <BR>degC in the North Atlantic in June 2003") 

    <P>The systems we have looked at so far get us part, but not all, of the way 

    <BR>there.&nbsp; Furthermore, the system currently under evaluation 

    (Informix/Grid <BR>DataBlade) is closed-source, commercial software so we 

    can't modify it <BR>ourselves.&nbsp; However, such database-based systems 

    have some key advantages <BR>over standard flat files, notably intelligent 

    tiling and caching, giving <BR>very fast retrieval of data. 

    <P>I was wondering whether this community would welcome an effort to create 

    an <BR>open-source data management/storage system for geospatial data, 

    perhaps as a <BR>plug-in to an open-source DBMS such as PostgreSQL.&nbsp; I 

    haven't found an <BR>existing project that answers our requirements, but 

    please let me know if <BR>you know of anything (some packages seem to deal 

    with geospatial data, but <BR>are not designed for _gridded_ data).&nbsp; It 

    seems that this could be of <BR>benefit to a to the GO-ESSP community, 

    considering that any Earth System <BR>Portal must be backed by some kind of 

    data store! ;-) 

    <P>This has been rather a long post, sorry!&nbsp; Any suggestions or 

    feedback would <BR>be very much appreciated. 

    <P>Best wishes, <BR>Jon 

    <P>-------------------------------------------------------------- <BR>Dr Jon 

    Blower&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    Tel: +44 118 378 5213 (direct line) <BR>Technical 

    Director&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Tel: +44 118 378 

    8741 (ESSC) <BR>Reading e-Science Centre&nbsp;&nbsp; Fax: +44 118 378 6413 

    <BR>ESSC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

    Email: jdb@mail.nerc-essc.ac.uk <BR>University of Reading <BR>3 Earley Gate 

    <BR>Reading RG6 6AL, UK 

    <BR>-------------------------------------------------------------- 

    <P>_______________________________________________ <BR>GO-ESSP mailing list 

    <BR>GO-ESSP@ucar.edu <BR><A 

    href="http://mailman.ucar.edu/mailman/listinfo/go-essp">http://mailman.ucar.edu/mailman/listinfo/go-essp</A></P></BLOCKQUOTE>

  <P>-- 

  <P>Steve Hankin, NOAA/PMEL -- Steven.C.Hankin@noaa.gov <BR>7600 Sand Point Way 

  NE, Seattle, WA 98115-0070 <BR>ph. (206) 526-6080, FAX (206) 526-6744 

  <BR>&nbsp; </P></BLOCKQUOTE></BODY></HTML>