[Go-essp-tech] publishing by realm

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Thu Feb 25 02:03:16 MST 2010


Hi all,

I completely agree that publishing by realm fits much better with the way the Gateway's UI is designed.  We've been discussing this issue over the last few days between IS-ENES and UKMO and I think the consensus is that it would be pragmatic change but it would have implications.

(I'm going to use the term realm-dataset to mean the unit of publication "at the realm level")

I had thought that publishing by realm would mean that we can only manage versions of realm-datasets rather than atomic-datasets but Bob's comment made me think.  We would have a rather muddled version system:

 1. esgpublish would track versions of realm-datasets and files
 2. The DRS records versions of atomic datasets

So what are we versioning?  What happens when 1 atomic-dataset from a realm is found to have errors?

 1. The entire realm-dataset would be unpublished
 2. The realm-dataset could be republished with the faulty atomic-dataset removed (realm-dataset v2)
 3. A new version of the realm-dataset could be published with the corrected variable (realm-dataset v3, atomic-dataset v2).

At this point we potentially have confusing version information.  Alternatively we could not do #2 but then the whole realm-dataset is unpublished whilst problems are fixed.  So we need to decide whether we will continue to represent versions of atomic-datasets or change the definition of the DRS version component.

This gets more complex when we consider replication.  If this realm-dataset has been replicated do we propagate all these versions to the replicas?

Just throwing some issues into the air.

Cheers,
Stehphen.


-----Original Message-----
From: go-essp-tech-bounces at ucar.edu on behalf of Bob Drach
Sent: Wed 2/24/2010 7:20 PM
To: Luca Cinquini
Cc: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] publishing by realm
 
Hi Luca,

I'm happy with the solution as well. The main downside, compared to  
publishing at the variable level, is that if new versions of specific  
files are published, it will be necessary to republishe more than  
just the modified files. However the publisher can be instructed to  
only rescan the files modified, and in the case where multiple  
variables are updated this scheme would actually require less  
republishing. In short, I think it's a workable solution.

Bob

On Feb 24, 2010, at 7:56 AM, Luca Cinquini wrote:

> Hi Bob,
> 	I looked at the PCMDI site after you published by realm, and it  
> seems to me that this is a MUCH better presentation of the data to  
> the user. The search results are the right granularity (1825 total  
> CMIP3 datasets, 194 for a single model like CCSM) and the number of  
> files for each dataset, once you click on it, is still very  
> manageable - around 10 for a single CCSM/atmosphere dataset  
> (although this is likely to increase for CMIP5 runs I believe).
>
> Eric and I are working on harvesting the additional experiment and  
> realm information for the thredds catalogs, and expose them as  
> search facets. When this is done (by next week, hopefully), it  
> would be good to re-publish all these data, because I think it  
> would provide a good example on how users can easily find CMIP3  
> data by selecting one or more of model/realm/experiment/variable.
>
> In summary, I like it much better know than before...
>
> thanks, Luca

_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech



-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list