<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv=Content-Type content="text/html; charset=us-ascii">

<meta name=Generator content="Microsoft Word 12 (filtered medium)">

<style>

<!--

 /* Font Definitions */

 @font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Consolas;

        panose-1:2 11 6 9 2 2 4 3 2 4;}

 /* Style Definitions */

 p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri","sans-serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p.MsoPlainText, li.MsoPlainText, div.MsoPlainText

        {mso-style-priority:99;

        mso-style-link:"Plain Text Char";

        margin:0cm;

        margin-bottom:.0001pt;

        font-size:10.5pt;

        font-family:Consolas;}

span.PlainTextChar

        {mso-style-name:"Plain Text Char";

        mso-style-priority:99;

        mso-style-link:"Plain Text";

        font-family:Consolas;}

.MsoChpDefault

        {mso-style-type:export-only;}

@page Section1

        {size:612.0pt 792.0pt;

        margin:72.0pt 92.4pt 72.0pt 92.4pt;}

div.Section1

        {page:Section1;}

-->

</style>

<!--[if gte mso 9]><xml>

 <o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

 <o:shapelayout v:ext="edit">

  <o:idmap v:ext="edit" data="1" />

 </o:shapelayout></xml><![endif]-->

</head>

<body lang=EN-GB link=blue vlink=purple>

<div class=Section1>

<p class=MsoPlainText>Hello All,<o:p></o:p></p>

<p class=MsoPlainText><o:p>&nbsp;</o:p></p>

<p class=MsoPlainText>I agree with Karl about option 2, but before discussing

option 1 I'd like to clarify the third of the &quot;starting point&quot;

statements:<o:p></o:p></p>

<p class=MsoPlainText><i>&gt; &gt; 3. We only replicate entire atomic datasets<o:p></o:p></i></p>

<p class=MsoPlainText>This should say, I think: <b>3. We only replicate entire

atomic dataset versions.<o:p></o:p></b></p>

<p class=MsoPlainText><b><o:p>&nbsp;</o:p></b></p>

<p class=MsoPlainText>I can&#8217;t see any grounds for requiring that all

versions be replicated. <o:p></o:p></p>

<p class=MsoPlainText><o:p>&nbsp;</o:p></p>

<p class=MsoPlainText>Making this change introduces another option:<o:p></o:p></p>

<p class=MsoPlainText><b>3. When an atomic dataset on a node contains data beyond

that which is to be replicated (centralized CMIP5 output), a version containing

only the portion to be replicated will be maintained.<o:p></o:p></b></p>

<p class=MsoPlainText><o:p>&nbsp;</o:p></p>

<p class=MsoPlainText>This would require a modification to the versioning

system currently proposed. E.g. <o:p></o:p></p>

<p class=MsoPlainText style='margin-left:36.0pt'><b>When a subset of the data

in an atomic dataset is to replicated, a version with an id of the form &#8220;vr&lt;version

number&gt;&lt;version letter&gt;&#8221; will be created, which contains (copies

of or links to) a subset of the files in a corresponding &#8220;v&lt;version

number&gt;&lt;version letter&gt;&#8221;.<o:p></o:p></b></p>

<p class=MsoPlainText><o:p>&nbsp;</o:p></p>

<p class=MsoPlainText>This avoids the complication of having to split the

larger atomic dataset on the source node. It does increase the number of

versions and links that need to be managed within an atomic dataset, but avoids

multiplying the number of atomic datasets. It would also mean that, within the

DRS, we would have a clear indication in the version id as to whether an atomic

dataset was a complete (&#8220;v..&#8221;) or partial (&#8220;vr..&#8221;)

replication.<o:p></o:p></p>

<p class=MsoPlainText><o:p>&nbsp;</o:p></p>

<p class=MsoPlainText>The main difference between expanding the use of the

version attribute as I&#8217;m suggesting and Stephen&#8217;s option 2 is that the

latter would require breaking up the data on the source node into two atomic

datasets. By making use of the fact that different versions of an atomic

dataset can share files we can avoid this fragmentation.<o:p></o:p></p>

<p class=MsoPlainText><o:p>&nbsp;</o:p></p>

<p class=MsoPlainText>Cheers,<o:p></o:p></p>

<p class=MsoPlainText>Martin<o:p></o:p></p>

<p class=MsoPlainText>&nbsp;&nbsp;<o:p></o:p></p>

<p class=MsoPlainText>&gt; <span lang=EN-US>-----Original Message-----</span><o:p></o:p></p>

<p class=MsoPlainText>&gt; <span lang=EN-US>From: go-essp-tech-bounces@ucar.edu

[mailto:go-essp-tech-</span><o:p></o:p></p>

<p class=MsoPlainText>&gt; <span lang=EN-US>bounces@ucar.edu] On Behalf Of Karl

Taylor</span><o:p></o:p></p>

<p class=MsoPlainText>&gt; <span lang=EN-US>Sent: 07 December 2009 23:33</span><o:p></o:p></p>

<p class=MsoPlainText>&gt; <span lang=EN-US>To: Pascoe, Stephen (STFC,RAL,SSTD)</span><o:p></o:p></p>

<p class=MsoPlainText>&gt; <span lang=EN-US>Cc: go-essp-tech@ucar.edu</span><o:p></o:p></p>

<p class=MsoPlainText>&gt; <span lang=EN-US>Subject: Re: [Go-essp-tech]

Proposal for adjusting our definition of an</span><o:p></o:p></p>

<p class=MsoPlainText>&gt; <span lang=EN-US>atomic dataset</span><o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; Dear Stephen and all,<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; Before commenting on the substance of your email,

let me suggest that<o:p></o:p></p>

<p class=MsoPlainText>&gt; we<o:p></o:p></p>

<p class=MsoPlainText>&gt; not talk about &quot;standard&quot; and

&quot;non-standard&quot; output.&nbsp; Rather, I think<o:p></o:p></p>

<p class=MsoPlainText>&gt; it<o:p></o:p></p>

<p class=MsoPlainText>&gt; will be less confusing to talk about:<o:p></o:p></p>

<p class=MsoPlainText>&gt; 1. CMIP5 &quot;requested&quot; output<o:p></o:p></p>

<p class=MsoPlainText>&gt; 2. output not requested by CMIP5.<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; As an aside, I think it is best to avoid the term

&quot;core&quot; output, and<o:p></o:p></p>

<p class=MsoPlainText>&gt; instead refer to the subset of the output that will

be replicated at<o:p></o:p></p>

<p class=MsoPlainText>&gt; several gateways (e.g., PCMDI, BADC, DKRZ, ...) as

&quot;centralized CMIP5<o:p></o:p></p>

<p class=MsoPlainText>&gt; output&quot;.&nbsp; Dean and I agree this will avoid

confusion.<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; Now to suggestions in your email:<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; I'm not sure I understand option 1, but I'm

definitely opposed to<o:p></o:p></p>

<p class=MsoPlainText>&gt; option<o:p></o:p></p>

<p class=MsoPlainText>&gt; 2.&nbsp; We are not talking about two different

experiments, we are talking<o:p></o:p></p>

<p class=MsoPlainText>&gt; about different subsets of output from a single

experiment.&nbsp; Option 2<o:p></o:p></p>

<p class=MsoPlainText>&gt; would, I'm sure, confuse at least 99% of the users

(well, maybe I<o:p></o:p></p>

<p class=MsoPlainText>&gt; exaggerate).<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; As for option 1,<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; 1. What would the allowable &quot;values&quot; be

for the additional DRS<o:p></o:p></p>

<p class=MsoPlainText>&gt; attribute?<o:p></o:p></p>

<p class=MsoPlainText>&gt; 2. What is meant by &quot;Atomic datasets that

currently span standard and<o:p></o:p></p>

<p class=MsoPlainText>&gt; non-standard output would be split into 2 atomic

datasets&quot;?&nbsp; I don't<o:p></o:p></p>

<p class=MsoPlainText>&gt; think there are any current atomic datasets (except

in our<o:p></o:p></p>

<p class=MsoPlainText>&gt; imagination),<o:p></o:p></p>

<p class=MsoPlainText>&gt; so there is no need to split them.<o:p></o:p></p>

<p class=MsoPlainText>&gt; 3.&nbsp; Rather than saying &quot;Other atomic

datasets would exist in one<o:p></o:p></p>

<p class=MsoPlainText>&gt; category or the other,&quot; couldn't we simply say,

an atomic dataset can<o:p></o:p></p>

<p class=MsoPlainText>&gt; either refer to all time-samples output from the run,

or a subset of<o:p></o:p></p>

<p class=MsoPlainText>&gt; contiguous time-samples defined by the

project.&nbsp;&nbsp; [I'm not sure that<o:p></o:p></p>

<p class=MsoPlainText>&gt; it's absolutely necessary that they be contiguous,

but I would think<o:p></o:p></p>

<p class=MsoPlainText>&gt; this would be less confusing.&nbsp; For example,

suppose the CMIP5 requested<o:p></o:p></p>

<p class=MsoPlainText>&gt; output was for the years 1950-1980, but the full

expt. ran from 1850 to<o:p></o:p></p>

<p class=MsoPlainText>&gt; 2005.&nbsp; I would think that having the atomic

dataset defined by the<o:p></o:p></p>

<p class=MsoPlainText>&gt; CMIP5<o:p></o:p></p>

<p class=MsoPlainText>&gt; requested output falling inside the atomic dataset

for the non-<o:p></o:p></p>

<p class=MsoPlainText>&gt; requested<o:p></o:p></p>

<p class=MsoPlainText>&gt; output would seem to &quot;split&quot; the

non-requested atomic dataset, which<o:p></o:p></p>

<p class=MsoPlainText>&gt; seems contradictory (can you split an atomic

dataset?).]<o:p></o:p></p>

<p class=MsoPlainText>&gt; 4.&nbsp; Note that there are some cases in which the

CMIP5 *requested*<o:p></o:p></p>

<p class=MsoPlainText>&gt; output<o:p></o:p></p>

<p class=MsoPlainText>&gt; is non-contiguous.&nbsp; For example, in the case of

aerosol data, some of<o:p></o:p></p>

<p class=MsoPlainText>&gt; the 3-D fields are collected in 1-year samples as

follows: 1850 to 1950<o:p></o:p></p>

<p class=MsoPlainText>&gt; every 20 years, 1960 to 2020 every 10 years, 2040 to

2100 every 20<o:p></o:p></p>

<p class=MsoPlainText>&gt; years.&nbsp; If we require the time-samples in an

atomic dataset be<o:p></o:p></p>

<p class=MsoPlainText>&gt; contiguous, this would require 17 different atomic

datasets would<o:p></o:p></p>

<p class=MsoPlainText>&gt; comprise the CMIP5 requested output for these

variables.&nbsp; Perhaps<o:p></o:p></p>

<p class=MsoPlainText>&gt; that's<o:p></o:p></p>

<p class=MsoPlainText>&gt; unattractive and argues against requiring that the

data be contiguous.<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; I'll try to join tomorrow at the beginning, at

least.<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; Best regards,<o:p></o:p></p>

<p class=MsoPlainText>&gt; Karl<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; stephen.pascoe@stfc.ac.uk wrote:<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; A bunch of the ESG developers are in NCAR this

week talking in detail<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; about versioning and representing replicas in

the datanode and<o:p></o:p></p>

<p class=MsoPlainText>&gt; gateway.<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; We have come to the conclusion that in order to

implement replication<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; we need to confine ourselves to replicating

entire atomic datasets.<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; We would like to work with the following

principles:<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; 1. CMIP5 archive is a set of atomic datasets<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; 2. The CMIP5 standard output is a subset of the

CMIP5 archive<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; 3. We only replicate entire atomic datasets.<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; &gt;From previous emails it is apparent that

the standard output does<o:p></o:p></p>

<p class=MsoPlainText>&gt; not<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; correspond to a set of atomic datasets because

in some cases standard<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; output is a temporal subset of an atomic

dataset.&nbsp; This implies that<o:p></o:p></p>

<p class=MsoPlainText>&gt; a<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; replica of an atomic dataset would be a

temporal subset of that<o:p></o:p></p>

<p class=MsoPlainText>&gt; atomic<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; dataset.<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; Therefore we propose adjusting the definition

of an atomic dataset to<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; allow us to only replicate entire atomic

datasets.&nbsp; We suggest 2 ways<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; of achieving this:<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;&nbsp; 1. Add an extra attribute to the DRS syntax

to represent the<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;&nbsp; difference between standard and

non-standard output.&nbsp; Atomic<o:p></o:p></p>

<p class=MsoPlainText>&gt; datasets<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;&nbsp; that currently span standard and

non-standard output would be split<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;&nbsp; into 2 atomic datasets.&nbsp; Other

atomic datasets would exist in one<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;&nbsp; category or the other.<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;&nbsp; 2. Split all experiments (as definied in

the DRS) that contain<o:p></o:p></p>

<p class=MsoPlainText>&gt; atomic<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;&nbsp; datasets that span standard and

non-standard output into 2<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;&nbsp; experiments e.g.

&quot;&lt;expt&gt;_standard&quot;, &quot;&lt;expt&gt;_optional&quot;.<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; We'd like to discuss this proposal at the telco

tomorrow.&nbsp; Comments<o:p></o:p></p>

<p class=MsoPlainText>&gt; welcome.<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; Thanks,<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt; Stephen.<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; &gt;<o:p></o:p></p>

<p class=MsoPlainText>&gt; <o:p></o:p></p>

<p class=MsoPlainText>&gt; _______________________________________________<o:p></o:p></p>

<p class=MsoPlainText>&gt; GO-ESSP-TECH mailing list<o:p></o:p></p>

<p class=MsoPlainText>&gt; GO-ESSP-TECH@ucar.edu<o:p></o:p></p>

<p class=MsoPlainText>&gt; http://mailman.ucar.edu/mailman/listinfo/go-essp-tech<o:p></o:p></p>

</div>

<br><p>-- 

<BR>Scanned by iCritical.

</p>

<br></body>

</html>