<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    <font face="Times New Roman">Hi Bryan,<br>

      <br>

      My view is that if the time-slices have actually been lost, we

      shouldn't necessarily reject the data as being useless.&nbsp; I agree,

      however, that we should encourage the modeling groups to try to

      recover or reproduce the lost time slices to make their output

      more complete.&nbsp; If that is impossible, I still think in many cases

      analysts will want access to the portions of the time-series that

      are available.&nbsp; <br>

      <br>

      Consider, for example, a 1000 year control run with a decade

      missing in the middle (perhaps all contained in a single lost

      file).&nbsp; Don't you think many researchers will make use of the two

      portions of the time-series that *are* available, and shouldn't

      the available data be assigned a DOI?<br>

      <br>

      As I recall, data not passing QC level 2 won't normally be

      replicated and wouldn't be assigned a DOI.&nbsp; Is this correct?<br>

      <br>

      best regards,<br>

      Karl<br>

      <br>

      <br>

    </font><br>

    On 5/4/11 1:08 AM, Bryan Lawrence wrote:

    <blockquote cite="mid:201105040908.20526.bryan.lawrence@stfc.ac.uk"

      type="cite">

      <pre wrap="">Hi Karl

There are two issues noted in your email:(1) missing variables, and (2) 

missing time slices in a sequence.

I agree that (1) is something to be noted, I think (2) is something that 

should cause failure, and require a response as Ag has suggested. I 

don't think it's too much to ask a modelling group to either provide the 

missing data, or provide missing data flags - but actual missing files in 

a sequence should be an error and a failure!

I think we should be holding a candle for the users here. The reality is 

that no code is going to read the metadata to find missing data, whereas 

code can read and understand missing data flags. 

Bryan

</pre>

      <blockquote type="cite">

        <pre wrap="">Dear Ag,

There is another possible way of handling the "missing data" issue. 

I'm not sure that a dataset should be be required to be complete

(i.e., required to include all time slices) to be considered

eligible for DOI assignment.  That is, we could relax the criteria. 

Note that I don't think we require *all* variables requested within

a single dataset to be present, so some datasets will indeed be

incomplete but be eligible for a DOI.  I think the QC procedure

should be to check with the modeling group, and if they can't supply

the missing time-slices, then we somehow note this flaw in the

dataset documentation and if other QC checks are passed, assign it a

DOI.

The criteria for getting a DOI should be that there are no known

errors in the data itself, and that there are no major problems with

the metadata.  In this case the data will be reliable, and analysts

will be welcome to use it and publish results, so I think it should

be assigned a DOI.

What do others think?

Best regards,

Karl

On 4/28/11 3:12 AM, <a class="moz-txt-link-abbreviated" href="mailto:ag.stephens@stfc.ac.uk">ag.stephens@stfc.ac.uk</a> wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">Dear all,

At BADC we have come across our first "missing data" issue in the

CMIP5 datasets we are ingesting. We have an example of some

missing months for a particular set of variables that was revealed

when running the QC code from DKRZ.

It would be very useful for the CMIP5 archive managers to make an

authoritative statement about how we should handle missing data

time steps in the archive.

I propose the following response when a Data Node receives a dataset 

</pre>

        </blockquote>

      </blockquote>

      <pre wrap="">in which time steps are missing:

</pre>

      <blockquote type="cite">

        <blockquote type="cite">

          <pre wrap="">  1. QC manager (i.e. whoever runs the QC code) informs Data

  Provider that there is missing data in a dataset (specifying

  full DRS structure and date range missing).

  2a. If Data Provider says "no, cannot provide this data" then the

  affected datasets cannot get a DOI and cannot be part of the

  "crystallised archive". STOP

  2b. Data Provider re-generates files, data is re-ingested, new

  version is generated, QC is re-run, all is good. STOP

  2c. Data Provider cannot re-generate but wants to pass QC - so

  needs to create the required files full of missing data.

  3. Data Provider creates missing data files and sends, data

  re-ingested, new version is generated, QC re-run, all good. STOP

In cases 2a and 2c it would also be very useful if the dataset is

annotated to inform the user which dates have been FILLED with

missing data. This would, I believe, be in the QC logs but we

might want a more prominent record of this if possible.

Cheers,

Ag

BADC--

Scanned by iCritical.

</pre>

        </blockquote>

      </blockquote>

      <pre wrap="">

--

Bryan Lawrence

Director of Environmental Archival and Associated Research

(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)

STFC, Rutherford Appleton Laboratory

Phone +44 1235 445012; Fax ... 5848; 

Web: home.badc.rl.ac.uk/lawrence

</pre>

    </blockquote>

  </body>

</html>