<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffcc" text="#000000">

    I agree with you Stephen.... completely.<br>

    <br>

    The dataset is OUR (ESGF) logical file unit.&nbsp; It would be great if

    we could make the world think in datasets and completely encapsulate

    the notion of files, I would love that, but until we acclimate

    people with the dataset notion as they use the system the notion of

    "file" as we are all used to cannot be avoided.&nbsp; We should

    manipulate things in terms of the ESGF *logical* file = the dataset

    as represented by the catalog... as much as we can, because it makes

    sense in our model of how things should be grouped.&nbsp; At the

    replication level things should only be manipulated in the context

    of datasets.&nbsp; For the user... we should support files, but I think

    we should do the following:<br>

    <br>

    If a user wants a file from a dataset, they should be able to get

    the file but we should maintain the context of the dataset by

    maintaining the dataset as a physical filesystem construct.&nbsp; For

    example if you use a mac you will see that an "application" is

    really a top level directory for a set of files.&nbsp; When you download

    an application what you get is a set of files in a file hierarchy

    such that in concert they manifest the application you expect.&nbsp;

    Along the same lines, I would propose that we have a similar

    construct for datasets and their relationship to files.<br>

    The details of this layout is something I'd like to bring up for

    discussion, given that the basic premise of what I am saying is

    accepted.<br>

    <br>

    We can then build tools to provide that internalize this construct

    and thus able to manipulate datasets directly.&nbsp; I have been mulling

    over building an ESG SHELL and I think I will finally do so.... as a

    part of that shell you would be able to perform augmented shell

    commands like "ls" that would operate accordingly in the context of

    our notion of dataset.<br>

    <br>

    something like<br>

    .<br>

    `-- foo_dataset<br>

    &nbsp;&nbsp;&nbsp; |-- foo_datafile1.nc<br>

    &nbsp;&nbsp;&nbsp; |-- foo_datafile2.nc<br>

    &nbsp;&nbsp;&nbsp; |-- foo_datafile3.nc<br>

    &nbsp;&nbsp;&nbsp; |-- foo_datafile5.nc<br>

    &nbsp;&nbsp;&nbsp; `-- foo_dataset.catalog<br>

    <br>

    With this kind of structure you would always have the full catalog

    for the dataset present and represented.&nbsp; You may have all or a

    subset of files that are in the catalog present.&nbsp; In the replication

    scenario, you would have them all.&nbsp; In the end user scenario you may

    have a subset.&nbsp; The augmented esgf-shell "ls" command you would be

    able to additionally see what files are present vs what files are

    not.&nbsp; Also because you have the catalog you can check the checksums

    of the files and you can then issue an esgf-shell command to

    "complete" the dataset and have it pull down the rest of the files.&nbsp;

    In the replication scheme I am exploring this is how this is

    intended to work.&nbsp; Also the location of the top level foo_dataset is

    under the data.repl directory where all replicas are kept.&nbsp; This

    bears fruit down the line by simplifying several operations down the

    line.&nbsp; This imposition is not required for the data publisher over

    datasets that they are custodians for, because of the ability to use

    the publisher's database to perform this file location - which is

    part of another scheme I have hatched to divorce the filesystem from

    the tyranny of the DRS's overreaching (IMHO) filesystem mandate. <br>

    <br>

    Now I'll be the first to mention that this proposal to impose a

    filesystem structure is somewhat hypocritical, since I have railed

    against the DRS's imposition of structure on the filesystem... but I

    think in this context is it limited enough in scope and provides

    enough of a benefit to be justified.<br>

    <br>

    I'd like to have this conversation.<br>

    <br>

    Trust me... this is the way to go. (IMHO)&nbsp; :-)<br>

    <br>

    On 6/2/11 12:58 AM, <a class="moz-txt-link-abbreviated" href="mailto:stephen.pascoe@stfc.ac.uk">stephen.pascoe@stfc.ac.uk</a> wrote:

    <blockquote

cite="mid:4C353E6E4A08AE4792B350DAA392B52119E27D@EXCHMBX01.fed.cclrc.ac.uk"

      type="cite"><span style="font-size: 11pt; font-family:

        &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31, 73,

        125);">My instinct is that we should accept datasets are

        collections of files and not try to completely hide this idea,

        however most of the system should focus on datasets because they

        more flexible.&nbsp; </span></blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Gavin M. Bell

Lawrence Livermore National Labs

--


 "Never mistake a clear view for a short distance."

                      -Paul Saffo


(GPG Key - <a class="moz-txt-link-freetext" href="http://rainbow.llnl.gov/dist/keys/gavin.asc">http://rainbow.llnl.gov/dist/keys/gavin.asc</a>)


 A796 CE39 9C31 68A4 52A7  1F6B 66B7 B250 21D5 6D3E

</pre>

  </body>

</html>