[Go-essp-tech] Status of Gateway 2.0 (another use case)

Estanislao Gonzalez gonzalez at dkrz.de
Wed Dec 14 08:05:37 MST 2011


Hi Jennifer,

AFAIK no, it's not possible to negate values, but that doesn't seem to 
correspond with the definition of the use-case you've presented.
AFAICT the user would just issue the same search and new files would be 
added (what about changed ones and those deleted from the server because 
of new versions?)

This all might be possible to implement, but before people start coding 
like frenzy, I think there are a couple of things that should be said:
1) I don't think any system should provide for every single use-case. 
CMIP5 is nothing like CMIP3; it of course about the amount of data 
(impossible to store centrally, both for storage and throughput 
constrains) but also the federative aspect (that allows every 
institution some freedom on how to provide the data and forces them to 
provide also the know-how for this) and the quality assurance (data 
changes a lot until it gets verified).
2) The main idea behind the P2P, and to where we all want the Gateway 
2.0 to go, is modularity. I think this is a required architectural 
decision we need to embrace. So I don't think it's a good idea to extend 
the Server side to provide a functionality typically available at the 
client, like download management.

So, the way I picture this is:
1) get the list of files to be downloaded (in the wget script or by any 
other means)
2) filter that to remove what is not required

Better yet, rely on a download manager for this like the DML or extend 
it if it's not providing this. The key here is the ability to:
1) Know when something changed (we have many more users than CMIP3 and 
it exceeds the climate community, so the less load the servers have, the 
better).
For this both systems has some kind of atom feed
2) Do something with this information. This is user dependent. Some will 
want it to be kept "synchronized" gathering always the latest version, 
some will want to have the multiple versions so they can compare them, 
some might just want to be notified and other won't care at all :-)

I think 2) is not really implemented, although some use cases are 
already supported in P2P (not sure about Gateway 2.0).
So you could add a cronjob with this:
bash <(wget 
http://p2pnode/wgetscript?experiment=decadal1960&realm=atmos&time_frequency=month&variable=clt 
-qO - | grep -v HadCM3)
and as soon as something changes you would have your new file 
(everything but HadCM3 as you pointed out in your example)

But there are a lot of cave-eats there and I don't think we want to 
encourage users to "misbehave" so we need some intelligence that doesn't 
try to download thousands of file if not required, uses multi-threading 
in a useful manner, handle certificate expiration, etc.

I think we should improve the client-side of the CMIP5 system.

My 2c,
Estani

Am 14.12.2011 15:30, schrieb Jennifer Adams:
> Dear Colleagues,
> I have thought of another use case for the Gateway 2.0 and P2P 
> developers to consider.
>
> Right now, I am diligently downloading CMIP5 data, gathering runs for 
> a subset of experiements and variables, for all available models and 
> ensemble members. I realize that not all modeling centers have 
> submitted their data yet, so in a few months time, I will have to go 
> back and fill in the gaps in my collection. At that point, I will have 
> a list of experiments/variables/models/ensembles that I have already 
> acquired, and I will need to search the available CMIP5 data for runs 
> that I do not have.
>
> I know it is unfair to compare to the good old days of using wget to 
> grab CMIP3 data via FTP from ftp-esg.ucllnl.org 
> <http://ftp-esg.ucllnl.org>, but this particular use case used to be 
> so easy ... I would just rerun my single wget command and it would 
> fill in the missing files in my collection automatically.
>
> Is there a syntax in the Gateway 2.0 and P2P search URLs that allows 
> for a NOT?
> http://esg-datanode.jpl.nasa.gov/esg-search/wget?experiment=decadal1960&realm=atmos&time_frequency=month&variable=clt 
> <http://esg-datanode.jpl.nasa.gov/esg-search/wget?experiment=decadal1960&realm=atmos&time_frequency=month&variable=clt>&model!=HadCM3
>
> Any better ideas or plans for how to do this?
>
> --Jennifer
>
>
>
> --
> Jennifer M. Adams
> IGES/COLA
> 4041 Powder Mill Road, Suite 302
> Calverton, MD 20705
> jma at cola.iges.org <mailto:jma at cola.iges.org>
>
>
>
>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20111214/cbbf39c2/attachment.html 


More information about the GO-ESSP-TECH mailing list