[Go-essp-tech] Status of Gateway 2.0 (another use case)

Cinquini, Luca (3880) Luca.Cinquini at jpl.nasa.gov
Wed Dec 14 08:20:41 MST 2011


Hi Jennifer,
although I totally agree with Estani, that we should build more intelligence in the client side, developing a "negative" syntax for the P2P api should not be difficult...

Because of the HTTP URL syntax, it would need to be something like:

&model=!HadCM3&model=!CCSM

(i.e. search all models that are not HadCM# and not CCSM).

I'll put on our to-do...

thanks, Luca

On Dec 14, 2011, at 8:05 AM, Estanislao Gonzalez wrote:

Hi Jennifer,

AFAIK no, it's not possible to negate values, but that doesn't seem to correspond with the definition of the use-case you've presented.
AFAICT the user would just issue the same search and new files would be added (what about changed ones and those deleted from the server because of new versions?)

This all might be possible to implement, but before people start coding like frenzy, I think there are a couple of things that should be said:
1) I don't think any system should provide for every single use-case. CMIP5 is nothing like CMIP3; it of course about the amount of data (impossible to store centrally, both for storage and throughput constrains) but also the federative aspect (that allows every institution some freedom on how to provide the data and forces them to provide also the know-how for this) and the quality assurance (data changes a lot until it gets verified).
2) The main idea behind the P2P, and to where we all want the Gateway 2.0 to go, is modularity. I think this is a required architectural decision we need to embrace. So I don't think it's a good idea to extend the Server side to provide a functionality typically available at the client, like download management.

So, the way I picture this is:
1) get the list of files to be downloaded (in the wget script or by any other means)
2) filter that to remove what is not required

Better yet, rely on a download manager for this like the DML or extend it if it's not providing this. The key here is the ability to:
1) Know when something changed (we have many more users than CMIP3 and it exceeds the climate community, so the less load the servers have, the better).
For this both systems has some kind of atom feed
2) Do something with this information. This is user dependent. Some will want it to be kept "synchronized" gathering always the latest version, some will want to have the multiple versions so they can compare them, some might just want to be notified and other won't care at all :-)

I think 2) is not really implemented, although some use cases are already supported in P2P (not sure about Gateway 2.0).
So you could add a cronjob with this:
bash <(wget http://p2pnode/wgetscript?experiment=decadal1960&realm=atmos&time_frequency=month&variable=clt -qO - | grep -v HadCM3)
and as soon as something changes you would have your new file (everything but HadCM3 as you pointed out in your example)

But there are a lot of cave-eats there and I don't think we want to encourage users to "misbehave" so we need some intelligence that doesn't try to download thousands of file if not required, uses multi-threading in a useful manner, handle certificate expiration, etc.

I think we should improve the client-side of the CMIP5 system.

My 2c,
Estani

Am 14.12.2011 15:30, schrieb Jennifer Adams:
Dear Colleagues,
I have thought of another use case for the Gateway 2.0 and P2P developers to consider.

Right now, I am diligently downloading CMIP5 data, gathering runs for a subset of experiements and variables, for all available models and ensemble members. I realize that not all modeling centers have submitted their data yet, so in a few months time, I will have to go back and fill in the gaps in my collection. At that point, I will have a list of experiments/variables/models/ensembles that I have already acquired, and I will need to search the available CMIP5 data for runs that I do not have.

I know it is unfair to compare to the good old days of using wget to grab CMIP3 data via FTP from ftp-esg.ucllnl.org<http://ftp-esg.ucllnl.org/>, but this particular use case used to be so easy … I would just rerun my single wget command and it would fill in the missing files in my collection automatically.

Is there a syntax in the Gateway 2.0 and P2P search URLs that allows for a NOT?
http://esg-datanode.jpl.nasa.gov/esg-search/wget?experiment=decadal1960&realm=atmos&time_frequency=month&variable=clt&model!=HadCM3

Any better ideas or plans for how to do this?

--Jennifer



--
Jennifer M. Adams
IGES/COLA
4041 Powder Mill Road, Suite 302
Calverton, MD 20705
jma at cola.iges.org<mailto:jma at cola.iges.org>






_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech




--
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de<mailto:gonzalez at dkrz.de>

_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu<mailto:GO-ESSP-TECH at ucar.edu>
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20111214/1720203e/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list