[Go-essp-tech] ESG Federation Priotities - Was: NCI OpenIDs not working at PCMDI Gateway

Eric Nienhouse ejn at ucar.edu
Fri Oct 14 09:32:13 MDT 2011


Hi All,

Our NCI OpenID thread was getting rather off topic, so I've started a 
new one.

Good to hear the NCI OpenID issue has been resolved and that the NCI 
node has received a number of accolades for quality service  :-)

I'd like to continue discussing federation priorities, development 
efforts and  replication.  Thanks Stephen and Gavin for summarizing a 
number of efforts in support of data access across the federation, 
including securing OpenDAP, LAS Product Services and replication.

It is most important we all stay focused on interoperability, system 
interfaces and specifications as we move forward.  I believe this is 
especially critical now as many federation efforts are at high activity 
level.

It's obvious the success and stability of the production ESGF system 
serving a large user base is critical as many users are preparing for 
near term scientific reporting deadlines.  Note, fed wide, we have ~25K 
users, many of whom are active CMIP5 researchers.  Published dataset 
volume and user downloads are rapidly increasing.

To this end I have a the questions/comments below.

Regards to all,

-Eric
>
>> It get's about people being able to download from multiple sites at the 
>> same time, and specially from a local one.
>> That's pretty much what is happening at IPSL, AFAIK you are indeed 
>> replicating data internally so scientist can get to them much faster.
>> That's the whole idea of replication.
>>
>>     
> There is a replication mechanism in the works - are you volunteering 
> to get this bit of work completed?
Gavin: A couple of questions about this replication mechanism in the 
works regarding interoperability:

Will this work have impact on the Thredds catalog representation of 
replica datasets?  Are you anticipating any changes to the replica 
publication workflow?  I ask as we're working on search scalability, 
metadata transfer and replicas.

> LAS is fully installed and integrated into the ESGF P2P Node.
> As Sebastien noted with the LAS URLs this task has been done.
>
> If you install your ESGF P2P Node with --type compute you will get 
> this configured and installed and you too can provide LAS 
> functionality. :-)  Try it out :-)

Indeed LAS Product Service integration is getting uptake, which is good 
to see.  We're publishing NCAR CMIP5 datasets with LAS endpoints into 
the Gateway 1.3.3 snapshot for pre-release testing.  LAS is a great 
service for visualization and data subset and download.

One concern here at NCAR relates to securing LAS access to CMIP5 
datasets in our production data node.  My understanding is that LAS 
services are not yet under access control in the (compute) node.

Is this correct?  If so, what are the plans for securing this service?  
Is the intention to utilize the OpenDAP security mechanism for doing so?

Thanks for any details you can provide.

On 13/10/2011 13:02, stephen.pascoe at stfc.ac.uk wrote:
> Sébastien and all,
>
> I agree getting all those services in place at one time is the target.  It is challenging that different parts of the federation have priorities and it's hard work to keep all the different parts in sync.  Some of us need OPeNDAP straight away, some need CIM metadata, some need GridFTP and checksums (for replication), some want visualisation (LAS).  All I can do now is mention a few areas where we are making progress.
>
> OPeNDAP.  I know our OPeNDAP security is broken at present but we've just spent some contractor time figuring out the problem which we have just pushed to esg-orp.git's devel branch.  This turns several hacks that make OPeNDAP work into configurable options.  
>
> We have also contributed the TDS security testing tool in esg-contrib.git.  Some initial tests show that JPL is the one place where OPeNDAP is working and correctly secured.  At NCI the OPENDAP aggregations weren't accessible for datasets where the NetCDF was.  Unless you are using the latest esg-orp filters it is likely the OPeNDAP URLs are not correctly secured.  There is also a loophole where if NetCDF files are in a threeds_dataset_root but not explicitly restricted in a THREDDS catalog they can be downloaded.  We hope that the work in esg-orp.git will allow us to close this.
>
> A major bottleneck for us is the time it takes to make AttributeService requests to PCMDI.  We are putting in place a caching AuthorizationService that will reduce AttributeService callouts and should make downloads quicker for both MOHC and IPSL data.  We are also getting end-user configured GridFTP ready for production so that users with large data requirements can start using that.
>
> So lots is happening and I embrace a competitive spirit amongst datanodes and gateways to get this right.  
>
> And a quick query to Sébastien
>
>   
>> > replication is the gateways priority. My priority is to have happy 
>> > users. And I know they want OpenDap. CORDEX simulations are running now 
>> > and they need OpenDap to subset their download.
>>     
>
> Are your users happy with their access to data from the USA?  Are USA scientists happy with their access to IPSL data?  To be honest we know BADC has a particular problem with bandwidth but I'd be surprised if replication wasn't going to help these users.
>
> Cheers,
> Stephen.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> Centre of Environmental Data Archival
> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>
>
> -----Original Message-----
> From: Sébastien Denvil [mailto:sebastien.denvil at ipsl.jussieu.fr] 
> Sent: 13 October 2011 10:07
> To: Estanislao Gonzalez
> Cc: muhammad.atif at anu.edu.au; Eric Nienhouse; Cinquini, Luca (3880); Pascoe, Stephen (STFC,RAL,RALSP); Neill Miller; esg-gateway-dev at earthsystemgrid.org; esg-node-dev at lists.llnl.gov
> Subject: Re: [esg-node-dev] RE: [esg-gateway-dev] NCI OpenIDs not working at PCMDI Gateway
>
>   Hi all, Estani,
>
> just a small comment below:
>
> On 13/10/2011 10:12, Estanislao Gonzalez wrote:
>   
>> > Hi Muhammad,
>> >
>> > It looks great!
>> >
>> > And Commenting Sébastien remarks. I do agree on OpeNDAP... but the 
>> > gateways are incapable of mimicking the p2p way of securing the 
>> > aggregations, is not something the data node admins should really 
>> > prioritize at the moment (at least not until it works). this is how I 
>> > see it:
>> >
>> > Basic:
>> > -DRS structure in both id and urls (this includes: versioning and 
>> > maintaining url/catalog version coherency, more to that later)
>> > -PKI
>> > -Both HTTP and GridFTP server access (BDM gives bonus points, but you 
>> > don't need to publish those endpoints in the catalog anyways  :-) 
>> > -checksums
>> >
>> > extra:
>> > -OpeNDAP Access (which can be broken for aggregations, since there's 
>> > no solution to that at the moment
>> > -LAS (I have never seen an installation besides the "demo" one with 
>> > this, so it can't be a requirement really, not at the moment)
>>     
>
> is that a demo?
> http://esg-datanode.jpl.nasa.gov/thredds/esgcet/1/obs4MIPs.NASA-JPL.AIRS.mon.v1.html?dataset=obs4MIPs.NASA-JPL.AIRS.mon.husNobs.1.aggregation.1
> http://esg-datanode.jpl.nasa.gov/las/getUI.do?catid=893EB2D5C79AD40EE2436A3F118649CE_ns_obs4MIPs.NASA-JPL.AIRS.mon.husNobs.1.aggregation.1
>
> It looks pretty mature.
>
>   
>> >
>> > why OpeNDAP as an extra? Because at this time, replication is a 
>> > priority. You don't want the whole world to get to your OpenDAP 
>> > server, it would be advisable to get some replicas in place before that.
>> >
>>     
>
>
> replication is the gateways priority. My priority is to have happy 
> users. And I know they want OpenDap. CORDEX simulations are running now 
> and they need OpenDap to subset their download.
>
> I don't mind the all world getting to my OpenDAP. We will boost the VM 
> as needed to sustain what it takes but OpenDap doesn't consume that much 
> resources and it save network bandwidth so it's not a bad deal.
>
> cheers.
> Sébastien
>
>
>   
>> > Anyway, I cast my 5 star vote and will use NCI node as an example.  :-)  
>> > Well done Muhammad, really.
>> >
>> > Just to show how another node might see, and I won't do this again 
>> > anyother time soon but I think it's require to value a pristine node 
>> > more, let's take noaa-gfdl (a middle class one :-):
>> >
>> > esgdata.gfdl.noaa.gov
>> > - No entry in the wiki page, so no admin to contact.
>> > - datasets with mixed cases:
>> > cmip5.output1.NOAA-GFDL.GFDL-HIRAM-C180.sst2090.mon.atmos.Amon.r3i1p2.v1/     
>> >       -
>> > cmip5.output1.noaa-gfdl.gfdl-hiram-c180.amip.mon.atmos.Amon.r1i1p1.v1/
>> >
>> > - dataset version and directory version mismatch and half-DRS 
>> > structure (this has version 1 in the catalogs):
>> > thredds/fileServer/gfdl_dataroot/NOAA-GFDL/GFDL-HIRAM-C180/amip/fx/atmos/fx/r0i0p0/v20110601/areacella/areacella_fx_GFDL-HIRAM-C180_amip_r0i0p0.nc 
>> >
>> >
>> > - Only HTTPServer access points
>> > - self-signed certificate containing "Globus-Test"
>> > - ORP redirecting to a different machine name (probably same machine, 
>> > but still misconfigured)
>> > - White-list is wrong or incomplete
>> > - because of the above PKI is not working
>> > - They do have checksums and that is really good.
>> >
>> > So that's a pretty standard data node which makes replication much 
>> > more difficult, if not impossible.
>> >
>> > My 2c anyway,
>> > Estani
>> >
>> > Am 13.10.2011 02:40, schrieb Muhammad Atif:
>>     
>>> >> On 13/10/11 02:50, Estanislao Gonzalez wrote:
>>>       
>>>> >>> By the way Muhammad, could you clean the datanode? There are a lot 
>>>> >>> of "unlinked" catalogs:
>>>> >>> http://esgnode1.nci.org.au/thredds/esgcet/3/cmip5.output1.CSIRO-QCCCE.CSIRO-Mk3-6-0.historicalGHG.day.ocean.day.r4i1p1.v20110802.html 
>>>> >>>
>>>> >>> That are returning just 404... I think there's an option for this in 
>>>> >>> the publisher (delete-orphans, or something) or was that intended 
>>>> >>> for something else Bob?
>>>> >>>
>>>> >>> But besides that, your data node looks pristine... version, 
>>>> >>> checksum, DRS conform directory structures... even a working GridFTP!!
>>>> >>> We should start a 5 star data node "quality meter" for data nodes 
>>>> >>> installations... you'll get a 4,5 (clean the 404 up and I'll cast my 
>>>> >>> 5 star vote ;-)... I think the rest of us starts from 4 and goes 
>>>> >>> downwards.... But I might be wrong, apologies for any other pristine 
>>>> >>> data node out there... if any.
>>>>         
>>> >>
>>> >> Anything to get 5 stars from you Estani. All done.  :) 
>>> >>
>>> >> I manually removed the entries from catalog.xml in thredds.
>>> >>
>>> >> Regards,
>>> >>
>>>       
>> >
>> >
>>     
>
>
> -- Sébastien Denvil IPSL, Pôle de modélisation du climat UPMC, Case 
> 101, 4 place Jussieu, 75252 Paris Cedex 5 Tour 45-55 2ème étage Bureau 
> 209 Tel: 33 1 44 27 21 10 Fax: 33 1 44 27 39 02
> -- Scanned by iCritical.



More information about the GO-ESSP-TECH mailing list