[Go-essp-tech] ESG Federation Priotities - Was: NCI OpenIDs not working at PCMDI Gateway

philip.kershaw at stfc.ac.uk philip.kershaw at stfc.ac.uk
Mon Oct 17 07:20:34 MDT 2011


Hi Estani,

Yes, from what you say it fits the same use case.

Cheers,
Phil

On 17/10/2011 13:36, "Estanislao Gonzalez" <gonzalez at dkrz.de> wrote:

>Hi,
>
>I don't want to steal the thread, but I'm very curious about this.
>As you might know I was/will be working in a stager, to stage files from
>tape. Well, the latest idea crossing my mind was about staging pretty
>much the whole federation. So you could set a "stage" node that caches
>data being accessed (pretty much like a proxy). In order for that to
>work, security will also need to be "cached", with that I mean the
>security procedure for getting the original file must be stored and used
>for securing the cached one (a simple and yet powerful procedure would
>be to retrieve the the first byte of the original file, if the user can
>do that, he/she can get the whole cached file).
>
>The ORP and some delegation are required for this to work. But I
>couldn't find the time to give this a more thorough though...
>Am I mistaken to think this is similar to the problematic of securing
>the LAS server?
>
>I would like to know how plausible and interesting this scenario look to
>you all (just to see if it's worth the while, or I just keep staging
>local data)
>
>Thanks,
>Estani
>
>Am 17.10.2011 14:07, schrieb Cinquini, Luca (3880):
>> Hi Phil,
>>          we are not using static server certs at the moment, only a
>>list of IPs from the registry.
>>
>> I remember your talk at the last go-essp meeting, and it would
>>certainly be good to move in that direction, it's just a question of
>>priorities.
>> In any case, the very first step should be to secure the LAS UI, i.e.
>>to redirect to the ORP, so that the user is authenticated, am I right ?
>> Then, once the user is authenticated, obtain a delegated credential to
>>access opendap services.
>>
>> thanks, Luca
>>
>> On Oct 17, 2011, at 1:09 AM,<philip.kershaw at stfc.ac.uk>  wrote:
>>
>>> Hi Luca,
>>>
>>> I think you're saying then that the LAS - OPeNDAP connections are
>>>secured
>>> with IP restrictions.  I recall an initial solution was to use static
>>> server certificates.  Did this get deployed or are there any plans to
>>> develop your current system further?
>>>
>>> For the MashMyData project here, we extended ESGF security to enable
>>>user
>>> delegation for secured workflows: portal to WPS to OPeNDAP service.
>>>You
>>> could do it in the above to get a LAS instance to use a delegated
>>> credential to access a secured OPeNDAP service.  We are using this
>>> approach on a couple of projects here.
>>>
>>> Cheers,
>>> Phil
>>>
>>> On 16/10/2011 14:38, "Cinquini, Luca
>>>(3880)"<Luca.Cinquini at jpl.nasa.gov>
>>> wrote:
>>>
>>>> Hi Eric:
>>>>
>>>> On Oct 14, 2011, at 9:32 AM, Eric Nienhouse wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Our NCI OpenID thread was getting rather off topic, so I've started a
>>>>> new one.
>>>>>
>>>>> Good to hear the NCI OpenID issue has been resolved and that the NCI
>>>>> node has received a number of accolades for quality service  :-)
>>>>>
>>>>> I'd like to continue discussing federation priorities, development
>>>>> efforts and  replication.  Thanks Stephen and Gavin for summarizing a
>>>>> number of efforts in support of data access across the federation,
>>>>> including securing OpenDAP, LAS Product Services and replication.
>>>>>
>>>>> It is most important we all stay focused on interoperability, system
>>>>> interfaces and specifications as we move forward.  I believe this is
>>>>> especially critical now as many federation efforts are at high
>>>>>activity
>>>>> level.
>>>>>
>>>>> It's obvious the success and stability of the production ESGF system
>>>>> serving a large user base is critical as many users are preparing for
>>>>> near term scientific reporting deadlines.  Note, fed wide, we have
>>>>>~25K
>>>>> users, many of whom are active CMIP5 researchers.  Published dataset
>>>>> volume and user downloads are rapidly increasing.
>>>>>
>>>>> To this end I have a the questions/comments below.
>>>>>
>>>>> Regards to all,
>>>>>
>>>>> -Eric
>>>>>>> It get's about people being able to download from multiple sites at
>>>>>>> the
>>>>>>> same time, and specially from a local one.
>>>>>>> That's pretty much what is happening at IPSL, AFAIK you are indeed
>>>>>>> replicating data internally so scientist can get to them much
>>>>>>>faster.
>>>>>>> That's the whole idea of replication.
>>>>>>>
>>>>>>>
>>>>>> There is a replication mechanism in the works - are you volunteering
>>>>>> to get this bit of work completed?
>>>>> Gavin: A couple of questions about this replication mechanism in the
>>>>> works regarding interoperability:
>>>>>
>>>>> Will this work have impact on the Thredds catalog representation of
>>>>> replica datasets?  Are you anticipating any changes to the replica
>>>>> publication workflow?  I ask as we're working on search scalability,
>>>>> metadata transfer and replicas.
>>>>>
>>>>>> LAS is fully installed and integrated into the ESGF P2P Node.
>>>>>> As Sebastien noted with the LAS URLs this task has been done.
>>>>>>
>>>>>> If you install your ESGF P2P Node with --type compute you will get
>>>>>> this configured and installed and you too can provide LAS
>>>>>> functionality. :-)  Try it out :-)
>>>>> Indeed LAS Product Service integration is getting uptake, which is
>>>>>good
>>>>> to see.  We're publishing NCAR CMIP5 datasets with LAS endpoints into
>>>>> the Gateway 1.3.3 snapshot for pre-release testing.  LAS is a great
>>>>> service for visualization and data subset and download.
>>>>>
>>>>> One concern here at NCAR relates to securing LAS access to CMIP5
>>>>> datasets in our production data node.  My understanding is that LAS
>>>>> services are not yet under access control in the (compute) node.
>>>>>
>>>>> Is this correct?  If so, what are the plans for securing this
>>>>>service?
>>>>> Is the intention to utilize the OpenDAP security mechanism for doing
>>>>>so?
>>>> correct - right now, LAS is granted access to the opendap endpoints
>>>>via
>>>> the IP filter. At some point,
>>>> we started working with PMEL to enable the LAS UI to be able to
>>>>redirect
>>>> to the ORP, in case the user is not authenticated already,
>>>> but that work was never completed. We can talk about picking it up at
>>>>one
>>>> of our upcoming conferences.
>>>>
>>>> thanks, Luca
>>>>
>>>>> Thanks for any details you can provide.
>>>>>
>>>>> On 13/10/2011 13:02, stephen.pascoe at stfc.ac.uk wrote:
>>>>>> Sébastien and all,
>>>>>>
>>>>>> I agree getting all those services in place at one time is the
>>>>>>target.
>>>>>> It is challenging that different parts of the federation have
>>>>>> priorities and it's hard work to keep all the different parts in
>>>>>>sync.
>>>>>> Some of us need OPeNDAP straight away, some need CIM metadata, some
>>>>>> need GridFTP and checksums (for replication), some want
>>>>>>visualisation
>>>>>> (LAS).  All I can do now is mention a few areas where we are making
>>>>>> progress.
>>>>>>
>>>>>> OPeNDAP.  I know our OPeNDAP security is broken at present but we've
>>>>>> just spent some contractor time figuring out the problem which we
>>>>>>have
>>>>>> just pushed to esg-orp.git's devel branch.  This turns several hacks
>>>>>> that make OPeNDAP work into configurable options.
>>>>>>
>>>>>> We have also contributed the TDS security testing tool in
>>>>>> esg-contrib.git.  Some initial tests show that JPL is the one place
>>>>>> where OPeNDAP is working and correctly secured.  At NCI the OPENDAP
>>>>>> aggregations weren't accessible for datasets where the NetCDF was.
>>>>>> Unless you are using the latest esg-orp filters it is likely the
>>>>>> OPeNDAP URLs are not correctly secured.  There is also a loophole
>>>>>>where
>>>>>> if NetCDF files are in a threeds_dataset_root but not explicitly
>>>>>> restricted in a THREDDS catalog they can be downloaded.  We hope
>>>>>>that
>>>>>> the work in esg-orp.git will allow us to close this.
>>>>>>
>>>>>> A major bottleneck for us is the time it takes to make
>>>>>> AttributeService requests to PCMDI.  We are putting in place a
>>>>>>caching
>>>>>> AuthorizationService that will reduce AttributeService callouts and
>>>>>> should make downloads quicker for both MOHC and IPSL data.  We are
>>>>>>also
>>>>>> getting end-user configured GridFTP ready for production so that
>>>>>>users
>>>>>> with large data requirements can start using that.
>>>>>>
>>>>>> So lots is happening and I embrace a competitive spirit amongst
>>>>>> datanodes and gateways to get this right.
>>>>>>
>>>>>> And a quick query to Sébastien
>>>>>>
>>>>>>
>>>>>>>> replication is the gateways priority. My priority is to have happy
>>>>>>>> users. And I know they want OpenDap. CORDEX simulations are
>>>>>>>>running
>>>>>>>> now
>>>>>>>> and they need OpenDap to subset their download.
>>>>>> Are your users happy with their access to data from the USA?  Are
>>>>>>USA
>>>>>> scientists happy with their access to IPSL data?  To be honest we
>>>>>>know
>>>>>> BADC has a particular problem with bandwidth but I'd be surprised if
>>>>>> replication wasn't going to help these users.
>>>>>>
>>>>>> Cheers,
>>>>>> Stephen.
>>>>>>
>>>>>> ---
>>>>>> Stephen Pascoe  +44 (0)1235 445980
>>>>>> Centre of Environmental Data Archival
>>>>>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11
>>>>>>0QX,
>>>>>> UK
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Sébastien Denvil [mailto:sebastien.denvil at ipsl.jussieu.fr]
>>>>>> Sent: 13 October 2011 10:07
>>>>>> To: Estanislao Gonzalez
>>>>>> Cc: muhammad.atif at anu.edu.au; Eric Nienhouse; Cinquini, Luca (3880);
>>>>>> Pascoe, Stephen (STFC,RAL,RALSP); Neill Miller;
>>>>>> esg-gateway-dev at earthsystemgrid.org; esg-node-dev at lists.llnl.gov
>>>>>> Subject: Re: [esg-node-dev] RE: [esg-gateway-dev] NCI OpenIDs not
>>>>>> working at PCMDI Gateway
>>>>>>
>>>>>> Hi all, Estani,
>>>>>>
>>>>>> just a small comment below:
>>>>>>
>>>>>> On 13/10/2011 10:12, Estanislao Gonzalez wrote:
>>>>>>
>>>>>>>> Hi Muhammad,
>>>>>>>>
>>>>>>>> It looks great!
>>>>>>>>
>>>>>>>> And Commenting Sébastien remarks. I do agree on OpeNDAP... but the
>>>>>>>> gateways are incapable of mimicking the p2p way of securing the
>>>>>>>> aggregations, is not something the data node admins should really
>>>>>>>> prioritize at the moment (at least not until it works). this is
>>>>>>>>how I
>>>>>>>> see it:
>>>>>>>>
>>>>>>>> Basic:
>>>>>>>> -DRS structure in both id and urls (this includes: versioning and
>>>>>>>> maintaining url/catalog version coherency, more to that later)
>>>>>>>> -PKI
>>>>>>>> -Both HTTP and GridFTP server access (BDM gives bonus points, but
>>>>>>>>you
>>>>>>>> don't need to publish those endpoints in the catalog anyways  :-)
>>>>>>>> -checksums
>>>>>>>>
>>>>>>>> extra:
>>>>>>>> -OpeNDAP Access (which can be broken for aggregations, since
>>>>>>>>there's
>>>>>>>> no solution to that at the moment
>>>>>>>> -LAS (I have never seen an installation besides the "demo" one
>>>>>>>>with
>>>>>>>> this, so it can't be a requirement really, not at the moment)
>>>>>> is that a demo?
>>>>>>
>>>>>> 
>>>>>>http://esg-datanode.jpl.nasa.gov/thredds/esgcet/1/obs4MIPs.NASA-JPL.A
>>>>>>IRS
>>>>>> 
>>>>>>.mon.v1.html?dataset=obs4MIPs.NASA-JPL.AIRS.mon.husNobs.1.aggregation
>>>>>>.1
>>>>>>
>>>>>> 
>>>>>>http://esg-datanode.jpl.nasa.gov/las/getUI.do?catid=893EB2D5C79AD40EE
>>>>>>243
>>>>>> 6A3F118649CE_ns_obs4MIPs.NASA-JPL.AIRS.mon.husNobs.1.aggregation.1
>>>>>>
>>>>>> It looks pretty mature.
>>>>>>
>>>>>>
>>>>>>>> why OpeNDAP as an extra? Because at this time, replication is a
>>>>>>>> priority. You don't want the whole world to get to your OpenDAP
>>>>>>>> server, it would be advisable to get some replicas in place before
>>>>>>>> that.
>>>>>>>>
>>>>>>
>>>>>> replication is the gateways priority. My priority is to have happy
>>>>>> users. And I know they want OpenDap. CORDEX simulations are running
>>>>>>now
>>>>>> and they need OpenDap to subset their download.
>>>>>>
>>>>>> I don't mind the all world getting to my OpenDAP. We will boost the
>>>>>>VM
>>>>>> as needed to sustain what it takes but OpenDap doesn't consume that
>>>>>> much
>>>>>> resources and it save network bandwidth so it's not a bad deal.
>>>>>>
>>>>>> cheers.
>>>>>> Sébastien
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> Anyway, I cast my 5 star vote and will use NCI node as an example.
>>>>>>>> :-)
>>>>>>>> Well done Muhammad, really.
>>>>>>>>
>>>>>>>> Just to show how another node might see, and I won't do this again
>>>>>>>> anyother time soon but I think it's require to value a pristine
>>>>>>>>node
>>>>>>>> more, let's take noaa-gfdl (a middle class one :-):
>>>>>>>>
>>>>>>>> esgdata.gfdl.noaa.gov
>>>>>>>> - No entry in the wiki page, so no admin to contact.
>>>>>>>> - datasets with mixed cases:
>>>>>>>>
>>>>>>>> 
>>>>>>>>cmip5.output1.NOAA-GFDL.GFDL-HIRAM-C180.sst2090.mon.atmos.Amon.r3i1
>>>>>>>>p2.
>>>>>>>> v1/
>>>>>>>>      -
>>>>>>>>
>>>>>>>> 
>>>>>>>>cmip5.output1.noaa-gfdl.gfdl-hiram-c180.amip.mon.atmos.Amon.r1i1p1.
>>>>>>>>v1/
>>>>>>>>
>>>>>>>> - dataset version and directory version mismatch and half-DRS
>>>>>>>> structure (this has version 1 in the catalogs):
>>>>>>>>
>>>>>>>> 
>>>>>>>>thredds/fileServer/gfdl_dataroot/NOAA-GFDL/GFDL-HIRAM-C180/amip/fx/
>>>>>>>>atm
>>>>>>>> 
>>>>>>>>os/fx/r0i0p0/v20110601/areacella/areacella_fx_GFDL-HIRAM-C180_amip_
>>>>>>>>r0i
>>>>>>>> 0p0.nc
>>>>>>>>
>>>>>>>>
>>>>>>>> - Only HTTPServer access points
>>>>>>>> - self-signed certificate containing "Globus-Test"
>>>>>>>> - ORP redirecting to a different machine name (probably same
>>>>>>>>machine,
>>>>>>>> but still misconfigured)
>>>>>>>> - White-list is wrong or incomplete
>>>>>>>> - because of the above PKI is not working
>>>>>>>> - They do have checksums and that is really good.
>>>>>>>>
>>>>>>>> So that's a pretty standard data node which makes replication much
>>>>>>>> more difficult, if not impossible.
>>>>>>>>
>>>>>>>> My 2c anyway,
>>>>>>>> Estani
>>>>>>>>
>>>>>>>> Am 13.10.2011 02:40, schrieb Muhammad Atif:
>>>>>>>>>> On 13/10/11 02:50, Estanislao Gonzalez wrote:
>>>>>>>>>>>> By the way Muhammad, could you clean the datanode? There are a
>>>>>>>>>>>> lot
>>>>>>>>>>>> of "unlinked" catalogs:
>>>>>>>>>>>>
>>>>>>>>>>>> 
>>>>>>>>>>>>http://esgnode1.nci.org.au/thredds/esgcet/3/cmip5.output1.CSIRO
>>>>>>>>>>>>-QC
>>>>>>>>>>>> 
>>>>>>>>>>>>CCE.CSIRO-Mk3-6-0.historicalGHG.day.ocean.day.r4i1p1.v20110802.
>>>>>>>>>>>>htm
>>>>>>>>>>>> l
>>>>>>>>>>>>
>>>>>>>>>>>> That are returning just 404... I think there's an option for
>>>>>>>>>>>> this in
>>>>>>>>>>>> the publisher (delete-orphans, or something) or was that
>>>>>>>>>>>>intended
>>>>>>>>>>>> for something else Bob?
>>>>>>>>>>>>
>>>>>>>>>>>> But besides that, your data node looks pristine... version,
>>>>>>>>>>>> checksum, DRS conform directory structures... even a working
>>>>>>>>>>>> GridFTP!!
>>>>>>>>>>>> We should start a 5 star data node "quality meter" for data
>>>>>>>>>>>>nodes
>>>>>>>>>>>> installations... you'll get a 4,5 (clean the 404 up and I'll
>>>>>>>>>>>> cast my
>>>>>>>>>>>> 5 star vote ;-)... I think the rest of us starts from 4 and
>>>>>>>>>>>>goes
>>>>>>>>>>>> downwards.... But I might be wrong, apologies for any other
>>>>>>>>>>>> pristine
>>>>>>>>>>>> data node out there... if any.
>>>>>>>>>> Anything to get 5 stars from you Estani. All done.  :)
>>>>>>>>>>
>>>>>>>>>> I manually removed the entries from catalog.xml in thredds.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> -- Sébastien Denvil IPSL, Pôle de modélisation du climat UPMC, Case
>>>>>> 101, 4 place Jussieu, 75252 Paris Cedex 5 Tour 45-55 2ème étage
>>>>>>Bureau
>>>>>> 209 Tel: 33 1 44 27 21 10 Fax: 33 1 44 27 39 02
>>>>>> -- Scanned by iCritical.
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>> --
>>> Scanned by iCritical.
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>
>
>-- 
>Estanislao Gonzalez
>
>Max-Planck-Institut für Meteorologie (MPI-M)
>Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>
>Phone:   +49 (40) 46 00 94-126
>E-Mail:  gonzalez at dkrz.de
>

-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list