[Go-essp-tech] Improving on token-based authorization for filedownload

Nathan Wilhelmi wilhelmi at ucar.edu
Wed Dec 2 12:47:58 MST 2009


One clarification: the wget scripts should only generate tokens if the 
particular datanode service is marked as requiring tokens. If the 
particular service is configured as not requiring tokens in the gateway 
it will not generate a token for the download when the script is generated.

-Nate

Luca Cinquini wrote:
> Hi Phil,
> 	it is still not clear to me wether you guys are going to install an  
> ESG Gateway or not... If not, there clearly will be some issues in  
> integrating the BADC data access architecture with the ESG Data Node.  
> Maybe it can all be resolved by proper use of Tomcat filters in front  
> of the Data Node... Coming to you specific questions:
>
> On Dec 2, 2009, at 8:00 AM, <philip.kershaw at stfc.ac.uk> <philip.kershaw at stfc.ac.uk 
>  > wrote:
>
>   
>> Hi Luca,
>>
>> Just wanted to check up on this: is there now a schedule for  
>> implementing a replacement for the token based authorization?  I'd  
>> like to raise a ticket for this on the GO-ESSP Trac so that we can  
>> keep a record.
>>     
> We don't have a timeline for this yet. Assuming we decide to do it, it  
> is quite a big task, I'm not sure it can be done before the opening of  
> the archive, considering everything else that needs to happen...
>   
>> I have concerns that interoperability is going to break for us with  
>> you.  Thinking through some possible scenarios:
>>
>> 1) We will have a data node here with the ESG software stack so for  
>> example a TDS protected with the token based system.  If we don't  
>> have a Gateway with the token based functionality does this render  
>> the token system on our data node useless? - If there is no token  
>> issuing functionality in our Gateway then no one can get tokens to  
>> access our TDS? ... Or can you apply for a token at some other ESG  
>> Gateway and get access to the Data Node here? i.e. are tokens  
>> transferable between Gateways/Data Nodes?
>>     
> Currently, the ESG filter in front of the data node will reject  
> requests that do not contain a token. Tokens must be generated by the  
> gateway where the data was published, so there are not shareable among  
> Gateways. If you want to use the ESG Data Node but not the  
> authorization tokens issued by the gateway, you may need to replace  
> the current token filter with some other mechanism...
>   
>> 2) We'll also have our own PyDAP and COWS services as part of our  
>> Data Node.  They're protected with OpenID and the certificate based  
>> wget access I outlined at GO-ESSP.  Can you see problems with other  
>> ESG Gateways referencing these services?  If they just reference the  
>> endpoints directly from a browser then I'm guessing it's OK: the  
>> OpenID sign in process would be initiated here when the given  
>> endpoint was accessed.
>>     
> Probably browser-based access will be ok, although I would need to  
> understand more about your system to be sure. Is  there a place where  
> your openid protection system is documented in excruciating detail ?  
> (well, maybe not excruciating but detailed enough ?). But I am afraid  
> that the wget scripts generated by the gateway will not work, since  
> they contain tokens...
> thanks, Luca
>   
>> Cheers,
>> Phil
>>
>>     
>>> -----Original Message-----
>>> From: Luca Cinquini [mailto:luca at ucar.edu]
>>> Sent: 17 November 2009 23:13
>>> To: Rachana Ananthakrishnan
>>> Cc: Alex Sim; Neill Miller; Kershaw, Philip (STFC,RAL,SSTD)
>>> Subject: Re: Improving on token-based authorization for filedownload
>>>
>>>
>>>
>>> On Nov 17, 2009, at 2:21 PM, Rachana Ananthakrishnan wrote:
>>>
>>>       
>>>>>> There has been explicit request to improve the token based
>>>>>> authorization. When you say "all other parts of the system", you
>>>>>> mean functionality not in this alpha release? What would be a
>>>>>> timeline that would work for the Gateway team, as in what
>>>>>>             
>>> release
>>>       
>>>>>> would you like to target this?
>>>>>>             
>>>>> There's many other things that I think would probably be a higher
>>>>> priority, for example today we talked a lot about supporting the
>>>>> DRS syntax, and there's off course versioning...
>>>>> It would probably be a PI level decision.
>>>>>           
>>>> I'll start a separate thread on this.
>>>>
>>>>         
>>>>>>> And I think the output from the Gateway should stll be a SAML
>>>>>>> assertion containing the individual URLs, not the dataset,
>>>>>>> because the GridFTP server does not know what dataset
>>>>>>>               
>>> the single
>>>       
>>>>>>> files belong to.
>>>>>>>               
>>>>>> Actually I was hoping we can do some wildcard tricks here. If the
>>>>>> Gateway returned an assertion about dataset/* then GridFTP will
>>>>>> simply do a wild card match. So if the request to Gateway
>>>>>>             
>>> had http://foo.bar:12345/dataset1/file1
>>>       
>>>>>> , then if the assertion can have
>>>>>>             
>>> http://foo.bar:12345/dataset1/*,
>>>       
>>>>>> then we could do some caching and save round trips to the
>>>>>> authorization service.
>>>>>>             
>>>>> The problem though is that there is no relation between the URL and
>>>>> a dataset identifier... Theoretically, files from different
>>>>> datasets can be contained in the same directory.
>>>>>
>>>>>           
>>>> Okay. I did not get that from your response on the previous thread
>>>> on this. The second proposal will require a remote round trip per
>>>> file and is not going to help performance in any way. I
>>>>         
>>> wonder if we
>>>       
>>>> can embed the dataset information in the URL and use that for
>>>> caching purposes. I understood from our discussion that it is
>>>> typical to download many files from a given dataset and trying to
>>>> optimize for that is useful - is that correct characterization?
>>>>         
>>> Yes, correct. The gridftp server could still make only one
>>> request to
>>> the gateway, asking for authorization to all files at once, and
>>> receive a single saml statement. I'm not sure if this would be too
>>> much data to transfer though, especially for very large
>>> number of files.
>>> BTW, are we sure that requesting authorization one file at a time
>>> really creates a large overhead, considering that the files to
>>> transfer are really pretty big ?
>>> Luca
>>>       
>>>> Thanks,
>>>> Rachana
>>>>
>>>>         
>>>>> Luca
>>>>>           
>>>>>>> The same argument applies to Proposal 3. So maybe there should
>>>>>>> probably be only two proposals, #1 and #2=#3
>>>>>>> (with #4 being a combination of the previous two).
>>>>>>>
>>>>>>>               
>>>>>> I'll collapse proposal 3 and 4. The attributes caching is not
>>>>>> useful given we are talking about caching only per
>>>>>>             
>>> connection and
>>>       
>>>>>> not across connections.
>>>>>>
>>>>>> Rachana
>>>>>>
>>>>>>             
>>>>>>> thanks, Luca
>>>>>>>
>>>>>>>
>>>>>>> On Nov 17, 2009, at 10:24 AM, Rachana Ananthakrishnan wrote:
>>>>>>>
>>>>>>>               
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Here is a write up with proposed solutions for
>>>>>>>>                 
>>> authorization of
>>>       
>>>>>>>> end user download of files from GridFTP server.
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>> http://www.ci.uchicago.edu/wiki/bin/view/ESGProject/EnhancedAu
>>>       
>> thorization
>>     
>>>>>>> I would appreciate a review and feedback from each of you on the
>>>>>>> proposal.
>>>>>>>
>>>>>>> Like mentioned there, reworking the wget based download (from
>>>>>>> TDS) is also in the works, but this docuemnt deals exclusively
>>>>>>> with the GridFTP support. This has been deemed critical and we
>>>>>>> are being asked for a solution on this in short order - so
>>>>>>> appreciate a quick turn around with comments.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Rachana
>>>>>>>               
>> --
>> Scanned by iCritical.
>>     
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>   




More information about the GO-ESSP-TECH mailing list