[Go-essp-tech] Access control for data with different QC Level

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Tue Jul 20 09:51:46 MDT 2010


Hi Martin

On Tuesday 20 July 2010 14:58:08 Juckes, Martin (STFC,RAL,SSTD) wrote:
> If the un-replicated (and hence less quality controlled) data is to
> be less widely available, then I think we have to re-consider what
> gets replicated. In particular, the 3-hourly, 2d fields have been
> requested by TGICA for the impacts community (and when I mentioned
> this at a recent meeting with hydrologists they were indeed very
> keen on this data). The current definition of "replicated" excludes
> around 200Tb of 3 hourly data from the decadal projections.


Hmm. I don't know when that happened. Last time I looked it was in ...  
I certainly think it needs to be. I know a lot of folk who will be 
looking to use that.  Perhaps it's worth reminding me what data is not 
being replicated (of the requested). I had thought it was the ocean 3d 
fields + (can't remember, but didn't think it was the tgica data).

> It may be that the last point (which I hadn't noticed before) will
> force us to reconsider the replication issue. TGICA may well want to
> have the data that falls under their request included in the data
> which is migrated/tagged into the IPCC DDC: and this would mean that
> it all would have to be quality controlled all the way to level 3.

By hook or crook we will need this data to make it to level 3. 

Thanks for picking up on this.
Bryan


> 
> Regards,
> Martin
> 
> > -----Original Message-----
> > From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> > bounces at ucar.edu] On Behalf Of Bryan Lawrence
> > Sent: 20 July 2010 14:19
> > To: go-essp-tech at ucar.edu; Karl E.Taylor
> > Cc: Cinquini, Luca (3880)
> > Subject: Re: [Go-essp-tech] Access control for data with different
> > QC Level
> > 
> > Hi Martina
> > 
> > I think answering the political and technical in one shot might
> > help here, so I'm going to try. This was my understanding of all
> > the
> 
> various
> 
> > conversations we have had.
> > 
> > Karl, Balaji, Martin: can I trouble you to  read through. I'll
> > highlight
> > the bits where you need to pay attention!
> > 
> > > 1. My understanding was that at QC L1 the CMIP5 modelling
> > > centres,
> 
> at
> 
> > > QC L2 non-commercial researchers and at QC L3 every registered
> > > user can access the data.
> > 
> > The issue of commercial v non-commercial is a decision for the
> > modelling
> > centres. Given the met office is now allowing commercial, it may be
> > that
> > it has gone away. But that' licensing decision.
> > 
> > So functionally:
> > 
> > access_token_required (dataset) =
> > 
> >      f(qc_level(dataset), license_type(dataset))
> > 
> > Currently we expect PCMDI to allocate tokens to users.
> > 
> > I expect the following three classes of tokens:
> > 
> > unrestricted_use*
> > noncommercial_use
> > testing
> > 
> > (* unrestricted still requires citation, I'll get to what i mean by
> > citation below.)
> > 
> > And we need machinery to allocate access_token_required to specific
> > datasets.
> > 
> > The questions then become:
> > 
> > a) on what grounds and how does PCMDI allocate the tokens?
> > 
> > How should be easy, being part of the user management tooling. I
> > don't know what the state of generic ESG tooling for that is, but
> > we have this
> > sort of tooling available  as part of our normal infrastructure.
> > 
> > On what grounds is more interesting. I'll postpone that a moment.
> > 
> > b) when, where and how, do we set up the "table" which maps
> > access_token_required onto the dataset.
> > 
> > Step 1). we need to record qc information.
> > 
> >  - the plan is to build a tool for that in September, to be
> >  complete
> 
> by
> 
> > the end of September, and it will export CIM quality documents via
> > an atom feed. It will  be independent of the questionnaire, and
> > folks could
> > deploy it anywhere, even on a data node.
> > 
> >  (Martina: that can cover the DOI information too.)
> >  (I have someone in mind to do the work.)
> >  - we get qc level one for free (the data can't be published by a
> >  data
> > 
> > node without being qc level one).
> > - given that qc level 2 can only be dealt with at DKRZ, PCMDI and
> 
> BADC
> 
> > since it will apply to replicated data, then we only need to deploy
> 
> the
> 
> > tool there, and gateways will only need to harvest from three
> > places - q3 information can only be made available at DKRZ
> > 
> > Step 2) the qc information needs to be harvested.
> > - this is a gateway issue,  need only to harvest from the three
> > above, plus the QC one at DKRZ.
> > 
> >  - It needs to be mapped onto each replica.
> > 
> > Step 3) this information needs to propagate into the PDP (or
> > whatever we
> > are calling the policy decision point, I've lost track of the
> > names).
> > 
> > Karl, Balaji, Martin:
> > 
> > At this point, we should recognise that we have the ability to
> > discriminate  between
> > QC  L1
> > QC  L2 *only for replicants*
> > QC  L3 *only for replicants*
> > 
> > We will assign DOIs *only for replicants*.  (At least in the first
> > instance).
> > 
> > This brings me back to on what grounds should we allocate access
> > tokens,
> > and what licenses should be associated with them.
> > 
> > I thought we had agreed on something like (abbreviated, the exact
> > wording needs agreement as per Balaji's email):
> > 
> > testing: you can use this data to exercise this software, and
> > report issues with the data to the originator. you may not publish
> > science with
> > this data, without the express permission of the data originator.
> > 
> > unrestricted: you can do anything you like with the data but you
> > must include citations in publications.
> > 
> > non-commercial: there are some restrictions on use, and you must
> > include
> > citations in publications ...
> > 
> > Before continuing, this brings me to a point of disagreement with
> 
> Karl.
> 
> > Users *should* absolutely care about the distinction between
> 
> replicated
> 
> > and non-replicated data. It's a quality thing. They can use the qc
> > stuff
> > with more confidence.
> > 
> > However, as it stands, we can't give a DOI to output which is not
> > replicated, but people will need to use it. I *do* think it's ok to
> > restrict this to modellers (despite Martin's point about what PCMDI
> 
> are
> 
> > advertising). I think most of the non-modelling community will be
> 
> happy
> 
> > with the replicated data ...
> > 
> > ... and I think WGCM will buy that argument.
> > 
> > But for the modellers using the L1 data which cannot be qc'd, then
> > we need a form of words for an old style acknowledgement or a
> > citatoin into
> > the data equivalent of the "grey literature". (probably ok to give
> > a url.)
> > 
> > So, now the criteria.
> > 
> > testing should be given to modellers as required by the originators
> > the other two in the normal way by default to anyone for replicated
> > data, only for special people who sign up to hte restriction above
> > for the non-replicated data.
> > 
> > (nb: nothing in the above precludes downloading and using
> > replicated data from other than DKRZ, BADC, PCMDI ... if you have
> > the tokens),
> > 
> > So that's how I thought we'd agreed it all, but i concede it had
> > never been written down in one place.
> > 
> > Cheers
> > Bryan
> > 
> > > 2. Bryan please correct me: There is QC L1 as in 1. and after QC
> > > L2 and QC L3 all registered users have access to the core data.
> > > Maybe only non-commercial researchers are granted access to the
> > > non-core data.
> > > 
> > > This is more a political issue.
> > > 
> > > In either case the QC Level has to be communicated to the ESG.
> > > Luca suggests that the portal uses the AtomFeed of the
> > > questionnaire to harvest the QC Flag. And after QC L3 the DOI
> > > link as well. QC and DOI are informations on data, so the right
> > > place in metafor CIM would be the dataObject on the hierarchy
> > > level "DRS experiment". Which parts of CIM do you harvest?
> > > 
> > > My biggest question at the moment is how to deliver the QC
> > > information to CIM. For the DOI target page there are a few
> > > additional information pieces needed on citation and contact.
> > > Stephen suggested to type them into the questionnaire. This would
> > > slow the publication process down and is error-prone. We need an
> > > automated CIM update there. The metafor people were against that
> > > solution as well because the questionnaire is meant for an inital
> > > metadata ingest by the modeling centers. Bryan, how do we get the
> > > information in the questionnaire, so that it can be harvested by
> > > the ESG?
> > > Which would be the alternatives to the AtomFeed/questionnaire as
> > > harvesting source for the quality level and DOI information?
> > > 
> > > My second biggest question is where to put the information in the
> > > CIM. I sent my interpretation / suggestion to the metafor list,
> > > but it didn't start a discussion. Examples for a simulationRun
> > > object, on how the dataObjects are referenced and on how the
> > > dataObject hierarchies are built, would be of great help. Or
> > > metafor just defines how I should send the quality information
> > > to them.
> > > 
> > > I moved away from the technical issues, but to solve these things
> > > is the precondition for the technical solution in the ESG.
> > > 
> > > Thanks a lot,
> > > Martina
> > > 
> > > V. Balaji wrote:
> > > > I know we discussed this at the Princeton workshop. I didn't
> > > > register some of the implications then.
> > > > 
> > > > I agree that in a technical sense, yes a dataset is "available"
> > > > to registered users as soon as it is passed by the publisher.
> > > > (QCL1-D). At that point, however, it's incompletely
> > > > documented, so I'm not sure it can be declared fully
> > > > compliant.
> > > > 
> > > > My understanding is that while users are free to begin working
> 
> with
> 
> > > > the data, they can publish results from the data only when the
> > > > dataset is citable, which means it has undergone more rigorous
> > > > QC. What they downloaded before QC-L2 is certainly
> > > > use-at-your-own-risk because L2's when the "semantic QC" kicks
> > > > in. And without QC-L3 it isn't citable.
> > > > 
> > > > I think there is a pretty strong feeling that the modeling
> 
> centers'
> 
> > > > data were used too often without citation or acknowledgment
> > > > last time, which is what some of the more formal QC levels
> > > > this time, e.g DOIs tied to data publication, are trying to
> > > > avoid. Assuming the QC document is adopted by the WGCM, it
> > > > will be a requirement for downstream users to cite datasets.
> > > > 
> > > > So, QC-L1D data are "available" in the sense that the 1s and 0s
> 
> may
> 
> > > > be downloaded, but they're not licensed yet for "do whatever
> > > > you like with them"... perhaps?
> > > > 
> > > > It's pretty important that we come up with language that is
> > > > clear what one can and cannot do with data at various levels
> > > > of QC. I've talked with Karl and Ron and others about making
> > > > WGCM the
> 
> authority
> 
> > > > for this, wo whatever words we use have to be run by them.
> > > > 
> > > > Thanks,
> > > > 
> > > > Cinquini, Luca (3880) writes:
> > > >> Hi Estani,
> > > >> 
> > > >> 	I concur with what Eric said, and to iterate my 
understanding
> > 
> > is
> > 
> > > >> 	that as soon as the data is published with QCL1,
> > > >> 
> > > >> it will be available to registered users. Maybe Bob, Dean or
> > > >> Karl can comment if my understanding is correct or not.
> > > >> thanks, Luca
> > > >> 
> > > >> On Jul 19, 2010, at 2:52 PM, Eric Nienhouse wrote:
> > > >>> Hi All,
> > > >>> 
> > > >>> We've had a number of discussions on the topic of QC level
> > > >>> and data access.  However, I feel we don't yet have a formal
> > > >>> definition of the requirements relating to this area.
> > > >>> 
> > > >>> I believe it is important to clarify and define the following
> 
> two
> 
> > > >>> QC related areas:
> > > >>> 
> > > >>> 1)  Who is the authoritative source of the QC level and how
> > > >>> this information is propagated through the system?
> > > >>> 
> > > >>> 2)  How does QC level apply to data access policy (eg. access
> > > >>> control)?
> > > >>> 
> > > >>> I would propose discussing this as a future GO-ESSP telco
> > > >>> agenda topic, with the intention we document the outcome.
> > > >>> 
> > > >>> Perhaps we can discuss this further via email and work
> > > >>> towards capturing the system requirements and related
> > > >>> policies in the meanwhile.
> > > >>> 
> > > >>> Please note that there are plans to expose the QC Level
> > > >>> within the Gateway UI once the data flow is identified. 
> > > >>> However, data access control is based upon the group (eg.
> > > >>> role) auth-z attribute (such as "CMIP5 Research") and does
> > > >>> not currently rely on the QC Level explicitly.
> > > >>> 
> > > >>> Thanks,
> > > >>> 
> > > >>> -Eric
> > > >>> 
> > > >>> Estanislao Gonzalez wrote:
> > > >>>> Hi Luca,
> > > >>>> 
> > > >>>> to sum things up (and correct me Martina/Bryan if I'm
> > > >>>> wrong):
> > > >>>> 
> > > >>>> 1) Published data have QC L1-Data "per se",  and will be
> > > >>>> available to a very selected group only (which doesn't seem
> > > >>>> to be the group you mention, but I might be wrong).
> > > >>>> 2) When acquiring QC L2 the data should be accessible to a
> > > >>>> broader although still confined group. This check will be
> > > >>>> performed by DKRZ and BADC and the information stored
> > > >>>> somewhere (not sure where though). Where BADC nor DKRZ have
> > > >>>> access to all data-nodes, so the information will be
> > > >>>> definitely be stored on some "neutral grounds" (CIM DB?).
> > > >>>> 3) QC L3 == DOI acquired == publication. At this stage data
> > > >>>> will be available to any registered user.
> > > >>>> 
> > > >>>> If I'm correct, then the security service must check
> > > >>>> "somehow" the QC level of the file in order to proceed with
> > > >>>> the authorization as it is currently implemented (thus
> > > >>>> comparing roles).
> > > >>>> 
> > > >>>> Any comments anyone?
> > > >>>> 
> > > >>>> Thanks,
> > > >>>> Estani
> > > >>>> 
> > > >>>> Cinquini, Luca (3880) wrote:
> > > >>>>> Hi Bryan, Martina,
> > > >>>>> I agree that these issues need to be discussed better, but
> 
> here
> 
> > > >>>>> are some considerations, which may in some cases only
> > > >>>>> reflect my understanding:
> > > >>>>> 
> > > >>>>> 1) we talked about the QC flag for Levels 2 and 3 to be set
> > > >>>>> in the metaphor questionnaire, and be propagated through
> > > >>>>> the atom feed to the gateways
> > > >>>>> 
> > > >>>>> 2) I thought that in order not to delay data distribution,
> > > >>>>> as soon as the data has QC level 1 (I.e. It has been
> > > >>>>> processed by the publisher), it will available to
> > > >>>>> registered users of the CMIP5 research and commercial
> > > >>>>> groups
> > > >>>>> 
> > > >>>>> 3) At this time there is nothing in the ESG access control
> > > >>>>> model that toes the access attributes to the QC flags.
> > > >>>>> 
> > > >>>>> Thanks, luca
> > > >>>>> 
> > > >>>>> On Jul 19, 2010, at 7:39 AM, Bryan Lawrence
> > 
> > <bryan.lawrence at stfc.ac.uk> wrote:
> > > >>>>>> Hi Martina
> > > >>>>>> 
> > > >>>>>> We definitely need to formalise some of this, so thanks
> > > >>>>>> for bringing it up.
> > > >>>>>> 
> > > >>>>>> What I had thought we were proposing was that L2 and L3
> > > >>>>>> data have effectively the same restrictions ...
> > > >>>>>> 
> > > >>>>>> ... but your fundamental point (I think) is how do we
> > > >>>>>> assign the QC, and how does the security software get
> > > >>>>>> that information? Ie what is the workflow that needs to
> > > >>>>>> exist. We do need to bottom that out.
> > > >>>>>> 
> > > >>>>>> Thanks
> > > >>>>>> Bryan
> > > >>>>>> 
> > > >>>>>> On Monday 19 July 2010 13:43:59 Martina Stockhause wrote:
> > > >>>>>>> Hi all,
> > > >>>>>>> 
> > > >>>>>>> I had a little discussion with Estani about how the
> 
> different
> 
> > > >>>>>>> and changing access constraints on the data depending on
> > > >>>>>>> their QC levels are realized. It came out that we don't
> > > >>>>>>> really know.
> > > >>>>>>> 
> > > >>>>>>> We have on the one hand the user with a special role e.g.
> > > >>>>>>> "scientific, non-commercial user", who has access to data
> > > >>>>>>> on QC L3 like every registered user and QC L2 because of
> > > >>>>>>> his role. On the other hand, the data has a quality
> > > >>>>>>> attribute (QC Level or QC Flag), which defines the
> > > >>>>>>> access restriction of the data. For data access a
> > > >>>>>>> mechanism has to check user role and data attribute,
> > > >>>>>>> before access is granted or denied.
> > > >>>>>>> 
> > > >>>>>>> How does the data get this quality attribute?
> > > >>>>>>> How is the user role checked against this quality
> > > >>>>>>> attribute?
> > > >>>>>>> 
> > > >>>>>>> For QC L3 we don't need that mechanism, because every
> > > >>>>>>> registered user has access to all CMIP5 data, but for QC
> > > >>>>>>> L1 and L2 exist such access restrictions.
> > > >>>>>>> 
> > > >>>>>>> Thanks a lot,
> > > >>>>>>> Martina
> > > >>>>>>> 
> > > >>>>>>> 
> > > >>>>>>> _______________________________________________
> > > >>>>>>> GO-ESSP-TECH mailing list
> > > >>>>>>> GO-ESSP-TECH at ucar.edu
> > > >>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > > >>>>>> 
> > > >>>>>> --
> > > >>>>>> Bryan Lawrence
> > > >>>>>> Director of Environmental Archival and Associated Research
> > > >>>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> > > >>>>>> STFC, Rutherford Appleton Laboratory
> > > >>>>>> Phone +44 1235 445012; Fax ... 5848;
> > > >>>>>> Web: home.badc.rl.ac.uk/lawrence
> > > >>>>>> _______________________________________________
> > > >>>>>> GO-ESSP-TECH mailing list
> > > >>>>>> GO-ESSP-TECH at ucar.edu
> > > >>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > > >>>>> 
> > > >>>>> _______________________________________________
> > > >>>>> GO-ESSP-TECH mailing list
> > > >>>>> GO-ESSP-TECH at ucar.edu
> > > >>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > > >> 
> > > >> _______________________________________________
> > > >> GO-ESSP-TECH mailing list
> > > >> GO-ESSP-TECH at ucar.edu
> > > >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > 
> > --
> > Bryan Lawrence
> > Director of Environmental Archival and Associated Research
> > (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> > STFC, Rutherford Appleton Laboratory
> > Phone +44 1235 445012; Fax ... 5848;
> > Web: home.badc.rl.ac.uk/lawrence
> > _______________________________________________
> > GO-ESSP-TECH mailing list
> > GO-ESSP-TECH at ucar.edu
> > http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-- 
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list