[Go-essp-tech] Access control for data with different QC Level

Tue Jul 20 07:58:08 MDT 2010

If the un-replicated (and hence less quality controlled) data is to be
less widely available, then I think we have to re-consider what gets
replicated. In particular, the 3-hourly, 2d fields have been requested
by TGICA for the impacts community (and when I mentioned this at a
recent meeting with hydrologists they were indeed very keen on this
data). The current definition of "replicated" excludes around 200Tb of 3
hourly data from the decadal projections.

It may be that the last point (which I hadn't noticed before) will force
us to reconsider the replication issue. TGICA may well want to have the
data that falls under their request included in the data which is
migrated/tagged into the IPCC DDC: and this would mean that it all would
have to be quality controlled all the way to level 3.

Regards,
Martin

> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> bounces at ucar.edu] On Behalf Of Bryan Lawrence
> Sent: 20 July 2010 14:19
> To: go-essp-tech at ucar.edu; Karl E.Taylor
> Cc: Cinquini, Luca (3880)
> Subject: Re: [Go-essp-tech] Access control for data with different QC
> Level
> 
> Hi Martina
> 
> I think answering the political and technical in one shot might help
> here, so I'm going to try. This was my understanding of all the
various
> conversations we have had.
> 
> Karl, Balaji, Martin: can I trouble you to  read through. I'll
> highlight
> the bits where you need to pay attention!
> 
> > 1. My understanding was that at QC L1 the CMIP5 modelling centres,
at
> > QC L2 non-commercial researchers and at QC L3 every registered user
> > can access the data.
> 
> The issue of commercial v non-commercial is a decision for the
> modelling
> centres. Given the met office is now allowing commercial, it may be
> that
> it has gone away. But that' licensing decision.
> 
> So functionally:
> 
> access_token_required (dataset) =
>      f(qc_level(dataset), license_type(dataset))
> 
> Currently we expect PCMDI to allocate tokens to users.
> 
> I expect the following three classes of tokens:
> 
> unrestricted_use*
> noncommercial_use
> testing
> 
> (* unrestricted still requires citation, I'll get to what i mean by
> citation below.)
> 
> And we need machinery to allocate access_token_required to specific
> datasets.
> 
> The questions then become:
> 
> a) on what grounds and how does PCMDI allocate the tokens?
> 
> How should be easy, being part of the user management tooling. I don't
> know what the state of generic ESG tooling for that is, but we have
> this
> sort of tooling available  as part of our normal infrastructure.
> 
> On what grounds is more interesting. I'll postpone that a moment.
> 
> b) when, where and how, do we set up the "table" which maps
> access_token_required onto the dataset.
> 
> Step 1). we need to record qc information.
>  - the plan is to build a tool for that in September, to be complete
by
> the end of September, and it will export CIM quality documents via an
> atom feed. It will  be independent of the questionnaire, and folks
> could
> deploy it anywhere, even on a data node.
>  (Martina: that can cover the DOI information too.)
>  (I have someone in mind to do the work.)
>  - we get qc level one for free (the data can't be published by a data
> node without being qc level one).
> - given that qc level 2 can only be dealt with at DKRZ, PCMDI and
BADC
> since it will apply to replicated data, then we only need to deploy
the
> tool there, and gateways will only need to harvest from three places
> - q3 information can only be made available at DKRZ
> 
> Step 2) the qc information needs to be harvested.
> - this is a gateway issue,  need only to harvest from the three above,
> plus the QC one at DKRZ.
>  - It needs to be mapped onto each replica.
> 
> Step 3) this information needs to propagate into the PDP (or whatever
> we
> are calling the policy decision point, I've lost track of the names).
> 
> Karl, Balaji, Martin:
> 
> At this point, we should recognise that we have the ability to
> discriminate  between
> QC  L1
> QC  L2 *only for replicants*
> QC  L3 *only for replicants*
> 
> We will assign DOIs *only for replicants*.  (At least in the first
> instance).
> 
> This brings me back to on what grounds should we allocate access
> tokens,
> and what licenses should be associated with them.
> 
> I thought we had agreed on something like (abbreviated, the exact
> wording needs agreement as per Balaji's email):
> 
> testing: you can use this data to exercise this software, and report
> issues with the data to the originator. you may not publish science
> with
> this data, without the express permission of the data originator.
> 
> unrestricted: you can do anything you like with the data but you must
> include citations in publications.
> 
> non-commercial: there are some restrictions on use, and you must
> include
> citations in publications ...
> 
> Before continuing, this brings me to a point of disagreement with
Karl.
> Users *should* absolutely care about the distinction between
replicated
> and non-replicated data. It's a quality thing. They can use the qc
> stuff
> with more confidence.
> 
> However, as it stands, we can't give a DOI to output which is not
> replicated, but people will need to use it. I *do* think it's ok to
> restrict this to modellers (despite Martin's point about what PCMDI
are
> advertising). I think most of the non-modelling community will be
happy
> with the replicated data ...
> 
> ... and I think WGCM will buy that argument.
> 
> But for the modellers using the L1 data which cannot be qc'd, then we
> need a form of words for an old style acknowledgement or a citatoin
> into
> the data equivalent of the "grey literature". (probably ok to give a
> url.)
> 
> So, now the criteria.
> 
> testing should be given to modellers as required by the originators
> the other two in the normal way by default to anyone for replicated
> data, only for special people who sign up to hte restriction above for
> the non-replicated data.
> 
> (nb: nothing in the above precludes downloading and using replicated
> data from other than DKRZ, BADC, PCMDI ... if you have the tokens),
> 
> So that's how I thought we'd agreed it all, but i concede it had never
> been written down in one place.
> 
> Cheers
> Bryan
> 
> 
> > 2. Bryan please correct me: There is QC L1 as in 1. and after QC L2
> > and QC L3 all registered users have access to the core data. Maybe
> > only non-commercial researchers are granted access to the non-core
> > data.
> >
> > This is more a political issue.
> >
> > In either case the QC Level has to be communicated to the ESG.
> > Luca suggests that the portal uses the AtomFeed of the questionnaire
> > to harvest the QC Flag. And after QC L3 the DOI link as well. QC and
> > DOI are informations on data, so the right place in metafor CIM
> > would be the dataObject on the hierarchy level "DRS experiment".
> > Which parts of CIM do you harvest?
> >
> > My biggest question at the moment is how to deliver the QC
> > information to CIM. For the DOI target page there are a few
> > additional information pieces needed on citation and contact.
> > Stephen suggested to type them into the questionnaire. This would
> > slow the publication process down and is error-prone. We need an
> > automated CIM update there. The metafor people were against that
> > solution as well because the questionnaire is meant for an inital
> > metadata ingest by the modeling centers. Bryan, how do we get the
> > information in the questionnaire, so that it can be harvested by the
> > ESG?
> > Which would be the alternatives to the AtomFeed/questionnaire as
> > harvesting source for the quality level and DOI information?
> >
> > My second biggest question is where to put the information in the
> > CIM. I sent my interpretation / suggestion to the metafor list, but
> > it didn't start a discussion. Examples for a simulationRun object,
> > on how the dataObjects are referenced and on how the dataObject
> > hierarchies are built, would be of great help. Or metafor just
> > defines how I should send the quality information to them.
> >
> > I moved away from the technical issues, but to solve these things is
> > the precondition for the technical solution in the ESG.
> >
> > Thanks a lot,
> > Martina
> >
> > V. Balaji wrote:
> > > I know we discussed this at the Princeton workshop. I didn't
> > > register some of the implications then.
> > >
> > > I agree that in a technical sense, yes a dataset is "available" to
> > > registered users as soon as it is passed by the publisher.
> > > (QCL1-D). At that point, however, it's incompletely documented, so
> > > I'm not sure it can be declared fully compliant.
> > >
> > > My understanding is that while users are free to begin working
with
> > > the data, they can publish results from the data only when the
> > > dataset is citable, which means it has undergone more rigorous QC.
> > > What they downloaded before QC-L2 is certainly
> > > use-at-your-own-risk because L2's when the "semantic QC" kicks in.
> > > And without QC-L3 it isn't citable.
> > >
> > > I think there is a pretty strong feeling that the modeling
centers'
> > > data were used too often without citation or acknowledgment last
> > > time, which is what some of the more formal QC levels this time,
> > > e.g DOIs tied to data publication, are trying to avoid. Assuming
> > > the QC document is adopted by the WGCM, it will be a requirement
> > > for downstream users to cite datasets.
> > >
> > > So, QC-L1D data are "available" in the sense that the 1s and 0s
may
> > > be downloaded, but they're not licensed yet for "do whatever you
> > > like with them"... perhaps?
> > >
> > > It's pretty important that we come up with language that is clear
> > > what one can and cannot do with data at various levels of QC. I've
> > > talked with Karl and Ron and others about making WGCM the
authority
> > > for this, wo whatever words we use have to be run by them.
> > >
> > > Thanks,
> > >
> > > Cinquini, Luca (3880) writes:
> > >> Hi Estani,
> > >>
> > >> 	I concur with what Eric said, and to iterate my understanding
> is
> > >> 	that as soon as the data is published with QCL1,
> > >>
> > >> it will be available to registered users. Maybe Bob, Dean or Karl
> > >> can comment if my understanding is correct or not. thanks, Luca
> > >>
> > >> On Jul 19, 2010, at 2:52 PM, Eric Nienhouse wrote:
> > >>> Hi All,
> > >>>
> > >>> We've had a number of discussions on the topic of QC level and
> > >>> data access.  However, I feel we don't yet have a formal
> > >>> definition of the requirements relating to this area.
> > >>>
> > >>> I believe it is important to clarify and define the following
two
> > >>> QC related areas:
> > >>>
> > >>> 1)  Who is the authoritative source of the QC level and how this
> > >>> information is propagated through the system?
> > >>>
> > >>> 2)  How does QC level apply to data access policy (eg. access
> > >>> control)?
> > >>>
> > >>> I would propose discussing this as a future GO-ESSP telco agenda
> > >>> topic, with the intention we document the outcome.
> > >>>
> > >>> Perhaps we can discuss this further via email and work towards
> > >>> capturing the system requirements and related policies in the
> > >>> meanwhile.
> > >>>
> > >>> Please note that there are plans to expose the QC Level within
> > >>> the Gateway UI once the data flow is identified.  However, data
> > >>> access control is based upon the group (eg. role) auth-z
> > >>> attribute (such as "CMIP5 Research") and does not currently rely
> > >>> on the QC Level explicitly.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> -Eric
> > >>>
> > >>> Estanislao Gonzalez wrote:
> > >>>> Hi Luca,
> > >>>>
> > >>>> to sum things up (and correct me Martina/Bryan if I'm wrong):
> > >>>>
> > >>>> 1) Published data have QC L1-Data "per se",  and will be
> > >>>> available to a very selected group only (which doesn't seem to
> > >>>> be the group you mention, but I might be wrong).
> > >>>> 2) When acquiring QC L2 the data should be accessible to a
> > >>>> broader although still confined group. This check will be
> > >>>> performed by DKRZ and BADC and the information stored somewhere
> > >>>> (not sure where though). Where BADC nor DKRZ have access to all
> > >>>> data-nodes, so the information will be definitely be stored on
> > >>>> some "neutral grounds" (CIM DB?). 3) QC L3 == DOI acquired ==
> > >>>> publication. At this stage data will be available to any
> > >>>> registered user.
> > >>>>
> > >>>> If I'm correct, then the security service must check "somehow"
> > >>>> the QC level of the file in order to proceed with the
> > >>>> authorization as it is currently implemented (thus comparing
> > >>>> roles).
> > >>>>
> > >>>> Any comments anyone?
> > >>>>
> > >>>> Thanks,
> > >>>> Estani
> > >>>>
> > >>>> Cinquini, Luca (3880) wrote:
> > >>>>> Hi Bryan, Martina,
> > >>>>> I agree that these issues need to be discussed better, but
here
> > >>>>> are some considerations, which may in some cases only reflect
> > >>>>> my understanding:
> > >>>>>
> > >>>>> 1) we talked about the QC flag for Levels 2 and 3 to be set in
> > >>>>> the metaphor questionnaire, and be propagated through the atom
> > >>>>> feed to the gateways
> > >>>>>
> > >>>>> 2) I thought that in order not to delay data distribution, as
> > >>>>> soon as the data has QC level 1 (I.e. It has been processed by
> > >>>>> the publisher), it will available to registered users of the
> > >>>>> CMIP5 research and commercial groups
> > >>>>>
> > >>>>> 3) At this time there is nothing in the ESG access control
> > >>>>> model that toes the access attributes to the QC flags.
> > >>>>>
> > >>>>> Thanks, luca
> > >>>>>
> > >>>>> On Jul 19, 2010, at 7:39 AM, Bryan Lawrence
> <bryan.lawrence at stfc.ac.uk> wrote:
> > >>>>>> Hi Martina
> > >>>>>>
> > >>>>>> We definitely need to formalise some of this, so thanks for
> > >>>>>> bringing it up.
> > >>>>>>
> > >>>>>> What I had thought we were proposing was that L2 and L3 data
> > >>>>>> have effectively the same restrictions ...
> > >>>>>>
> > >>>>>> ... but your fundamental point (I think) is how do we assign
> > >>>>>> the QC, and how does the security software get that
> > >>>>>> information? Ie what is the workflow that needs to exist. We
> > >>>>>> do need to bottom that out.
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>> Bryan
> > >>>>>>
> > >>>>>> On Monday 19 July 2010 13:43:59 Martina Stockhause wrote:
> > >>>>>>> Hi all,
> > >>>>>>>
> > >>>>>>> I had a little discussion with Estani about how the
different
> > >>>>>>> and changing access constraints on the data depending on
> > >>>>>>> their QC levels are realized. It came out that we don't
> > >>>>>>> really know.
> > >>>>>>>
> > >>>>>>> We have on the one hand the user with a special role e.g.
> > >>>>>>> "scientific, non-commercial user", who has access to data on
> > >>>>>>> QC L3 like every registered user and QC L2 because of his
> > >>>>>>> role. On the other hand, the data has a quality attribute
> > >>>>>>> (QC Level or QC Flag), which defines the access restriction
> > >>>>>>> of the data. For data access a mechanism has to check user
> > >>>>>>> role and data attribute, before access is granted or denied.
> > >>>>>>>
> > >>>>>>> How does the data get this quality attribute?
> > >>>>>>> How is the user role checked against this quality attribute?
> > >>>>>>>
> > >>>>>>> For QC L3 we don't need that mechanism, because every
> > >>>>>>> registered user has access to all CMIP5 data, but for QC L1
> > >>>>>>> and L2 exist such access restrictions.
> > >>>>>>>
> > >>>>>>> Thanks a lot,
> > >>>>>>> Martina
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> GO-ESSP-TECH mailing list
> > >>>>>>> GO-ESSP-TECH at ucar.edu
> > >>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > >>>>>>
> > >>>>>> --
> > >>>>>> Bryan Lawrence
> > >>>>>> Director of Environmental Archival and Associated Research
> > >>>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> > >>>>>> STFC, Rutherford Appleton Laboratory
> > >>>>>> Phone +44 1235 445012; Fax ... 5848;
> > >>>>>> Web: home.badc.rl.ac.uk/lawrence
> > >>>>>> _______________________________________________
> > >>>>>> GO-ESSP-TECH mailing list
> > >>>>>> GO-ESSP-TECH at ucar.edu
> > >>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> GO-ESSP-TECH mailing list
> > >>>>> GO-ESSP-TECH at ucar.edu
> > >>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > >>
> > >> _______________________________________________
> > >> GO-ESSP-TECH mailing list
> > >> GO-ESSP-TECH at ucar.edu
> > >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> 
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-- 
Scanned by iCritical.