[Go-essp-tech] Access control for data with different QC Level

Tue Jul 20 10:13:26 MDT 2010

Hello,

Attached is an outline, based the "achive_size" spreadsheet Karl
produced,

Cheers,
Martin

> -----Original Message-----
> From: Bryan Lawrence [mailto:bryan.lawrence at stfc.ac.uk]
> Sent: 20 July 2010 16:52
> To: Juckes, Martin (STFC,RAL,SSTD)
> Cc: go-essp-tech at ucar.edu; Karl E.Taylor; Cinquini, Luca (3880)
> Subject: Re: [Go-essp-tech] Access control for data with different QC
> Level
> 
> Hi Martin
> 
> On Tuesday 20 July 2010 14:58:08 Juckes, Martin (STFC,RAL,SSTD) wrote:
> > If the un-replicated (and hence less quality controlled) data is to
> > be less widely available, then I think we have to re-consider what
> > gets replicated. In particular, the 3-hourly, 2d fields have been
> > requested by TGICA for the impacts community (and when I mentioned
> > this at a recent meeting with hydrologists they were indeed very
> > keen on this data). The current definition of "replicated" excludes
> > around 200Tb of 3 hourly data from the decadal projections.
> 
> 
> Hmm. I don't know when that happened. Last time I looked it was in ...
> I certainly think it needs to be. I know a lot of folk who will be
> looking to use that.  Perhaps it's worth reminding me what data is not
> being replicated (of the requested). I had thought it was the ocean 3d
> fields + (can't remember, but didn't think it was the tgica data).
> 
> > It may be that the last point (which I hadn't noticed before) will
> > force us to reconsider the replication issue. TGICA may well want to
> > have the data that falls under their request included in the data
> > which is migrated/tagged into the IPCC DDC: and this would mean that
> > it all would have to be quality controlled all the way to level 3.
> 
> By hook or crook we will need this data to make it to level 3.
> 
> Thanks for picking up on this.
> Bryan
> 
> 
> >
> > Regards,
> > Martin
> >
> > > -----Original Message-----
> > > From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> > > bounces at ucar.edu] On Behalf Of Bryan Lawrence
> > > Sent: 20 July 2010 14:19
> > > To: go-essp-tech at ucar.edu; Karl E.Taylor
> > > Cc: Cinquini, Luca (3880)
> > > Subject: Re: [Go-essp-tech] Access control for data with different
> > > QC Level
> > >
> > > Hi Martina
> > >
> > > I think answering the political and technical in one shot might
> > > help here, so I'm going to try. This was my understanding of all
> > > the
> >
> > various
> >
> > > conversations we have had.
> > >
> > > Karl, Balaji, Martin: can I trouble you to  read through. I'll
> > > highlight
> > > the bits where you need to pay attention!
> > >
> > > > 1. My understanding was that at QC L1 the CMIP5 modelling
> > > > centres,
> >
> > at
> >
> > > > QC L2 non-commercial researchers and at QC L3 every registered
> > > > user can access the data.
> > >
> > > The issue of commercial v non-commercial is a decision for the
> > > modelling
> > > centres. Given the met office is now allowing commercial, it may
be
> > > that
> > > it has gone away. But that' licensing decision.
> > >
> > > So functionally:
> > >
> > > access_token_required (dataset) =
> > >
> > >      f(qc_level(dataset), license_type(dataset))
> > >
> > > Currently we expect PCMDI to allocate tokens to users.
> > >
> > > I expect the following three classes of tokens:
> > >
> > > unrestricted_use*
> > > noncommercial_use
> > > testing
> > >
> > > (* unrestricted still requires citation, I'll get to what i mean
by
> > > citation below.)
> > >
> > > And we need machinery to allocate access_token_required to
specific
> > > datasets.
> > >
> > > The questions then become:
> > >
> > > a) on what grounds and how does PCMDI allocate the tokens?
> > >
> > > How should be easy, being part of the user management tooling. I
> > > don't know what the state of generic ESG tooling for that is, but
> > > we have this
> > > sort of tooling available  as part of our normal infrastructure.
> > >
> > > On what grounds is more interesting. I'll postpone that a moment.
> > >
> > > b) when, where and how, do we set up the "table" which maps
> > > access_token_required onto the dataset.
> > >
> > > Step 1). we need to record qc information.
> > >
> > >  - the plan is to build a tool for that in September, to be
> > >  complete
> >
> > by
> >
> > > the end of September, and it will export CIM quality documents via
> > > an atom feed. It will  be independent of the questionnaire, and
> > > folks could
> > > deploy it anywhere, even on a data node.
> > >
> > >  (Martina: that can cover the DOI information too.)
> > >  (I have someone in mind to do the work.)
> > >  - we get qc level one for free (the data can't be published by a
> > >  data
> > >
> > > node without being qc level one).
> > > - given that qc level 2 can only be dealt with at DKRZ, PCMDI and
> >
> > BADC
> >
> > > since it will apply to replicated data, then we only need to
deploy
> >
> > the
> >
> > > tool there, and gateways will only need to harvest from three
> > > places - q3 information can only be made available at DKRZ
> > >
> > > Step 2) the qc information needs to be harvested.
> > > - this is a gateway issue,  need only to harvest from the three
> > > above, plus the QC one at DKRZ.
> > >
> > >  - It needs to be mapped onto each replica.
> > >
> > > Step 3) this information needs to propagate into the PDP (or
> > > whatever we
> > > are calling the policy decision point, I've lost track of the
> > > names).
> > >
> > > Karl, Balaji, Martin:
> > >
> > > At this point, we should recognise that we have the ability to
> > > discriminate  between
> > > QC  L1
> > > QC  L2 *only for replicants*
> > > QC  L3 *only for replicants*
> > >
> > > We will assign DOIs *only for replicants*.  (At least in the first
> > > instance).
> > >
> > > This brings me back to on what grounds should we allocate access
> > > tokens,
> > > and what licenses should be associated with them.
> > >
> > > I thought we had agreed on something like (abbreviated, the exact
> > > wording needs agreement as per Balaji's email):
> > >
> > > testing: you can use this data to exercise this software, and
> > > report issues with the data to the originator. you may not publish
> > > science with
> > > this data, without the express permission of the data originator.
> > >
> > > unrestricted: you can do anything you like with the data but you
> > > must include citations in publications.
> > >
> > > non-commercial: there are some restrictions on use, and you must
> > > include
> > > citations in publications ...
> > >
> > > Before continuing, this brings me to a point of disagreement with
> >
> > Karl.
> >
> > > Users *should* absolutely care about the distinction between
> >
> > replicated
> >
> > > and non-replicated data. It's a quality thing. They can use the qc
> > > stuff
> > > with more confidence.
> > >
> > > However, as it stands, we can't give a DOI to output which is not
> > > replicated, but people will need to use it. I *do* think it's ok
to
> > > restrict this to modellers (despite Martin's point about what
PCMDI
> >
> > are
> >
> > > advertising). I think most of the non-modelling community will be
> >
> > happy
> >
> > > with the replicated data ...
> > >
> > > ... and I think WGCM will buy that argument.
> > >
> > > But for the modellers using the L1 data which cannot be qc'd, then
> > > we need a form of words for an old style acknowledgement or a
> > > citatoin into
> > > the data equivalent of the "grey literature". (probably ok to give
> > > a url.)
> > >
> > > So, now the criteria.
> > >
> > > testing should be given to modellers as required by the
originators
> > > the other two in the normal way by default to anyone for
replicated
> > > data, only for special people who sign up to hte restriction above
> > > for the non-replicated data.
> > >
> > > (nb: nothing in the above precludes downloading and using
> > > replicated data from other than DKRZ, BADC, PCMDI ... if you have
> > > the tokens),
> > >
> > > So that's how I thought we'd agreed it all, but i concede it had
> > > never been written down in one place.
> > >
> > > Cheers
> > > Bryan
> > >
> > > > 2. Bryan please correct me: There is QC L1 as in 1. and after QC
> > > > L2 and QC L3 all registered users have access to the core data.
> > > > Maybe only non-commercial researchers are granted access to the
> > > > non-core data.
> > > >
> > > > This is more a political issue.
> > > >
> > > > In either case the QC Level has to be communicated to the ESG.
> > > > Luca suggests that the portal uses the AtomFeed of the
> > > > questionnaire to harvest the QC Flag. And after QC L3 the DOI
> > > > link as well. QC and DOI are informations on data, so the right
> > > > place in metafor CIM would be the dataObject on the hierarchy
> > > > level "DRS experiment". Which parts of CIM do you harvest?
> > > >
> > > > My biggest question at the moment is how to deliver the QC
> > > > information to CIM. For the DOI target page there are a few
> > > > additional information pieces needed on citation and contact.
> > > > Stephen suggested to type them into the questionnaire. This
would
> > > > slow the publication process down and is error-prone. We need an
> > > > automated CIM update there. The metafor people were against that
> > > > solution as well because the questionnaire is meant for an
inital
> > > > metadata ingest by the modeling centers. Bryan, how do we get
the
> > > > information in the questionnaire, so that it can be harvested by
> > > > the ESG?
> > > > Which would be the alternatives to the AtomFeed/questionnaire as
> > > > harvesting source for the quality level and DOI information?
> > > >
> > > > My second biggest question is where to put the information in
the
> > > > CIM. I sent my interpretation / suggestion to the metafor list,
> > > > but it didn't start a discussion. Examples for a simulationRun
> > > > object, on how the dataObjects are referenced and on how the
> > > > dataObject hierarchies are built, would be of great help. Or
> > > > metafor just defines how I should send the quality information
> > > > to them.
> > > >
> > > > I moved away from the technical issues, but to solve these
things
> > > > is the precondition for the technical solution in the ESG.
> > > >
> > > > Thanks a lot,
> > > > Martina
> > > >
> > > > V. Balaji wrote:
> > > > > I know we discussed this at the Princeton workshop. I didn't
> > > > > register some of the implications then.
> > > > >
> > > > > I agree that in a technical sense, yes a dataset is
"available"
> > > > > to registered users as soon as it is passed by the publisher.
> > > > > (QCL1-D). At that point, however, it's incompletely
> > > > > documented, so I'm not sure it can be declared fully
> > > > > compliant.
> > > > >
> > > > > My understanding is that while users are free to begin working
> >
> > with
> >
> > > > > the data, they can publish results from the data only when the
> > > > > dataset is citable, which means it has undergone more rigorous
> > > > > QC. What they downloaded before QC-L2 is certainly
> > > > > use-at-your-own-risk because L2's when the "semantic QC" kicks
> > > > > in. And without QC-L3 it isn't citable.
> > > > >
> > > > > I think there is a pretty strong feeling that the modeling
> >
> > centers'
> >
> > > > > data were used too often without citation or acknowledgment
> > > > > last time, which is what some of the more formal QC levels
> > > > > this time, e.g DOIs tied to data publication, are trying to
> > > > > avoid. Assuming the QC document is adopted by the WGCM, it
> > > > > will be a requirement for downstream users to cite datasets.
> > > > >
> > > > > So, QC-L1D data are "available" in the sense that the 1s and
0s
> >
> > may
> >
> > > > > be downloaded, but they're not licensed yet for "do whatever
> > > > > you like with them"... perhaps?
> > > > >
> > > > > It's pretty important that we come up with language that is
> > > > > clear what one can and cannot do with data at various levels
> > > > > of QC. I've talked with Karl and Ron and others about making
> > > > > WGCM the
> >
> > authority
> >
> > > > > for this, wo whatever words we use have to be run by them.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Cinquini, Luca (3880) writes:
> > > > >> Hi Estani,
> > > > >>
> > > > >> 	I concur with what Eric said, and to iterate my
> understanding
> > >
> > > is
> > >
> > > > >> 	that as soon as the data is published with QCL1,
> > > > >>
> > > > >> it will be available to registered users. Maybe Bob, Dean or
> > > > >> Karl can comment if my understanding is correct or not.
> > > > >> thanks, Luca
> > > > >>
> > > > >> On Jul 19, 2010, at 2:52 PM, Eric Nienhouse wrote:
> > > > >>> Hi All,
> > > > >>>
> > > > >>> We've had a number of discussions on the topic of QC level
> > > > >>> and data access.  However, I feel we don't yet have a formal
> > > > >>> definition of the requirements relating to this area.
> > > > >>>
> > > > >>> I believe it is important to clarify and define the
following
> >
> > two
> >
> > > > >>> QC related areas:
> > > > >>>
> > > > >>> 1)  Who is the authoritative source of the QC level and how
> > > > >>> this information is propagated through the system?
> > > > >>>
> > > > >>> 2)  How does QC level apply to data access policy (eg.
access
> > > > >>> control)?
> > > > >>>
> > > > >>> I would propose discussing this as a future GO-ESSP telco
> > > > >>> agenda topic, with the intention we document the outcome.
> > > > >>>
> > > > >>> Perhaps we can discuss this further via email and work
> > > > >>> towards capturing the system requirements and related
> > > > >>> policies in the meanwhile.
> > > > >>>
> > > > >>> Please note that there are plans to expose the QC Level
> > > > >>> within the Gateway UI once the data flow is identified.
> > > > >>> However, data access control is based upon the group (eg.
> > > > >>> role) auth-z attribute (such as "CMIP5 Research") and does
> > > > >>> not currently rely on the QC Level explicitly.
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> -Eric
> > > > >>>
> > > > >>> Estanislao Gonzalez wrote:
> > > > >>>> Hi Luca,
> > > > >>>>
> > > > >>>> to sum things up (and correct me Martina/Bryan if I'm
> > > > >>>> wrong):
> > > > >>>>
> > > > >>>> 1) Published data have QC L1-Data "per se",  and will be
> > > > >>>> available to a very selected group only (which doesn't seem
> > > > >>>> to be the group you mention, but I might be wrong).
> > > > >>>> 2) When acquiring QC L2 the data should be accessible to a
> > > > >>>> broader although still confined group. This check will be
> > > > >>>> performed by DKRZ and BADC and the information stored
> > > > >>>> somewhere (not sure where though). Where BADC nor DKRZ have
> > > > >>>> access to all data-nodes, so the information will be
> > > > >>>> definitely be stored on some "neutral grounds" (CIM DB?).
> > > > >>>> 3) QC L3 == DOI acquired == publication. At this stage data
> > > > >>>> will be available to any registered user.
> > > > >>>>
> > > > >>>> If I'm correct, then the security service must check
> > > > >>>> "somehow" the QC level of the file in order to proceed with
> > > > >>>> the authorization as it is currently implemented (thus
> > > > >>>> comparing roles).
> > > > >>>>
> > > > >>>> Any comments anyone?
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Estani
> > > > >>>>
> > > > >>>> Cinquini, Luca (3880) wrote:
> > > > >>>>> Hi Bryan, Martina,
> > > > >>>>> I agree that these issues need to be discussed better, but
> >
> > here
> >
> > > > >>>>> are some considerations, which may in some cases only
> > > > >>>>> reflect my understanding:
> > > > >>>>>
> > > > >>>>> 1) we talked about the QC flag for Levels 2 and 3 to be
set
> > > > >>>>> in the metaphor questionnaire, and be propagated through
> > > > >>>>> the atom feed to the gateways
> > > > >>>>>
> > > > >>>>> 2) I thought that in order not to delay data distribution,
> > > > >>>>> as soon as the data has QC level 1 (I.e. It has been
> > > > >>>>> processed by the publisher), it will available to
> > > > >>>>> registered users of the CMIP5 research and commercial
> > > > >>>>> groups
> > > > >>>>>
> > > > >>>>> 3) At this time there is nothing in the ESG access control
> > > > >>>>> model that toes the access attributes to the QC flags.
> > > > >>>>>
> > > > >>>>> Thanks, luca
> > > > >>>>>
> > > > >>>>> On Jul 19, 2010, at 7:39 AM, Bryan Lawrence
> > >
> > > <bryan.lawrence at stfc.ac.uk> wrote:
> > > > >>>>>> Hi Martina
> > > > >>>>>>
> > > > >>>>>> We definitely need to formalise some of this, so thanks
> > > > >>>>>> for bringing it up.
> > > > >>>>>>
> > > > >>>>>> What I had thought we were proposing was that L2 and L3
> > > > >>>>>> data have effectively the same restrictions ...
> > > > >>>>>>
> > > > >>>>>> ... but your fundamental point (I think) is how do we
> > > > >>>>>> assign the QC, and how does the security software get
> > > > >>>>>> that information? Ie what is the workflow that needs to
> > > > >>>>>> exist. We do need to bottom that out.
> > > > >>>>>>
> > > > >>>>>> Thanks
> > > > >>>>>> Bryan
> > > > >>>>>>
> > > > >>>>>> On Monday 19 July 2010 13:43:59 Martina Stockhause wrote:
> > > > >>>>>>> Hi all,
> > > > >>>>>>>
> > > > >>>>>>> I had a little discussion with Estani about how the
> >
> > different
> >
> > > > >>>>>>> and changing access constraints on the data depending on
> > > > >>>>>>> their QC levels are realized. It came out that we don't
> > > > >>>>>>> really know.
> > > > >>>>>>>
> > > > >>>>>>> We have on the one hand the user with a special role
e.g.
> > > > >>>>>>> "scientific, non-commercial user", who has access to
data
> > > > >>>>>>> on QC L3 like every registered user and QC L2 because of
> > > > >>>>>>> his role. On the other hand, the data has a quality
> > > > >>>>>>> attribute (QC Level or QC Flag), which defines the
> > > > >>>>>>> access restriction of the data. For data access a
> > > > >>>>>>> mechanism has to check user role and data attribute,
> > > > >>>>>>> before access is granted or denied.
> > > > >>>>>>>
> > > > >>>>>>> How does the data get this quality attribute?
> > > > >>>>>>> How is the user role checked against this quality
> > > > >>>>>>> attribute?
> > > > >>>>>>>
> > > > >>>>>>> For QC L3 we don't need that mechanism, because every
> > > > >>>>>>> registered user has access to all CMIP5 data, but for QC
> > > > >>>>>>> L1 and L2 exist such access restrictions.
> > > > >>>>>>>
> > > > >>>>>>> Thanks a lot,
> > > > >>>>>>> Martina
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> _______________________________________________
> > > > >>>>>>> GO-ESSP-TECH mailing list
> > > > >>>>>>> GO-ESSP-TECH at ucar.edu
> > > > >>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > > > >>>>>>
> > > > >>>>>> --
> > > > >>>>>> Bryan Lawrence
> > > > >>>>>> Director of Environmental Archival and Associated
Research
> > > > >>>>>> (NCAS/British Atmospheric Data Centre and NCEO/NERC
NEODC)
> > > > >>>>>> STFC, Rutherford Appleton Laboratory
> > > > >>>>>> Phone +44 1235 445012; Fax ... 5848;
> > > > >>>>>> Web: home.badc.rl.ac.uk/lawrence
> > > > >>>>>> _______________________________________________
> > > > >>>>>> GO-ESSP-TECH mailing list
> > > > >>>>>> GO-ESSP-TECH at ucar.edu
> > > > >>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > > > >>>>>
> > > > >>>>> _______________________________________________
> > > > >>>>> GO-ESSP-TECH mailing list
> > > > >>>>> GO-ESSP-TECH at ucar.edu
> > > > >>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > > > >>
> > > > >> _______________________________________________
> > > > >> GO-ESSP-TECH mailing list
> > > > >> GO-ESSP-TECH at ucar.edu
> > > > >> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > >
> > > --
> > > Bryan Lawrence
> > > Director of Environmental Archival and Associated Research
> > > (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> > > STFC, Rutherford Appleton Laboratory
> > > Phone +44 1235 445012; Fax ... 5848;
> > > Web: home.badc.rl.ac.uk/lawrence
> > > _______________________________________________
> > > GO-ESSP-TECH mailing list
> > > GO-ESSP-TECH at ucar.edu
> > > http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> 
> --
> Bryan Lawrence
> Director of Environmental Archival and Associated Research
> (NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
> STFC, Rutherford Appleton Laboratory
> Phone +44 1235 445012; Fax ... 5848;
> Web: home.badc.rl.ac.uk/lawrence
> --
> Scanned by iCritical.

-- 
Scanned by iCritical.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: drs_implementation_notes_v3.odt
Type: application/octet-stream
Size: 17562 bytes
Desc: drs_implementation_notes_v3.odt
Url : http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20100720/63960d5a/attachment-0001.obj