[Go-essp-tech] +2Gb CMIP5 files

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Tue May 18 10:21:39 MDT 2010


On Tuesday 18 May 2010 16:31:01 martin.juckes at stfc.ac.uk wrote:
> This may have been covered already, but we also need to consider
>  network implications. I think, as suggested by Phil's email near the
>  start of this thread, that the problems associated with transferring
>  large files (due to the faster than linear growth in failure rate
>  with file size) give enough justification for imposing a limit,

I totally agree with this!
B


> 
> Cheers,
> Martin
> 
> > -----Original Message-----
> > From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> > bounces at ucar.edu] On Behalf Of Nathan Wilhelmi
> > Sent: 18 May 2010 15:55
> > To: Pascoe, Stephen (STFC,RAL,SSTD)
> > Cc: go-essp-tech at ucar.edu; doutriaux1 at llnl.gov
> > Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
> >
> > Hi All,
> >
> >     Here is a nice table summarizing the various Windows file
> > system limits. http://www.ntfs.com/ntfs_vs_fat.htm
> >
> > -Nate*
> > *
> >
> > stephen.pascoe at stfc.ac.uk wrote:
> > > I've done some testing of these file limits this afternoon and I
> >
> > don't
> >
> > > think the filesystems will be a problem.
> > >
> > > >From Wikipedia it appears the FAT32 file system has a 4Gb limit
> > >
> > > (http://en.wikipedia.org/wiki/File_Allocation_Table).  That
> > > covers Windows 95 onwards but my Windows XP box is NTFS and has
> > > no problem
> >
> > with
> >
> > > +4Gb files.  Similarly my 32-bit linux laptop (recent ubuntu) can
> >
> > handle
> >
> > > +4Gb files.
> > >
> > > Looks like anyone with a reasonably modern system will be able to
> >
> > handle
> >
> > > +4Gb files.  We may have more problems with old NetCDF library
> >
> > versions.
> >
> > > S.
> > >
> > > ---
> > > Stephen Pascoe  +44 (0)1235 445980
> > > British Atmospheric Data Centre
> > > Rutherford Appleton Laboratory
> > >
> > > -----Original Message-----
> > > From: go-essp-tech-bounces at ucar.edu
> > > [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of
> > > ag.stephens at stfc.ac.uk
> > > Sent: 18 May 2010 09:31
> > > To: taylor13 at llnl.gov; go-essp-tech at ucar.edu
> > > Cc: doutriaux1 at llnl.gov
> > > Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
> > >
> > > Dear Karl,
> > >
> > > Whether we think it's advisable or not, I'm sure that some of the
> >
> > wider
> >
> > > CMIP5 user community will be looking at the outputs on Windows. I
> >
> > think
> >
> > > it is sensible to set a 2GB file size limit.
> > >
> > > Regards,
> > >
> > > Ag
> > >
> > > -----Original Message-----
> > > From: go-essp-tech-bounces at ucar.edu
> > > [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Karl Taylor
> > > Sent: 17 May 2010 18:45
> > > To: go-essp-tech at ucar.edu
> > > Cc: Doutriaux, Charles
> > > Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
> > >
> > > Dear all,
> > >
> > > CMOR has code already in place for checking whether a file
> > > exceeds 2
> >
> > GB,
> >
> > > but it is currently turned off (it was turned on for CMIP3).  We
> >
> > thought
> >
> > > it was now unnecessary.  If the feeling is that there will be
> > > users downloading CMIP5 files to windows machines using older
> > > operating systems, I suppose that limiting CMIP5 files to
> > > whatever the limit
> 
> is
> 
> > (2
> >
> > > GB or 4 GB -- does anyone know which it is?) might be wise.
> > >
> > > On the other hand, will anyone use a windows machine to look at
> >
> > netCDF
> >
> > > files?  If not, maybe this is a non-issue.
> > >
> > > Karl
> > >
> > > On 5/16/10 12:08 PM, stephen.pascoe at stfc.ac.uk wrote:
> > >> I think I raised undue alarm here when suggesting we might be
> >
> > dealing
> >
> > > with +2GB files.  Thanks Phil for clarifying that UKMO is still
> >
> > planning
> >
> > > to limit itself to<2GB files.
> > >
> > >> I am wondering what the policy should be here?  My first thought
> > >> is
> > >
> > > that modeling centres will mainly make the same decision as UKMO
> >
> > since
> >
> > > it is in their interest for their model output to be widely used.
> > > However, enforcement could be difficult.  The logical place to
> >
> > enforce
> >
> > > the limit is in the level 1 QC but CMOR doesn't do this so it
> > > will
> 
> be
> 
> > a
> >
> > > problem for people running datanodes.
> > >
> > >> I suggest we make a strong recommendation to supply data in<2GB
> >
> > files
> >
> > > and enforce it during level-2 QC before replicating.
> > >
> > >> S.
> > >>
> > >> -----Original Message-----
> > >> From: go-essp-tech-bounces at ucar.edu on behalf of Michael
> > >> Lautenschlager
> > >> Sent: Sun 5/16/2010 1:35 PM
> > >> To: V. Balaji
> > >> Cc: go-essp-tech at ucar.edu
> > >> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
> > >>
> > >> Hello *,
> > >>
> > >> we strongly support Phils decision for data files less than 2
> > >> GB.
> 
> We
> 
> > >> made decision in Hamburg for the same reasons because we cannot
> >
> > expect
> >
> > >> that all users use 64 Bit systems. Most Windows environments are
> >
> > still
> >
> > >> running with 32 Bits.
> > >>
> > >> Best wishes, Michael
> > >>
> > >> ---------------
> > >> Dr. Michael Lautenschlager
> > >>
> > >> German Climate Computing Centre (DKRZ) World Data Center Climate
> > >> (WDCC)
> > >> ADDRESS: Bundesstrasse 45a, D-20146 Hamburg, Germany
> > >> PHONE:   +4940-460094-118
> > >> E-Mail:  lautenschlager at dkrz.de
> > >>
> > >> URL:    http://*www.*dkrz.de/
> > >>           http://*www.*wdc-climate.de/
> > >>
> > >> V. Balaji schrieb:
> > >>> If I understood correctly the most serious 2Gb problem is with
> > >
> > > apache!
> > >
> > >>> Bentley, Philip writes:
> > >>>> Hi Stephen,
> > >>>>
> > >>>> Yes, that's true, we did create a small number of test netCDF
> >
> > files
> >
> > >>>> in that size range. But this was because the CMOR library we
> > >>>> used
> >
> > at
> >
> > >>>> the time didn't include functionality for chunking the output
> 
> into
> 
> > >>>> smaller files. Plus we wanted to stress-test our pipeline!
> > >>>>
> > >>>> Two things have happened since then:
> > >>>>
> > >>>> 1. Jamie has been working with Charles at PCMDI to implement
> > >>>> and test a solution whereby we can limit the size of the
> > >>>> output
> 
> netCDF
> 
> > >>>> files produced by CMOR.
> > >>>>
> > >>>> 2. We have made the local decision to limit our netCDF file
> > >>>> sizes
> >
> > to
> >
> > >>>> 2 GB (or thereabouts) as, logistically, that will cause us
> > >>>> less headache moving these files around, and it should
> > >>>> maximise the number of client applications in which the files
> > >>>> can be read.
> > >>>>
> > >>>> IIRC, I think Balaji mentioned that the 64-bit offset format
> > >>>> was required for output from the gridspec toolset. I could be
> > >>>> wrong.
> > >>>>
> > >>>> Regards,
> > >>>> Phil
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: go-essp-tech-bounces at ucar.edu
> > >>>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of
> > >>>>> stephen.pascoe at stfc.ac.uk
> > >>>>> Sent: 14 May 2010 10:52
> > >>>>> To: go-essp-tech at ucar.edu
> > >>>>> Subject: [Go-essp-tech] +2Gb CMIP5 files
> > >>>>>
> > >>>>> The latest UKMO extraction for CMIP5 has produced some files
> > >>>>> in
> >
> > the
> >
> > >>>>> 30Gb range.  We had discussed previously the assumption that
> > >>>>> all files would be<2Gb.  Do we feel it is important to
> > >>>>> enforce a<2Gb limit or should this just be a recommendation
> > >>>>> on modelling
> >
> > centres?
> >
> > >>>>> To my knowledge there is two issues with +2Gb files:
> > >>>>>
> > >>>>>   1. +2GB NetCDF files will be in 64-bit offset format.
> > >>>>> Therefore NetCDF libraries prior to v3.6 will not be able to
> 
> read
> 
> > >>>>> them.
> > >>>>>   2. Older file systems may have a 2Gb file limit. This will
> >
> > mainly
> >
> > >>>>> affect 32-bit systems that are a few years old. FAT32 has a
> > >>>>> 4Gb limit.
> > >>>>>
> > >>>>> These are end-user issues, is there any reason why the ESG
> >
> > software
> >
> > >>>>> might have problems with files over 2Gb?  If we do want to
> 
> ensure
> 
> > >>>>> files are<2Gb do we want to mandate the modelling centres
> 
> deliver
> 
> > >>>>> that or will the data centres need to split files?
> > >>>>>
> > >>>>> Stephen.
> > >>>>>
> > >>>>> ---
> > >>>>> Stephen Pascoe  +44 (0)1235 445980
> > >>>>> British Atmospheric Data Centre
> > >>>>> Rutherford Appleton Laboratory
> > >>>>> --
> > >>>>> Scanned by iCritical.
> > >>>>> _______________________________________________
> > >>>>> GO-ESSP-TECH mailing list
> > >>>>> GO-ESSP-TECH at ucar.edu
> > >>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > >>>>
> > >>>> _______________________________________________
> > >>>> GO-ESSP-TECH mailing list
> > >>>> GO-ESSP-TECH at ucar.edu
> > >>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > >>
> > >> _______________________________________________
> > >> GO-ESSP-TECH mailing list
> > >> GO-ESSP-TECH at ucar.edu
> > >> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > >
> > > _______________________________________________
> > > GO-ESSP-TECH mailing list
> > > GO-ESSP-TECH at ucar.edu
> > > http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> > > --
> > > Scanned by iCritical.
> > > _______________________________________________
> > > GO-ESSP-TECH mailing list
> > > GO-ESSP-TECH at ucar.edu
> > > http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >
> > _______________________________________________
> > GO-ESSP-TECH mailing list
> > GO-ESSP-TECH at ucar.edu
> > http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> 

-- 
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list