[Go-essp-tech] +2Gb CMIP5 files

Bryan Lawrence bryan.lawrence at stfc.ac.uk
Wed May 19 23:39:27 MDT 2010


This has been brought up before, but I still think we might find that 
files larger than X GB are no fun to move across international links ...  
but we don't really know what X is ...

Cheers
Bryan

On Wednesday 19 May 2010 18:06:04 Karl Taylor wrote:
> Hi all,
> 
> Thank you all for the input.  My impression is that there are
>  sometimes good reasons for writing files larger than 4 GB.  I also
>  don't think anyone with old file systems (that can't handle files
>  larger than 4GB) will probably have the computing power to do much
>  with the large CMIP datasets, so they likely won't download the
>  kinds of 3-d fields that might be stored in files larger than 4 GB.
> 
> Although I wouldn't say everyone agrees on this, I've discussed this
> with Charles, and we now plan to activate a warning in CMOR if a file
> (when it is closed by CMOR) exceeds 4 GB.  We won't impose any
>  absolute limit, but in the "requirements" document, we will explain
>  why we recommend generally limiting file size.
> 
> The CMOR checker code will also give the same warning.
> 
> Please let me know if there is a compelling reason for doing things
> differently.
> 
> Best regards,
> Karl
> 
> On 5/18/10 3:37 PM, Gary Strand wrote:
> > On Tue May 18, 2010, at 9:31 AM,<martin.juckes at stfc.ac.uk> 
> > <martin.juckes at stfc.ac.uk
> >
> >   >  wrote:
> >>
> >> This may have been covered already, but we also need to consider
> >> network
> >> implications. I think, as suggested by Phil's email near the start
> >> of this thread, that the problems associated with transferring
> >> large files
> >> (due to the faster than linear growth in failure rate with file
> >> size) give enough justification for imposing a limit,
> >
> > One twist is that if subsetting (geographical and/or temporal) is
> > available for CMIP5 data, then we (data providers) may be able to
> > get away with larger files. Granted, if a file needs to be
> > replicated across the federation, then very large files (>  10GB)
> > may be problematic, but IMHO in terms of overall bandwidth
> > consumption, replication will be small part of the total over time.
> >
> > CCSM has made available many ~5 GB files for access via ESG, and
> > we've had only occasional problems with downloads of the entire
> > file. The bigger problem is that most users (these are 6-hourly
> > full-column atmosphere data, quite suitable for driving RCMs) are
> > interested in only a small geographical area. That's why subsetting
> > of these large files is important. Yes, there may be some users who
> > want the global data, but our experience has been that the vast
> > majority do not. There have been several cases in which users have
> > provided me with disks to subset these data for them, and then ship
> > the subsetted data back, as their bandwidth is just too low for the
> > global files.
> >
> > In sum, I'm advocating reasonable CMIP5 file sizes, but I think<2
> > GB is a little too small. If that's mandated, CCSM4 will end up
> > submitting many files of that size, particularly from the ocean
> > model (7.4*10^6 grid points per global full-depth field). An upper
> > limit of 10 GB will nicely fit one file per decade from monthly
> > ocean data, and will allow us to have a year's worth of full-column
> > 6-hourly atmosphere in a single file (~8.4 GB).
> >
> >> Cheers,
> >> Martin
> >>
> >>> -----Original Message-----
> >>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-
> >>> bounces at ucar.edu] On Behalf Of Nathan Wilhelmi
> >>> Sent: 18 May 2010 15:55
> >>> To: Pascoe, Stephen (STFC,RAL,SSTD)
> >>> Cc: go-essp-tech at ucar.edu; doutriaux1 at llnl.gov
> >>> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
> >>>
> >>> Hi All,
> >>>
> >>>     Here is a nice table summarizing the various Windows file
> >>> system limits. http://*www.*ntfs.com/ntfs_vs_fat.htm
> >>>
> >>> -Nate*
> >>> *
> >>>
> >>> stephen.pascoe at stfc.ac.uk wrote:
> >>>> I've done some testing of these file limits this afternoon and I
> >>>
> >>> don't
> >>>
> >>>> think the filesystems will be a problem.
> >>>>
> >>>>>  From Wikipedia it appears the FAT32 file system has a 4Gb
> >>>>> limit
> >>>>
> >>>> (http://*en.wikipedia.org/wiki/File_Allocation_Table).  That
> >>>> covers Windows 95 onwards but my Windows XP box is NTFS and has
> >>>> no problem
> >>>
> >>> with
> >>>
> >>>> +4Gb files.  Similarly my 32-bit linux laptop (recent ubuntu)
> >>>> can
> >>>
> >>> handle
> >>>
> >>>> +4Gb files.
> >>>>
> >>>> Looks like anyone with a reasonably modern system will be able
> >>>> to
> >>>
> >>> handle
> >>>
> >>>> +4Gb files.  We may have more problems with old NetCDF library
> >>>
> >>> versions.
> >>>
> >>>> S.
> >>>>
> >>>> ---
> >>>> Stephen Pascoe  +44 (0)1235 445980
> >>>> British Atmospheric Data Centre
> >>>> Rutherford Appleton Laboratory
> >>>>
> >>>> -----Original Message-----
> >>>> From: go-essp-tech-bounces at ucar.edu
> >>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of
> >>>> ag.stephens at stfc.ac.uk
> >>>> Sent: 18 May 2010 09:31
> >>>> To: taylor13 at llnl.gov; go-essp-tech at ucar.edu
> >>>> Cc: doutriaux1 at llnl.gov
> >>>> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
> >>>>
> >>>> Dear Karl,
> >>>>
> >>>> Whether we think it's advisable or not, I'm sure that some of
> >>>> the
> >>>
> >>> wider
> >>>
> >>>> CMIP5 user community will be looking at the outputs on Windows.
> >>>> I
> >>>
> >>> think
> >>>
> >>>> it is sensible to set a 2GB file size limit.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Ag
> >>>>
> >>>> -----Original Message-----
> >>>> From: go-essp-tech-bounces at ucar.edu
> >>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Karl Taylor
> >>>> Sent: 17 May 2010 18:45
> >>>> To: go-essp-tech at ucar.edu
> >>>> Cc: Doutriaux, Charles
> >>>> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
> >>>>
> >>>> Dear all,
> >>>>
> >>>> CMOR has code already in place for checking whether a file
> >>>> exceeds 2
> >>>
> >>> GB,
> >>>
> >>>> but it is currently turned off (it was turned on for CMIP3).  We
> >>>
> >>> thought
> >>>
> >>>> it was now unnecessary.  If the feeling is that there will be
> >>>> users downloading CMIP5 files to windows machines using older
> >>>> operating systems, I suppose that limiting CMIP5 files to
> >>>> whatever the limit
> >>
> >> is
> >>
> >>> (2
> >>>
> >>>> GB or 4 GB -- does anyone know which it is?) might be wise.
> >>>>
> >>>> On the other hand, will anyone use a windows machine to look at
> >>>
> >>> netCDF
> >>>
> >>>> files?  If not, maybe this is a non-issue.
> >>>>
> >>>> Karl
> >>>>
> >>>> On 5/16/10 12:08 PM, stephen.pascoe at stfc.ac.uk wrote:
> >>>>> I think I raised undue alarm here when suggesting we might be
> >>>
> >>> dealing
> >>>
> >>>> with +2GB files.  Thanks Phil for clarifying that UKMO is still
> >>>
> >>> planning
> >>>
> >>>> to limit itself to<2GB files.
> >>>>
> >>>>> I am wondering what the policy should be here?  My first
> >>>>> thought is
> >>>>
> >>>> that modeling centres will mainly make the same decision as UKMO
> >>>
> >>> since
> >>>
> >>>> it is in their interest for their model output to be widely
> >>>> used. However, enforcement could be difficult.  The logical
> >>>> place to
> >>>
> >>> enforce
> >>>
> >>>> the limit is in the level 1 QC but CMOR doesn't do this so it
> >>>> will
> >>
> >> be
> >>
> >>> a
> >>>
> >>>> problem for people running datanodes.
> >>>>
> >>>>> I suggest we make a strong recommendation to supply data in<2GB
> >>>
> >>> files
> >>>
> >>>> and enforce it during level-2 QC before replicating.
> >>>>
> >>>>> S.
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: go-essp-tech-bounces at ucar.edu on behalf of Michael
> >>>>> Lautenschlager
> >>>>> Sent: Sun 5/16/2010 1:35 PM
> >>>>> To: V. Balaji
> >>>>> Cc: go-essp-tech at ucar.edu
> >>>>> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
> >>>>>
> >>>>> Hello *,
> >>>>>
> >>>>> we strongly support Phils decision for data files less than 2
> >>>>> GB.
> >>
> >> We
> >>
> >>>>> made decision in Hamburg for the same reasons because we cannot
> >>>
> >>> expect
> >>>
> >>>>> that all users use 64 Bit systems. Most Windows environments
> >>>>> are
> >>>
> >>> still
> >>>
> >>>>> running with 32 Bits.
> >>>>>
> >>>>> Best wishes, Michael
> >>>>>
> >>>>> ---------------
> >>>>> Dr. Michael Lautenschlager
> >>>>>
> >>>>> German Climate Computing Centre (DKRZ) World Data Center
> >>>>> Climate (WDCC)
> >>>>> ADDRESS: Bundesstrasse 45a, D-20146 Hamburg, Germany
> >>>>> PHONE:   +4940-460094-118
> >>>>> E-Mail:  lautenschlager at dkrz.de
> >>>>>
> >>>>> URL:    http://**www.**dkrz.de/
> >>>>>           http://**www.**wdc-climate.de/
> >>>>>
> >>>>> V. Balaji schrieb:
> >>>>>> If I understood correctly the most serious 2Gb problem is with
> >>>>
> >>>> apache!
> >>>>
> >>>>>> Bentley, Philip writes:
> >>>>>>> Hi Stephen,
> >>>>>>>
> >>>>>>> Yes, that's true, we did create a small number of test netCDF
> >>>
> >>> files
> >>>
> >>>>>>> in that size range. But this was because the CMOR library we
> >>>>>>> used
> >>>
> >>> at
> >>>
> >>>>>>> the time didn't include functionality for chunking the output
> >>
> >> into
> >>
> >>>>>>> smaller files. Plus we wanted to stress-test our pipeline!
> >>>>>>>
> >>>>>>> Two things have happened since then:
> >>>>>>>
> >>>>>>> 1. Jamie has been working with Charles at PCMDI to implement
> >>>>>>> and test a solution whereby we can limit the size of the
> >>>>>>> output
> >>
> >> netCDF
> >>
> >>>>>>> files produced by CMOR.
> >>>>>>>
> >>>>>>> 2. We have made the local decision to limit our netCDF file
> >>>>>>> sizes
> >>>
> >>> to
> >>>
> >>>>>>> 2 GB (or thereabouts) as, logistically, that will cause us
> >>>>>>> less headache moving these files around, and it should
> >>>>>>> maximise the number of client applications in which the files
> >>>>>>> can be read.
> >>>>>>>
> >>>>>>> IIRC, I think Balaji mentioned that the 64-bit offset format
> >>>>>>> was required for output from the gridspec toolset. I could be
> >>>>>>> wrong.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Phil
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: go-essp-tech-bounces at ucar.edu
> >>>>>>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of
> >>>>>>>> stephen.pascoe at stfc.ac.uk
> >>>>>>>> Sent: 14 May 2010 10:52
> >>>>>>>> To: go-essp-tech at ucar.edu
> >>>>>>>> Subject: [Go-essp-tech] +2Gb CMIP5 files
> >>>>>>>>
> >>>>>>>> The latest UKMO extraction for CMIP5 has produced some files
> >>>>>>>> in
> >>>
> >>> the
> >>>
> >>>>>>>> 30Gb range.  We had discussed previously the assumption that
> >>>>>>>> all files would be<2Gb.  Do we feel it is important to
> >>>>>>>> enforce a<2Gb limit or should this just be a recommendation
> >>>>>>>> on modelling
> >>>
> >>> centres?
> >>>
> >>>>>>>> To my knowledge there is two issues with +2Gb files:
> >>>>>>>>
> >>>>>>>>   1. +2GB NetCDF files will be in 64-bit offset format.
> >>>>>>>> Therefore NetCDF libraries prior to v3.6 will not be able to
> >>
> >> read
> >>
> >>>>>>>> them.
> >>>>>>>>   2. Older file systems may have a 2Gb file limit. This will
> >>>
> >>> mainly
> >>>
> >>>>>>>> affect 32-bit systems that are a few years old. FAT32 has a
> >>>>>>>> 4Gb limit.
> >>>>>>>>
> >>>>>>>> These are end-user issues, is there any reason why the ESG
> >>>
> >>> software
> >>>
> >>>>>>>> might have problems with files over 2Gb?  If we do want to
> >>
> >> ensure
> >>
> >>>>>>>> files are<2Gb do we want to mandate the modelling centres
> >>
> >> deliver
> >>
> >>>>>>>> that or will the data centres need to split files?
> >>>>>>>>
> >>>>>>>> Stephen.
> >>>>>>>>
> >>>>>>>> ---
> >>>>>>>> Stephen Pascoe  +44 (0)1235 445980
> >>>>>>>> British Atmospheric Data Centre
> >>>>>>>> Rutherford Appleton Laboratory
> >>>>>>>> --
> >>>>>>>> Scanned by iCritical.
> >>>>>>>> _______________________________________________
> >>>>>>>> GO-ESSP-TECH mailing list
> >>>>>>>> GO-ESSP-TECH at ucar.edu
> >>>>>>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> GO-ESSP-TECH mailing list
> >>>>>>> GO-ESSP-TECH at ucar.edu
> >>>>>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>>>
> >>>>> _______________________________________________
> >>>>> GO-ESSP-TECH mailing list
> >>>>> GO-ESSP-TECH at ucar.edu
> >>>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>>
> >>>> _______________________________________________
> >>>> GO-ESSP-TECH mailing list
> >>>> GO-ESSP-TECH at ucar.edu
> >>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>> --
> >>>> Scanned by iCritical.
> >>>> _______________________________________________
> >>>> GO-ESSP-TECH mailing list
> >>>> GO-ESSP-TECH at ucar.edu
> >>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>>
> >>> _______________________________________________
> >>> GO-ESSP-TECH mailing list
> >>> GO-ESSP-TECH at ucar.edu
> >>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >>
> >> --
> >> Scanned by iCritical.
> >> _______________________________________________
> >> GO-ESSP-TECH mailing list
> >> GO-ESSP-TECH at ucar.edu
> >> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> >
> > Gary Strand
> > strandwg at ucar.edu
> >
> >
> >
> > _______________________________________________
> > GO-ESSP-TECH mailing list
> > GO-ESSP-TECH at ucar.edu
> > http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
> 
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> 

-- 
Bryan Lawrence
Director of Environmental Archival and Associated Research
(NCAS/British Atmospheric Data Centre and NCEO/NERC NEODC)
STFC, Rutherford Appleton Laboratory
Phone +44 1235 445012; Fax ... 5848; 
Web: home.badc.rl.ac.uk/lawrence


More information about the GO-ESSP-TECH mailing list