[Go-essp-tech] +2Gb CMIP5 files

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Wed May 19 02:53:46 MDT 2010


 
I agree with Karl's suggestion.  If there is strong opposition to <2GB
in the general case this could be safely relaxed to 4GB.

S.

---
Stephen Pascoe  +44 (0)1235 445980
British Atmospheric Data Centre
Rutherford Appleton Laboratory

-----Original Message-----
From: go-essp-tech-bounces at ucar.edu
[mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Karl Taylor
Sent: 18 May 2010 17:46
To: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] +2Gb CMIP5 files

Dear all,

I agree that the transfer issue for large files is probably more
important than whether the data can be processed on old windows systems.

I propose imposing a 2 GB limit on file size with exceptions made for
very high resolution models, e.g., the MRI-AM20km with 60 levels and
1.8x10**6 grid cells.  For this model a single time-slice of 3-d data
would occupy about 0.4 GB, so very many files would be needed to save a
century of monthly data.  [For daily data at 8 pressure levels a year
would occupy about 22 GB.]  In cases like this, I think the size limit
should be relaxed somewhat (e.g., perhaps allowing a full year of
monthly 3-d data to be stored in a single file of approximately 3 GB in
the case considered here, or even a full year of daily 3-d data).

CMOR would give a warning if the size exceeded 2 GB and would error exit
at some higher limit (perhaps 10 GB??? or 25 GB???).  [Charles, let us
know if this would be too hard to do.]

Thanks for your input on this, and let me know if what I propose seems
like the best we can do.  Specifically, what do you think the absolute
limit should be?

Best regards,
Karl


On 5/18/10 9:21 AM, Bryan Lawrence wrote:
> On Tuesday 18 May 2010 16:31:01 martin.juckes at stfc.ac.uk wrote:
>    
>> This may have been covered already, but we also need to consider
>>   network implications. I think, as suggested by Phil's email near
the
>>   start of this thread, that the problems associated with
transferring
>>   large files (due to the faster than linear growth in failure rate
>>   with file size) give enough justification for imposing a limit,
>>      
> I totally agree with this!
> B
>
>
>    
>> Cheers,
>> Martin
>>
>>      
>>> -----Original Message-----
>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech- 
>>> bounces at ucar.edu] On Behalf Of Nathan Wilhelmi
>>> Sent: 18 May 2010 15:55
>>> To: Pascoe, Stephen (STFC,RAL,SSTD)
>>> Cc: go-essp-tech at ucar.edu; doutriaux1 at llnl.gov
>>> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
>>>
>>> Hi All,
>>>
>>>      Here is a nice table summarizing the various Windows file 
>>> system limits. http://*www.*ntfs.com/ntfs_vs_fat.htm
>>>
>>> -Nate*
>>> *
>>>
>>> stephen.pascoe at stfc.ac.uk wrote:
>>>        
>>>> I've done some testing of these file limits this afternoon and I
>>>>          
>>> don't
>>>
>>>        
>>>> think the filesystems will be a problem.
>>>>
>>>> > From Wikipedia it appears the FAT32 file system has a 4Gb limit
>>>>
>>>> (http://*en.wikipedia.org/wiki/File_Allocation_Table).  That covers

>>>> Windows 95 onwards but my Windows XP box is NTFS and has no problem
>>>>          
>>> with
>>>
>>>        
>>>> +4Gb files.  Similarly my 32-bit linux laptop (recent ubuntu) can
>>>>          
>>> handle
>>>
>>>        
>>>> +4Gb files.
>>>>
>>>> Looks like anyone with a reasonably modern system will be able to
>>>>          
>>> handle
>>>
>>>        
>>>> +4Gb files.  We may have more problems with old NetCDF library
>>>>          
>>> versions.
>>>
>>>        
>>>> S.
>>>>
>>>> ---
>>>> Stephen Pascoe  +44 (0)1235 445980
>>>> British Atmospheric Data Centre
>>>> Rutherford Appleton Laboratory
>>>>
>>>> -----Original Message-----
>>>> From: go-essp-tech-bounces at ucar.edu 
>>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of 
>>>> ag.stephens at stfc.ac.uk
>>>> Sent: 18 May 2010 09:31
>>>> To: taylor13 at llnl.gov; go-essp-tech at ucar.edu
>>>> Cc: doutriaux1 at llnl.gov
>>>> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
>>>>
>>>> Dear Karl,
>>>>
>>>> Whether we think it's advisable or not, I'm sure that some of the
>>>>          
>>> wider
>>>
>>>        
>>>> CMIP5 user community will be looking at the outputs on Windows. I
>>>>          
>>> think
>>>
>>>        
>>>> it is sensible to set a 2GB file size limit.
>>>>
>>>> Regards,
>>>>
>>>> Ag
>>>>
>>>> -----Original Message-----
>>>> From: go-essp-tech-bounces at ucar.edu 
>>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Karl Taylor
>>>> Sent: 17 May 2010 18:45
>>>> To: go-essp-tech at ucar.edu
>>>> Cc: Doutriaux, Charles
>>>> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
>>>>
>>>> Dear all,
>>>>
>>>> CMOR has code already in place for checking whether a file exceeds 
>>>> 2
>>>>          
>>> GB,
>>>
>>>        
>>>> but it is currently turned off (it was turned on for CMIP3).  We
>>>>          
>>> thought
>>>
>>>        
>>>> it was now unnecessary.  If the feeling is that there will be users

>>>> downloading CMIP5 files to windows machines using older operating 
>>>> systems, I suppose that limiting CMIP5 files to whatever the limit
>>>>          
>> is
>>
>>      
>>> (2
>>>
>>>        
>>>> GB or 4 GB -- does anyone know which it is?) might be wise.
>>>>
>>>> On the other hand, will anyone use a windows machine to look at
>>>>          
>>> netCDF
>>>
>>>        
>>>> files?  If not, maybe this is a non-issue.
>>>>
>>>> Karl
>>>>
>>>> On 5/16/10 12:08 PM, stephen.pascoe at stfc.ac.uk wrote:
>>>>          
>>>>> I think I raised undue alarm here when suggesting we might be
>>>>>            
>>> dealing
>>>
>>>        
>>>> with +2GB files.  Thanks Phil for clarifying that UKMO is still
>>>>          
>>> planning
>>>
>>>        
>>>> to limit itself to<2GB files.
>>>>
>>>>          
>>>>> I am wondering what the policy should be here?  My first thought 
>>>>> is
>>>>>            
>>>> that modeling centres will mainly make the same decision as UKMO
>>>>          
>>> since
>>>
>>>        
>>>> it is in their interest for their model output to be widely used.
>>>> However, enforcement could be difficult.  The logical place to
>>>>          
>>> enforce
>>>
>>>        
>>>> the limit is in the level 1 QC but CMOR doesn't do this so it will
>>>>          
>> be
>>
>>      
>>> a
>>>
>>>        
>>>> problem for people running datanodes.
>>>>
>>>>          
>>>>> I suggest we make a strong recommendation to supply data in<2GB
>>>>>            
>>> files
>>>
>>>        
>>>> and enforce it during level-2 QC before replicating.
>>>>
>>>>          
>>>>> S.
>>>>>
>>>>> -----Original Message-----
>>>>> From: go-essp-tech-bounces at ucar.edu on behalf of Michael 
>>>>> Lautenschlager
>>>>> Sent: Sun 5/16/2010 1:35 PM
>>>>> To: V. Balaji
>>>>> Cc: go-essp-tech at ucar.edu
>>>>> Subject: Re: [Go-essp-tech] +2Gb CMIP5 files
>>>>>
>>>>> Hello *,
>>>>>
>>>>> we strongly support Phils decision for data files less than 2 GB.
>>>>>            
>> We
>>
>>      
>>>>> made decision in Hamburg for the same reasons because we cannot
>>>>>            
>>> expect
>>>
>>>        
>>>>> that all users use 64 Bit systems. Most Windows environments are
>>>>>            
>>> still
>>>
>>>        
>>>>> running with 32 Bits.
>>>>>
>>>>> Best wishes, Michael
>>>>>
>>>>> ---------------
>>>>> Dr. Michael Lautenschlager
>>>>>
>>>>> German Climate Computing Centre (DKRZ) World Data Center Climate
>>>>> (WDCC)
>>>>> ADDRESS: Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>> PHONE:   +4940-460094-118
>>>>> E-Mail:  lautenschlager at dkrz.de
>>>>>
>>>>> URL:    http://**www.**dkrz.de/
>>>>>            http://**www.**wdc-climate.de/
>>>>>
>>>>> V. Balaji schrieb:
>>>>>            
>>>>>> If I understood correctly the most serious 2Gb problem is with
>>>>>>              
>>>> apache!
>>>>
>>>>          
>>>>>> Bentley, Philip writes:
>>>>>>              
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> Yes, that's true, we did create a small number of test netCDF
>>>>>>>                
>>> files
>>>
>>>        
>>>>>>> in that size range. But this was because the CMOR library we
>>>>>>> used
>>>>>>>                
>>> at
>>>
>>>        
>>>>>>> the time didn't include functionality for chunking the output
>>>>>>>                
>> into
>>
>>      
>>>>>>> smaller files. Plus we wanted to stress-test our pipeline!
>>>>>>>
>>>>>>> Two things have happened since then:
>>>>>>>
>>>>>>> 1. Jamie has been working with Charles at PCMDI to implement
>>>>>>> and test a solution whereby we can limit the size of the
>>>>>>> output
>>>>>>>                
>> netCDF
>>
>>      
>>>>>>> files produced by CMOR.
>>>>>>>
>>>>>>> 2. We have made the local decision to limit our netCDF file
>>>>>>> sizes
>>>>>>>                
>>> to
>>>
>>>        
>>>>>>> 2 GB (or thereabouts) as, logistically, that will cause us
>>>>>>> less headache moving these files around, and it should
>>>>>>> maximise the number of client applications in which the files
>>>>>>> can be read.
>>>>>>>
>>>>>>> IIRC, I think Balaji mentioned that the 64-bit offset format
>>>>>>> was required for output from the gridspec toolset. I could be
>>>>>>> wrong.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Phil
>>>>>>>
>>>>>>>                
>>>>>>>> -----Original Message-----
>>>>>>>> From: go-essp-tech-bounces at ucar.edu
>>>>>>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of
>>>>>>>> stephen.pascoe at stfc.ac.uk
>>>>>>>> Sent: 14 May 2010 10:52
>>>>>>>> To: go-essp-tech at ucar.edu
>>>>>>>> Subject: [Go-essp-tech] +2Gb CMIP5 files
>>>>>>>>
>>>>>>>> The latest UKMO extraction for CMIP5 has produced some files
>>>>>>>> in
>>>>>>>>                  
>>> the
>>>
>>>        
>>>>>>>> 30Gb range.  We had discussed previously the assumption that
>>>>>>>> all files would be<2Gb.  Do we feel it is important to
>>>>>>>> enforce a<2Gb limit or should this just be a recommendation
>>>>>>>> on modelling
>>>>>>>>                  
>>> centres?
>>>
>>>        
>>>>>>>> To my knowledge there is two issues with +2Gb files:
>>>>>>>>
>>>>>>>>    1. +2GB NetCDF files will be in 64-bit offset format.
>>>>>>>> Therefore NetCDF libraries prior to v3.6 will not be able to
>>>>>>>>                  
>> read
>>
>>      
>>>>>>>> them.
>>>>>>>>    2. Older file systems may have a 2Gb file limit. This will
>>>>>>>>                  
>>> mainly
>>>
>>>        
>>>>>>>> affect 32-bit systems that are a few years old. FAT32 has a
>>>>>>>> 4Gb limit.
>>>>>>>>
>>>>>>>> These are end-user issues, is there any reason why the ESG
>>>>>>>>                  
>>> software
>>>
>>>        
>>>>>>>> might have problems with files over 2Gb?  If we do want to
>>>>>>>>                  
>> ensure
>>
>>      
>>>>>>>> files are<2Gb do we want to mandate the modelling centres
>>>>>>>>                  
>> deliver
>>
>>      
>>>>>>>> that or will the data centres need to split files?
>>>>>>>>
>>>>>>>> Stephen.
>>>>>>>>
>>>>>>>> ---
>>>>>>>> Stephen Pascoe  +44 (0)1235 445980
>>>>>>>> British Atmospheric Data Centre
>>>>>>>> Rutherford Appleton Laboratory
>>>>>>>> --
>>>>>>>> Scanned by iCritical.
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>                  
>>>>>>> _______________________________________________
>>>>>>> GO-ESSP-TECH mailing list
>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>                
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://**mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>            
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>> --
>>>> Scanned by iCritical.
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>          
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://*mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>        
>>      
>    

_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list