[ncl-talk] passing args within ncl scripts
Jayant
jayantkp2979 at gmail.com
Wed Jul 21 22:30:24 MDT 2021
Thank you Dave for your suggestions. I am working on it.
1. The input files are in binary format and hence, I wonder should I use
fbindirwrite to save individual files and then read them using fbindirread
to make a combined matrix or save individual files in netcdf format and
then read them. Which way would be more optimal (fast)?
2. Do I need to use a submit script (sbatch) for task parallelism scripts?
I tried running 'ncl srcfile.ncl' directly with MAX_CONCURRENT = 24. Please
see the screenshot attached. The ps -a command gives ncl <defunct> and I
don't understand whether the subprocess command is actually working or not?
Best,
On Wed, Jul 21, 2021 at 12:20 PM Dave Allured - NOAA Affiliate <
dave.allured at noaa.gov> wrote:
> Jayant, task parallelism will be useful if you can come up with a strategy
> to do partial calculations or data subsetting within subprocesses, in such
> a way as to reduce the size of intermediate results. There are many
> possible strategies, depending on the kind of calculations. For example,
> daily stats or partial sums or area averaging could be calculated in
> subprocesses, then reported back to the parent via relatively small Netcdf
> files. Another way to think about strategy is dimension reduction along
> one or more dimensions of the original matrix.
>
> Example 3 uses individual PNG plot files to communicate back to the
> parent. These could just as easily be individual Netcdf files with
> dimension-reduced partial results.
>
>
> On Wed, Jul 21, 2021 at 7:14 AM Rick Brownrigg via ncl-talk <
> ncl-talk at mailman.ucar.edu> wrote:
>
>> Wow -- a 4320x56x450x900 floating-point variable is 391GB! In any case,
>> NCL's subprocess feature can't be used for shared memory tasks -- the tasks
>> are necessarily independent of each other. I agree a 1.5hrs runtime is
>> rather painful. But I don't know of a good way to speed that up.
>>
>> Rick
>>
>>
>> On Wed, Jul 21, 2021 at 1:00 AM Jayant <jayantkp2979 at gmail.com> wrote:
>>
>>> Hi Rick,
>>> Thanks again for your prompt response.
>>> I have around 24 x 30 x 6months = 4320 files in binary format (along
>>> with a .ctl descriptor file for each).
>>>
>>> 1. I read the ctl file first to get the record number of the
>>> variable of interest (say TEMP) and then;
>>> 2. use the fbinrecread function to read the binary file.
>>>
>>> *I guess the binary read takes a lot of time!* I define a 4-d variable
>>> (say inp_temp(4320,56,450,900)) in the beginning and then in a do loop over
>>> time, the above 2 steps are performed. After the do loop, I perform some
>>> daily and monthly stats and then generate monthly (6) plots or a vertical
>>> profile (pressure vs height) at a particular point over the entire period.
>>> To give an estimate on the execution time, it takes about an hour and half
>>> to complete.
>>>
>>> In order to reduce time for the binary read, I was thinking of adopting
>>> the task parallelism for the do loop part of the script.
>>>
>>> On Wed, Jul 21, 2021 at 1:38 AM Rick Brownrigg <brownrig at ucar.edu>
>>> wrote:
>>>
>>>> Hi Jayant,
>>>>
>>>> My apologies if I'm still not clear. You say "It takes a lot of time
>>>> to read a variable and generate a plot." Are you trying to read 24 files
>>>> and generate 24 plots? Or read 24 files and perform analysis and generate
>>>> plots from the composite?
>>>>
>>>> It sounds like the latter -- are you trying to use subprocesses to read
>>>> 24 files and end up with one array in memory composed from all 24 of them,
>>>> so that you can perform analysis and/or plots on that array? Then no --
>>>> subprocesses won't do the job and NCL in general does not have a way to
>>>> perform concurrent reads into a shared memory space. The parent NCL script
>>>> executing other programs via the subprocess() function has no communication
>>>> with those programs.
>>>>
>>>> The "addfiles" function is the NCL way of reading multiple files into
>>>> a common array; it is not concurrent to the best of my knowledge, but it
>>>> does the job.
>>>>
>>>> Rick
>>>>
>>>>
>>>> On Tue, Jul 20, 2021 at 9:04 PM Jayant <jayantkp2979 at gmail.com> wrote:
>>>>
>>>>> Thanks Rick,
>>>>> I want to use task parallelism. I have hourly files spanning a few
>>>>> months from a high resolution simulation. It takes a lot of time to read a
>>>>> variable and generate a plot. I have come across task parallelism (example
>>>>> 3) and want to modify the example such that I can use 24 processors to read
>>>>> 24 files at a time and save the desired variable in a parent array. And
>>>>> once the reading is complete, I can perform calculations (daily/monthly
>>>>> stats) on the parent array. I hope this helps understand what I intend to
>>>>> do.
>>>>> You mentioned a file based approach...and perhaps the example 3 does
>>>>> save individual plots and later combine frames. I wonder if it's good idea
>>>>> in my case????
>>>>> Best regards,
>>>>> Jayant
>>>>>
>>>>> On Tue, Jul 20, 2021 at 11:50 PM Rick Brownrigg <brownrig at ucar.edu>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> If I understand you correctly, you are trying to get the second
>>>>>> script to update the array in the first script? If so, that would not be
>>>>>> possible, as the two scripts execute as independent processes, operating in
>>>>>> independent memory spaces. They would need some other mechanism to
>>>>>> communicate results between each other, perhaps something like a file-based
>>>>>> approach.
>>>>>>
>>>>>> Perhaps explain in more detail what you are trying to do and why
>>>>>> there are two scripts involved, and others might be able to offer
>>>>>> suggestions.
>>>>>>
>>>>>> Rick
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 20, 2021 at 8:26 PM Jayant via ncl-talk <
>>>>>> ncl-talk at mailman.ucar.edu> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> I want to call one ncl script (test_second.ncl) from within another
>>>>>>> ncl script (test_prime.ncl) using system command (in fact subprocess
>>>>>>> command). In doing so, I want to update an array (defined in
>>>>>>> test_prime.ncl) in the second call. I am getting zeros (unchanged!!). How
>>>>>>> to proceed? Is there something like global variables that can be defined?
>>>>>>> Below are the working example scripts:
>>>>>>> ;==================================================
>>>>>>> *test_prime.ncl*
>>>>>>> begin
>>>>>>> ninp=10
>>>>>>> inparr=new(ninp,float)
>>>>>>> inparr=0.0
>>>>>>>
>>>>>>> do i=0,ninp-1
>>>>>>> command="ncl -Q test_second.ncl
>>>>>>> "+str_get_sq()+"ip="+i+str_get_sq()+" "+str_get_sq()+"tmparr="+
>>>>>>> inparr(i)+str_get_sq()
>>>>>>> system(command)
>>>>>>> end do
>>>>>>> print(inparr)
>>>>>>> end
>>>>>>> ;==================================================
>>>>>>> *test_second.ncl*
>>>>>>> begin
>>>>>>> tmparr=ip ; intend to perform some calculation and update
>>>>>>> end
>>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ucar.edu/pipermail/ncl-talk/attachments/20210722/7f765a81/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2021-07-21 at 7.47.49 PM.png
Type: image/png
Size: 276406 bytes
Desc: not available
URL: <https://mailman.ucar.edu/pipermail/ncl-talk/attachments/20210722/7f765a81/attachment-0001.png>
More information about the ncl-talk
mailing list