[Thread Prev][Thread Next][Index]

Re: [ferret_users] memory strategies for handling large computational requests

To: ferret_users@xxxxxxxx
Subject: Re: [ferret_users] memory strategies for handling large computational requests
From: "Ansley C. Manke" <ansley.b.manke@xxxxxxxx>
Date: Mon, 27 Nov 2017 16:36:30 -0800
In-reply-to: <400769719.1005202.1511348224834.JavaMail.zimbra@lsce.ipsl.fr>
List-archive: <https://groups.google.com/a/noaa.gov/group/ferret_users/>
List-help: <https://support.google.com/a/noaa.gov/bin/topic.py?topic=25838>, <mailto:ferret_users+help@noaa.gov>
List-id: <ferret_users.noaa.gov>
List-post: <https://groups.google.com/a/noaa.gov/group/ferret_users/post>, <mailto:ferret_users@noaa.gov>
Mailing-list: list ferret_users@xxxxxxxx; contact ferret_users+owners@xxxxxxxx
References: <400769719.1005202.1511348224834.JavaMail.zimbra@lsce.ipsl.fr>
Sender: owner-ferret_users@xxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

Hi Patrick,
Ferret does not need to load the whole grid into memory to do these operations. Ferret will try to break up the calculation itself, particularly with the memory-use enhancements in v7.2. For the operations you're using ferret will break up this computation. Not all computations can be broken up, either because of the nature of the operations themselves, or because we haven't implemented all combinations of operations. For instance function calls are not broken up. If that is the case for your computations, then writing a file will likely be necessary. Doing the operations in pieces in the T direction and appending is a good option. You can also append in other directions, as long as you first save the entire grid, containing say, missing data, and then use APPEND/K= /J= /I= to overwrite the file as the actual results are computed. See a few small examples of that in chapter 10 of the Ferret users Guide, examples 4, 4a, and 5.

I'm going to do an example with your operations in some detail, since few of us have explored this.

yes? use https://vesg.ipsl.upmc.fr/thredds/dodsC/IPSLFS/brocksce/tmp/CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.ncyes? show data currently SET data sets:1> https://vesg.ipsl.upmc.fr/thredds/dodsC/IPSLFS/brocksce/tmp/CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc (default)name title I J K LTIME_CENTEREDTime axis ... ... ... 1:8280TIME_CENTERED_BOUNDS1:2 ... ... 1:8280TRANSPIR Transpiration 1:144 1:143 1:13 1:8280AREAS Mesh areas 1:144 1:143 ... ...CONTFRAC Continental fraction 1:144 1:143 ... ...(I made a local dataset with the same size grid for testing, as reading lots of data from the thredds server is somewhat slow.)

Diagnostic mode lists information as Ferret runs, noting when it puts tasks on its operation stack, reads data, computes transformations, and does "gathering". The memory management in Ferret v7.2 breaks up computations into parts, executes those parts while saving partial results and finally "finalizes" by putting the results together. Earlier versions of Ferret also break computations up into pieces to reduce the amount of data that must be read in; in fact earlier Ferret versions do this particular thing in exactly the same way - the example below works with any old Ferret version. V7.2 would also break up the computation along the axes being compressed if that was the only way to shorten the computation.

I'll put in some comments here in orange,

First a shorter example, compute the result on L=1:360. yes? let var=TRANSPIR[k=@sum, x=@ave, y=@ave]yes? set memory/size=200yes? set mode diagnostic ! this is not needed, but lets us see Ferret memory management in actionyes? save/clobber/file=file1.nc/L=1:360 var[l=@sbx:120] getgrid EX#1 C: 5 dset: 1 I: 1 1 J: 1 1 K: 1 1 L: 1 1 M: 1 1 N: 1 1 getgrid VAR C: 7 dset: 1 I: 1 1 J: 1 1 K: 1 1 L: 1 1 M: 1 1 N: 1 1 allocate dynamic grid GBC3 LON LAT VEGET1 TIME_COUNT allocate dynamic grid GBC3 LON LAT VEGET1 TIME_COUNT
Here Ferret is setting up to get data for L=1:420, to be able to correctly return the boxcar smoother var[L@SBX:120] on L=1:360, and sets up "gathering" to return the averaging and sum requestsstrip limits reconciliation : EX#1 eval EX#1 C: 5 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 1 360 strip --> VAR[L=1:360@SBX:120,D=1] eval VAR C: 8 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 1 420 strip gathering TRANSPIR on T axis: 1 420 dset: 1 420=request 100000000=availableMem strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1] strip --> TRANSPIR[Y=90S:90N@AV4,D=1] It will read and compute the XY averages and Z sum for each subset in L: strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1] strip --> TRANSPIR[Y=90S:90N@AV4,D=1] reading TRANSPIR M: 17 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 1 138 doing --> TRANSPIR[Y=90S:90N@AV4,D=1] final --> TRANSPIR[Y=90S:90N@AV4,D=1] doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1] doing gathering TRANSPIR on T axis: 1 138 dset: 1 138=request 99999724=availableMem strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1] strip --> TRANSPIR[Y=90S:90N@AV4,D=1] reading TRANSPIR M: 13 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 139 276 doing --> TRANSPIR[Y=90S:90N@AV4,D=1] final --> TRANSPIR[Y=90S:90N@AV4,D=1] doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1] doing gathering TRANSPIR on T axis: 139 276 dset: 1 138=request 99999304=availableMem strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1] strip --> TRANSPIR[Y=90S:90N@AV4,D=1] reading TRANSPIR M: 10 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 277 414 doing --> TRANSPIR[Y=90S:90N@AV4,D=1] final --> TRANSPIR[Y=90S:90N@AV4,D=1] doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1] doing gathering TRANSPIR on T axis: 277 414 dset: 1 138=request 99999304=availableMem strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1] strip --> TRANSPIR[Y=90S:90N@AV4,D=1] reading TRANSPIR M: 7 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 415 420 doing --> TRANSPIR[Y=90S:90N@AV4,D=1] final --> TRANSPIR[Y=90S:90N@AV4,D=1] doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1] doing gathering TRANSPIR on T axis: 415 420 dset: 1 6=request 99999568=availableMem And finally Ferret does the @SBX on the entire time series of XY averaged, Z summed data, returningVAR[L=1:360@SBX:120] doing --> VAR[L=1:360@SBX:120,D=1] LISTing to file file1.nc yes?Now do the whole set, using a larger memory setting. It does perhaps 12-15 different reads.
yes? set mem/siz=400yes? save/clobber/file=file2.nc var[l=@sbx:120]... doing --> VAR[T=01-JAN-232018:00:31-DEC-300918:00@SBX:120,D=1]
LISTing to file file2.nc

Verify that the operations on a subset of the data, computed in different chunks, matches what is computed for the entire dataset.
yes? cancel data/allyes? cancel var/allyes? use file1.nc, file2.ncyes? list/l=60:70 var[d=2] - var[d=1] VARIABLE : VAR[D=file2] - VAR[D=file1] SUBSET : 11 points (TIME) CALENDAR : NOLEAP16-DEC-2324 12 / 60: ....16-JAN-2325 12 / 61: 0.000015-FEB-2325 00 / 62: 0.000016-MAR-2325 12 / 63: 0.000016-APR-2325 00 / 64: 0.000016-MAY-2325 12 / 65: 0.000016-JUN-2325 00 / 66: 0.000016-JUL-2325 12 / 67: 0.000016-AUG-2325 12 / 68: 0.000016-SEP-2325 00 / 69: 0.000016-OCT-2325 12 / 70: 0.0000

etc.

On 11/22/2017 2:57 AM, Patrick Brockmann wrote:

Hi ferreters,

I would like to plot time series with quite huge file (8.3G) from a variable XYZT (144x143x13x8280).

I have worked with last 7.2 ferret release and tried different increases of memory without success.

I always get **ERROR: request exceeds memory setting

Next step would be as suggested from doc (../../documentation/users-guide/computing-environment/MEMORY-USE)

to break up my request into fragments.

Is it the best solution in my case ?

My ressource is available from

https://vesg.ipsl.upmc.fr/thredds/catalog/IPSLFS/brocksce/tmp/catalog.html?dataset=DatasetScanIPSLFS/brocksce/tmp/CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc

Typical code lines are:

yes? use CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc

yes? let var=TRANSPIR[k=@sum, x=@ave, y=@ave]
yes? plot var[l=@sbx:120]

! the following pass because I have limites time range (1:1200)

yes? plot var[l=1:1200@sbx:120]

Any help welcome.

Regards

Patrick

--
Data Analysis and Visualization Engineer
LSCE/IPSL, CEA-CNRS-UVSQ laboratory
LSCE - Climate and Environment Sciences Laboratory
IPSL - Institut Pierre Simon Laplace
--

References:
- [ferret_users] memory strategies for handling large computational requests
  - From: Patrick Brockmann

Previous by thread: Re: [ferret_users] Re: memory strategies for handling large computational requests
Next by thread: [ferret_users] Assigning values to a variable

[Thread Prev][Thread Next][Index]