[Thread Prev][Thread Next][Index]

Re: [ferret_users] memory strategies for handling large computational requests



Hi Patrick,
Ferret does not need to load the whole grid into memory to do these operations. Ferret will try to break up the calculation itself, particularly with the memory-use enhancements in v7.2. For the operations you're using ferret will break up this computation. Not all computations can be broken up, either because of the nature of the operations themselves, or because we haven't implemented all combinations of operations. For instance function calls are not broken up. If that is the case for your computations, then writing a file will likely be necessary. Doing the operations in pieces in the T direction and appending is a good option. You can also append in other directions, as long as you first save the entire grid, containing say, missing data, and then use APPEND/K= /J= /I= to overwrite the file as the actual results are computed. See a few small examples of that in chapter 10 of the Ferret users Guide, examples 4, 4a, and 5.

I'm going to do an example with your operations in some detail, since few of us have explored this.


yes? use https://vesg.ipsl.upmc.fr/thredds/dodsC/IPSLFS/brocksce/tmp/CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc

yes? show data
currently SET data sets:
1> https://vesg.ipsl.upmc.fr/thredds/dodsC/IPSLFS/brocksce/tmp/CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc (default)
name title I J K L
TIME_CENTERED
Time axis ... ... ... 1:8280
TIME_CENTERED_BOUNDS
1:2 ... ... 1:8280
TRANSPIR Transpiration 1:144 1:143 1:13 1:8280
AREAS Mesh areas 1:144 1:143 ... ...
CONTFRAC Continental fraction 1:144 1:143 ... ...


(I made a local dataset with the same size grid for testing, as reading lots of data from the thredds server is somewhat slow.)

Diagnostic mode lists information as Ferret runs, noting when it puts tasks on its operation stack, reads data, computes transformations, and does "gathering". The memory management in Ferret v7.2 breaks up computations into parts, executes those parts while saving partial results and finally "finalizes" by putting the results together. Earlier versions of Ferret also break computations up into pieces to reduce the amount of data that must be read in; in fact earlier Ferret versions do this particular thing in exactly the same way - the example below works with any old Ferret version. V7.2 would also break up the computation along the axes being compressed if that was the only way to shorten the computation.

I'll put in some comments here in orange,

First a shorter example, compute the result on L=1:360.

yes? let var=TRANSPIR[k=@sum, x=@ave, y=@ave]

yes? set memory/size=200
yes? set mode diagnostic ! this is not needed, but lets us see Ferret memory management in action

yes? save/clobber/file=file1.nc
/L=1:360 var[l=@sbx:120]

getgrid EX#1 C: 5 dset: 1 I: 1 1 J: 1 1 K: 1 1 L: 1 1 M: 1 1 N: 1 1
getgrid VAR C: 7 dset: 1 I: 1 1 J: 1 1 K: 1 1 L: 1 1 M: 1 1 N: 1 1
allocate dynamic grid GBC3 LON LAT VEGET1 TIME_COUNT
allocate dynamic grid GBC3 LON LAT VEGET1 TIME_COUNT

Here Ferret is setting up to get data for L=1:420, to be able to correctly return the boxcar smoother var[L@SBX:120] on L=1:360, and sets up "gathering" to return the averaging and sum requests
strip limits reconciliation : EX#1
eval EX#1 C: 5 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 1 360
strip --> VAR[L=1:360@SBX:120,D=1]
eval VAR C: 8 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 1 420
strip gathering TRANSPIR on T axis: 1 420 dset: 1 420=request 100000000=availableMem
strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
strip --> TRANSPIR[Y=90S:90N@AV4,D=1]

It will read and compute the XY averages and Z sum for each subset in L:
strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
strip --> TRANSPIR[Y=90S:90N@AV4,D=1]
reading TRANSPIR M: 17 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 1 138
doing --> TRANSPIR[Y=90S:90N@AV4,D=1]
final --> TRANSPIR[Y=90S:90N@AV4,D=1]
doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
doing gathering TRANSPIR on T axis: 1 138 dset: 1 138=request 99999724=availableMem
strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
strip --> TRANSPIR[Y=90S:90N@AV4,D=1]
reading TRANSPIR M: 13 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 139 276
doing --> TRANSPIR[Y=90S:90N@AV4,D=1]
final --> TRANSPIR[Y=90S:90N@AV4,D=1]
doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
doing gathering TRANSPIR on T axis: 139 276 dset: 1 138=request 99999304=availableMem
strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
strip --> TRANSPIR[Y=90S:90N@AV4,D=1]
reading TRANSPIR M: 10 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 277 414
doing --> TRANSPIR[Y=90S:90N@AV4,D=1]
final --> TRANSPIR[Y=90S:90N@AV4,D=1]
doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
doing gathering TRANSPIR on T axis: 277 414 dset: 1 138=request 99999304=availableMem
strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
strip --> TRANSPIR[Y=90S:90N@AV4,D=1]
reading TRANSPIR M: 7 dset: 1 I: 1 144 J: 1 90 K: 1 13 L: 415 420
doing --> TRANSPIR[Y=90S:90N@AV4,D=1]
final --> TRANSPIR[Y=90S:90N@AV4,D=1]
doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
doing gathering TRANSPIR on T axis: 415 420 dset: 1 6=request 99999568=availableMem

And finally Ferret does the @SBX on the entire time series of XY averaged, Z summed data, returning
VAR[L=1:360@SBX:120]
doing --> VAR[L=1:360@SBX:120,D=1]
LISTing to file file1.nc

yes?

Now do the whole set, using a larger memory setting. It does perhaps 12-15 different reads.
yes? set mem/siz=400
yes? save/clobber/file=file2.nc var[l=@sbx:120]
...
doing --> VAR[T=01-JAN-232018:00:31-DEC-300918:00@SBX:120,D=1]

LISTing to file file2.nc


Verify that the operations on a subset of the data, computed in different chunks, matches what is computed for the entire dataset.


yes? cancel data/all
yes? cancel var/all

yes? use file1.nc, file2.nc
yes? list/l=60:70 var[d=2] - var[d=1]
VARIABLE : VAR[D=file2] - VAR[D=file1]
SUBSET : 11 points (TIME)
CALENDAR : NOLEAP
16-DEC-2324 12 / 60: ....
16-JAN-2325 12 / 61: 0.0000
15-FEB-2325 00 / 62: 0.0000
16-MAR-2325 12 / 63: 0.0000
16-APR-2325 00 / 64: 0.0000
16-MAY-2325 12 / 65: 0.0000
16-JUN-2325 00 / 66: 0.0000
16-JUL-2325 12 / 67: 0.0000
16-AUG-2325 12 / 68: 0.0000
16-SEP-2325 00 / 69: 0.0000
16-OCT-2325 12 / 70: 0.0000

etc.


On 11/22/2017 2:57 AM, Patrick Brockmann wrote:
Hi ferreters,

I would like to plot time series with quite huge file (8.3G) from a variable XYZT (144x143x13x8280).
I have worked with last 7.2 ferret release and tried different increases of memory without success.
I always get **ERROR: request exceeds memory setting

Next step would be as suggested from doc (../../documentation/users-guide/computing-environment/MEMORY-USE)
to break up my request into fragments.
Is it the best solution in my case ?

My ressource is available from

Typical code lines are:

yes? use CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc
yes? let var=TRANSPIR[k=@sum, x=@ave, y=@ave]
yes? plot var[l=@sbx:120]

! the following pass because I have limites time range (1:1200)
yes? plot var[l=1:1200@sbx:120]


Any help welcome.
Regards
Patrick

--
Data Analysis and Visualization Engineer
LSCE/IPSL, CEA-CNRS-UVSQ laboratory
LSCE - Climate and Environment Sciences Laboratory
IPSL - Institut Pierre Simon Laplace
--



[Thread Prev][Thread Next][Index]