8.10 DISCRETE SAMPLING GEOMETRIES DATASETS - Ferret/PyFerret Documentation

PyFerret and Ferret v7.6 implements automatic handling of datasets which use the Discrete Sampling Geometries standard for CF.

In Chapter 9 of the NetCDF CF Conventions document, a set of file types for "Discrete Sampling Geometries" is defined. These datasets describe point data, trajectories, profiles, time series, timeSeriesProfiles, and trajectoryProfiles. Ferret version 7.6 implements automatic handling of files that use the contiguous ragged array representation for these data types: point data, trajectories, profiles, time series. Version 7.6.3 offers graphics and text data listings for feature types timeSeriesProfiles, and trajectoryProfiles.

The datasets are feature-collections, and a single feature is a single instance of a timeseries, a profile, a trajectory, or a point. The observations lie along the "observation" axis and the information describing features lie along the "instance" axis, which is of length nfeatures. In the documentation and example scripts we use the abbreviation "DSG files". A common source for DSG datasets is ERDDAP, where a file is saved from a tabledap dataset as data type .ncCF.

When a dataset contains the relevant attributes for a Discrete Geometries dataset, Ferret will assign the observations axis to the X direction for Trajectory data, the T direction for Timeseries data, the Z direction for Profile data, and the instance axis (which describe each feature) to the E direction. For Point data each point is an instance, so the observations lie in the E direction. The sample_dimension identifies a count variable, usually called Rowsize, listing the number of observations in each feature, and names the sample dimension to which it applies. Ferret locates the coordinate information, and defines a "translation grid" which allows it to map data to the world coordinates for the data type: stations in XYZ for Timeseries, stations in XYT for Profiles, trajectories along XYT and perhaps Z for Trajectories, and Points in ZYZT. Using the coordinate information, the ordinary region-selection commands apply, including qualifiers /X= /Y= /Z= /T= or SET REGION, or variable[x=, y=, z= ,t=]. Plots and listings are automatically labeled as always.

The variables which describe each feature are put onto the E axis.

Run the tutorial script

yes? go dsg_tutorial

Here is a summary of the enhancements:

Command outputs customized to work with DSG datasets:

SHOW DATA

The output of SHOW DATA includes the DSG data type. The listing of the variables shows the feature-specific variables shown on an E axis of length number-of-features. The listing of observed variables shows the total length of the contiguous ragged array that stores the observations.

SHOW GRID

The output of SHOW GRID lists a nominal range on the "observations" axis, that represents the ragged-array storing all of the observations, and the feature-length axis of length number-of-features. Then it summarizes the coordinate ranges for the observations as found in the coordinate variables in the dataset.

LIST variable

A text listing to the terminal or a file, like any text listing, includes a header that describes the dataset and the subset requested, and then the data with their coordinates. Here the feature-id is added to the listing on each row, and the coordinates are all of the longitude-latitude-depth/height-time coordinates. The LIST command may qualifiers /X= /Y= /Z= /T= to limit the coordinate ranges shown, as well as /E= or /M= to choose a subset of features. Note that /I= /J= /K= and /L= cannot be used because index ranges don't map to coordinate ranges for these data types. ( To get a quick listing of a few data points, try "list/i=1:5 XSEQUENCE(variable)" )

See also feature-masking, to ask for a subset of features, e.g. a few of the profiles or timeseries.

yes? use dsg_profile_example
yes? list/t="23-aug-2012:00:00":"23-aug-2012:06:00" sigma_t

SAVE variable

To write a new DSG dataset which is a subset of an existing one, specify the variable and any coordinate ranges. The coordinates and the required elements that make up a Discrete Sampling Geometries file will automatically be written to the file. Further variables with the same constraints may be added to the file, however appending more features, or appending along the time axis is not at this time allowed. This command will write only the two profiles contained in this time range:

yes? use dsg_profile_example
yes? save/file=new_file.nc/clobber/t="23-aug-2012:00:00":"23-aug-2012:06:00" sigma_t

SAVE is not currently implemented for TrajectoryProfile and TimeseriesProfile data types.

DSG-specific options:

The SET DATA/FEATURE= qualifier, or equivalently USE/FEATURE= changes the interpretation of the dataset to use a different feature type within the current session. In particular, Trajectory data may also be viewed as Timeseries data, with time increasing along each path. Reset a Trajectory datastet to look like a timeseries dataset with this setting:

yes? set data/feature=timeseries  my_dataset.nc

If the dataset is already open, the dataset number may be used.

yes? set data/feature=timeseries 1

When a dataset is already open, any variables that have been loaded into memory are cleared, and if a feature-mask was defined, it no longer applies to the dataset. Setting data to another feature type applies only to changing from Trajectory to Timeseries data types. Other data types do not lend themselves to this logic.

Data of any feature-type may be treated as if it is not a Discrete-Sampling Geometries file, using /FEATURE=none

yes? set data/feature=none my_dataset.nc

and the dataset will appear to be data on simple 1-D grids: observations on a long 1-D axis and feature-level metadata on a shorter axis, number of stations etc. As above, if a dataset is already open, any variables that have been loaded into memory are cleared, and if a feature-mask was defined, it no longer applies to the dataset.

A mode controlling DSG-handling, MODE DSG is set by default. If it is canceled, then in the current session datasets are NOT initialized as DSG datasets, but are handled as they were prior to Feret/PyFerret v7.6; as described in the italicized text at the end of this page.

`variable,RETURN=xcoord` returns the name of the coordinate variable in the longitude direction, for a DSG dataset. Likewise asking for ycoord , zcoord, or tcoord returns the latitude, depth/height, and time coordinate variables.

Plots

Native plot types

For all of these data types, the "PLOT" command will result in a line plot, showing the set of features as a set of lines drawn in the appropriate direction. If the dataset is Timeseries DSG dataset, the plot will show a multi-line timeseries plot. Data from a Profile DSG dataset is plotted as a set of vertical lines, variable-value as a function of Z. Trajectory datasets are drawn as a set of ribbon plots with the data colored by value along trajectories in an x-y map plot. Constraining the region with any region specifer, PLOT/X= /Y= /Z= /T= will cause PyFerret to include only that subset of data in the plot.

Data in TrajectoryProfile datasets and TimeseriesProfile dataset are drawn by default as a set of profile lines, simply showing all of the profiles in the dataset (or in the region given in the plot command.) In addition for these data types, we can plot trajectories for TrajectoryProfile data, or timeseries for TimeseriesPfile data, using PLOT/ALONG=

Changing the plot type with PLOT/ALONG=

To make line plots and ribbon plots in directions other than the native plot for each data type, use PLOT/ALONG=

PLOT/ALONG=xy

Use PLOT/ALONG=xy to make a longitude/latitude plot. For TrajectoryProfile data, this will be symbols shown at each profile location along each trajectory. For TimeseriesProfile data, Timeseries data and Profile data, this will show symbols at each station (in the same style as for Point data). The line and/or symbols will be colored according to the variable given. For TrajectoryProfile, TimeseriesProfile, and Profile data, a /Z= qualifier will limit the z-range of the color-by variable; the plot will be colored using the average data from profile within in the z region given. For Timeseries data, a /T= qualifier will limit the t-range of the color-by variable and the plot will be colored using the average value in the timeseries at each station. As usual, constraints can be used in any direction to limit the X-Y-Z-T range of the data to be shown.

Examples

yes? use dsg_profile_example
 
yes? plot/along=xy sal
*** NOTE: PLOT/ALONG= with /Z=LO:HI colors the plot with AVE of profile data in that range
 
yes? plot/along=xy/z=0:5 sal  ! to color the symbols on the LON/LAT plot with the near-surface salinity data

PLOT/ALONG=t

Use PLOT/ALONG=t to make a set of timeseries plots for the stations in a TimeseriesProfile data set. The line and/or symbols will be colored according to the variable given. A /Z= qualifier will limit the z-range of the color-by variable; the plot will be colored using the first value in each profile within in the z region given. As usual, constraints can be used in any direction to limit the X-Y-Z-T range of the data to be shown.

Examples

yes? use my_trajectoryProfile_data.nc
 
! Draw timeseries lines, one line at each station, colored by a data value from each profile
yes? plot/along=t/z=20 temperature
 
! Or, draw an XY map plot showning the locations of the stations.
yes? plot/along=xy/z=10 temperature

(Note that PLOT/ALONG= can also be applied to gridded datasets, with a somewhat different interpretation.)

Masking

SET DATA/FMASK= (or USE/FMASK=)

Apply a feature mask to the dataset. The feature mask is a variable with values 1 and 0 or 1 and missing, of length number-of-features. When applied to the dataset, only the features in the mask with value 1 will be used in listings, plots, and other operations.

yes? use dsg_trajectory_example
 
yes? let pan_mask = if STRINDEX(expocode, "PAN") GT 0 then 1
yes? list/norow/nohead pan_mask
..
..
1.000
1.000

Apply this mask to the dataset. A plot or listing or other operation will now include only the two features included in the mask.

yes? set data/fmask=pan_mask dsg_trajectory_example

SET DATA/SMASK= defines a station- or trajectory- mask on a TimeseriesProfile or TrajectoryProfile dataset, to choose to work with the profiles in one or more of the Timeseries or Trajectories.

Analysis operations:

Transformations

Transformations are applied separately to each feature. So transformations which summarize information about a variable, such as @MAX, @MIN, @SUM , or @NGD are applied separately to each feature return the maximum, minimum, sum, or number of good data per feature.

yes? use dsg_trajectory_example
yes? list fco2_recommended[x=@max]

Transformations such as smoothers or indefinite integrals do their operation within features as well. So a smoothing transform does not smear data from one trajectory or one profile to the next in the file. For temperature profiles, for instance, the @SBX boxcar smoother stops at the end of each profile.

yes? use dsg_profile_example
yes? list/m=1:2 temp, temp[z=@sbx]

Functions:

Function calls have not yet been optimized for Discrete Sampling Geometries. A NOTE is issued to this effect. For functions which are not grid-changing functions, this is unimportant. For instance, we could use RHO_UN to compute mass density of seawater from temperature and salinity.

Note that the MINMAX function gives us a handy means of finding the overall minimum and maximum of data in a DSG data. variable[@min] or variable[@max] returns the minimum and maximum of data in each feature; minmax(variable) returns the minimum and maximum for all data.

Regridding:

A set of timeseries or profiles in a DSG dataset may or may not have a common set of time coordinates or depth levels, respectively. They can be regridded to a common time or z axis using an ordinary regridding operation. The result is a 2-D variable, station-vs-time, or station-vs-depth.

yes? use dsg_timeseries_example
yes? define axis/t="15-jan-2017:12:00":21-apr-2017:10/unit=days tuniform
 
yes? let/like=t_25  t25_station_vs_time =  t_25[gt=tuniform]
yes? shade t25_station_vs_time

Comparisons with other datasets:

A regridding operation can be used to define a variable sampled from a gridded dataset such as a reference dataset or model output. THe data is sampled at the times and locations of the observations in a DSG dataset. Here is the sequence of operations:

yes? use gridded_data_set.nc
yes? use dsg_timeseries_data.nc
 
yes? let gridded_on_dsg = gridded_var[d=1, g=dsg_var[d=2] ]
 
yes? let difference_on_dsg = dsg_var[d=2] - gridded_on_dsg
 
yes? plot difference_on_dsg

What about regridding DSG data to grids? The existing functions scat2grid* may be used to interpolate or use binning or averaging to put scattered data into a multi-dimensional grid.

yes? show functions scat2grid*

See discussion under SCAT2GRID_BIN_XYZT about DSG data, and the FAQ, Calling SCAT2GRID functions for DSG data

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

If using older Ferret/PyFerret versions, here are some methods available for working with files such as this:

Functions that are useful for discrete Geometries files include SEPARATE, to put a missing-value between the trajectories. Then they may be plotted as individual lines, stopping the line between different trajectories:

yes? let/units=degrees_east/title=longitude separate_lon = SEPARATE(longitude, rowsize, 1)
yes? let/units=degrees_north/title=latitude separate_lat = SEPARATE(latitude, rowsize, 0)
yes? let/units="`temp,return=units`"/title="`temp,return=title`" separate_temp = SEPARATE(temp, rowsize, 0)
 
yes? ribbon/vs/line/thick/key/palette=rnb2 separate_lon, separate_lat, separate_temp

To choose a subset of trajectories, you can use one of the EXPNDI_BY_* functions to put the data onto an observation-by-feature axis, for instance, a temperature-by-trajectory grid, or for profile data (see the example below), salinity-by-profile. Here we use EXPNDI_BY_M_COUNTS.

yes? let longest = `rowsize[m=@max]`
yes? let lon2d = EXPNDI_BY_M_COUNTS(longitude, rowsize, longest)
yes? let lat2d = EXPNDI_BY_M_COUNTS(latitude, rowsize, longest)
yes? let var2d = EXPNDI_BY_M_COUNTS(temp, rowsize, longest)

Say the trajectory names include the year of the observations. To pick out the ones deployed in 2010:

yes? let mask = if STRINDEX(trajectory, "2010") gt 0 then 1
yes? list mask
VARIABLE : IF STRINDEX(TRAJECTORY, "2010") GT 0 THEN 1
DATA SET : SOCAT v3 data collection
FILENAME : dsg_file.nc
FILEPATH : /home/users/files/
SUBSET : 16 points (E)
1 / 1: ...
2 / 2: ...
3 / 3: ...
4 / 4: 1.000
5 / 5: 1.000
6 / 6: 1.000
7 / 7: 1.000
8 / 8: 1.000
9 / 9: ...
10 / 10: ...
11 / 11: ...
12 / 12: ...
13 / 13: ...
14 / 14: ...
15 / 15: ...
16 / 16: ...

Now, choose just those trajectories. The masked variables are 2-d variables, but it is fine to send them as arguments to the PLOT/VS command.

yes? let mask2d = EXPNDI_BY_M_COUNTS(mask, rowsize, longest)
 
yes? let/units=degrees_east masked_lon = mask2d*lon2d
yes? let/units=degrees_north masked_lat = mask2d*lat2d
yes? let/units="`temp,return=units`"/title="`temp,return=title`" masked_temp = mask2d*var2d
 
yes? plot/vs/ribbon/nolab masked_lon, masked_lat, masked_temp

Example 2

Profile datasets list the longitudes and latitudes as "metadata" variables, that is one longitude/ latitude variable per profile.

yes? use my_profile_data.nc
yes? show data
currently SET data sets:
1> ./my_profile_data.nc  (default)
name     title                             I         J         K         L         M         N
PLATFORM_CODE
PLATFORM CODE                    ..       ..       ..       ..       1:74      ..
LONGITUDE
Longitude                        ..       ..       ..       ..       1:74      ..
LATITUDE Latitude                         ..       ..       ..       ..       1:74      ..
ROWSIZE  Number of Observations for this  ..       ..       ..       ..       1:74      ..
TIME     OBSERVATION DATE                 1:43935   ..       ..       ..       ..       ..
DEPTH    OBSERVATION DEPTH                1:43935   ..       ..       ..       ..       ..
ZSAL     Sea Water Salinity               1:43935   ..       ..       ..       ..       ..

To plot, say, lines showing the salinity at depth, as a function of the longitudes represented in the data, we need to replicate the longitude values, so that our new longitude variable has the longitude for profile 1 corresponding to each observation of profile 1, the longitude for profile 2 corresponding to each observation of that profile, etc. Use the EXPND_BY_LEN function here.

yes? let nx = `rowsize[m=@sum]`  ! or equivalently, we could use the size of zsal.
yes? let lon_obs = EXPND_BY_LEN(longitude,rowsize,nx)
 
yes? let/units=degrees_east/title=longitude separate_lon = SEPARATE(lon_obs, rowsize, 1)
yes? let/units=`depth,return=units`/title=depth separate_dep = SEPARATE(depth, rowsize, 0)
yes? let/units="`zsal,return=units`"/title="`zsal,return=title`" separate_sal = SEPARATE(zsal, rowsize, 0)
 
! The /vlimits qualifier causes Ferret to draw the z axis as a depth axis on the plot:
 
yes? ribbon/vs/line/thick/key/lev=v/VLIMITS=2000:0 separate_lon, separate_dep , separate_sal