Chapter-7: Create netCDF for compound data#

The CF Conventions provided recommendations on netCDF structure for two types of complex DSGs, i.e. time series of profiles and trajectory of profiles.

Time Series of Profiles#

  • Time series of profiles at a single station (H.5.2): There is only one stations in a dataset; for this station, multiple profiles were measured at multiple times.

  • Multidimensional array representations of time series profiles (H.5.1): There are multiple stations in a dataset; for each station, one or more profiles were measured at multiple times; and there are the same number of profiles for each station, and the same number of vertical levels for every profile, whereas the time values and the vertical level values can, but don’t have to be the same. For the situation where all time series share common time values, and all profiles share common vertical level values, we can use the representation in the example H.17 of Appendix H.5.1 for a more efficient use of storage space.

  • Ragged array representation of time series profiles (H.5.3): There are multiple stations, the number of profiles and vertical levels for each station varies.

Trajectory of Profiles#

  • Profiles along a single trajectory (H.6.2)

  • Multidimensional array representation of trajectory profiles (H.6.1): There’re multiple trajectories in the file, and there’re the same number of profiles for all trajectories and the same number of vertical levels for all profiles. This representation can also be used for trajectories with different number of profiles or vertical levels, at the cost of some wasted storage space due to missing values in the arrays.

  • Ragged array representation of trajectory profiles (H.6.3): The number of profiles and the vertical levels for each trajectory varies.

Note

It is not always a good practice to pack all data into one single netCDF file. If the data structure becomes too complicated, it is often a better option to separate the features e.g. by stations and put them into different netCDF files.

For example, the Global Temperature and Salinity Profile Programme (GTSPP)[7] dataset from NOAA is a ragged array representation of trajectory of profiles trajectories of profiles (H.6.3). ↓

xarray.Dataset {
dimensions:
	trajectory = 371 ;
	profile = 132501 ;
	obs = 419569 ;

variables:
	object trajectory(trajectory) ;
		trajectory:cf_role = trajectory_id ;
		trajectory:comment = Constructed from org_type_platform_cruise ;
		trajectory:ioos_category = Identifier ;
		trajectory:long_name = Trajectory ID ;
	object org(trajectory) ;
		org:comment = From the first 2 characters of stream_ident:
Code  Meaning
AD  Australian Oceanographic Data Centre
AF  Argentina Fisheries (Fisheries Research and Development National Institute (INIDEP), Mar del Plata, Argentina
AO  Atlantic Oceanographic and Meteorological Lab
AP  Asia-Pacific (International Pacific Research Center/ Asia-Pacific Data-Research Center)
BI  BIO Bedford institute of Oceanography
CF  Canadian Navy
CS  CSIRO in Australia
DA  Dalhousie University
FN  FNOC in Monterey, California
FR  Orstom, Brest
FW  Fresh Water Institute (Winnipeg)
GE  BSH, Germany
IC  ICES
II  IIP
IK  Institut fur Meereskunde, Kiel
IM  IML
IO  IOS in Pat Bay, BC
JA  Japanese Meteorologocal Agency
JF  Japan Fisheries Agency
ME  EDS
MO  Moncton
MU  Memorial University
NA  NAFC
NO  NODC (Washington)
NW  US National Weather Service
OD  Old Dominion Univ, USA
RU  Russian Federation
SA  St Andrews
SI  Scripps Institute of Oceanography
SO  Southampton Oceanographic Centre, UK
TC  TOGA Subsurface Data Centre (France)
TI  Tiberon lab US
UB  University of BC
UQ  University of Quebec at Rimouski
VL  Far Eastern Regional Hydromet. Res. Inst. of V
WH  Woods Hole

from https://www.nodc.noaa.gov/GTSPP/document/codetbls/gtsppcode.html#ref006 ;
		org:ioos_category = Identifier ;
		org:long_name = Organization ;
	object type(trajectory) ;
		type:comment = From the 3rd and 4th characters of stream_ident:
Code  Meaning
AR  Animal mounted recorder
BA  BATHY message
BF  Undulating Oceanographic Recorder (e.g. Batfish CTD)
BO  Bottle
BT  general BT data
CD  CTD down trace
CT  CTD data, up or down
CU  CTD up trace
DB  Drifting buoy
DD  Delayed mode drifting buoy data
DM  Delayed mode version from originator
DT  Digital BT
IC  Ice core
ID  Interpolated drifting buoy data
IN  Ship intake samples
MB  MBT
MC  CTD and bottle data are mixed for the station
MI  Data from a mixed set of instruments
ML  Minilog
OF  Real-time oxygen and fluorescence
PF  Profiling float
RM  Radio message
RQ  Radio message with scientific QC
SC  Sediment core
SG  Thermosalinograph data
ST  STD data
SV  Sound velocity probe
TE  TESAC message
TG  Thermograph data
TK  TRACKOB message
TO  Towed CTD
TR  Thermistor chain
XB  XBT
XC  Expendable CTD

from https://www.nodc.noaa.gov/GTSPP/document/codetbls/gtsppcode.html#ref082 ;
		type:ioos_category = Identifier ;
		type:long_name = Data Type ;
	object platform(trajectory) ;
		platform:comment = See the list of platform codes (sorted in various ways) at https://www.nodc.noaa.gov/GTSPP/document/codetbls/calllist.html ;
		platform:ioos_category = Identifier ;
		platform:long_name = GTSPP Platform Code ;
		platform:references = https://www.nodc.noaa.gov/gtspp/document/codetbls/callist.html ;
	object cruise(trajectory) ;
		cruise:comment = Radio callsign + year for real time data, or NODC reference number for delayed mode data.  See
https://www.nodc.noaa.gov/GTSPP/document/codetbls/calllist.html .
'X' indicates a missing value.
Two or more adjacent spaces in the original cruise names have been compacted to 1 space. ;
		cruise:ioos_category = Identifier ;
		cruise:long_name = Cruise_ID ;
	float64 station_id(profile) ;
		station_id:actual_range = [54396260 55104400] ;
		station_id:cf_role = profile_id ;
		station_id:comment = Identification number of the station (profile) in the GTSPP Continuously Managed Database ;
		station_id:ioos_category = Identifier ;
		station_id:long_name = Station ID Number ;
	float32 longitude(profile) ;
		longitude:_CoordinateAxisType = Lon ;
		longitude:actual_range = [-179.9    178.384] ;
		longitude:axis = X ;
		longitude:C_format = %9.4f ;
		longitude:colorBarMaximum = 180.0 ;
		longitude:colorBarMinimum = -180.0 ;
		longitude:epic_code = 502 ;
		longitude:FORTRAN_format = F9.4 ;
		longitude:ioos_category = Location ;
		longitude:long_name = Longitude ;
		longitude:standard_name = longitude ;
		longitude:units = degrees_east ;
		longitude:valid_max = 180.0 ;
		longitude:valid_min = -180.0 ;
	float32 latitude(profile) ;
		latitude:_CoordinateAxisType = Lat ;
		latitude:actual_range = [-68.69   83.583] ;
		latitude:axis = Y ;
		latitude:C_format = %8.4f ;
		latitude:colorBarMaximum = 90.0 ;
		latitude:colorBarMinimum = -90.0 ;
		latitude:epic_code = 500 ;
		latitude:FORTRAN_format = F8.4 ;
		latitude:ioos_category = Location ;
		latitude:long_name = Latitude ;
		latitude:standard_name = latitude ;
		latitude:units = degrees_north ;
		latitude:valid_max = 90.0 ;
		latitude:valid_min = -90.0 ;
	datetime64[ns] time(profile) ;
		time:_CoordinateAxisType = Time ;
		time:actual_range = [1.71322560e+09 1.71452124e+09] ;
		time:axis = T ;
		time:ioos_category = Time ;
		time:long_name = Time ;
		time:standard_name = time ;
		time:time_origin = 01-JAN-1970 00:00:00 ;
	int32 trajectoryIndex(profile) ;
		trajectoryIndex:instance_dimension = trajectory ;
		trajectoryIndex:ioos_category = Identifier ;
		trajectoryIndex:long_name = The trajectory to which this profile is associated. ;
	int32 rowSize(profile) ;
		rowSize:ioos_category = Identifier ;
		rowSize:long_name = Number of Observations for this Profile ;
		rowSize:sample_dimension = obs ;
	float32 depth(obs) ;
		depth:_CoordinateAxisType = Height ;
		depth:_CoordinateZisPositive = down ;
		depth:actual_range = [   0. 4477.] ;
		depth:axis = Z ;
		depth:C_format = %6.2f ;
		depth:colorBarMaximum = 5000.0 ;
		depth:colorBarMinimum = 0.0 ;
		depth:epic_code = 3 ;
		depth:FORTRAN_format = F6.2 ;
		depth:ioos_category = Location ;
		depth:long_name = Depth of the Observations ;
		depth:positive = down ;
		depth:standard_name = depth ;
		depth:units = m ;
	float32 temperature(obs) ;
		temperature:actual_range = [-2.  35.5] ;
		temperature:C_format = %9.4f ;
		temperature:cell_methods = time: point longitude: point latitude: point depth: point ;
		temperature:colorBarMaximum = 32.0 ;
		temperature:colorBarMinimum = 0.0 ;
		temperature:epic_code = 28 ;
		temperature:FORTRAN_format = F9.4 ;
		temperature:ioos_category = Temperature ;
		temperature:long_name = Sea Water Temperature ;
		temperature:standard_name = sea_water_temperature ;
		temperature:units = degree_C ;
	float32 salinity(obs) ;
		salinity:actual_range = [ 0.  39.2] ;
		salinity:C_format = %9.4f ;
		salinity:cell_methods = time: point longitude: point latitude: point depth: point ;
		salinity:colorBarMaximum = 37.0 ;
		salinity:colorBarMinimum = 32.0 ;
		salinity:epic_code = 41 ;
		salinity:FORTRAN_format = F9.4 ;
		salinity:ioos_category = Salinity ;
		salinity:long_name = Practical Salinity ;
		salinity:salinity_scale = PSU ;
		salinity:standard_name = sea_water_practical_salinity ;
		salinity:units = PSU ;

// global attributes:
	:acknowledgment = These data were acquired from the US NOAA National Oceanographic Data Center (NODC) on 2024-09-11 from https://www.nodc.noaa.gov/GTSPP/. ;
	:cdm_altitude_proxy = depth ;
	:cdm_data_type = TrajectoryProfile ;
	:cdm_profile_variables = station_id, longitude, latitude, time ;
	:cdm_trajectory_variables = trajectory, org, type, platform, cruise ;
	:Conventions = COARDS, WOCE, GTSPP, CF-1.10, ACDD-1.3 ;
	:creator_email = nodc.gtspp@noaa.gov ;
	:creator_name = NOAA NESDIS NODC (IN295) ;
	:creator_url = https://www.nodc.noaa.gov/GTSPP/ ;
	:crs = EPSG:4326 ;
	:defaultGraphQuery = longitude,latitude,station_id&time%3E=max(time)-7days&time%3C=max(time)&.draw=markers&.marker=10|5 ;
	:Easternmost_Easting = 178.38400268554688 ;
	:featureType = TrajectoryProfile ;
	:file_source = The GTSPP Continuously Managed Data Base ;
	:geospatial_lat_max = 83.58300018310547 ;
	:geospatial_lat_min = -68.69000244140625 ;
	:geospatial_lat_units = degrees_north ;
	:geospatial_lon_max = 178.38400268554688 ;
	:geospatial_lon_min = -179.89999389648438 ;
	:geospatial_lon_units = degrees_east ;
	:geospatial_vertical_max = 4477.0 ;
	:geospatial_vertical_min = 0.0 ;
	:geospatial_vertical_positive = down ;
	:geospatial_vertical_units = m ;
	:gtspp_ConventionVersion = GTSPP4.0 ;
	:gtspp_handbook_version = GTSPP Data User's Manual 1.0 ;
	:gtspp_program = writeGTSPPnc40.f90 ;
	:gtspp_programVersion = 1.8 ;
	:history = 2024-08-31 csun writeGTSPPnc40.f90 Version 1.8
.tgz files from ftp.nodc.noaa.gov /pub/data.nodc/gtspp/bestcopy/netcdf (https://www.nodc.noaa.gov/GTSPP/)
2024-08-31 Most recent ingest, clean, and reformat at ERD (erd.data at noaa.gov).
2024-09-11T19:21:22Z (local files)
2024-09-11T19:21:22Z http://localhost:8080/erddap/tabledap/erdGtsppBestNc.ncCF?&time>=2024-04-16&time<2024-05-01&orderBy("trajectory,station_id,time,depth") ;
	:id = erdGtsppBest ;
	:infoUrl = https://www.nodc.noaa.gov/GTSPP/ ;
	:institution = NOAA NODC ;
	:keywords = cruise, data, density, depth, Earth Science > Oceans > Ocean Temperature > Water Temperature, Earth Science > Oceans > Salinity/Density > Salinity, global, gtspp, identifier, noaa, nodc, observation, ocean, oceans, organization, profile, program, salinity, sea, sea_water_practical_salinity, sea_water_temperature, seawater, station, temperature, temperature-salinity, time, type, water ;
	:keywords_vocabulary = NODC Data Types, CF Standard Names, GCMD Science Keywords ;
	:LEXICON = NODC_GTSPP ;
	:license = These data are openly available to the public.  Please acknowledge the use of these data with:
These data were acquired from the US NOAA National Oceanographic Data Center (NODC) on 2024-09-11 from https://www.nodc.noaa.gov/GTSPP/.

The data may be used and redistributed for free but is not intended
for legal use, since it may contain inaccuracies. Neither the data
Contributor, ERD, NOAA, nor the United States Government, nor any
of their employees or contractors, makes any warranty, express or
implied, including warranties of merchantability and fitness for a
particular purpose, or assumes any legal liability for the accuracy,
completeness, or usefulness, of this information. ;
	:naming_authority = gov.noaa.nodc ;
	:Northernmost_Northing = 83.58300018310547 ;
	:project = Joint IODE/JCOMM Global Temperature-Salinity Profile Programme ;
	:references = https://www.nodc.noaa.gov/GTSPP/ ;
	:sourceUrl = (local files) ;
	:Southernmost_Northing = -68.69000244140625 ;
	:standard_name_vocabulary = CF Standard Name Table v70 ;
	:subsetVariables = trajectory, org, type, platform, cruise ;
	:summary = The Global Temperature-Salinity Profile Programme (GTSPP) develops and maintains a global ocean temperature and salinity resource with data that are both up-to-date and of the highest quality. It is a joint World Meteorological Organization (WMO) and Intergovernmental Oceanographic Commission (IOC) program.  It includes data from XBTs, CTDs, moored and drifting buoys, and PALACE floats. For information about organizations contributing data to GTSPP, see http://gosic.org/goos/GTSPP-data-flow.htm .  The U.S. National Oceanographic Data Center (NODC) maintains the GTSPP Continuously Managed Data Base and releases new 'best-copy' data once per month.

WARNING: This dataset has a *lot* of data.  If you request too much data, your request will fail.
* If you don't specify a longitude and latitude bounding box, don't request more than a month's data at a time.
* If you do specify a longitude and latitude bounding box, you can request data for a proportionally longer time period.
Requesting data for a specific station_id may be slow, but it works.

*** This ERDDAP dataset has data for the entire world for all available times (currently, up to and including the August 2024 data) but is a subset of the original NODC 'best-copy' data.  It only includes data where the quality flags indicate the data is 1=CORRECT, 2=PROBABLY GOOD, or 5=MODIFIED. It does not include some of the metadata, any of the history data, or any of the quality flag data of the original dataset. You can always get the complete, up-to-date dataset (and additional, near-real-time data) from the source: https://www.nodc.noaa.gov/GTSPP/ .  Specific differences are:
* Profiles with a position_quality_flag or a time_quality_flag other than 1|2|5 were removed.
* Rows with a depth (z) value less than -0.4 or greater than 10000 or a z_variable_quality_flag other than 1|2|5 were removed.
* Temperature values less than -4 or greater than 40 or with a temperature_quality_flag other than 1|2|5 were set to NaN.
* Salinity values less than 0 or greater than 41 or with a salinity_quality_flag other than 1|2|5 were set to NaN.
* Time values were converted from "days since 1900-01-01 00:00:00" to "seconds since 1970-01-01T00:00:00".

See the Quality Flag definitions on page 5 and "Table 2.1: Global Impossible Parameter Values" on page 61 of
https://www.nodc.noaa.gov/GTSPP/document/qcmans/GTSPP_RT_QC_Manual_20090916.pdf .
The Quality Flag definitions are also at
https://www.nodc.noaa.gov/GTSPP/document/qcmans/qcflags.htm . ;
	:testOutOfDate = now-45days ;
	:time_coverage_end = 2024-04-30T23:54:00Z ;
	:time_coverage_start = 2024-04-16T00:00:00Z ;
	:title = Global Temperature and Salinity Profile Programme (GTSPP) Data, 1985-present ;
	:Westernmost_Easting = -179.89999389648438 ;
}

Exercise: create a netCDF for time series of profiles#

Now, let’s practice creating a netCDF for time series of profiles with the Newport Lab CTD Casts dataset. Each original file contains a single profile at a time, with measurements of sea water temperature, salinity, density, and fluorescence in multiple water depths. Since each profile has different number of water depth levels, we’ll compose the netCDF in a ragged array representation of time series profiles (H.5.3). You could randomly download a few original files to do this exercise; if you encounter any difficulties, feel free to have a look at the provided sample solution. The sample solution used three datasets, 010903NH05.nc, 011207NH05.nc and 061300NH2.nc.

import os
from glob import glob
import numpy as np
import pandas as pd
import xarray as xr
import cftime
from datetime import datetime
# List available datasets. Please change it to your file path.
tsP_files = sorted(glob(os.path.join(os.getcwd(), "dsg_tsProfile", "*.nc")))
# Read both files
ds1 = xr.open_dataset(tsP_files[0], decode_times=False)
ds2 = xr.open_dataset(tsP_files[1], decode_times=False)
ds3 = xr.open_dataset(tsP_files[2], decode_times=False)
# By printing the xarray dataset, we can inspect the structure of the source data.
ds1
<xarray.Dataset> Size: 5kB
Dimensions:            (row: 43)
Dimensions without coordinates: row
Data variables: (12/14)
    line               (row) |S2 86B ...
    longitude          (row) float64 344B ...
    latitude           (row) float64 344B ...
    station_code       (row) |S10 430B ...
    time               (row) float64 344B ...
    station            (row) |S4 172B ...
    ...                 ...
    temperature        (row) float64 344B ...
    salinity           (row) float64 344B ...
    density            (row) float64 344B ...
    fluorescence       (row) float64 344B ...
    project            (row) |S2 86B ...
    transect           (row) |S20 860B ...
Attributes:
    id:                    010903NH05
    observationDimension:  row
# By inspecting the longitudes, latitudes and time of the observations,
# we know that one source netCDF file only contains a single profile at a time for one station.
print(ds1.longitude.data)
print(ds1.latitude.data)
print(ds1.time.data)
[-124.12999686 -124.12999686 -124.12999686 -124.12999686 -124.12999686
 -124.12999686 -124.12999686 -124.12999686 -124.12999686 -124.12999686
 -124.12999686 -124.12999686 -124.12999686 -124.12999686 -124.12999686
 -124.12999686 -124.12999686 -124.12999686 -124.12999686 -124.12999686
 -124.12999686 -124.12999686 -124.12999686 -124.12999686 -124.12999686
 -124.12999686 -124.12999686 -124.12999686 -124.12999686 -124.12999686
 -124.12999686 -124.12999686 -124.12999686 -124.12999686 -124.12999686
 -124.12999686 -124.12999686 -124.12999686 -124.12999686 -124.12999686
 -124.12999686 -124.12999686 -124.12999686]
[44.65169887 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887
 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887
 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887
 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887
 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887
 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887
 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887 44.65169887
 44.65169887]
[1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09
 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09
 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09
 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09
 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09
 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09
 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09
 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09 1.0420992e+09
 1.0420992e+09 1.0420992e+09 1.0420992e+09]
# Get the station id of all stations
station1 = np.unique(ds1.station.data)
station2 = np.unique(ds2.station.data)
station3 = np.unique(ds3.station.data)
station_index = np.concatenate([station1, station2, station3])
station = np.unique(station_index)
print("The name of each station is: ", station)

# Get the longitudes of all stations
lon1 = np.unique(ds1.longitude.data)
lon2 = np.unique(ds2.longitude.data)
lon3 = np.unique(ds3.longitude.data)
lon = np.unique(np.concatenate([lon1, lon2, lon3]))
print("The longitude of each station is: ", lon)

# Get the latitudes of all stations
lat1 = np.unique(ds1.latitude.data)
lat2 = np.unique(ds2.latitude.data)
lat3 = np.unique(ds3.latitude.data)
lat = np.unique(np.concatenate([lat1, lat2, lat3]))
# Because the latitudes of all stations are same,
# we need to make the length of latitude array match the number of stations.
lat = np.repeat(lat, len(station))
print("The latitude of each station is: ", lat)
The name of each station is:  [b'NH03' b'NH15']
The longitude of each station is:  [-124.41169686 -124.12999686]
The latitude of each station is:  [44.65169887 44.65169887]
# Get the IDs of all profiles
profID1 = np.unique(ds1.station_code.data)
profID2 = np.unique(ds2.station_code.data)
profID3 = np.unique(ds3.station_code.data)
profID = np.concatenate([profID1, profID2, profID3])
print("The ID of each profile is: ", profID)

# Get the number of observations for all profile
rowSize1 = ds1.sizes['row']
rowSize2 = ds2.sizes['row']
rowSize3 = ds3.sizes['row']
rowSize = [rowSize1, rowSize2, rowSize3]
print("The number of observations for each profile is: ", rowSize)
The ID of each profile is:  [b'010903NH03' b'011207NH03' b'061300NH15a']
The number of observations for each profile is:  [43, 46, 83]
# Get time of each profile
time1 = np.unique(ds1.time.data)
time2 = np.unique(ds2.time.data)
time3 = np.unique(ds3.time.data)
time = np.concatenate([time1, time2, time3])
print("The time of each profile is: ", time)

# Get the time unit
time_unit = ds1.time.units
print("The time unit is: ", time_unit)
The time of each profile is:  [1.04209920e+09 1.16863812e+09 9.60945600e+08]
The time unit is:  seconds since 1970-01-01T00:00:00Z
# Get the data variables (temperature, density, salinity, fluorenscence) from each profile,
# and merge them in order into a 1D array
temp = np.concatenate([ds1.temperature.data, ds2.temperature.data, ds3.temperature.data])
density = np.concatenate([ds1.density.data, ds2.density.data, ds3.density.data])
salinity = np.concatenate([ds1.salinity.data, ds2.salinity.data, ds3.salinity.data])
fluorescence = np.concatenate(
    [ds1.fluorescence.data, ds2.fluorescence.data, ds3.fluorescence.data]
)
print("The total number of observations for all profiles is: ", len(temp))

# Get the water depths from each profile, and merge them in order into a 1D array
depth = np.concatenate(
    [ds1.depth_or_pressure.data, ds2.depth_or_pressure.data, ds3.depth_or_pressure.data]
)
The total number of observations for all profiles is:  172
# Compose the netCDF dataset
ds = xr.Dataset(
    coords=(
        {
            "lon": (
                ["station"],
                lon,
                {"standard_name": "longitude", "units": "degrees_east"},
            ),
            "lat": (
                ["station"],
                lat,
                {"standard_name": "latitude", "units": "degrees_north"},
            ),
            "station_name": (
                ["station"],
                station,
                {"long_name": "station name", "cf_role": "timeseries_id"},
            ),
            "profile": (
                ["profile"],
                profID,
                {"long_name": "Profile ID", "cf_role": "profile_id"},
            ),
            "station_index": (
                ["profile"],
                station_index,
                {
                    "long_name": "which station this profile is for",
                    "instance_dimension": "station",
                },
            ),
            "row_size": (
                ["profile"],
                rowSize,
                {
                    "long_name": "number of observations for this profile",
                    "sample_dimension": "obs",
                },
            ),
            "time": (["profile"], time, {"standard_name": "time", "units": time_unit}),
            "depth": (
                ["obs"],
                depth,
                {"long_name": "Depth", "units": "m", "axis": "Z"},
            ),
        }
    ),
    data_vars=(
        {
            "density": (
                ["obs"],
                density,
                {"standard_name": "sea_water_density", "units": "sigma"},
            ),
            "fluorescence": (
                ["obs"],
                fluorescence,
                {"long_name": "Fluorescence", "units": "volts"},
            ),
            "salinity": (
                ["obs"],
                salinity,
                {"standard_name": "sea_water_practical_salinity", "units": "PSU"},
            ),
            "temperature": (
                ["obs"],
                temp,
                {"standard_name": "sea_water_temperature", "units": "degree_C"},
            ),
        }
    ),
    attrs={
        "featureType": "TimeSeriesProfile",
        "Conventions": "CF-1.11",
        "more attributes available at": "https://coastwatch.pfeg.noaa.gov/erddap/tabledap/erdNewportCtd.html"
    },
)

ds
<xarray.Dataset> Size: 7kB
Dimensions:        (obs: 172, station: 2, profile: 3)
Coordinates:
    lon            (station) float64 16B -124.4 -124.1
    lat            (station) float64 16B 44.65 44.65
    station_name   (station) |S4 8B b'NH03' b'NH15'
  * profile        (profile) |S11 33B b'010903NH03' b'011207NH03' b'061300NH15a'
    station_index  (profile) |S4 12B b'NH03' b'NH03' b'NH15'
    row_size       (profile) int64 24B 43 46 83
    time           (profile) float64 24B 1.042e+09 1.169e+09 9.609e+08
    depth          (obs) float64 1kB 1.0 2.0 3.0 4.0 5.0 ... 81.0 82.0 83.0 84.0
Dimensions without coordinates: obs, station
Data variables:
    density        (obs) float64 1kB 23.0 23.01 23.01 ... 26.19 26.19 26.19
    fluorescence   (obs) float64 1kB nan nan nan nan ... 0.163 0.16 0.16 0.164
    salinity       (obs) float64 1kB 30.11 30.12 30.12 ... 33.64 33.64 33.64
    temperature    (obs) float64 1kB 10.83 10.83 10.83 ... 8.078 8.079 8.077
Attributes:
    featureType:                   TimeSeriesProfile
    Conventions:                   CF-1.11
    more attributes available at:  https://coastwatch.pfeg.noaa.gov/erddap/ta...

Note

  1. In case you’re not familiar with the syntax of xr.Dataset() function, we would highly recommend you to check out the elaborations in Chapter 3.

  2. For the convenience of elaboration, we have used only three profiles from two stations. However, in real use cases you probably need to handle large amount of data, then it is usually a better option to use loop functions. You can find a showcase with loop functions in Chapter 5.