Chapter-5: Create netCDF for profile Data#

The CF Conventions introduced the following representations for profiles:

  • Single Profile (H.3.3): The file containes only one station and a single profile.

  • Orthogonal multidimensional array representation of profiles (H.3.1): Multiple profiles have same number of vertical coordinates and the coordinate values are identical. Similar to the case for time series, this representation can also be applied on multiple profiles with different vertical coordinate values while the efficient use of storage space is traded off.

  • Incomplete multidimensional array representation of profiles (H.3.2): Multiple profiles have the same number of vertical coordinates, but coordinate values are different. This representation is more efficient in storage space usage compared to H.3.1.

  • Contiguous ragged array representation of profiles (H.3.4): Multiple profiles have different lengths at vertical axis, and the whole dataset is complete (no more new observations will be made). This representation is the most efficient representation for storing this type of data regarding storage space.

  • Indexed ragged array representation of profiles (H.3.5): Multiple profiles have different lengths at vertical axis, and the dataset is not complete (dataset will be updated when new measurements are made).

As we can see, profile and time series share common representations, only the concern of profile data is vertical axis, rather than time axis. In last tutorial, we introduced how to form diverse netCDF files for time series data, and in this chapter, we prepared some hands-on exercises of creating netCDF for profile data. You can first try out the exercises on your own; if you encounter problems, feel free to take a look at the sample solutions.

For the exercises, you will need two datasets of CLAPPP : New Caledonian lagoons: CTD Profiles[5]. The sample solutions use the dataset “prony-1.nc” and “teremba-1.nc”.

When you have the data, try to do the following exercise:

  1. Each downloaded dataset contains a single profile that is already CF-compliant. If you’re using the same datasets as the sample solution, try to subset the datasets to make them share the same depth axis. Try to merge the subsets into one netCDF in the form as given in Appendix H.3.1 in the CF Conventions. Since each profile contains many data variables, if you’re able to do it, try to write loops to facilitate the data processing.

  2. Try to subset both datasets to be the same length on depth axis but have different depth coordinates, and merge the data in the form as given in Appendix H.3.2 in the CF Conventions.

  3. The two downloaded profiles actually have different lengths on the vertical axis depth. Try to merge them into the form as given in Appendix H.3.4 in the CF Conventions.

import os
from glob import glob
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
# List available datasets. Please change it to your file path.
os.chdir('../src/data')
pf_files = glob(os.path.join(os.getcwd(), "dsg_profile", "*.nc"))

Inspect the downloaded single profile#

# Load the profile dataset.
# It is a standard single profile netCDF file.
ds_prony = xr.open_dataset(pf_files[0], decode_times=False)
ds_prony
<xarray.Dataset> Size: 382B
Dimensions:       (depth: 11)
Coordinates:
    time          int32 4B ...
    latitude      float32 4B ...
    longitude     float32 4B ...
  * depth         (depth) float32 44B -0.1 -0.2 -0.3 -0.4 ... -0.9 -1.0 -1.1
Data variables:
    stationname   |S18 18B ...
    temperature   (depth) float32 44B ...
    conductivity  (depth) float32 44B ...
    salinity      (depth) float32 44B ...
    fluorescence  (depth) float32 44B ...
    irradiance    (depth) float32 44B ...
    density       (depth) float32 44B ...
    turbidity     (depth) float32 44B ...
Attributes: (12/37)
    description:              CTD profile (NetCDF files) for station ./Clappp...
    title:                    CLAPPP Project  : CTD profile (NetCDF files) st...
    keywords:                 Seabird CTD SBE19plus, temperature, conductivit...
    history:                  Created 02/06/21
    production:               MIO UMR 7294 CNRS / OSU Pytheas UMS3470 CNRS
    contact:                  martine Rodier (martine.rodier@mio.osupytheas.fr)
    ...                       ...
    contributor_institution:  MIO UMR 7294 CNRS / OSU Pytheas UMS 3470 CNRS
    institution:              MIO UMR7294 CNRS / OSU Pytheas
    publisher_name:           MIO Marine Institute of Oceanography - OSU Pytheas
    publisher_email:          martine.rodier@mio.osupytheas.fr
    publisher_url:            https://dataset.osupytheas.fr/
    publisher_institution:    MIO UMR 7294 CNRS / OSU Pytheas UMS 3470 CNRS

1. Orthogonal multidimensional array representation of profiles (H.3.1)#

Multiple profiles, same number of vertical levels and vertical coordinate values are identical.

# Load another profile.
ds_teremba = xr.open_dataset(pf_files[1], decode_times=False)
# Vertical levels of two datasets
print("Depth Coordinates at Station 'prony': ", ds_prony.depth.data)
print("Depth Coordinates at Station 'teremba': ", ds_teremba.depth.data[:20])

# Select the same depths for the station "teremba" to match those for the station "prony"
ds_teremba = ds_teremba.sel(depth=ds_prony.depth.data)
print("New Depth Coordinates at Station 'teremba': ", ds_teremba.depth.data)
Depth Coordinates at Station 'prony':  [-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1]
Depth Coordinates at Station 'teremba':  [-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1 -1.2 -1.3 -1.4
 -1.5 -1.6 -1.7 -1.8 -1.9 -2. ]
New Depth Coordinates at Station 'teremba':  [-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1]
# Vertical coordinate values for both profiles
depth = ds_teremba.depth.data
print("Vertical levels for both profiles are: ", depth)

# Station names for two profiles
# class <bytes>.decode() ==> string
station_name = [ds_prony.stationname.data.tolist().decode(), ds_teremba.stationname.data.tolist().decode()]
print("Station names of two profiles are: ", station_name)

# Time coordinate values for two profiles
# 0-D array (scalar).tolist() ==> the scalar value itself, not a list containing that value
time = [ds_prony.time.data.tolist(), ds_teremba.time.data.tolist()]
print("Time values for two profiles are: ", time)

# Longitudes for two profiles
lon = [ds_prony.longitude.data.tolist(), ds_teremba.longitude.data.tolist()]
print("Longitudes for two stations are: ", lon)

# Latitudes for two profiles
lat = [ds_prony.latitude.data.tolist(), ds_teremba.latitude.data.tolist()]
print("Latitudes for two profiles are: ", lat)
Vertical levels for both profiles are:  [-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1]
Station names of two profiles are:  ['./Clappp1//prony-1', './Clappp1//teremba-1']
Time values for two profiles are:  [20439933, 20442692]
Longitudes for two stations are:  [166.8699951171875, 165.72999572753906]
Latitudes for two profiles are:  [-22.399999618530273, -21.850000381469727]
# In the example, we have 2 profiles and 11 vertical coordinates,
# Make all the data variables be an array of the shape (profile, z), i.e. (2, 11)
temperature = np.row_stack((ds_prony.temperature.data.tolist(), ds_teremba.temperature.data.tolist()))
conductivity = np.row_stack((ds_prony.conductivity.data.tolist(), ds_teremba.conductivity.data.tolist()))
salinity = np.row_stack((ds_prony.salinity.data.tolist(), ds_teremba.salinity.data.tolist()))
fluorescence = np.row_stack((ds_prony.fluorescence.data.tolist(), ds_teremba.fluorescence.data.tolist()))
irradiance = np.row_stack((ds_prony.irradiance.data.tolist(), ds_teremba.irradiance.data.tolist()))
density = np.row_stack((ds_prony.density.data.tolist(), ds_teremba.density.data.tolist()))
turbidity = np.row_stack((ds_prony.turbidity.data.tolist(), ds_teremba.turbidity.data.tolist()))
# Get Attributes
attrs_depth = ds_prony.depth.attrs
attrs_station_name = ds_prony.stationname.attrs
attrs_lat = ds_prony.latitude.attrs
attrs_lon = ds_prony.longitude.attrs
attrs_time = ds_prony.time.attrs

attrs_temperature = ds_prony.temperature.attrs
attrs_conductivity = ds_prony.conductivity.attrs
attrs_salinity = ds_prony.salinity.attrs
attrs_fluorescence = ds_prony.fluorescence.attrs
attrs_irradiance = ds_prony.irradiance.attrs
attrs_density = ds_prony.density.attrs
attrs_turbidity = ds_prony.turbidity.attrs

attrs_global = ds_prony.attrs
ds_profile = xr.Dataset(
    coords={
        "time":(["station"], time, attrs_time),
        "lon":(["station"], lon, attrs_lon),
        "lat":(["station"], lat, attrs_lat),
        "station_name":(["station"], station_name, attrs_station_name),
        "depth":(["depth"], depth, attrs_depth)
    },
    data_vars={
        "temperature":(["station","depth"], temperature, attrs_temperature),
        "conductivity":(["station","depth"], conductivity, attrs_conductivity),
        "salinity":(["station","depth"], salinity, attrs_salinity),
        "fluorescence":(["station","depth"], fluorescence, attrs_fluorescence),
        "irradiance":(["station","depth"], irradiance, attrs_irradiance),
        "density":(["station","depth"], density, attrs_density),
        "turbidity":(["station","depth"], turbidity, attrs_turbidity)
    },
    attrs={
        "featureType":"profile"
    }
)

# Add attribute "cf_role"
ds_profile.station_name.attrs['cf_role'] = "profile_id"

ds_profile
<xarray.Dataset> Size: 1kB
Dimensions:       (station: 2, depth: 11)
Coordinates:
    time          (station) int64 16B 20439933 20442692
    lon           (station) float64 16B 166.9 165.7
    lat           (station) float64 16B -22.4 -21.85
    station_name  (station) <U20 160B './Clappp1//prony-1' './Clappp1//teremb...
  * depth         (depth) float32 44B -0.1 -0.2 -0.3 -0.4 ... -0.9 -1.0 -1.1
Dimensions without coordinates: station
Data variables:
    temperature   (station, depth) float64 176B 24.47 24.32 ... 24.67 24.67
    conductivity  (station, depth) float64 176B 32.59 52.66 ... 53.09 53.08
    salinity      (station, depth) float64 176B 20.58 35.22 ... 35.27 35.27
    fluorescence  (station, depth) float64 176B 1.813 1.776 ... 0.3914 0.3622
    irradiance    (station, depth) float64 176B -999.0 -999.0 ... -999.0 -999.0
    density       (station, depth) float64 176B 1.013e+03 ... 1.024e+03
    turbidity     (station, depth) float64 176B 0.0 0.0 0.0 ... 0.148 0.0 0.0
Attributes:
    featureType:  profile

Alternative Approach: Loop#

When there’re lots of data variables in a dataset, it can get tedious to access to the arrays one by one; Looping is an alternative way of doing.

# Alternative way to access to all arrays and reshape them.
coord_names = list(ds_prony.coords)[:-1] # drop "depth" from the list
data_variable_names = list(ds_prony.keys())[1:] # drop "stationname" from the list

print(coord_names)
print(data_variable_names)
['time', 'latitude', 'longitude']
['temperature', 'conductivity', 'salinity', 'fluorescence', 'irradiance', 'density', 'turbidity']
# Create a dictionary containing the coordinates for the new dataset
dict_coords = {}
for i in coord_names:
    dict_coords[i] = [ds_prony[i].data.tolist(), ds_teremba[i].data.tolist()]

dict_coords
{'time': [20439933, 20442692],
 'latitude': [-22.399999618530273, -21.850000381469727],
 'longitude': [166.8699951171875, 165.72999572753906]}
# Create another dictionary containing the data variables for the new dataset
dict_datavar = {}
for i in data_variable_names:
    dict_datavar[i] = np.row_stack((ds_prony[i].data.tolist(), ds_teremba[i].data.tolist()))

dict_datavar
{'temperature': array([[24.46899986, 24.31800079, 24.35400009, 24.34300041, 24.27099991,
         24.25600052, 24.24399948, 24.23600006, 24.22999954, 24.23999977,
         24.3239994 ],
        [25.02899933, 24.91600037, 24.84300041, 24.7840004 , 24.76199913,
         24.75099945, 24.70299911, 24.69300079, 24.67700005, 24.66900063,
         24.66500092]]),
 'conductivity': array([[32.59090042, 52.65810013, 52.72539902, 52.71509933, 52.65539932,
         52.64300156, 52.64400101, 52.64849854, 52.71120071, 52.72040176,
         52.76760101],
        [21.35740089, 34.02090073, 25.94919968, 40.42350006, 47.11520004,
         46.70410156, 53.00569916, 53.01900101, 53.10960007, 53.08599854,
         53.08349991]]),
 'salinity': array([[20.58300018, 35.22499847, 35.24599838, 35.2480011 , 35.25899887,
         35.26200104, 35.27199936, 35.28099823, 35.33399963, 35.33300018,
         35.30199814],
        [12.79100037, 21.37000084, 15.89099979, 25.94199944, 30.78700066,
         30.49399948, 35.18199921, 35.20000076, 35.28099823, 35.26900101,
         35.27000046]]),
 'fluorescence': array([[1.81309998, 1.77579999, 1.75590003, 1.79449999, 1.75590003,
         1.80729997, 1.84119999, 1.79449999, 0.74739999, 0.74860001,
         0.72750002],
        [1.89139998, 1.90190005, 1.87969995, 1.903     , 1.93340003,
         1.90069997, 1.96019995, 1.88320005, 1.903     , 0.39140001,
         0.36219999]]),
 'irradiance': array([[-999., -999., -999., -999., -999., -999., -999., -999., -999.,
         -999., -999.],
        [-999., -999., -999., -999., -999., -999., -999., -999., -999.,
         -999., -999.]]),
 'density': array([[1012.63647461, 1023.71759033, 1023.72357178, 1023.72839355,
         1023.75878906, 1023.76611328, 1023.77752686, 1023.78717041,
         1023.82928467, 1023.82617188, 1023.77819824],
        [1006.6373291 , 1013.10162354, 1009.01202393, 1016.57720947,
         1020.23382568, 1020.01660156, 1023.57232666, 1023.58868408,
         1023.65490723, 1023.64898682, 1023.65142822]]),
 'turbidity': array([[0.        , 0.        , 0.        , 0.        , 0.        ,
         0.        , 0.        , 0.        , 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , 0.        , 0.        ,
         0.        , 0.34400001, 0.186     , 0.148     , 0.        ,
         0.        ]])}
# Get Attributes
attrs_coords = {}
for i in coord_names:
    attrs_coords[i] = ds_prony[i].attrs

attrs_dataVar = {}
for i in data_variable_names:
    attrs_dataVar[i] = ds_prony[i].attrs

attrs_depth = ds_prony.depth.attrs
attrs_station_name = ds_prony.stationname.attrs

attrs_global = ds_prony.attrs
# Make a dictionary containing the data variables
dict_data_vars = {}
for i in data_variable_names:
    dict_data_vars[i] = (["station","depth"], dict_datavar[i], attrs_dataVar[i])

# Compose the xarray dataset
ds_profile = xr.Dataset(
    coords={
        "depth": (["depth"], np.float32(depth), attrs_depth),
        "station": (["station"], station_name, attrs_station_name),
        "time": (["station"], np.int32(dict_coords['time']), attrs_coords['time']),
        "lat": (["station"], np.float32(dict_coords['latitude']), attrs_coords['latitude']),
        "lon": (["station"], np.float32(dict_coords['longitude']), attrs_coords['longitude'])
    },
    data_vars = dict_data_vars,
)
# Add Global Attributes
ds_profile.attrs = attrs_global

# Add Feature Type 
ds_profile.attrs["featureType"] = "profile"
ds_profile
<xarray.Dataset> Size: 1kB
Dimensions:       (station: 2, depth: 11)
Coordinates:
  * depth         (depth) float32 44B -0.1 -0.2 -0.3 -0.4 ... -0.9 -1.0 -1.1
  * station       (station) <U20 160B './Clappp1//prony-1' './Clappp1//teremb...
    time          (station) int32 8B 20439933 20442692
    lat           (station) float32 8B -22.4 -21.85
    lon           (station) float32 8B 166.9 165.7
Data variables:
    temperature   (station, depth) float64 176B 24.47 24.32 ... 24.67 24.67
    conductivity  (station, depth) float64 176B 32.59 52.66 ... 53.09 53.08
    salinity      (station, depth) float64 176B 20.58 35.22 ... 35.27 35.27
    fluorescence  (station, depth) float64 176B 1.813 1.776 ... 0.3914 0.3622
    irradiance    (station, depth) float64 176B -999.0 -999.0 ... -999.0 -999.0
    density       (station, depth) float64 176B 1.013e+03 ... 1.024e+03
    turbidity     (station, depth) float64 176B 0.0 0.0 0.0 ... 0.148 0.0 0.0
Attributes: (12/37)
    description:              CTD profile (NetCDF files) for station ./Clappp...
    title:                    CLAPPP Project  : CTD profile (NetCDF files) st...
    keywords:                 Seabird CTD SBE19plus, temperature, conductivit...
    history:                  Created 02/06/21
    production:               MIO UMR 7294 CNRS / OSU Pytheas UMS3470 CNRS
    contact:                  martine Rodier (martine.rodier@mio.osupytheas.fr)
    ...                       ...
    contributor_institution:  MIO UMR 7294 CNRS / OSU Pytheas UMS 3470 CNRS
    institution:              MIO UMR7294 CNRS / OSU Pytheas
    publisher_name:           MIO Marine Institute of Oceanography - OSU Pytheas
    publisher_email:          martine.rodier@mio.osupytheas.fr
    publisher_url:            https://dataset.osupytheas.fr/
    publisher_institution:    MIO UMR 7294 CNRS / OSU Pytheas UMS 3470 CNRS

2. Incomplete multidimensional array representation of profiles (H.3.2)#

Recommended for the situation where there are multiple profiles and same number of vertical coordinates in each, but coordinate values are different.

# Load another profile.
ds_prony = xr.open_dataset(pf_files[0], decode_times=False)
ds_teremba = xr.open_dataset(pf_files[1], decode_times=False)

# Vertical levels of two datasets
print("Depth Coordinates at Station 'prony': ", ds_prony.depth.data)
print("Depth Coordinates at Station 'teremba': ", ds_teremba.depth.data[0:21])

# Select the same depths for the station "teremba" to match those for the station "prony"
ds_teremba = ds_teremba.isel(depth=slice(10,21))
print("New Depth Coordinates at Station 'teremba': ", ds_teremba.depth.data)
Depth Coordinates at Station 'prony':  [-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1]
Depth Coordinates at Station 'teremba':  [-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1 -1.2 -1.3 -1.4
 -1.5 -1.6 -1.7 -1.8 -1.9 -2.  -2.1]
New Depth Coordinates at Station 'teremba':  [-1.1 -1.2 -1.3 -1.4 -1.5 -1.6 -1.7 -1.8 -1.9 -2.  -2.1]
# Compared to the previous example in H.3.1, 
# replace the coordinate variable depth(depth) with depth(station,depth).

# Station names for two profiles
# class <bytes>.decode() ==> string
station_name = [ds_prony.stationname.data.tolist().decode(), ds_teremba.stationname.data.tolist().decode()]
print("Station names of two profiles are: ", station_name)
# Get attributes of station names
attrs_station_name = ds_prony.stationname.attrs

# Depths at two stations
depths = np.row_stack((ds_prony.depth.data.tolist(), ds_teremba.depth.data.tolist()))
print("Depths at two stations are: ", depths)
# Get attributes of depth
attrs_depth = ds_prony.depth.attrs
Station names of two profiles are:  ['./Clappp1//prony-1', './Clappp1//teremba-1']
Depths at two stations are:  [[-0.1        -0.2        -0.30000001 -0.40000001 -0.5        -0.60000002
  -0.69999999 -0.80000001 -0.89999998 -1.         -1.10000002]
 [-1.10000002 -1.20000005 -1.29999995 -1.39999998 -1.5        -1.60000002
  -1.70000005 -1.79999995 -1.89999998 -2.         -2.0999999 ]]
# Get list of auxiliary coordinate variable names
ls_auxCoords = list(ds_prony.coords)[:-1] # Drop Depth
print("List of auxiliary coordinate variables: ", ls_auxCoords)

# Get time, lon, lat and depth by looping
auxCoords = {}
for i in ls_auxCoords:
    auxCoords[i] = [ds_prony[i].data.tolist(), ds_teremba[i].data.tolist()]

    # Alternatively: If there are lots of datasets, we can consider of using nested loops.
    #auxCoords[i] = [j[i].data.tolist() for j in [ds_prony, ds_teremba]]
print(auxCoords)

# Get Attributes of auxiliary coordinate variables
auxCoords_attrs = {}
for i in ls_auxCoords:
    auxCoords_attrs[i] = ds_prony[i].attrs
List of auxiliary coordinate variables:  ['time', 'latitude', 'longitude']
{'time': [20439933, 20442692], 'latitude': [-22.399999618530273, -21.850000381469727], 'longitude': [166.8699951171875, 165.72999572753906]}
# Get list of data variables
ls_dataVar = list(ds_prony.keys())[1:] # Drop station names from the list
print("List of data variables: ", ls_dataVar)

# Get data variables by looping
dataVar = {}
for i in ls_dataVar:
    dataVar[i] = np.row_stack((ds_prony[i].data.tolist(), ds_teremba[i].data.tolist()))

# Get Attributes of data variables
dataVar_attrs = {}
for i in ls_dataVar:
    dataVar_attrs[i] = ds_prony[i].attrs
List of data variables:  ['temperature', 'conductivity', 'salinity', 'fluorescence', 'irradiance', 'density', 'turbidity']
# Create new Dataset
dict_auxCoords = {}
for i in ls_auxCoords:
    dict_auxCoords[i] = (["station"], auxCoords[i], auxCoords_attrs[i])
# Add station name and depth
dict_auxCoords["station_name"] = (["station"], station_name, attrs_station_name)
dict_auxCoords["depths"] = (["station","depth"], depths, attrs_depth)

dict_dataVar = {}
for i in ls_dataVar:
    dict_dataVar[i] = (["station","depth"], dataVar[i], dataVar_attrs[i])

ds_profile = xr.Dataset(coords=dict_auxCoords, 
                        data_vars=dict_dataVar,
                        attrs={"featureType":"profile"})

# Add attribute "cf_role"
ds_profile.station_name.attrs['cf_role'] = "profile_id"

ds_profile
<xarray.Dataset> Size: 2kB
Dimensions:       (station: 2, depth: 11)
Coordinates:
    time          (station) int64 16B 20439933 20442692
    latitude      (station) float64 16B -22.4 -21.85
    longitude     (station) float64 16B 166.9 165.7
    station_name  (station) <U20 160B './Clappp1//prony-1' './Clappp1//teremb...
    depths        (station, depth) float64 176B -0.1 -0.2 -0.3 ... -2.0 -2.1
Dimensions without coordinates: station, depth
Data variables:
    temperature   (station, depth) float64 176B 24.47 24.32 ... 24.65 24.65
    conductivity  (station, depth) float64 176B 32.59 52.66 ... 53.07 53.06
    salinity      (station, depth) float64 176B 20.58 35.22 ... 35.27 35.27
    fluorescence  (station, depth) float64 176B 1.813 1.776 ... 0.2968 0.3272
    irradiance    (station, depth) float64 176B -999.0 -999.0 ... -999.0 -999.0
    density       (station, depth) float64 176B 1.013e+03 ... 1.024e+03
    turbidity     (station, depth) float64 176B 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
Attributes:
    featureType:  profile

3. Contiguous ragged array of representation of profiles (H.3.4)#

Recommeded for the situation where there are multiple profiles, each profile has different number of vertical coordinates, and the dataset is complete.

# Load another profile.
ds_prony = xr.open_dataset(pf_files[0], decode_times=False)
ds_teremba = xr.open_dataset(pf_files[1], decode_times=False)

print("Number of observatinos at station 'prony': ", len(ds_prony.depth))
print("Number of observations at station 'teremba': ", len(ds_teremba.depth))
Number of observatinos at station 'prony':  11
Number of observations at station 'teremba':  604
# Number of observations at each station
rowSize = [len(ds_prony.depth), len(ds_teremba.depth)]
print("Number of observations at each station is: ", rowSize)

# Station names for two profiles
# class <bytes>.decode() ==> string
station_name = [ds_prony.stationname.data.tolist().decode(), ds_teremba.stationname.data.tolist().decode()]
print("Station names of two profiles are: ", station_name)

# Time coordinate values for two profiles
# 0-D array (scalar).tolist() ==> the scalar value itself, not a list containing that value
time = [ds_prony.time.data.tolist(), ds_teremba.time.data.tolist()]
print("Time values for two profiles are: ", time)

# Longitudes for two profiles
lon = [ds_prony.longitude.data.tolist(), ds_teremba.longitude.data.tolist()]
print("Longitudes for two stations are: ", lon)

# Latitudes for two profiles
lat = [ds_prony.latitude.data.tolist(), ds_teremba.latitude.data.tolist()]
print("Latitudes for two profiles are: ", lat)

# Vertical coordinate values for both profiles
depth = np.concatenate([ds_prony.depth.data, ds_teremba.depth.data])
print("Depths of the (first twenty) individual observations are: ", depth[:20])
Number of observations at each station is:  [11, 604]
Station names of two profiles are:  ['./Clappp1//prony-1', './Clappp1//teremba-1']
Time values for two profiles are:  [20439933, 20442692]
Longitudes for two stations are:  [166.8699951171875, 165.72999572753906]
Latitudes for two profiles are:  [-22.399999618530273, -21.850000381469727]
Depths of the (first twenty) individual observations are:  [-0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1.  -1.1 -0.1 -0.2 -0.3
 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9]
# Merge each data variable of two stations to one-dimensional array
temperature = np.concatenate([ds_prony.temperature.data, ds_teremba.temperature.data])
conductivity = np.concatenate([ds_prony.conductivity.data, ds_teremba.conductivity.data])
salinity = np.concatenate([ds_prony.salinity.data, ds_teremba.salinity.data])
fluorescence = np.concatenate([ds_prony.fluorescence.data, ds_teremba.fluorescence.data])
irradiance = np.concatenate([ds_prony.irradiance.data, ds_teremba.irradiance.data])
density = np.concatenate([ds_prony.density.data, ds_teremba.density.data])
turbidity = np.concatenate([ds_prony.turbidity.data, ds_teremba.turbidity.data])
# Get Attributes
attrs_depth = ds_prony.depth.attrs
attrs_station_name = ds_prony.stationname.attrs
attrs_lat = ds_prony.latitude.attrs
attrs_lon = ds_prony.longitude.attrs
attrs_time = ds_prony.time.attrs

attrs_temperature = ds_prony.temperature.attrs
attrs_conductivity = ds_prony.conductivity.attrs
attrs_salinity = ds_prony.salinity.attrs
attrs_fluorescence = ds_prony.fluorescence.attrs
attrs_irradiance = ds_prony.irradiance.attrs
attrs_density = ds_prony.density.attrs
attrs_turbidity = ds_prony.turbidity.attrs

attrs_global = ds_prony.attrs
# Create new Dataset
ds_profile = xr.Dataset(
    coords={
        "station_name": (["station"], station_name, attrs_station_name),
        "time": (["station"], time, attrs_time),
        "lon": (["station"], lon, attrs_lon),
        "lat": (["station"], lat, attrs_lat),
        "rowSize": (["station"], rowSize, {"long_name": "number of observations for this profile",
                                           "sample_dimension": "obs"}),
        "depth": (["obs"], depth, attrs_depth)
    },
    data_vars={
        "temperature":(["obs"], temperature, attrs_temperature),
        "conductivity":(["obs"], conductivity, attrs_conductivity),
        "salinity":(["obs"], salinity, attrs_salinity),
        "fluorescence":(["obs"], fluorescence, attrs_fluorescence),
        "irradiance":(["obs"], irradiance, attrs_irradiance),
        "density":(["obs"], density, attrs_density),
        "turbidity":(["obs"], turbidity, attrs_turbidity)
    },
    attrs={
        "featureType": "profile"
    }
)

# Add attribute "cf_role"
ds_profile.station_name.attrs['cf_role'] = "profile_id"

ds_profile
<xarray.Dataset> Size: 20kB
Dimensions:       (obs: 615, station: 2)
Coordinates:
    station_name  (station) <U20 160B './Clappp1//prony-1' './Clappp1//teremb...
    time          (station) int64 16B 20439933 20442692
    lon           (station) float64 16B 166.9 165.7
    lat           (station) float64 16B -22.4 -21.85
    rowSize       (station) int64 16B 11 604
    depth         (obs) float32 2kB -0.1 -0.2 -0.3 -0.4 ... -72.2 -72.3 -72.4
Dimensions without coordinates: obs, station
Data variables:
    temperature   (obs) float32 2kB 24.47 24.32 24.35 ... 22.99 22.99 22.99
    conductivity  (obs) float32 2kB 32.59 52.66 52.73 ... 51.46 51.46 51.46
    salinity      (obs) float32 2kB 20.58 35.22 35.25 ... 35.35 35.35 35.35
    fluorescence  (obs) float32 2kB 1.813 1.776 1.756 ... 0.8875 0.8629 0.8641
    irradiance    (obs) float32 2kB -999.0 -999.0 -999.0 ... -999.0 -999.0
    density       (obs) float32 2kB 1.013e+03 1.024e+03 ... 1.025e+03 1.025e+03
    turbidity     (obs) float32 2kB 0.0 0.0 0.0 0.0 ... 0.095 0.084 0.1 0.091
Attributes:
    featureType:  profile