Chapter-6: Create netCDF for trajectory Data#

The CF Conventions recommended the following representations for trajectories:

  • Single Trajectory (H.4.2): A netCDF file containes a single trajectory.

  • Multidimensional array representation of trajectories (H.4.1): A netCDF file containes multiple trajectories, and each trajectory contains same number of observations/elements. This representation can also be applied on multiple trajectories with different number of observations, at the cost of wasting some storage space due to missing values.

  • Contiguous ragged array representation of trajectories (H.4.3): A netCDF file contains multiple trajectories with different number of elements, and one can control the order of writing (e.g. dataset is complete) For such a case, this representation is more efficient in storage space usage compared to H.4.1.

  • Indexed ragged array representation of trajectories (H.4.4): A netCDF file contains multiple trajectories with different number of elements, and the elements cannot be written in order

In this tutorial, we’re walking through the process of creating netCDF from text (CSV) files for trajectory data. For the showcase, we used “weather201507.csv” and “weather201510.csv” from the dataset Liquid Robotics Wave Glider, Honey Badger (G3), 2015, Weather[6].

Having the data ready, we’ll do the following:

  1. The downloaded CSV file “weather201507.csv” contains a single trajectory. We’ll read the CSV file as a pandas dataframe, and create a netCDF file from it based on the template given in Appendix H.4.2.

  2. Based on the data from the CSV file “weather201507.csv” and “weather201510.csv”, we’ll pack both trajectories into one netCDF in the form as given in Appendix H.4.3.

import os
from glob import glob
import numpy as np
import pandas as pd
import xarray as xr
import cftime
from datetime import datetime
# List available datasets. Please change it to your file path.
os.chdir('../src/data')
trj_files = glob(os.path.join(os.getcwd(), "dsg_trajectory", "*.csv"))

1. Create netCDF from a single trajectory#

# Inspect the dataset
df = pd.read_csv(trj_files[0])
df
vehicleName weather* feed_version datetime latitude (decimal degrees) longitude (decimal degrees) temperature (C) pressure (mBar) avg_wind_speed (kt) std_dev_wind_speed (kt) avg_wind_direction (degrees T) std_dev_wind_direction (degrees T)
0 Honey Badger (G3) weather 1.0 2015-07-01T00:10:00Z 28.005397 -154.140788 25.3 1028.1 9.4 2.3 170.7 0.0
1 Honey Badger (G3) weather 1.0 2015-07-01T00:20:00Z 28.003527 -154.140363 25.3 1029.0 9.0 1.9 167.8 0.0
2 Honey Badger (G3) weather 1.0 2015-07-01T00:40:00Z 27.999757 -154.139467 25.2 1028.0 8.9 1.9 174.7 0.0
3 Honey Badger (G3) weather 1.0 2015-07-01T00:50:00Z 27.997773 -154.139175 25.2 1028.0 8.8 2.0 171.3 0.0
4 Honey Badger (G3) weather 1.0 2015-07-01T01:10:00Z 27.995253 -154.139087 25.3 1028.0 9.9 1.5 166.4 0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
3805 Honey Badger (G3) weather 1.0 2015-07-31T23:10:00Z 25.876978 -145.019295 25.3 1018.3 16.8 1.9 35.1 0.0
3806 Honey Badger (G3) weather 1.0 2015-07-31T23:20:00Z 25.874477 -145.021475 25.3 1018.0 16.8 2.0 38.7 0.0
3807 Honey Badger (G3) weather 1.0 2015-07-31T23:30:00Z 25.871978 -145.023530 25.4 1018.3 16.9 1.7 35.0 0.0
3808 Honey Badger (G3) weather 1.0 2015-07-31T23:40:00Z 25.869360 -145.025563 25.4 1018.0 16.1 2.0 38.6 0.0
3809 Honey Badger (G3) weather 1.0 2015-07-31T23:50:00Z 25.866775 -145.027755 25.5 1018.3 16.0 2.2 34.7 0.0

3810 rows × 12 columns

# Transform datetime in string to datetime object
time_dt = [datetime.strptime(i, '%Y-%m-%dT%H:%M:%SZ') for i in df['datetime']]
# Print the first five date times
time_dt[:5]
[datetime.datetime(2015, 7, 1, 0, 10),
 datetime.datetime(2015, 7, 1, 0, 20),
 datetime.datetime(2015, 7, 1, 0, 40),
 datetime.datetime(2015, 7, 1, 0, 50),
 datetime.datetime(2015, 7, 1, 1, 10)]
# Set a reference time (time units)
time_units = 'seconds since 1970-01-01 00:00:00'
# Convert datetime to numerical values relative to the reference time
time_num = cftime.date2num(time_dt, time_units)
time_num
array([1435709400, 1435710000, 1435711200, ..., 1438385400, 1438386000,
       1438386600])
# List the column names of the dataframe
df_colnames = df.columns
df_colnames
Index(['vehicleName', 'weather*', 'feed_version', 'datetime',
       'latitude (decimal degrees)', 'longitude (decimal degrees)',
       'temperature (C)', 'pressure (mBar)', 'avg_wind_speed (kt)',
       'std_dev_wind_speed (kt)', 'avg_wind_direction (degrees T)',
       'std_dev_wind_direction (degrees T)'],
      dtype='object')
# Transfer the data of each column into array
lat = np.array(df[df_colnames[4]])
lon = np.array(df[df_colnames[5]])
temp = np.array(df[df_colnames[6]])
pressure = np.array(df[df_colnames[7]])
avg_wind_speed = np.array(df[df_colnames[8]])
avg_wind_direction = np.array(df[df_colnames[10]])
# Get the vehicle name as trajectory id
vehicleName = df['vehicleName'].unique().tolist()[0]
vehicleName
'Honey Badger (G3)'
# Create xarray dataset of single trajectory
ds = xr.Dataset(
    coords={
        "time":(["time"], np.float64(time_num), {"standard_name":"time",
                                                 "units":time_units}),
        "lat":(["time"], np.float64(lat), {"standard_name":"latitude",
                                           "units":"degrees_north"}),
        "lon":(["time"], np.float64(lon), {"standard_name":"longitude",
                                           "units":"degrees_east"}),
        "trajectory":([], vehicleName, {"long_name": "Vehicle Name",
                                        "cf_role": "trajectory_id"})
    },
    data_vars={
        "temperature": (["time"], np.float32(temp), {"long_name": "Temperature",
                                                     "units": "degree_C"}),
        "pressure": (["time"], np.float32(pressure), {"long_name": "Pressure",
                                                       "units": "mBar"}),
        "avg_wind_speed": (["time"], np.float32(avg_wind_speed), {"long_name":"wind_speed",
                                                                  "units": "knots"}),
        "avg_wind_direction": (["time"], np.float32(avg_wind_direction), {"standard_name":"wind_from_direction",
                                                                          "units":"degree"})
    },
    attrs={
        "featureType": "trajectory",
        "Conventions": "CF-1.11"
    }
)

ds
<xarray.Dataset> Size: 152kB
Dimensions:             (time: 3810)
Coordinates:
  * time                (time) float64 30kB 1.436e+09 1.436e+09 ... 1.438e+09
    lat                 (time) float64 30kB 28.01 28.0 28.0 ... 25.87 25.87
    lon                 (time) float64 30kB -154.1 -154.1 ... -145.0 -145.0
    trajectory          <U17 68B 'Honey Badger (G3)'
Data variables:
    temperature         (time) float32 15kB 25.3 25.3 25.2 ... 25.4 25.4 25.5
    pressure            (time) float32 15kB 1.028e+03 1.029e+03 ... 1.018e+03
    avg_wind_speed      (time) float32 15kB 9.4 9.0 8.9 8.8 ... 16.9 16.1 16.0
    avg_wind_direction  (time) float32 15kB 170.7 167.8 174.7 ... 35.0 38.6 34.7
Attributes:
    featureType:  trajectory
    Conventions:  CF-1.11
ds.info()
xarray.Dataset {
dimensions:
	time = 3810 ;

variables:
	float32 temperature(time) ;
		temperature:long_name = Temperature ;
		temperature:units = degree_C ;
	float32 pressure(time) ;
		pressure:long_name = Pressure ;
		pressure:units = mBar ;
	float32 avg_wind_speed(time) ;
		avg_wind_speed:long_name = wind_speed ;
		avg_wind_speed:units = knots ;
	float32 avg_wind_direction(time) ;
		avg_wind_direction:standard_name = wind_from_direction ;
		avg_wind_direction:units = degree ;
	float64 time(time) ;
		time:standard_name = time ;
		time:units = seconds since 1970-01-01 00:00:00 ;
	float64 lat(time) ;
		lat:standard_name = latitude ;
		lat:units = degrees_north ;
	float64 lon(time) ;
		lon:standard_name = longitude ;
		lon:units = degrees_east ;
	<U17 trajectory() ;
		trajectory:long_name = Vehicle Name ;
		trajectory:cf_role = trajectory_id ;

// global attributes:
	:featureType = trajectory ;
	:Conventions = CF-1.11 ;
}

2. Create netCDF for multiple trajectories with different number of observations#

# Read both CSV files as pandas dataframe
df1 = pd.read_csv(trj_files[0])
df2 = pd.read_csv(trj_files[1])

df2.head()
vehicleName weather* feed_version datetime latitude (decimal degrees) longitude (decimal degrees) temperature (C) pressure (mBar) avg_wind_speed (kt) std_dev_wind_speed (kt) avg_wind_direction (degrees T) std_dev_wind_direction (degrees T)
0 Honey Badger (G3) weather 1.0 2015-10-01T00:00:00Z 25.474888 -153.806048 25.3 1016.4 27.2 3.0 18.4 0.0
1 Honey Badger (G3) weather 1.0 2015-10-01T00:10:00Z 25.474348 -153.806277 25.5 1015.9 26.2 3.0 20.6 0.0
2 Honey Badger (G3) weather 1.0 2015-10-01T00:20:00Z 25.473818 -153.806640 25.5 1015.9 25.9 2.4 22.0 0.0
3 Honey Badger (G3) weather 1.0 2015-10-01T00:30:00Z 25.472875 -153.806823 25.7 1015.9 26.1 2.6 15.2 0.0
4 Honey Badger (G3) weather 1.0 2015-10-01T00:40:00Z 25.471928 -153.806935 25.9 1015.8 25.9 3.1 20.9 0.0
# Get the number of observations of each trajectory
rowSize1 = len(df1.index)
rowSize2 = len(df2.index)

print("The number of observations in each trajectory is ", rowSize1, " and ", rowSize2)
The number of observations in each trajectory is  3810  and  2972
# Set the trajectory ID as vehicle name + time of the cruise
trajName1 = "Honey_Badger_(G3)_201507"
trajName2 = "Honey_Badger_(G3)_201510"
print("The name of each trajectory is ", trajName1, " and ", trajName2)
The name of each trajectory is  Honey_Badger_(G3)_201507  and  Honey_Badger_(G3)_201510
# Prepare coordinate variable of time.
# Transform the datetime from DFs to datetime objects
time_dt1 = [datetime.strptime(i, '%Y-%m-%dT%H:%M:%SZ') for i in df1['datetime']]
time_dt2 = [datetime.strptime(i, '%Y-%m-%dT%H:%M:%SZ') for i in df2['datetime']]

# Set a reference time (i.e. the unit of time)
time_unit = 'seconds since 1970-01-01 00:00:00'

# Convert datetime to numerical values relative to the reference time
time_num1 = cftime.date2num(time_dt1, time_unit)
time_num2 = cftime.date2num(time_dt2, time_unit)

# Join the time steps of both trajectories into a 1D array
time_num = np.concatenate([time_num1,time_num2])
time_num
array([1435709400, 1435710000, 1435711200, ..., 1446331200, 1446331800,
       1446334800])
# Prepare the other coordinate variables: longitude & latitude
# We're using again the list of column names from the last section
lat1 = np.array(df1[df_colnames[4]])
lat2 = np.array(df2[df_colnames[4]])
lat = np.concatenate([lat1, lat2])

lon1 = np.array(df1[df_colnames[5]])
lon2 = np.array(df2[df_colnames[5]])
lon = np.concatenate([lon1, lon2])
# Prepare data variable. For simplicity, we'll include just one data variable, e.g. temperature
temp1 = np.array(df1[df_colnames[6]])
temp2 = np.array(df2[df_colnames[6]])
temp = np.concatenate([temp1, temp2])
# Create a xarray dataset from the data
ds = xr.Dataset(
    coords = {
        "trajectory": (["trajectory"], [trajName1, trajName2], {"cf_role":"trajectory_id"}),
        "rowSize": (["trajectory"], [rowSize1, rowSize2], {"long_name":"number of obs for this trajectory",
                                                           "sample_dimension":"obs"}),
        "time": (["obs"], time_num, {"standard_name":"time",
                                     "units": time_unit}),
        "lon": (["obs"], lon, {"standard_name":"longitude",
                               "units":"degrees_east"}),
        "lat": (["obs"], lat, {"standard_name":"latitude",
                               "units":"degrees_north"})
    },
    data_vars = {
        "temperature": (["obs"], temp, {"long_name":"Temperature",
                                        "units":"degree_C",
                                        "coordinates":"time lon lat"})
    },
    attrs = {"featureType": "trajectory",
             "Conventions": "CF-1.11"}
)

ds
<xarray.Dataset> Size: 217kB
Dimensions:      (obs: 6782, trajectory: 2)
Coordinates:
  * trajectory   (trajectory) <U24 192B 'Honey_Badger_(G3)_201507' 'Honey_Bad...
    rowSize      (trajectory) int64 16B 3810 2972
    time         (obs) int64 54kB 1435709400 1435710000 ... 1446334800
    lon          (obs) float64 54kB -154.1 -154.1 -154.1 ... -155.4 -155.4 0.0
    lat          (obs) float64 54kB 28.01 28.0 28.0 28.0 ... 20.2 20.2 20.2 0.0
Dimensions without coordinates: obs
Data variables:
    temperature  (obs) float64 54kB 25.3 25.3 25.2 25.2 ... 26.8 26.8 26.8 27.3
Attributes:
    featureType:  trajectory
    Conventions:  CF-1.11
ds.info()
xarray.Dataset {
dimensions:
	obs = 6782 ;
	trajectory = 2 ;

variables:
	float64 temperature(obs) ;
		temperature:long_name = Temperature ;
		temperature:units = degree_C ;
		temperature:coordinates = time lon lat ;
	<U24 trajectory(trajectory) ;
		trajectory:cf_role = trajectory_id ;
	int64 rowSize(trajectory) ;
		rowSize:long_name = number of obs for this trajectory ;
		rowSize:sample_dimension = obs ;
	int64 time(obs) ;
		time:standard_name = time ;
		time:units = seconds since 1970-01-01 00:00:00 ;
	float64 lon(obs) ;
		lon:standard_name = longitude ;
		lon:units = degrees_east ;
	float64 lat(obs) ;
		lat:standard_name = latitude ;
		lat:units = degrees_north ;

// global attributes:
	:featureType = trajectory ;
	:Conventions = CF-1.11 ;
}