Chapter-6: Create netCDF for trajectory Data

Chapter-6: Create netCDF for trajectory Data#

The CF Conventions recommended the following representations for trajectories:

Single Trajectory (H.4.2): A netCDF file containes a single trajectory.
Multidimensional array representation of trajectories (H.4.1): A netCDF file containes multiple trajectories, and each trajectory contains same number of observations/elements. This representation can also be applied on multiple trajectories with different number of observations, at the cost of wasting some storage space due to missing values.
Contiguous ragged array representation of trajectories (H.4.3): A netCDF file contains multiple trajectories with different number of elements, and one can control the order of writing (e.g. dataset is complete) For such a case, this representation is more efficient in storage space usage compared to H.4.1.
Indexed ragged array representation of trajectories (H.4.4): A netCDF file contains multiple trajectories with different number of elements, and the elements cannot be written in order

In this tutorial, we’re walking through the process of creating netCDF from text (CSV) files for trajectory data. For the showcase, we used “weather201507.csv” and “weather201510.csv” from the dataset Liquid Robotics Wave Glider, Honey Badger (G3), 2015, Weather[6].

Having the data ready, we’ll do the following:

The downloaded CSV file “weather201507.csv” contains a single trajectory. We’ll read the CSV file as a pandas dataframe, and create a netCDF file from it based on the template given in Appendix H.4.2.
Based on the data from the CSV file “weather201507.csv” and “weather201510.csv”, we’ll pack both trajectories into one netCDF in the form as given in Appendix H.4.3.

import os
from glob import glob
import numpy as np
import pandas as pd
import xarray as xr
import cftime
from datetime import datetime

# List available datasets. Please change it to your file path.
os.chdir('../src/data')
trj_files = glob(os.path.join(os.getcwd(), "dsg_trajectory", "*.csv"))

1. Create netCDF from a single trajectory#

# Inspect the dataset
df = pd.read_csv(trj_files[0])
df

	vehicleName	weather*	feed_version	datetime	latitude (decimal degrees)	longitude (decimal degrees)	temperature (C)	pressure (mBar)	avg_wind_speed (kt)	std_dev_wind_speed (kt)	avg_wind_direction (degrees T)	std_dev_wind_direction (degrees T)
0	Honey Badger (G3)	weather	1.0	2015-07-01T00:10:00Z	28.005397	-154.140788	25.3	1028.1	9.4	2.3	170.7	0.0
1	Honey Badger (G3)	weather	1.0	2015-07-01T00:20:00Z	28.003527	-154.140363	25.3	1029.0	9.0	1.9	167.8	0.0
2	Honey Badger (G3)	weather	1.0	2015-07-01T00:40:00Z	27.999757	-154.139467	25.2	1028.0	8.9	1.9	174.7	0.0
3	Honey Badger (G3)	weather	1.0	2015-07-01T00:50:00Z	27.997773	-154.139175	25.2	1028.0	8.8	2.0	171.3	0.0
4	Honey Badger (G3)	weather	1.0	2015-07-01T01:10:00Z	27.995253	-154.139087	25.3	1028.0	9.9	1.5	166.4	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...
3805	Honey Badger (G3)	weather	1.0	2015-07-31T23:10:00Z	25.876978	-145.019295	25.3	1018.3	16.8	1.9	35.1	0.0
3806	Honey Badger (G3)	weather	1.0	2015-07-31T23:20:00Z	25.874477	-145.021475	25.3	1018.0	16.8	2.0	38.7	0.0
3807	Honey Badger (G3)	weather	1.0	2015-07-31T23:30:00Z	25.871978	-145.023530	25.4	1018.3	16.9	1.7	35.0	0.0
3808	Honey Badger (G3)	weather	1.0	2015-07-31T23:40:00Z	25.869360	-145.025563	25.4	1018.0	16.1	2.0	38.6	0.0
3809	Honey Badger (G3)	weather	1.0	2015-07-31T23:50:00Z	25.866775	-145.027755	25.5	1018.3	16.0	2.2	34.7	0.0

3810 rows × 12 columns

# Transform datetime in string to datetime object
time_dt = [datetime.strptime(i, '%Y-%m-%dT%H:%M:%SZ') for i in df['datetime']]
# Print the first five date times
time_dt[:5]

[datetime.datetime(2015, 7, 1, 0, 10),
 datetime.datetime(2015, 7, 1, 0, 20),
 datetime.datetime(2015, 7, 1, 0, 40),
 datetime.datetime(2015, 7, 1, 0, 50),
 datetime.datetime(2015, 7, 1, 1, 10)]

# Set a reference time (time units)
time_units = 'seconds since 1970-01-01 00:00:00'
# Convert datetime to numerical values relative to the reference time
time_num = cftime.date2num(time_dt, time_units)
time_num

array([1435709400, 1435710000, 1435711200, ..., 1438385400, 1438386000,
       1438386600])

# List the column names of the dataframe
df_colnames = df.columns
df_colnames

Index(['vehicleName', 'weather*', 'feed_version', 'datetime',
       'latitude (decimal degrees)', 'longitude (decimal degrees)',
       'temperature (C)', 'pressure (mBar)', 'avg_wind_speed (kt)',
       'std_dev_wind_speed (kt)', 'avg_wind_direction (degrees T)',
       'std_dev_wind_direction (degrees T)'],
      dtype='object')

# Transfer the data of each column into array
lat = np.array(df[df_colnames[4]])
lon = np.array(df[df_colnames[5]])
temp = np.array(df[df_colnames[6]])
pressure = np.array(df[df_colnames[7]])
avg_wind_speed = np.array(df[df_colnames[8]])
avg_wind_direction = np.array(df[df_colnames[10]])

# Get the vehicle name as trajectory id
vehicleName = df['vehicleName'].unique().tolist()[0]
vehicleName

'Honey Badger (G3)'

ds.info()

xarray.Dataset {
dimensions:
	time = 3810 ;

variables:
	float32 temperature(time) ;
		temperature:long_name = Temperature ;
		temperature:units = degree_C ;
	float32 pressure(time) ;
		pressure:long_name = Pressure ;
		pressure:units = mBar ;
	float32 avg_wind_speed(time) ;
		avg_wind_speed:long_name = wind_speed ;
		avg_wind_speed:units = knots ;
	float32 avg_wind_direction(time) ;
		avg_wind_direction:standard_name = wind_from_direction ;
		avg_wind_direction:units = degree ;
	float64 time(time) ;
		time:standard_name = time ;
		time:units = seconds since 1970-01-01 00:00:00 ;
	float64 lat(time) ;
		lat:standard_name = latitude ;
		lat:units = degrees_north ;
	float64 lon(time) ;
		lon:standard_name = longitude ;
		lon:units = degrees_east ;
	<U17 trajectory() ;
		trajectory:long_name = Vehicle Name ;
		trajectory:cf_role = trajectory_id ;

// global attributes:
	:featureType = trajectory ;
	:Conventions = CF-1.11 ;
}

2. Create netCDF for multiple trajectories with different number of observations#

# Read both CSV files as pandas dataframe
df1 = pd.read_csv(trj_files[0])
df2 = pd.read_csv(trj_files[1])

df2.head()

	vehicleName	weather*	feed_version	datetime	latitude (decimal degrees)	longitude (decimal degrees)	temperature (C)	pressure (mBar)	avg_wind_speed (kt)	std_dev_wind_speed (kt)	avg_wind_direction (degrees T)
0	Honey Badger (G3)	weather	1.0	2015-10-01T00:00:00Z	25.474888	-153.806048	25.3	1016.4	27.2	3.0	18.4
1	Honey Badger (G3)	weather	1.0	2015-10-01T00:10:00Z	25.474348	-153.806277	25.5	1015.9	26.2	3.0	20.6
2	Honey Badger (G3)	weather	1.0	2015-10-01T00:20:00Z	25.473818	-153.806640	25.5	1015.9	25.9	2.4	22.0
3	Honey Badger (G3)	weather	1.0	2015-10-01T00:30:00Z	25.472875	-153.806823	25.7	1015.9	26.1	2.6	15.2
4	Honey Badger (G3)	weather	1.0	2015-10-01T00:40:00Z	25.471928	-153.806935	25.9	1015.8	25.9	3.1	20.9

# Get the number of observations of each trajectory
rowSize1 = len(df1.index)
rowSize2 = len(df2.index)

print("The number of observations in each trajectory is ", rowSize1, " and ", rowSize2)

The number of observations in each trajectory is  3810  and  2972

# Set the trajectory ID as vehicle name + time of the cruise
trajName1 = "Honey_Badger_(G3)_201507"
trajName2 = "Honey_Badger_(G3)_201510"
print("The name of each trajectory is ", trajName1, " and ", trajName2)

The name of each trajectory is  Honey_Badger_(G3)_201507  and  Honey_Badger_(G3)_201510

# Prepare coordinate variable of time.
# Transform the datetime from DFs to datetime objects
time_dt1 = [datetime.strptime(i, '%Y-%m-%dT%H:%M:%SZ') for i in df1['datetime']]
time_dt2 = [datetime.strptime(i, '%Y-%m-%dT%H:%M:%SZ') for i in df2['datetime']]

# Set a reference time (i.e. the unit of time)
time_unit = 'seconds since 1970-01-01 00:00:00'

# Convert datetime to numerical values relative to the reference time
time_num1 = cftime.date2num(time_dt1, time_unit)
time_num2 = cftime.date2num(time_dt2, time_unit)

# Join the time steps of both trajectories into a 1D array
time_num = np.concatenate([time_num1,time_num2])
time_num

array([1435709400, 1435710000, 1435711200, ..., 1446331200, 1446331800,
       1446334800])

# Prepare the other coordinate variables: longitude & latitude
# We're using again the list of column names from the last section
lat1 = np.array(df1[df_colnames[4]])
lat2 = np.array(df2[df_colnames[4]])
lat = np.concatenate([lat1, lat2])

lon1 = np.array(df1[df_colnames[5]])
lon2 = np.array(df2[df_colnames[5]])
lon = np.concatenate([lon1, lon2])

# Prepare data variable. For simplicity, we'll include just one data variable, e.g. temperature
temp1 = np.array(df1[df_colnames[6]])
temp2 = np.array(df2[df_colnames[6]])
temp = np.concatenate([temp1, temp2])

ds.info()

xarray.Dataset {
dimensions:
	obs = 6782 ;
	trajectory = 2 ;

variables:
	float64 temperature(obs) ;
		temperature:long_name = Temperature ;
		temperature:units = degree_C ;
		temperature:coordinates = time lon lat ;
	<U24 trajectory(trajectory) ;
		trajectory:cf_role = trajectory_id ;
	int64 rowSize(trajectory) ;
		rowSize:long_name = number of obs for this trajectory ;
		rowSize:sample_dimension = obs ;
	int64 time(obs) ;
		time:standard_name = time ;
		time:units = seconds since 1970-01-01 00:00:00 ;
	float64 lon(obs) ;
		lon:standard_name = longitude ;
		lon:units = degrees_east ;
	float64 lat(obs) ;
		lat:standard_name = latitude ;
		lat:units = degrees_north ;

// global attributes:
	:featureType = trajectory ;
	:Conventions = CF-1.11 ;
}

Chapter-6: Create netCDF for trajectory Data

Contents

Chapter-6: Create netCDF for trajectory Data#

1. Create netCDF from a single trajectory#

2. Create netCDF for multiple trajectories with different number of observations#