Ocean Science Minor Cruise data¶
Jody Klymak jklymak@uvic.ca
This directory contains data from the various Ocean Science Minor cruises since they started in 2007. Every Eos 314 cruise CTD data is represented here. From 2007-2019, most Eos/Bio 311 and Eos 312 CTD data sets are here as well.
The bulk of the data management has been devoted to CTD data - there is sporadic data from other sources.
20XXcruise directories¶
These contain the data collected for that year. Inside these directories, there is typically one or more cruises. Eg 2007cruises/200701
, 2007cruises/200705
, 2007cruises/200707
each represent a cruise, with the cruise name being yyyymm
, y
being the year, and m
being the month of the year.
In 2022 this naming convention changed a bit: 2022cruises/20220924
with the last two digits being the day of the month.
Inside these directories are a ctd
directory that typically looks like:
!ls 2007cruises/200705/ctd/
20070523_s1.csv 20070523_s2.csv 20070524_S3.csv 20070524_s8.csv 20070523_s1.hex 20070523_s2.hex 20070524_S3.hex 20070524_s8.hex 20070523_s1.mat 20070523_s2.mat 20070524_S3.mat 20070524_s8.mat 20070523_s12.csv 20070523_s2_1m.csv 20070524_S3_1m.csv 20070524_s8_1m.csv 20070523_s12.hex 20070523_s4.csv 20070524_s5.csv CtdGrid.mat 20070523_s12.mat 20070523_s4.hex 20070524_s5.hex CtdGrid.nc 20070523_s12_1m.csv 20070523_s4.mat 20070524_s5.mat 20070523_s1_1m.csv 20070523_s4_1m.csv 20070524_s5_1m.csv
So 20070523_s12.mat
was collected on 2007-05-23 at station id S12.
This data is "raw", in that it has not been binned verticaly. 1-m vertical bins are very useful, and are provided in CtdGrid.mat
and CtdGrid.nc
. The mat
file is a Matlab file, and the .nc
file is a netCDF file.
Note the cruise directories are not reprocessed, and the data in the files may vary from cruise to cruise as the processing changed. If you are going to use gridded data, it is recommended you do so as described next.
CtdCruiseGrids directory¶
This simply contains all the 1-m cruise grids, named by the cruise name. These grid files should all have the same variables, and be in the same units (if they are not let Jody know: jklymak@uvic.ca)
!ls CtdCruiseGrids/
200701.mat 201001.nc 201207.mat 201610.nc 202009.mat 200701.nc 201005.mat 201207.nc 201709.mat 202009.nc 200705.mat 201005.nc 201301a.mat 201709.nc 202110.mat 200705.nc 201006.mat 201301a.nc 201709b.mat 202110.nc 200707.mat 201006.nc 201301b.mat 201709b.nc 20220924.mat 200707.nc 201007.mat 201301b.nc 201710.mat 20220924.nc 200801.mat 201007.nc 201406.mat 201710.nc 20220928.mat 200801.nc 201101.mat 201406.nc 201809a.mat 20220928.nc 200805.mat 201101.nc 201407.mat 201809a.nc 20230923.mat 200805.nc 201105.mat 201407.nc 201809b.mat 20230923.nc 200807.mat 201105.nc 201407b.mat 201809b.nc 20230927.mat 200807.nc 201106.mat 201407b.nc 201901.mat 20230927.nc 200807b.mat 201106.nc 201509a.mat 201901.nc 20240918.mat 200807b.nc 201107.mat 201509a.nc 201909a.mat 20240918.nc 200901.mat 201107.nc 201510.mat 201909a.nc 20240925.mat 200901.nc 201201.mat 201510.nc 201909b.mat 20240925.nc 200905.mat 201201.nc 201609a.mat 201909b.nc AllCruises.mat 200905.nc 201205.mat 201609a.nc 202001.mat AllCruises.nc 200907.mat 201205.nc 201609b.mat 202001.nc 200907.nc 201206.mat 201609b.nc 202003.mat 201001.mat 201206.nc 201610.mat 202003.nc
import xarray as xr
import numpy as np
with xr.open_dataset('CtdCruiseGrids/201709.nc') as ds:
display(ds)
<xarray.Dataset> Dimensions: (depths: 324, time: 16) Coordinates: * depths (depths) float64 0.5 1.5 2.5 3.5 4.5 ... 320.5 321.5 322.5 323.5 * time (time) datetime64[ns] 2017-09-27T19:44:00 ... 2017-09-27T16:47:59 Data variables: (12/13) temp (depths, time) float64 ... cond (depths, time) float64 ... sal (depths, time) float64 ... pden (depths, time) float64 ... O2 (depths, time) float64 ... O2sat (depths, time) float64 ... ... ... Par (depths, time) float64 ... lat (time) float64 ... lon (time) float64 ... id (time) object ... alongx (time) float64 ... serial (time) object ...
The above shows what is inside a typical cruise grid; this uses python, but Matlab structures will be very similar. The grid has 324 depths in 1-m vertical bins, and this cruise had 16 stations, and hence 16 times. Variables like temp
, sal
O2
etc are then gridded onto the grid, and NaN are placed where there is no data.
In 2022 the CTD manufacturer was changed from Seabird CTD to a pair of RBR CTDs , so the files are slightly different:
with xr.open_dataset('CtdCruiseGrids/20220924.nc') as ds:
display(ds)
<xarray.Dataset> Dimensions: (depths: 324, time: 12) Coordinates: * depths (depths) float64 0.5 1.5 2.5 3.5 4.5 ... 320.5 321.5 322.5 323.5 cast (time) int64 ... * time (time) datetime64[ns] 2022-09-24T23:46:00.875000064 ... 2022-09-... Data variables: (12/14) pres (depths, time) float64 ... temp (depths, time) float64 ... cond (depths, time) float64 ... Flu (depths, time) float64 ... O2sat (depths, time) float64 ... sal (depths, time) float64 ... ... ... id (time) object ... lon (time) float64 ... lat (time) float64 ... alongx (time) float64 ... acrossx (time) float64 ... O2 (depths, time) float64 ...
CtdStationGrids¶
Similarly, it is often useful to look at time series of stations. These are stored in CtdStationGrids/
!ls CtdStationGrids/
A1.mat H1.mat J35.mat J6.mat S12.mat S25.mat S45.mat S8.mat A1.nc H1.nc J35.nc J6.nc S12.nc S25.nc S45.nc S8.nc A15.mat H2.mat J4.mat PB2.mat S1225.mat S3.mat S475.mat S9.mat A15.nc H2.nc J4.nc PB2.nc S1225.nc S3.nc S475.nc S9.nc A2.mat H3.mat J45.mat PB4.mat S125.mat S35.mat S5.mat test.mat A2.nc H3.nc J45.nc PB4.nc S125.nc S35.nc S5.nc test.nc A3.mat J1.mat J48.mat S1.mat S15.mat S4.mat S55.mat A3.nc J1.nc J48.nc S1.nc S15.nc S4.nc S55.nc A4.mat J2.mat J5.mat S10.mat S16.mat S425.mat S6.mat A4.nc J2.nc J5.nc S10.nc S16.nc S425.nc S6.nc A5.mat J3.mat J55.mat S11.mat S2.mat S425W.mat S7.mat A5.nc J3.nc J55.nc S11.nc S2.nc S425W.nc S7.nc
All the data¶
It is also useful to just have all the data in a single file. This can be found in CtdCruiseGrids/AllCruises.nc
and CtdCruiseGrids/AllCruises.mat
.
with xr.open_dataset('CtdCruiseGrids/AllCruises.nc') as ds:
display(ds)
<xarray.Dataset> Dimensions: (depths: 335, time: 486) Coordinates: * depths (depths) float64 0.5 1.5 2.5 3.5 ... 331.5 332.5 333.5 334.5 * time (time) datetime64[ns] 2007-01-22T11:53:49 ... 2024-09-25T22:... cast (time) int64 ... Data variables: (12/18) temp (depths, time) float64 ... cond (depths, time) float64 ... sal (depths, time) float64 ... pden (depths, time) float64 ... O2 (depths, time) float64 ... O2sat (depths, time) float64 ... ... ... cruise (time) object ... alongx (time) float64 ... cond0 (depths, time) float64 ... pres (depths, time) float64 ... acrossx (time) float64 ... water_depth (time) float64 ...
To help narrow down the cruise that may be interested in, this data set has a "cruise" variable that can be searched on. In python, for instance you could get all the data from the 20220924
cruise using where
:
with xr.open_dataset('CtdCruiseGrids/AllCruises.nc') as ds:
ds = ds.where(ds.cruise == '20220924', drop=True)
display(ds)
<xarray.Dataset> Dimensions: (depths: 335, time: 12) Coordinates: * depths (depths) float64 0.5 1.5 2.5 3.5 ... 331.5 332.5 333.5 334.5 * time (time) datetime64[ns] 2022-09-24T23:46:00.875000064 ... 2022... cast (time) int64 11 10 9 8 7 6 5 4 3 2 1 0 Data variables: (12/18) temp (depths, time) float64 15.17 15.0 14.64 14.35 ... nan nan nan cond (depths, time) float64 35.36 35.26 34.9 34.66 ... nan nan nan sal (depths, time) float64 28.09 28.12 28.07 28.05 ... nan nan nan pden (depths, time) float64 1.021e+03 1.021e+03 ... nan nan O2 (depths, time) float64 325.1 285.2 302.5 285.7 ... nan nan nan O2sat (depths, time) float64 123.1 107.7 113.3 106.3 ... nan nan nan ... ... cruise (time) object '20220924' '20220924' ... '20220924' '20220924' alongx (time) float64 -0.438 1.842 3.474 5.485 ... 21.73 26.31 29.02 cond0 (depths, time) float64 nan nan nan nan nan ... nan nan nan nan pres (depths, time) float64 0.9396 0.8128 0.8714 ... nan nan nan acrossx (time) float64 0.1403 0.294 0.5684 ... 0.05532 0.09388 0.08632 water_depth (time) float64 nan nan nan nan nan nan nan nan nan nan nan nan
Similarly if you wanted a single station of data, you can search on id
:
with xr.open_dataset('CtdCruiseGrids/AllCruises.nc') as ds:
ds = ds.where(ds.id == 'S4', drop=True)
display(ds)
<xarray.Dataset> Dimensions: (depths: 335, time: 42) Coordinates: * depths (depths) float64 0.5 1.5 2.5 3.5 ... 331.5 332.5 333.5 334.5 * time (time) datetime64[ns] 2007-01-25T10:24:49 ... 2024-09-25T22:... cast (time) int64 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 11 11 11 10 12 13 Data variables: (12/18) temp (depths, time) float64 nan nan nan nan nan ... nan nan nan nan cond (depths, time) float64 nan nan nan nan nan ... nan nan nan nan sal (depths, time) float64 nan nan nan nan nan ... nan nan nan nan pden (depths, time) float64 nan nan nan nan nan ... nan nan nan nan O2 (depths, time) float64 nan nan nan nan nan ... nan nan nan nan O2sat (depths, time) float64 nan nan nan nan nan ... nan nan nan nan ... ... cruise (time) object 200701 200705 200707 ... 20240918 20240925 alongx (time) float64 nan nan nan nan ... -0.5341 -0.5341 -0.45 -0.474 cond0 (depths, time) float64 nan nan nan nan nan ... nan nan nan nan pres (depths, time) float64 nan nan nan nan nan ... nan nan nan nan acrossx (time) float64 nan nan nan nan ... 0.1401 0.1401 0.1548 0.1408 water_depth (time) float64 nan nan nan nan nan ... nan nan nan 200.0 200.0
However, be aware that some of the station ids are written inconsistently. For instance "A1" and "H1" are almost the same, and sometimes are used interchangeably, or even "H1/A1". In that case it may be a good idea to use the alongx
variable:
with xr.open_dataset('CtdCruiseGrids/AllCruises.nc') as ds:
# get H1 along x:
alongxH1 = ds.where(ds.id=='A1', drop=True).alongx[0]
ds = ds.where((ds.alongx>alongxH1-1), drop=True)
ds = ds.where((ds.alongx<alongxH1+1), drop=True)
display(ds)
<xarray.Dataset> Dimensions: (depths: 335, time: 22) Coordinates: * depths (depths) float64 0.5 1.5 2.5 3.5 ... 331.5 332.5 333.5 334.5 * time (time) datetime64[ns] 2014-07-08T15:58:04 ... 2024-09-25T16:... cast (time) int64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Data variables: (12/18) temp (depths, time) float64 nan nan nan nan nan ... nan nan nan nan cond (depths, time) float64 nan nan nan nan nan ... nan nan nan nan sal (depths, time) float64 nan nan nan nan nan ... nan nan nan nan pden (depths, time) float64 nan nan nan nan nan ... nan nan nan nan O2 (depths, time) float64 nan nan nan nan nan ... nan nan nan nan O2sat (depths, time) float64 nan nan nan nan nan ... nan nan nan nan ... ... cruise (time) object 201407 201509a 201509a ... 20230927 20240925 alongx (time) float64 29.13 29.95 28.13 29.15 ... 29.16 29.16 29.44 cond0 (depths, time) float64 nan nan nan nan nan ... nan nan nan nan pres (depths, time) float64 nan nan nan nan nan ... nan nan nan nan acrossx (time) float64 nan nan nan nan ... 0.3589 0.9371 0.9371 0.1794 water_depth (time) float64 nan nan nan nan nan ... nan nan nan nan 312.0