Table of Contents
Introduction top
The Slocum Coastal Electric Glider is an Autonomous Underwater Vehicle (AUV) which propels itself through the water column by adjusting it's internal volume to weight ratio. The glider is powered by alkaline battery packs and is nominally equipped with a SeaBird 41CP CTD sensor for recording Conductivity, Temperature and Depth (pressure) at 0.5Hz. The glider may also be outfitted with a suite of additional instrumentation for measure in situ water properties.
Terminology
The following concepts and terms are defined to make the following discussion more transparent:
Slocum glider deployments are organized missions, which themselves contain individual segments. The raw data files are names using the following 2 conventions:
Data files are names according to an 8.3 naming convention in which the first 4 numbers correspond to the sequential mission number and the last 4 numbers correspond to the sequential segment number of the current mission.
The same data files, once transferred to shore, are typically
renamed by the dockserver application according to the following convention:
gliderName_yyyy_ddd_mmm_sss.*bd
where gliderName is the name of the glider,
yyyy is the current year,
ddd is
the (0-based) day of the year,
mmm is the (0-based) mission number for the
current day of the year and
sss is the (0-based) segment number of the current
mission.
Native Data Filetypes top
The glider stores all measured parameters (sensors) in 3 native binary filetypes:
Contains all of the logged sensors that the glider monitors during a deployment. These files are typically very large and are not transmitted to shore via iridium satellite phone link.
Contains a subset of the dbd files. Typically used to store all necessary engineering sensor data in the event that investigation of the glider's behavior is warranted. These filetypes are (typically) much smaller than the dbd and can be transmitted over the iridium satellite phone link.
The contents of this data file (sensors logged and logging frequency) are mbdlist.dat file, stored in the CONFIG directory onboard the glider.
Contains a (typically) smaller subset of the dbd data files. This filetype is the primary file which is used for logging and transmitting of realtime, in situ, data back to shore. These files (typically) contain position, depth-averaged current and science datastreams as well decimated flight performance data (ie: pitch, roll, etc.).
The contents of this data file (sensors logged and logging frequency) are configured using the sbdlist.dat file, stored in the CONFIG directory onboard the glider.
All 3 filetypes store data in binary format, which must be converted to ASCII via a few linux or windows shore-side executables. Technical documentation and further information on the nature of the binary format as well as utilties for converting them to ascii data can be found here.
Shore-Side Data Processing top
Once transmitted to shore, the raw slocum binary data files are converted to ASCII data and stored on our local fileserver, where they pass through a number of processing steps/levels, prior to being made publicly available. The goals of organizing the data processing into a series of processing levels are:
Combine data files with (possibly) different sets of sensor data into one uniform, easy-to-access data structure.
Many of the values measured by the science payload sensors are in engineering, "uncalibrated" sensor units or can be used to calculate other ocean properties of interest.
We recognize the need for data quality assurance and control. To this end, we strive to implement a series of checks to ensure, to the best of our ability, that users are confident in the quality of data the gliders provide.
Filetypes and APIs top
A full description of slocum glider native raw data filetypes and their structure is located here and is not the subject of this section. Instead, we aim to describe the various stages of data processing used at the Coastal Ocean Observation Laboratory. The majority of data processing is performed using the Matlab programming language. As such, many of the intermediate processing level data structures are Matlab native (.mat). However, as of this writing (2009-March), we have also chosen to store the data in the NetCDF file format to allow for machine and platform independent data access through a number of APIs including, but not limited to:
Processing Levels top
The following is an in-depth explanation of the various levels of processed slocum glider data created and used at RU-COOL. Source code and documentation of this code is here
The first step in processing is to combine the raw .dbd, .sbd, and/or .mbd data files to create a consistent data structure. This structure provides access to the dataset using a notation that is similiar to that of the raw data files.
As discussed elsewhere, a key feature of the slocum glider is configurability. The user may specify which sensors are logged and how often they are logged to the raw data files. As the data density requirements as well as the state of the glider's payload can vary signficantly within a deployment, data files from the same deployment often contain different collections of sensor data. Thus, the need for a structured baseline dataset allowing easy access to all segment data.
Level 0 data structures meet the following requirements:
The following is an example of a lvl0 data structure:
lvl0 = c_fin: 1 c_heading: 2 c_pitch: 3 c_wpt_lat: 4 c_wpt_lon: 5 m_depth: 6 m_fin: 7 m_final_water_vx: 8 m_final_water_vy: 9 m_gps_lat: 10 m_gps_lon: 11 m_heading: 12 m_pitch: 13 m_present_time: 14 m_pressure: 15 m_roll: 16 m_water_cond: 17 m_water_depth: 18 m_water_pressure: 19 m_water_temp: 20 sci_ctd41cp_timestamp: 21 sci_m_present_time: 22 sci_water_cond: 23 sci_water_pressure: 24 sci_water_temp: 25 x_software_ver: 26 data: [1256166x26 double] source: {582x5 cell} meta: [1x1 struct]
The source field is a cell array containing the segment name (without the filetype extension), the row indices into the data matrix containing the segment data, the 8.3 filename as stored on the glider, the filetype extension (sbd, mbd, dbd, sf_dbd), and the creation time of the file.
lvl0.source = 'ru22_2008_260_5_0_' [ 1] [ 572] '00820000' 'sf_dbd' '17-Sep-2008 16:46:40' 'ru22_2008_260_6_0_' [ 573] [2441] '00830000' 'sf_dbd' '17-Sep-2008 16:46:40' 'ru22_2008_260_6_1_' [2442] [2453] '00830001' 'sf_dbd' '17-Sep-2008 16:46:40' 'ru22_2008_260_6_2_' [2454] [4095] '00830002' 'sf_dbd' '17-Sep-2008 16:46:40' 'ru22_2008_260_6_3_' [4096] [4107] '00830003' 'sf_dbd' '17-Sep-2008 19:33:20' 'ru22_2008_260_6_4_' [4108] [5689] '00830004' 'sf_dbd' '17-Sep-2008 19:33:20' 'ru22_2008_260_6_5_' [5690] [5700] '00830005' 'sf_dbd' '17-Sep-2008 19:33:20' 'ru22_2008_260_6_6_' [5701] [7324] '00830006' 'sf_dbd' '17-Sep-2008 19:33:20' 'ru22_2008_260_6_7_' [7325] [7335] '00830007' 'sf_dbd' '17-Sep-2008 19:33:20' 'ru22_2008_260_6_8_' [7336] [8957] '00830008' 'sf_dbd' '17-Sep-2008 19:33:20'
The meta field contains a structured array which holds the deployment meta data:
lvl0.meta = glider: 'ru22' deployDate: 733668.625 recoverDate: 733703.885416667 projectDate: 'marcoos' sensorPackages: [1x1 struct] comments: {5x1 cell} deploymentStatus: 'archive' recovered: 1 fileType: 'lvl0' fileName: '/home/coolgroup/auvs/archive/080917-081022-ru22/ru22_20080917T1500_marcoos_arc_lvl0.mat' sensorUnits: [1x1 struct]
If the RECOVER_DATE field (contained in the glider's deployment configuration file) is empty, recoverDate is left empty and recovered is set to 0, signifying that the deployment is current. Otherwise, recoverDate contains a matlab datenum datenumber and recovered is set to 1, signifying the deployment is over and the glider has either been lost or recovered successfully.
The sensorPackages field contains a data structure describing the sensor type and containing the sensor serial number:
lvl0.meta.sensorPackages = TYPE: 'ctd' SN: '0121'
where TYPE is the sensor type and SN is the sensor serial number. This information is parsed from the glider's configuration file for all sensor packages described by the TYPE and SN configuration parameters.
sensorUnits is a structured array containing the raw units for each included sensor as defined in masterdata:
lvl0.meta.sensorUnits = c_fin: 'rad' c_heading: 'rad' c_pitch: 'rad' c_wpt_lat: 'lat' c_wpt_lon: 'lon' m_depth: 'm' m_fin: 'rad' m_final_water_vx: 'm/s' m_final_water_vy: 'm/s' m_gps_lat: 'lat' m_gps_lon: 'lon' m_heading: 'rad' m_pitch: 'rad' m_present_time: 'timestamp' m_pressure: 'bar' m_roll: 'rad' m_water_cond: 'S/m' m_water_depth: 'm' m_water_pressure: 'bar' m_water_temp: 'degC' sci_ctd41cp_timestamp: 'sec' sci_m_present_time: 'timestamp' sci_water_cond: 'S/m' sci_water_pressure: 'bar' sci_water_temp: 'degC' x_software_ver: 'nodim'
The units of each sensor are defined in masterdata. A structured array containing the sensor name as the field name and the units as the field value is used to assign the units. The default sensor units file is located here: ./auvs/auvMeta/sensorUnits.mat.
Background: Two Persistor CF1 chips are used to control the flight and scientific functions of the glider. The flight controller chip is responsible for glider flight and communications and the science controller is responsible for the operation and logging of science payload data. Each chip contains a clock and it is critical that the two clocks are synched. This allows for accurate downstream QA/QC of scientific datasets using glider attitude (ie: pitch, roll, etc.) information.
While the clocks are synched everytime the glider acquires a GPS position, it is still very likely that at least a portion of the data records will have flight controller and science controller timestamps that are mis-aligned. This is a result of the buffering between the flight controller and science controller. More information and possible work-arounds can be found here.
The procedure for merging flight controller and science controller
timestamps is summarized as follows:
A level 1 data structure is built from a lvl0 data structure. The level 1 standard provides the user with a dataset in which all sensors are properly timestamped, regardless of whether the sensor was measured on the flight controller or the science controller.
Level 1 data structures meet the following requirements:
The following fields and a corresponding data arrays have been added to a level 1 data structure:
The structure of a level 1 data structure is displayed below:
lvl1 = c_fin: 1 c_heading: 2 c_pitch: 3 c_wpt_lat: 4 c_wpt_lon: 5 m_battery: 6 m_battery_inst: 7 m_fin: 8 m_final_water_vx: 9 m_final_water_vy: 10 m_gps_lat: 11 m_gps_lon: 12 m_heading: 13 m_pitch: 14 m_present_time: 15 m_pressure: 16 m_roll: 17 m_science_clothesline_lag: 18 m_science_sent_some_data: 19 m_vacuum: 20 m_water_cond: 21 m_water_depth: 22 m_water_pressure: 23 m_water_temp: 24 sci_ctd41cp_timestamp: 25 sci_m_present_time: 26 sci_water_cond: 27 sci_water_pressure: 28 sci_water_temp: 29 x_software_ver: 30 alignMask: 31 timestamp: 32 pressure: 33 depth: 33 proIndex: 34 proDir: 35 latitude: 36 longitude: 37 trackDistance : 38 data: [1256166x38 double] source: {582x8 cell} meta: [1x1 struct]
Two new elements, specifiying which raw glider timestamp and pressure sensor was used to fill in the timestamp and pressure arrays for the specified segment, have been added to the source cell array:
lvl1.source = Columns 1 through 7 'ru22_2008_260_5_0_sf_dbd' [ 1] [ 572] '00820000' '17-Sep-2008 16:46:40' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_0_sf_dbd' [ 573] [2441] '00830000' '17-Sep-2008 16:46:40' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_1_sf_dbd' [2442] [2453] '00830001' '17-Sep-2008 16:46:40' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_2_sf_dbd' [2454] [4095] '00830002' '17-Sep-2008 16:46:40' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_3_sf_dbd' [4096] [4107] '00830003' '17-Sep-2008 19:33:20' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_4_sf_dbd' [4108] [5689] '00830004' '17-Sep-2008 19:33:20' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_5_sf_dbd' [5690] [5700] '00830005' '17-Sep-2008 19:33:20' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_6_sf_dbd' [5701] [7324] '00830006' '17-Sep-2008 19:33:20' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_7_sf_dbd' [7325] [7335] '00830007' '17-Sep-2008 19:33:20' 'sci_ctd41cp_timestamp' 'sci_water_pressure' 'ru22_2008_260_6_8_sf_dbd' [7336] [8957] '00830008' '17-Sep-2008 19:33:20' 'sci_ctd41cp_timestamp' 'sci_water_pressure'
fileType and fileName fields of the meta data structure have been updated:
lvl1.meta = glider: 'ru22' deployDate: 733668.625 recoverDate: 733703.885416667 projectDate: 'marcoos' sensorPackages: [1x1 struct] comments: {5x1 cell} deploymentStatus: 'archive' recovered: 1 fileType: 'lvl1' fileName: '/home/coolgroup/auvs/archive/080917-081022-ru22/ru22_20080917T1500_marcoos_arc_lvl1.mat' sensorUnits: [1x1 struct]
Units for the newly added fields have been added to the sensorUnits field of the meta data structure:
lvl1.meta.sensorUnits = c_fin: 'rad' c_heading: 'rad' c_pitch: 'rad' c_wpt_lat: 'lat' c_wpt_lon: 'lon' m_battery: 'volts' m_battery_inst: 'volts' m_fin: 'rad' m_final_water_vx: 'm/s' m_final_water_vy: 'm/s' m_gps_lat: 'lat' m_gps_lon: 'lon' m_heading: 'rad' m_pitch: 'rad' m_present_time: 'timestamp' m_pressure: 'bar' m_roll: 'rad' m_science_clothesline_lag: 's' m_science_sent_some_data: 'nodim' m_vacuum: 'inHg' m_water_cond: 'S/m' m_water_depth: 'm' m_water_pressure: 'bar' m_water_temp: 'degC' sci_ctd41cp_timestamp: 'timestamp' sci_m_present_time: 'timestamp' sci_water_cond: 'S/m' sci_water_pressure: 'bar' sci_water_temp: 'degC' x_software_ver: 'nodim' alignMask: 'nodim' timestamp: 'seconds' pressure: 'decibars' depth: 'meters' proIndex: 'nodim' proDir: 'nodim' latitude: 'degrees_minutes' longitude: 'degrees_minutes' trackDistance: 'meters'
The goal of creating level 0 and level 1 structures is to provide a consistent data structure for accessing glider deployment data. Retrieving records from both data structures is similiar to accessing records from the raw (.dbd, .sbd, or .mbd) data files. The following is a brief description of native data file access as well as methods and routines for accessing records from the dbd data structures created above.
Once the segment binary files have been converted to .dat and .m files, segment data can be loaded into the workspace by typing the name of the segment file on the command line:
>> ru22_2009_263_4_0_sbd >> whos Name Size Bytes Class Attributes c_alt_time 1x1 8 double global c_heading 1x1 8 double global data 455x28 101920 double global f_fin_offset 1x1 8 double global m_battpos 1x1 8 double global m_fin 1x1 8 double global m_final_water_vx 1x1 8 double global m_final_water_vy 1x1 8 double global m_gps_lat 1x1 8 double global m_gps_lon 1x1 8 double global m_heading 1x1 8 double global m_pitch 1x1 8 double global m_present_time 1x1 8 double global m_pressure 1x1 8 double global m_roll 1x1 8 double global m_tot_num_inflections 1x1 8 double global m_water_depth 1x1 8 double global run_name 1x31 62 char global sci_ctd41cp_timestamp 1x1 8 double global sci_water_cond 1x1 8 double global sci_water_pressure 1x1 8 double global sci_water_temp 1x1 8 double global start 1x1 8 double u_hd_fin_ap_gain 1x1 8 double global u_hd_fin_ap_hardover_holdoff 1x1 8 double global u_hd_fin_ap_igain 1x1 8 double global u_hd_fin_ap_inflection_holdoff 1x1 8 double global u_min_water_depth 1x1 8 double global u_pitch_ap_deadband 1x1 8 double global u_pitch_ap_gain 1x1 8 double global u_pitch_max_delta_battpos 1x1 8 double global
The m-file loads the data contained in the corresponding segment .dat file into a variable called 'data'. A global variable named after each sensor is declared and holds the column index for the array of sensor data. So accessing the array of glider timestamps is done with the following:
>> data(:,m_present_time) ans = 1253555069.26578 1253555131.8306 1253555185.3468 1253555227.82458 1253555234.59711 1253555247.05344 1253555254.84457 1253555263.6499 1253555271.70105 1253555280.54303 1253555287.96317 1253555295.64926
The slocum dbd data structures are organized in much the same way as the raw .m an .dat files described above. The data field contains a MxN array of all segment data records. All other fields containing a numeric scalar are the column indices into data. Accessing the array of glider timestamps is done with the following:
>> load ru22_20080917T1500_marcoos_arc_lvl0 >> whos Name Size Bytes Class Attributes lvl0 1x1 301957711 struct >> lvl0 lvl0 = c_fin: 1 c_heading: 2 c_pitch: 3 c_wpt_lat: 4 c_wpt_lon: 5 m_battery: 6 m_battery_inst: 7 m_fin: 8 m_final_water_vx: 9 m_final_water_vy: 10 m_gps_lat: 11 m_gps_lon: 12 m_heading: 13 m_pitch: 14 m_present_time: 15 m_pressure: 16 m_roll: 17 m_science_clothesline_lag: 18 m_science_sent_some_data: 19 m_vacuum: 20 m_water_cond: 21 m_water_depth: 22 m_water_pressure: 23 m_water_temp: 24 sci_ctd41cp_timestamp: 25 sci_m_present_time: 26 sci_water_cond: 27 sci_water_pressure: 28 sci_water_temp: 29 x_software_ver: 30 data: [1256166x30 double] source: {582x6 cell} meta: [1x1 struct] >> lvl0.data([1:12],lvl0.m_present_time) ans = 1221666925.40518 1221667032.09711 1221667065.35141 1221667069.73843 1221667074.89362 1221667080.02219 1221667085.69919 1221667095.17737 1221667100.33185 1221667105.92688 1221667112.04996 1221667122.78406
Once the raw dbd files are processed to the level 1 standard, derived parameters are calculated and added to level 1 data structure for all instruments contained in the lvl1.meta.sensorPackages structure. If a CF Standard Name has been defined, this name is used. If a CF standard name is found for an existing paramter, the name is added to the data structure as an alias to the original data column. The units of the derived paramters are added to the lvl1.meta.sensorUnits structure. The new data structure is saved to the filename generated by replacing the "lvl" string with "payloads" in lvl1.meta.fileName.
The following derived CTD parameters are added to the level 1 data structure:
The following derived vehicle parameters are also added as they may be useful for downstream CTD data processing: