U.S. JGOFS   Merged Data Product Methodology

  Merged data products are created by joining two or more data objects comprised of analyses from the same sampling device.   For example, many different investigators analyze water samples drawn from the same Niskin bottles.  The various types of data are then submitted to the Data Management Office (DMO) for inclusion in the larger database.  

  The DMO has developed a "merge" utility based on the join function from the jgofs Data Base Management System.   Data records from two data objects are combined into a temporary 'constructed object'.   Additional data objects are then merged with the constructed object until all data records have been added.   Data records must have a relationship that is evident in the data itself. All data collected by the same sampling device (Niskin bottle casts) share that relationship.   The parameter (variable) list of the newly "merged" object is the union of the variable lists of the two input objects.   Record (row) matching is attempted wherein records of the two objects match when the values for all parameters in the match set have the same value.   The resulting number of records is the number of records in the first object that have a match with some record in the second object.   In cases where more than one record have identical matched sets of values (replicates), only the first record appears in the merged object.   This is rare since replicates were discouraged.

For example, to begin making the Arabian Sea merged product we identified a base object, the bottle data, into which we would attempt to merge the HPLC pigments data.  

Object Name Common Parameters Additional Parameters
bottle event sta cast cast_type bot sta_std press temp sal_bot
HPLC pigments event sta cast cast_type bot chl_a1 chl_a2
merged product event sta sta_std cast cast_type bot press temp sal_bot chl_a1 chl_a2

The bottle data and HPLC pigments data have some variables (parameters) in common: event (unique sampling event number), sta (station number), cast, cast_type, bot (bottle number). This represents the match set (intersection) of variables. The values in the records (rows) are combined only if all values for every member of the match set are identical. This means if a row of data from the bottle object and one from the HPLC object have the same event, station, cast, bottle number and cast type (in this case, cast type CTD), then the remaining values in each record will appear in the same row in the resultant merged product.   If the values in the match set do not agree, then no data are included from the second data object for this row.

Arabian Sea bottle data   Arabian Sea HPLC pigments data
event
sta
sta_std
cast
bot
press
temp
sal_bot
O2_ml_L
NO3
PO4
 
event
sta
cast
bot
chl_a2
chl_a1
neox
1081534
1
N1
1
24
3.5
24.733
36.609
4.301
1.89
0.68
 
1081534
1
1
24
4
432
11
1081534
1
N1
1
23
3.1
24.733
36.609
4.294
1.93
0.68
 
1081534
1
1
23
3
433
15
1081534
1
N1
1
22
10.9
24.738
36.609
4.295
1.98
0.67
  HPLC samples not drawn from all bottles
1081534
1
N1
1
21
10.6
24.737
36.61
4.295
1.97
0.67
               
1081534
1
N1
1
20
21
24.738
36.609
4.293
1.97
0.67
 
1081534
1
1
20
3
418
13
1081534
1
N1
1
19
20.7
24.738
36.612
4.294
2.02
0.68
 
1081534
1
1
19
3
408
13
1081534
1
N1
1
18
30.9
24.744
36.61
4.295
2.01
0.68
               
1081534
1
N1
1
17
31
24.744
36.61
4.296
2.01
0.68
               
1081534
1
N1
1
16
41.1
24.738
36.61
nd
2.1
0.7
 
1081534
1
1
16
3
418
11
1081534
1
N1
1
15
41
24.741
36.61
4.29
2.14
0.7
 
1081534
1
1
15
3
390
13
1081534
1
N1
1
14
50.9
24.75
36.611
4.289
2.23
0.69
               
1081534
1
N1
1
13
51.1
24.747
36.61
4.295
2.23
0.69
               
1081534
1
N1
1
12
61
24.736
36.609
4.227
2.5
0.7
 
1081534
1
1
12
3
387
13
1081534
1
N1
1
11
61.2
24.737
36.608
4.214
2.55
0.71
 
1081534
1
1
11
3
370
11
1081534
1
N1
1
10
71.3
24.72
36.606
4.107
3.04
0.75
 
1081534
1
1
10
2
304
2
1081534
1
N1
1
9
71.4
24.699
36.607
4.102
2.99
0.75
               
1081534
1
N1
1
8
105.8
22.029
36.269
0.694
20.78
1.99
 
1081534
1
1
8
1
26
0
1081534
1
N1
1
7
105.5
22.026
36.269
0.695
20.91
2
               
1081534
1
N1
1
6
133.5
19.887
36.09
0.072
23
2.35
 
1081534
1
1
6
0
5
0
1081534
1
N1
1
5
133
19.902
36.09
0.069
23.04
2.35
               
1081534
1
N1
1
4
162.8
19.1
36.076
0.04
22.68
2.41
 
1081534
1
1
4
0
3
0
1081534
1
N1
1
3
162.7
19.07
36.077
0.04
22.61
2.4
               
1081534
1
N1
1
2
202.8
18.162
36.158
0.067
21.86
2.41
               
1081534
1
N1
1
1
251.7
17.356
36.295
0.1
21.47
2.41
               

Once the two objects in the example above have been merged (with 'nd' added in place of any missing data values), the resultant constructed object would be:

Arabian Sea merged bottle data product
event
sta
sta_std
cast
bot
press
temp
sal_bot
O2_ml_L
NO3
PO4
chl_a2
chl_a1
neox
1081534
1
N1
1
24
3.5
24.733
36.609
4.301
1.89
0.68
4
432
11
1081534
1
N1
1
23
3.1
24.733
36.609
4.294
1.93
0.68
3
433
15
1081534
1
N1
1
22
10.9
24.738
36.609
4.295
1.98
0.67
nd
nd
nd
1081534
1
N1
1
21
10.6
24.737
36.61
4.295
1.97
0.67
nd
nd
nd
1081534
1
N1
1
20
21
24.738
36.609
4.293
1.97
0.67
3
418
13
1081534
1
N1
1
19
20.7
24.738
36.612
4.294
2.02
0.68
3
408
13
1081534
1
N1
1
18
30.9
24.744
36.61
4.295
2.01
0.68
nd
nd
nd
1081534
1
N1
1
17
31
24.744
36.61
4.296
2.01
0.68
nd
nd
nd
1081534
1
N1
1
16
41.1
24.738
36.61
nd
2.1
0.7
3
418
11
1081534
1
N1
1
15
41
24.741
36.61
4.29
2.14
0.7
3
390
13
1081534
1
N1
1
14
50.9
24.75
36.611
4.289
2.23
0.69
nd
nd
nd
1081534
1
N1
1
13
51.1
24.747
36.61
4.295
2.23
0.69
nd
nd
nd
1081534
1
N1
1
12
61
24.736
36.609
4.227
2.5
0.7
3
387
13
1081534
1
N1
1
11
61.2
24.737
36.608
4.214
2.55
0.71
3
370
11
1081534
1
N1
1
10
71.3
24.72
36.606
4.107
3.04
0.75
2
304
2
1081534
1
N1
1
9
71.4
24.699
36.607
4.102
2.99
0.75
nd
nd
nd
1081534
1
N1
1
8
105.8
22.029
36.269
0.694
20.78
1.99
1
26
0
1081534
1
N1
1
7
105.5
22.026
36.269
0.695
20.91
2
nd
nd
nd
1081534
1
N1
1
6
133.5
19.887
36.09
0.072
23
2.35
0
5
0
1081534
1
N1
1
5
133
19.902
36.09
0.069
23.04
2.35
nd
nd
nd
1081534
1
N1
1
4
162.8
19.1
36.076
0.04
22.68
2.41
0
3
0
1081534
1
N1
1
3
162.7
19.07
36.077
0.04
22.61
2.4
nd
nd
nd
1081534
1
N1
1
2
202.8
18.162
36.158
0.067
21.86
2.41
nd
nd
nd
1081534
1
N1
1
1
251.7
17.356
36.295
0.1
21.47
2.41
nd
nd
nd

The resultant merged product becomes the 'base object' and additional data objects are added according to the list of data associated with that sampling device.   At the time this document was written, the Arabian Sea Niskin bottle merged product was comprised of 73 different data objects from 6 cruises.

 

R/V Thomas Thompson Cruise Number

Object Name

043

045

049

050

053

054

bottle

bacteria

   

total CO2

 

total organic C

HPLC pigments

particulate organic C and N

 

total organic N

particulate matter conc.

 

   

picoplankton

nanoplankton

       

microplankton

 

 

iodine speciation

 

       

chl fluor

 

 

 

trace Fe Al

 

 

sulfide peroxide

 

       

manganese

 

       

calcite production

 

 

 

low level NO3

       

 

log

 
back to top