|
U.S. JGOFS Merged Data Product Methodology
  Merged
data products are created by joining two or more data objects comprised
of analyses from the same sampling device.   For example, many different
investigators analyze water samples drawn from the same Niskin bottles.
The various types of data are then submitted to the Data Management
Office (DMO) for inclusion in the larger database.
  The DMO
has developed a "merge" utility based on the join function
from the jgofs Data
Base Management System.   Data records from two data objects
are combined into a temporary 'constructed object'.   Additional
data objects are then merged with the constructed object until all data
records have been added.   Data records must have a relationship
that is evident in the data itself. All data collected by the same sampling
device (Niskin bottle casts) share that relationship.   The parameter
(variable) list of the newly "merged" object is the union
of the variable lists of the two input objects.   Record (row) matching
is attempted wherein records of the two objects match when the values
for all parameters in the match set have the same value.   The resulting
number of records is the number of records in the first object that
have a match with some record in the second object.   In cases where more
than one record have identical matched sets of values (replicates),
only the first record appears in the merged object.   This is rare since
replicates were discouraged.
For example, to begin making the
Arabian Sea merged product we identified a base object, the bottle data,
into which we would attempt to merge the HPLC pigments data.  
Object
Name |
Common
Parameters |
Additional
Parameters |
bottle |
event
sta cast cast_type bot |
sta_std
press temp sal_bot |
HPLC
pigments |
event
sta cast cast_type bot |
chl_a1
chl_a2 |
merged
product |
event
sta sta_std cast cast_type bot press temp sal_bot chl_a1 chl_a2 |
The
bottle data and HPLC pigments data have some variables (parameters)
in common: event
(unique sampling event number), sta (station number), cast, cast_type,
bot (bottle number). This represents the match set (intersection) of
variables. The values in the records (rows) are combined only if
all values for every member of the match set are identical. This means
if a row of data from the bottle object and one from the HPLC object
have the same event, station,
cast, bottle number and cast type (in this case, cast type CTD),
then the remaining values in each record will appear in the same
row in the resultant merged product.   If the values in the match set
do not agree, then no data are included from the second data object for this row.
Arabian
Sea bottle data |
|
Arabian
Sea HPLC pigments data |
event
|
sta
|
sta_std
|
cast
|
bot
|
press
|
temp
|
sal_bot
|
O2_ml_L
|
NO3
|
PO4
|
|
event
|
sta
|
cast
|
bot
|
chl_a2
|
chl_a1
|
neox
|
1081534
|
1
|
N1
|
1
|
24
|
3.5
|
24.733
|
36.609
|
4.301
|
1.89
|
0.68
|
|
1081534
|
1
|
1
|
24
|
4
|
432
|
11
|
1081534
|
1
|
N1
|
1
|
23
|
3.1
|
24.733
|
36.609
|
4.294
|
1.93
|
0.68
|
|
1081534
|
1
|
1
|
23
|
3
|
433
|
15
|
1081534
|
1
|
N1
|
1
|
22
|
10.9
|
24.738
|
36.609
|
4.295
|
1.98
|
0.67
|
|
HPLC
samples not drawn from all bottles |
1081534
|
1
|
N1
|
1
|
21
|
10.6
|
24.737
|
36.61
|
4.295
|
1.97
|
0.67
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
20
|
21
|
24.738
|
36.609
|
4.293
|
1.97
|
0.67
|
|
1081534
|
1
|
1
|
20
|
3
|
418
|
13
|
1081534
|
1
|
N1
|
1
|
19
|
20.7
|
24.738
|
36.612
|
4.294
|
2.02
|
0.68
|
|
1081534
|
1
|
1
|
19
|
3
|
408
|
13
|
1081534
|
1
|
N1
|
1
|
18
|
30.9
|
24.744
|
36.61
|
4.295
|
2.01
|
0.68
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
17
|
31
|
24.744
|
36.61
|
4.296
|
2.01
|
0.68
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
16
|
41.1
|
24.738
|
36.61
|
nd
|
2.1
|
0.7
|
|
1081534
|
1
|
1
|
16
|
3
|
418
|
11
|
1081534
|
1
|
N1
|
1
|
15
|
41
|
24.741
|
36.61
|
4.29
|
2.14
|
0.7
|
|
1081534
|
1
|
1
|
15
|
3
|
390
|
13
|
1081534
|
1
|
N1
|
1
|
14
|
50.9
|
24.75
|
36.611
|
4.289
|
2.23
|
0.69
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
13
|
51.1
|
24.747
|
36.61
|
4.295
|
2.23
|
0.69
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
12
|
61
|
24.736
|
36.609
|
4.227
|
2.5
|
0.7
|
|
1081534
|
1
|
1
|
12
|
3
|
387
|
13
|
1081534
|
1
|
N1
|
1
|
11
|
61.2
|
24.737
|
36.608
|
4.214
|
2.55
|
0.71
|
|
1081534
|
1
|
1
|
11
|
3
|
370
|
11
|
1081534
|
1
|
N1
|
1
|
10
|
71.3
|
24.72
|
36.606
|
4.107
|
3.04
|
0.75
|
|
1081534
|
1
|
1
|
10
|
2
|
304
|
2
|
1081534
|
1
|
N1
|
1
|
9
|
71.4
|
24.699
|
36.607
|
4.102
|
2.99
|
0.75
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
8
|
105.8
|
22.029
|
36.269
|
0.694
|
20.78
|
1.99
|
|
1081534
|
1
|
1
|
8
|
1
|
26
|
0
|
1081534
|
1
|
N1
|
1
|
7
|
105.5
|
22.026
|
36.269
|
0.695
|
20.91
|
2
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
6
|
133.5
|
19.887
|
36.09
|
0.072
|
23
|
2.35
|
|
1081534
|
1
|
1
|
6
|
0
|
5
|
0
|
1081534
|
1
|
N1
|
1
|
5
|
133
|
19.902
|
36.09
|
0.069
|
23.04
|
2.35
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
4
|
162.8
|
19.1
|
36.076
|
0.04
|
22.68
|
2.41
|
|
1081534
|
1
|
1
|
4
|
0
|
3
|
0
|
1081534
|
1
|
N1
|
1
|
3
|
162.7
|
19.07
|
36.077
|
0.04
|
22.61
|
2.4
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
2
|
202.8
|
18.162
|
36.158
|
0.067
|
21.86
|
2.41
|
|
|
|
|
|
|
|
|
1081534
|
1
|
N1
|
1
|
1
|
251.7
|
17.356
|
36.295
|
0.1
|
21.47
|
2.41
|
|
|
|
|
|
|
|
|
Once the two objects in the example above have been
merged (with 'nd' added in place of any missing data values), the resultant
constructed object would be:
Arabian
Sea merged bottle data product |
event
|
sta
|
sta_std
|
cast
|
bot
|
press
|
temp
|
sal_bot
|
O2_ml_L
|
NO3
|
PO4
|
chl_a2
|
chl_a1
|
neox
|
1081534
|
1
|
N1
|
1
|
24
|
3.5
|
24.733
|
36.609
|
4.301
|
1.89
|
0.68
|
4
|
432
|
11
|
1081534
|
1
|
N1
|
1
|
23
|
3.1
|
24.733
|
36.609
|
4.294
|
1.93
|
0.68
|
3
|
433
|
15
|
1081534
|
1
|
N1
|
1
|
22
|
10.9
|
24.738
|
36.609
|
4.295
|
1.98
|
0.67
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
21
|
10.6
|
24.737
|
36.61
|
4.295
|
1.97
|
0.67
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
20
|
21
|
24.738
|
36.609
|
4.293
|
1.97
|
0.67
|
3
|
418
|
13
|
1081534
|
1
|
N1
|
1
|
19
|
20.7
|
24.738
|
36.612
|
4.294
|
2.02
|
0.68
|
3
|
408
|
13
|
1081534
|
1
|
N1
|
1
|
18
|
30.9
|
24.744
|
36.61
|
4.295
|
2.01
|
0.68
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
17
|
31
|
24.744
|
36.61
|
4.296
|
2.01
|
0.68
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
16
|
41.1
|
24.738
|
36.61
|
nd
|
2.1
|
0.7
|
3
|
418
|
11
|
1081534
|
1
|
N1
|
1
|
15
|
41
|
24.741
|
36.61
|
4.29
|
2.14
|
0.7
|
3
|
390
|
13
|
1081534
|
1
|
N1
|
1
|
14
|
50.9
|
24.75
|
36.611
|
4.289
|
2.23
|
0.69
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
13
|
51.1
|
24.747
|
36.61
|
4.295
|
2.23
|
0.69
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
12
|
61
|
24.736
|
36.609
|
4.227
|
2.5
|
0.7
|
3
|
387
|
13
|
1081534
|
1
|
N1
|
1
|
11
|
61.2
|
24.737
|
36.608
|
4.214
|
2.55
|
0.71
|
3
|
370
|
11
|
1081534
|
1
|
N1
|
1
|
10
|
71.3
|
24.72
|
36.606
|
4.107
|
3.04
|
0.75
|
2
|
304
|
2
|
1081534
|
1
|
N1
|
1
|
9
|
71.4
|
24.699
|
36.607
|
4.102
|
2.99
|
0.75
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
8
|
105.8
|
22.029
|
36.269
|
0.694
|
20.78
|
1.99
|
1
|
26
|
0
|
1081534
|
1
|
N1
|
1
|
7
|
105.5
|
22.026
|
36.269
|
0.695
|
20.91
|
2
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
6
|
133.5
|
19.887
|
36.09
|
0.072
|
23
|
2.35
|
0
|
5
|
0
|
1081534
|
1
|
N1
|
1
|
5
|
133
|
19.902
|
36.09
|
0.069
|
23.04
|
2.35
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
4
|
162.8
|
19.1
|
36.076
|
0.04
|
22.68
|
2.41
|
0
|
3
|
0
|
1081534
|
1
|
N1
|
1
|
3
|
162.7
|
19.07
|
36.077
|
0.04
|
22.61
|
2.4
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
2
|
202.8
|
18.162
|
36.158
|
0.067
|
21.86
|
2.41
|
nd
|
nd
|
nd
|
1081534
|
1
|
N1
|
1
|
1
|
251.7
|
17.356
|
36.295
|
0.1
|
21.47
|
2.41
|
nd
|
nd
|
nd
|
The resultant merged product becomes the 'base object' and additional data objects
are added according to the list of data associated with that sampling device.  
At the time this document was written, the Arabian Sea Niskin bottle merged product was
comprised of 73 different data objects from 6 cruises.
|
R/V
Thomas Thompson Cruise Number
|
Object Name
|
043
|
045
|
049
|
050
|
053
|
054
|
bottle
|
|
|
|
|
|
|
bacteria
|
|
|
|
|
|
|
total CO2
|
|
|
|
|
|
|
total organic C
|
|
|
|
|
|
|
HPLC pigments
|
|
|
|
|
|
|
particulate organic C and N
|
|
|
|
|
|
|
total organic N
|
|
|
|
|
|
|
particulate matter conc.
|
|
|
|
|
|
|
picoplankton
|
|
|
|
|
|
|
nanoplankton
|
|
|
|
|
|
|
microplankton
|
|
|
|
|
|
|
iodine speciation
|
|
|
|
|
|
|
chl fluor
|
|
|
|
|
|
|
trace Fe Al
|
|
|
|
|
|
|
sulfide peroxide
|
|
|
|
|
|
|
manganese
|
|
|
|
|
|
|
calcite production
|
|
|
|
|
|
|
low level NO3
|
|
|
|
|
|
|
log
|
|
|
|
|
|
|
|