-
Notifications
You must be signed in to change notification settings - Fork 24
F. Step 4: Mean ratio of spot
Having derived the Ndod (= Nscans - 1) interpolated ratio values and errors in Step 3, the next step is to combine these values into a single isotopic ratio (per 'ratio of interest', per analysis), to match the values tabulated for the measured nuclide-ratios on the StandardData and SampleData sheets of the SQUID-workbook.
SQUID offers users a choice of calculation method: EITHER the error-weighted linear regression of interpolated ratios vs time, with the regression-error calculated at analysis midtime OR the error-weighted mean of interpolated ratios and its error, independent of time. (In fact, SQUID allows sub-choices, depending on whether each 'ratio of interest' involves nuclides of a single element or not, but we will ignore the finer distinctions for the moment.)
This implies the existence of a Boolean 'UserLinFits' (= TRUE when the user preference is for mean ratios to be calculated via error-weighted linear regression vs time, FALSE when the user preference is for mean ratios to be calculated via error-weighted mean independent of time).
As stipulated in the early part of Step 3, single-scan data (i.e. Ndod = Rct = N = 0) will never be subjected to Dodson interpolation, and thus will never occur as input to WtdLinCorr. Similarly, at the end of Step 3, specific provision is made for two-scan data (i.e. Ndod = Rct = N = 1), because although Dodson interpolation has taken place, it has generated only a single value and error, so there is nothing to 'average'. Therefore, Step 4 is applicable only to nuclide-ratios collected over three or more scans (i.e. Ndod = Rct = N > 1; admittedly, this encompasses the vast majority of 'proper' SHRIMP data).
But in addition to this, Ludwig specified that linear regression to analysis midtime calculation is not permitted unless Ndod = Rct = N > 3, regardless of the specified UserLinFits (i.e. if UserLinFits = TRUE, but N = 2 or 3, then the calculation is performed as error-weighted average instead, without user consultation). These scenarios are best handled by logic preceding the call of the calculation routines (as per the code below), but it should be noted that the WtdLinCorr subroutine (invoked below) contains its own check, as a backup that is redundant in this particular (Step 4) application of the WtdLinCorr subroutine.
These calculations are outlined in detail here, which includes references to documentation of the subroutine WtdLinCorr as this is a generalised function used by SQUID 2.50 in a variety of contexts. In addition to WtdLinCorr, Step 4 also requires the subroutines DeletePoint, WeightedLinearCorr, and WtdAvCorr, all of which are now documented.
In Calamari, these calculations culminate in the Calamari CSV file ‘SQUID_04…”, which is an array matching the contents of the ‘[NUM]/[DEN]’ and associated ‘%err’ columns within the StandardData and SampleData worksheets of a SQUID-workbook, but also incorporating some other metadata. The calculations incorporated into this array require a user choice regarding whether ‘spot mean’ isotopic ratios are to be calculated from their constituent Within-Spot Ratios as time-invariant ‘spot averages’, or linear regression of the Within-Spot Ratios vs time, interpolated to the time-midpoint of the analysis. The output CSV has one row per analysis, and 3 columns per predefined ‘ratio of interest’ (comprising index of identified outlier [if any], ‘ratio of interest’ mean value, and 1sigma percentage error in the ratio-mean value). Its structure is described in detail here.
Value-by-value comparisons of the Step 4 outputs from SQUID 2.50 and Calamari for a specified demonstration XML file are documented at Synthesis of Steps 1–4… below.
Replication (in XLS) of Steps 3 and 4 from Step 1: isotopic ratios (s ‘spot average’ and ‘linear regression’) for a single spot
This Excel workbook (100142_OG1.7_SpotAvg_LinReg.xls) contains ‘longhand’ working of VBA code for a single analysis (in this case OG1.7.1.1, from session 100142), dealing specifically with the case SBM-normalisation = YES. It replicates the Dodson interpolation process for each ‘ratio of interest’, and replicates the ‘spot-ratio’ calculation process for both LinReg and SpotAvg (including the run-through of the ‘DeletePoint’ process).
From left to right:
- Beneath heading (0) is just a label to use for the analysis; manual entry.
- Beneath each heading (1)–(4) is simply Copy-Paste (as Values) from the relevant source SQUID-books for the analysis of interest. Headings (1) and (2) come from any SQUID-book generated from session 100142, but note that (3) and (4) were obtained from different SQUID-books (the first with user-preferences set to ‘spot average’ throughout, the second run with user-preferences set to ‘linear regression’ throughout). In all four cases, this is just an easy way to get hold of the SQUID-generated data for comparison. There are no calculations on this sheet.
- Rows 1–10 are copied directly from rows 5–14 of InputsChecks (‘tot cts mass’ values have already been calculated as 10*[biweight of 10 integrations], and then deadtime-corrected, which is why none of the values are exact integers).
- Rows 12–20 comprise 5 columns per mass-station, times copied from above, with CPS values for ions and SBM calculated, and NetPkSigma (for TotCPS) from CalcVariance. Note at cell L21 that mean background is a simple arithmetic average of the scans, when count-rates are low.
- Rows 23–32 calculate ‘net’ ion-counts (background-corrected) and ‘net’ SBM counts (SBMzero-corrected), with ‘peak Fractional errors’ (PkFerrs) calculated, and converted to (absolute) PkSig values. There is no normalisation to SBM (i.e. SBMNetCPS is calculated purely for the use of rows 34–43 below). These are the values used as input to the Dodson interpolation process when SBM = NO (i.e. they are not used in this particular workbook). 204 and background are not included in these calculations because very low count-rate peaks (and ratios that use them) are not subjected to Dodson interpolation.
- Rows 34–43 calculate ‘net’ ion counts normalised to the ‘net’ SBM counts. PkFerr and PkSig are calculated as above, but PkSigSBM is augmented with a small additional uncertainty related to counting stats on the SBM. These are the values used as input to the Dodson interpolation process when SBM = YES (i.e. the scenario applicable in this particular workbook).
- Row 45 and below implement the Dodson interpolation calculations for each of the 9 ‘ratios of interest’ in turn. Note that each set of ratio-calculations occupies 6 columns (i.e. A–F for 207/206, G–L for 208/206, and so on, across to AW–BB for 206/254). Within each of these 6-column segments:
- Row 47 defines the structure of the ratio. SQUID needs this defined as a ‘low mass’ (i.e. chronologically measured first during analysis) and a ‘high mass’ (measured second), and then an indication of which mass is the denominator. This is because the periodicity of the Interpolation is defined entirely by the ‘first-measured’ (i.e. low-mass) peak, irrespective of whether that peak is the numerator or the denominator.
- Rows 50–55 are (currently) static Copy-Pastes of the critical rows and columns needed for the calculations. Because SBM = YES, the critical rows are 38–43 everywhere, and within each 5-column mass-block, the critical ‘sub-columns’ are 1, 2 and 5. So to fill A50–C55 for the low-mass peak 206, the data in A50:B55 = P38:Q43, and the data in C50:C55 = T38:T43. Likewise in D50:F55 for high mass peak 207: the data come from rows 38–43 (SBM = YES), and columns 1, 2 and 5 (= columns U, V and Y for mass 207).
- Rows 58–72 perform the Dodson interpolation, producing N–1 (i.e. 5) data points from the N (i.e. 6) scan values. The time data define the scan-to-scan ‘period’ of the interpolation and the within-scan time-offset between the two peaks involved. These calculations culminate in the Dodson-interpolated InterpRatVal in the firth column of the 6-column block (e.g. E68:E72). These are the ratio values that appear in the With-Spot Ratios sheet of a SQUID-book.
- Rows 74–80 calculate the 1sigma absolute error associated with InterpRatVal, culminating in InterpRatSig. These are the ‘err’ values associated with the ratio-values in the Within-Spot Ratios sheet of a SQUID-book.
- Rows 83–89 are simply a forensic comparison of the calculations in the sheet (captured in the first 3 columns) against the values returned by SQUID during normal running (second 3 columns; extracted from the InputsChecks sheet). These matches are supposed to be perfect to 15 significant figures.
This too is arranged as 6-column blocks, with the ‘ratios of interest’ in the same order as the preceding sheet (i.e. A–F for 207/206, G–L for 208/206, and so on, across to AW–BB for 206/254). Within each of these 6-column segments:
- Row 2 enumerates the Dodson-interpolated ratios (i.e. Nscans–1), and calculates MSWDratToler, which is the ‘shrinkage’ factor needed in the one-reject MSWD before any one of the Dodson points will even be considered as an outlier (a necessary, but not always sufficient, condition).
- Rows 5–9 are the sigma-rho matrix for the Dodson points i.e. the diagonal elements are the InterpRatSig values, and the symmetric, immediate off-diagonal elements (non-zero) represent the correlated nature of the adjacent Dodson points (i.e. the error on Dodson point 1, calculated from interpolating original scans 1 and 2, is correlated with the error on Dodson point 2, calculated from interpolating original scans 2 and 3, because the two errors being considered share original scan 2 — this was the thrust of Ludwig (2009) in Chemical Geology, where he explained why SQUID 1 ratio-errors were underestimated, whereas SQUID 2.50 gets them right).
- Rows 12–16 are the ratio values (y1, which is superfluous, and y2) and the ratio errors (SigmaY, which is just the collection of the diagonal elements of SigRho). Note that the median of SigmaY is determined.
- Rows 18–23 is a modified version of the original SigRho matrix, in which all diagonal elements with values smaller than median(SigmaY) are replaced with median(SigmaY). Diagonal elements with values larger than median(SigmaY), and off-diagonal elements, are unchanged. Ludwig has never described the reasoning behind this manipulation. SigRho2 (rows 25–30) is a copy of SigRho1.
- Row 32 starts the process of calculating the time-invariant spot-average and its error (note that there are no time-values anywhere on this sheet). The notation ‘i =’ relates to the Dodson point being excluded/rejected from the calculation: i = 0 here means that we are dealing with the ‘no-rejection’ case, to begin with. The first step is to construct the variance-covariance matrix OmegaInv (rows 34–38; diagonal elements are the squares of the sigma-values in SigRho2, symmetric off-diagonal elements are the covariances).
- Rows 41–45 are the inverse of OmegaInv (i.e. Omega), calculated using Excel’s MINVERSE array function. This matrix is the basis for both the numerator and the denominator.
- Rows 48–53 are the symmetric Numerator matrix, generating by multiplying the elements of Omega by the various two-component sums of the Dodson ratio-values.
- Rows 55–58 show how the various matrix-elements are summed and manipulated to give the initial estimate of the ratio-mean (MeanVal) and its error (SigmaMeanVal). Note that there has not yet been any evaluation of possible excess scatter.
- Rows 60–67 evaluate excess scatter, and I’ve fitted the various vectors to save space. First is UnWtdResids (A61:A65) which is simply the column-vector of residual generated by subtracting MeanVal from each of the original Dodson ratio-values held in y2 (rows 12–16). Next, TransposeUnwtdResids (B61:F61) is the UnWtdResids column vector simply transposed into a row-vector for matrix multiplication purposes. TempMat (B64:F64) is the product of matrix multiplication (using Excel’s MMULT array function) on TransposeUnwtdResids and Omega, and finally, the single cell SumWtdResids (B67) is the result of a further matrix multiplication, by UnWtdResids. These MMULT operations are identifiable in equations presented by Ludwig (2009).
- Rows 68–71 re-present the no-rejection MeanVal (as InterW(0); in this notation, the number in the parentheses identifies the Dodson point excluded/rejected from the calculation, with (0) denoting the no-rejection case), and no-rejection SigmaMeanVal (as InterSigmaW(0)). But now SumWtdResids is used to calculate MswdW(0) for N–1 degrees of freedom, and the corresponding ProbW(0) value. If ProbW(0) < 0.1 (value hard-coded), the SpotAvg_DeletePoint routine (described separately) is tried. Note that if ‘Try DeletePoint?’ = FALSE in row 72, the DeletePoint summary presented in rows 74–82 needs to be ignored, no matter what it says.
Go to SpotAvg_DeletePoint description, if it is interesting/relevant, and return to this point when finished there.
- Rows 74–82 contain a summary of the completed SpotAvg_DeletePoint calculations (described separately). The columns are used to denote the point deleted (i.e. B74:B82, denoted column ‘1’, contains the result of the calculation performed by deleting Dodson point 1 and performing the 4-point calculation using Dodson points 2, 3, 4 and 5. Similarly, C74:C82, denoted column ‘2’, contains the result of the calculation performed by deleting Dodson point 2 and performing the 4-point calculation using Dodson points 1, 3, 4 and 5… and so on), and rows 75–78 contain the values of InterW (mean), InterSigmaW (sigma), MSWD and Prob associated with each of those calculations.
- Row 80 calculates the ratio of MSWDs (for each one-rejection case, relative to the reference no-rejection case). This value is intended for comparison with the original MSWDraToler value in row 2.
- Row 81 simply evaluates which of the MSWDrats is the smallest (i.e. TRUE = the biggest outlier), but this does not automatically mean it is rejectable.
- Row 82 implement’s Ludwig’s triple-condition for rejection of a Dodson point: (1) Only the Dodson point for which MinMSWD = TRUE can be considered and (2) its MSWDrat value must be smaller than MSWDratToler (remembering that the latter is 0.2 in this case; see row 2) and (3) the ProbW resulting from the rejection must exceed 0.1 (i.e. a rejection must remove all excess scatter). This triple-condition is a very stern test, and rarely satisfied. It is possible Ludwig originally intended this test to be condition (1) and [condition (2) OR condition (3)], which seems a bit more reasonable, but we’ll never know. As can be seen from the sheet, in the SpotAvg case for this analysis, no Dodson point for any ratio qualifies for rejection. There is a single example in the LinReg sheets; more on that later.
- Row 84 synthesises all of the previous logic (including the constraint from row 72), and identifies (as MinIndex) the Dodson point to be excluded (0 = no rejection). Rows 85–88 simply select the appropriate value of the mean (Intercept), sigma (SigmaIntercept), MSWD and Probfit from all the potential candidates, based on the MinIndex value synthesised in row 84.
- Rows 89–91 look at Probfit, and multiply the calculated SigmaIntercept by sqrt(MSWD) if Probfit < 0.05 (value hard-coded).
- Rows 93–98 isolate the final mean, and transform its Sigma from absolute, to fractional, to percentage.
- Rows 100–103 are simply a forensic comparison of the calculations in the sheet (captured in the first 2 columns) against the values returned by SQUID during normal running (second 2 columns; extracted from the InputsChecks sheet). These matches are supposed to be perfect to 15 significant figures. This is the end of the SpotAvg_CorrelErrors routine; proceed to LinReg_CorrelErrors if interested.
This too is arranged as 6-column blocks, with the ‘ratios of interest’ in the same order as the preceding sheet (i.e. A–F for 207/206, G–L for 208/206, and so on, across to AW–BB for 206/254). Within each of these 6-column segments:
- Row 2 enumerates the Dodson-interpolated ratios (i.e. Nscans–2, because we are explicitly considering all the possible calculations in which 1 point has been removed from the corresponding no-rejection calculation in SpotAvg_CorrelErrors). Note that MSWDratToler is not applicable here, because we are explicitly evaluating the exclusion scenarios.
- Rows 5–9 are copies of the no-rejection ratio-values (y1: SpotAvg_CorrelErrors, rows 12–16) and the ‘adjusted’ sigma-rho matrix (SigRho1: SpotAvg_CorrelErrors, rows 19–23), as the starting point for sequential-rejection calculations for the Dodson points.
- Row 11 starts the process of calculating the time-invariant spot-average and its error for each excluded-point scenario. The notation ‘i =’ relates to the Dodson point being excluded/rejected from the calculation. Note that y2 (rows 13–16) is y1 with the first element removed, and SigRho2 (rows 13–16) is the 5 x 5 SigRho1 matrix with the first row and first column removed to make a 4 x 4 matrix.
- Rows 18–48 are exactly analogous to SpotAvg_CorrelErrors rows 32–67 in all respects: see description above.
- Rows 49–52 are also analogous to SpotAvg_CorrelErrors rows 68–71; note that the values in SpotAvg_DeletePoint!B49:B52 are harvested by SpotAvg_CorrelErrors for the DeletePoint summary, where they appear at SpotAvg_CorrelErrors!B75:B78.
- Row 54 starts the process of calculating the spot-average and its error when the second point is being excluded. Note that y2 (rows 56–59) is y1 with the second element removed, and SigRho2 (rows 56–59) is the 5 x 5 SigRho1 matrix with its second row and second column removed to make a 4 x 4 matrix. An interesting aspect of this ‘matrix reduction’ when RejPoint = 2, 3 or 4 (but not 1 or 5) is that the excision of an ‘inside’ Dodson point results in the juxtaposition of two Dodson points which no longer share an original scan from the double-interpolation process. In this case, RejPoint = 2, so the surviving points are 1, 3, 4 and 5. Dodson point 1 was derived by interpolating original scans 1 and 2, and Dodson point 3 was derived by interpolating original scans 3 and 4. The fact that there are no shared scans means that the errors associated with Dodson points 1 and 3 are uncorrelated, so their correlation coefficients (and covariances) will be zero. Note that off-diagonal elements C56 = B57 are zero, reflecting the lack of correlation between Dodson points 1 and 3, whereas non-zero off-diagonal elements are retained at cells E57 = D58, and F58 = E59, reflecting the preserved correlation between Dodson points 3 and 4, and 4 and 5 respectively.
- Rows 61–91 are exactly analogous to SpotAvg_CorrelErrors rows 32–67 in all respects: see description above.
- Rows 92–95 are also analogous to SpotAvg_CorrelErrors rows 68–71; note that the values in SpotAvg_DeletePoint!B92:B95 are harvested by SpotAvg_CorrelErrors for the DeletePoint summary, where they appear at SpotAvg_CorrelErrors!C75:C78.
- Row 97 starts the process of calculating the spot-average and its error when the third point is being excluded. The calculations are analogous to the second-point calculation in most respects: only the position of the off-diagonal zeroes, reflecting the juxtaposition of uncorrelated Dodson points 2 and 4, is different.
- Rows 104–134 are exactly analogous to SpotAvg_CorrelErrors rows 32–67 in all respects: see description above.
- Rows 135–138 are also analogous to SpotAvg_CorrelErrors rows 68–71; note that the values in SpotAvg_DeletePoint!B135:B138 are harvested by SpotAvg_CorrelErrors for the DeletePoint summary, where they appear at SpotAvg_CorrelErrors!D75:D78.
- Row 140 starts the process of calculating the spot-average and its error when the fourth point is being excluded. The calculations are analogous to the second-point calculation in most respects: only the position of the off-diagonal zeroes, reflecting the juxtaposition of uncorrelated Dodson points 3 and 5, is different.
- Rows 147–177 are exactly analogous to SpotAvg_CorrelErrors rows 32–67 in all respects: see description above.
- Rows 178–181 are also analogous to SpotAvg_CorrelErrors rows 68–71; note that the values in SpotAvg_DeletePoint!B178:B181 are harvested by SpotAvg_CorrelErrors for the DeletePoint summary, where they appear at SpotAvg_CorrelErrors!E75:E78.
- Row 183 starts the process of calculating the spot-average and its error when the fifth/final point is being excluded. The calculations are analogous to the first-point calculation, because the 4 x 4 matrix created by removing the last row and last column results in a 4 x 4 matrix in which all elements remain correlated, as in the first-exclusion case (row 11).
- Rows 190–220 are exactly analogous to SpotAvg_CorrelErrors rows 32–67 in all respects: see description above.
- Rows 221–224 are also analogous to SpotAvg_CorrelErrors rows 68–71; note that the values in SpotAvg_DeletePoint!B221:B224 are harvested by SpotAvg_CorrelErrors for the DeletePoint summary, where they appear at SpotAvg_CorrelErrors!F75:F78. This is the end of the SpotAvg_DeletePoint routine; return to the remaining description of SpotAvg_CorrelErrors above.
This too is arranged as 6-column blocks, with the ‘ratios of interest’ in the same order as the preceding sheet (i.e. A–F for 207/206, G–L for 208/206, and so on, across to AW–BB for 206/254). Within each of these 6-column segments:
- Row 2 enumerates the Dodson-interpolated ratios (i.e. N–1), and calculates MSWDratToler, which is the ‘shrinkage’ factor needed in the one-reject MSWD before any one of the Dodson points will even be considered as an outlier (a necessary, but not always sufficient, condition). Note that MSWDratToler value is always smaller (for the same number of scans) than the SpotAvg case, because the two-diimensional LinReg calculation has N–2 degrees of freedom, whereas the one-dimensional SpotAvg calculation has N–1 degrees of freedom.
- Rows 5–9 are the sigma-rho matrix for the Dodson points i.e. the diagonal elements are the InterpRatSig values, and the symmetric, immediate off-diagonal elements (non-zero) represent the correlated nature of the adjacent Dodson points (i.e. the error on Dodson point 1, calculated from interpolating original scans 1 and 2, is correlated with the error on Dodson point 2, calculated from interpolating original scans 2 and 3, because the two errors being considered share original scan 2 — this was the thrust of Ludwig (2009) in Chemical Geology, where he explained why SQUID 1 ratio-errors were underestimated, whereas SQUID 2.50 gets them right).
- Rows 12–16 are the ratio values (y1, which is superfluous, and y2), the corresponding timestamp values (x2), and the ratio errors (SigmaY, which is just the collection of the diagonal elements of SigRho). Note that the median of SigmaY is determined.
- Rows 18–23 are a modified version of the original SigRho matrix, in which all diagonal elements with values smaller than median(SigmaY) are replaced with the value median(SigmaY). Diagonal elements with values larger than median(SigmaY), and off-diagonal elements, are unchanged. Ludwig has never described the reasoning behind this manipulation. SigRho2 (rows 25–30) is a copy of SigRho1.
- Row 32 starts the process of calculating the two-dimensional linear regression and its error. The notation ‘i =’ relates to the Dodson point being excluded/rejected from the calculation: i = 0 here means that we are dealing with the ‘no-rejection’ case, to begin with. The first step is to construct the variance-covariance matrix Omega (rows 34–38; diagonal elements are the squares of the sigma-values in SigRho2, symmetric off-diagonal elements are the covariances).
- Rows 41–45 are the inverse of Omega (i.e. InvOmega), calculated using Excel’s MINVERSE array function. This matrix is the basis for both the numerator and the denominator.
- Rows 47–52 are the symmetric “x-value” matrix (Px), generating by multiplying the elements of InvOmega by the various two-component sums of the timestamps.
- Rows 54–59 are the symmetric “y-value” matrix (Py), generating by multiplying the elements of InvOmega by the various two-component sums of the Dodson ratio-values.
- Rows 61–66 are the symmetric “xy-correlation” matrix (Pxy), generating by multiplying the elements of InvOmega by the various two-component sums of the products of timestamps and Dodson ratio-values.
- Rows 68–73 are the symmetric “slope” matrix (Mx), generating by multiplying the elements of InvOmega by the various two-component products of the timestamps.
- Rows 75–81 show how the various matrix-elements are summed and manipulated to give the initial estimate of the ratio-slope (SlopeW(0)) and the ratio-intercept (InterW(0)). Note that there has not yet been any evaluation of possible excess scatter, and the zero in parentheses refers to the no-outlier case, as a baseline.
- Rows 83–89 contain firstly (A83–C85) the Fisher information matrix (constructed from the various matrix-sums in rows 75–79), and secondly (D83–F85) the inverse of the Fisher matrix, calculated using Excel’s MINVERSE array function. The inverted Fisher matrix is the basis for calculating the sigmas of the ratio-slope (row 87) and the ratio-intercept (row 88), as well as the slope-intercept covariance (row 89).
- Rows 91–98 evaluate excess scatter, and I’ve fitted the various vectors to save space. First is Resids (A92:A96) which is simply the column-vector of residuals for each timestamp (E12:E16)–ratio (C12:C16) pair, relative to the line defined by SlopeW(0) and InterW(0). Next, ResidsT (B92:F92) is the Resids column vector simply transposed into a row-vector for matrix multiplication purposes. The single cell Mm (C96) is the product of a two-step matrix multiplication (using Excel’s MMULT array function), firstly on ResidsT and InvOmega, followed by this product and Resids. These MMULT operations are identifiable in equations presented by Ludwig (2009). The single cell SumSqWtdResids (row 98) is equivalent to Mm.
- Rows 99–101 use SumSqWtdResids to calculate MswdW(0) for N–2 degrees of freedom, and the corresponding ProbW(0) value. If ProbW(0) < 0.1 (value hard-coded), the LinReg_DeletePoint routine (described separately) is tried. Note that if ‘Try DeletePoint?’ = FALSE in row 101, the DeletePoint summary presented in rows 103–114 needs to be ignored, no matter what it says.
Go to LinReg_DeletePoint description, if it is interesting/relevant, and return to this point when finished there.
- Rows 103–114 contain a summary of the completed LinReg_DeletePoint calculations (described separately). The columns are used to denote the point deleted (i.e. B103:B114, denoted column ‘1’, contains the result of the calculation performed by deleting Dodson point 1 and performing the 4-point calculation using Dodson points 2, 3, 4 and 5. Similarly, C103:C114, denoted column ‘2’, contains the result of the calculation performed by deleting Dodson point 2 and performing the 4-point calculation using Dodson points 1, 3, 4 and 5… and so on), and rows 104–110 contain the values of SlopeW (slope), InterW (intercept), SigmaSlopeW (error of slope), SigmaInterW (error of intercept), CovSlopeInterW (slope-intercept covariance), MSWD and Prob associated with each of those calculations.
- Row 112 calculates the ratio of MSWDs (for each one-rejection case, relative to the reference no-rejection case). This value is intended for comparison with the original MSWDratToler value in row 2.
- Row 113 simply evaluates which of the MSWDrats is the smallest (i.e. TRUE = the biggest outlier), but this does not automatically mean it is rejectable.
- Row 114 implement’s Ludwig’s triple-condition for rejection of a Dodson point: (1) Only the Dodson point for which MinMSWD = TRUE can be considered and (2) its MSWDrat value must be smaller than MSWDratToler (remembering that the latter is 0.15 in this case; see row 2) and (3) the ProbW resulting from the rejection must exceed 0.1 (i.e. a rejection must remove all excess scatter). This triple-condition is a very stern test, and rarely satisfied. It is possible Ludwig originally intended this test to be condition (1) and [condition (2) OR condition (3)], which seems a bit more reasonable, but we’ll never know. Fortunately, in the LinReg case for this analysis, there is one ratio (206/254; columns AW–BB) where a Dodson point qualifies for rejection. More on that later.
- Row 116 synthesises all of the previous logic (including the constraint from row 101), and identifies (as MinIndex) the Dodson point to be excluded (0 = no rejection).
- Rows 117–120 simply select the appropriate value of the intercept, intercept-error, MSWD and Probfit from all the potential candidates, based on the MinIndex value synthesised in row 116.
- Rows 121–123, red cells implement the faulty-looking logic written into the SQUID VBA code i.e. ‘appropriate’ values of slope, slope-error, and slope-intercept covariance are selected from all the potential candidates only when MinIndex > 0 at row 116. But in the extremely common case where MinIndex = 0, all three of these parameters are zeroed. In-code comments indicate that the code has been deliberately written this way, but it is hard to see the mathematical justification for it. It is possible that part of the code was being tested, and part of the ‘test’ was accidentally left in the production code. We’ll never know.
- Rows 124–127 look at Probfit, and multiply the calculated SigmaIntercept and SigmaSlope by sqrt(MSWD) if Probfit < 0.05 (value hard-coded).
- Row 129 calculates burn mid-time as the average of the first and last timestamps of the analysis.
- Row 130 attempts to calculate RatioMean as Intercept + (Slope*MidTime), which reduces simply to Intercept in the common case where MinIndex = 0 (and Slope is set to zero as a result).
- Row 131 attempts to calculate RatioMeanSig as a three-term expression involving SigmaSlope and CovSlopeInter, and this expression reduces simply to SigmaIntercept in the common case where MinIndex = 0 (and SigmaSlope and CovSlopeInter are both set to zero as a result).
- Rows 133–135 isolate the final mean, and transform its Sigma from absolute, to fractional, to percentage.
- Rows 137–140 are simply a forensic comparison of the calculations in the sheet (captured in the first 2 columns) against the values returned by SQUID during normal running (second 2 columns; extracted from the InputsChecks sheet). These matches are supposed to be perfect to 15 significant figures. In practice, they are not, in this LinReg case: sometimes there are variations of 1 or even 2 at the final decimal place, relative to the ‘genuine’ SQUID output. The origin of this variation is not clear, but no coding error is evident.
Note in the final comparisons that for all of the ratios except the final one (206/254, columns AW–BB), the SQUID value can only be replicated by zeroing the Slope, SigmaSlope and CovSlopeInter values in rows 121–123. In columns AW–BB, where MinIndex > 0, the opposite is true: the SQUID value can only be replicated if the Slope, SigmaSlope and CovSlopeInter values are all properly calculated and utilised to rigorously interpolate the linear regression to burn midtime. This proves that the faulty-looking logic implemented in the red cells in rows 121–123 reflects the truth of how SQUID 2.50 actually operates.
This is the end of the 'Step 4' replication process. Value-by-value comparisons of the Step 4 outputs from SQUID 2.50 and Calamari for a specified demonstration XML file are documented at G. Synthesis of Steps 1–4: Cell-by-cell comparisons of SQUID 2.50 and Calamari output.
This too is arranged as 6-column blocks, with the ‘ratios of interest’ in the same order as the preceding sheet (i.e. A–F for 207/206, G–L for 208/206, and so on, across to AW–BB for 206/254). Within each of these 6-column segments:
- Row 2 enumerates the Dodson-interpolated ratios (i.e. Nscans–2, because we are explicitly considering all the possible calculations in which 1 point has been removed from the corresponding no-rejection calculation in LinReg_CorrelErrors). Note that MSWDratToler is not applicable here, because we are explicitly evaluating the exclusion scenarios.
- Rows 3–15 are copies of the no-rejection ratio-values (y1 and x1: LinReg_CorrelErrors, rows 12–16) and the ‘adjusted’ sigma-rho matrix (SigRho1: LinReg_CorrelErrors, rows 19–23), as the starting point for sequential-rejection calculations for the Dodson points.
- Row 17 starts the process of calculating the linear regression (and its errors and covariances) for each excluded-point scenario. The notation ‘i =’ relates to the Dodson point being excluded/rejected from the calculation. Note that x2 and y2 (rows 19–22) are x1 and y1 with the first element removed, and SigRho2 (rows 19–22) is the 5 x 5 SigRho1 matrix with the first row and first column removed to make a 4 x 4 matrix.
- Rows 24–85 are exactly analogous to LinReg_CorrelErrors rows 32–100 in all respects: see description above. Note that the values in LinReg_DeletePoint!B66:B67,B73:75,B84:85 are harvested by LinReg_CorrelErrors for the DeletePoint summary, where they appear at LinReg_CorrelErrors!B104:B110.
- Row 87 starts the process of calculating the linear regression (and its errors and covariances) when the second point is being excluded. Note that x2 and y2 (rows 89–92) are x1 and y1 with the second element removed, and SigRho2 (rows 89–92) is the 5 x 5 SigRho1 matrix with its second row and second column removed to make a 4 x 4 matrix. An interesting aspect of this ‘matrix reduction’ when RejPoint = 2, 3 or 4 (but not 1 or 5) is that the excision of an ‘inside’ Dodson point results in the juxtaposition of two Dodson points which no longer share an original scan from the double-interpolation process. In this case, RejPoint = 2, so the surviving points are 1, 3, 4 and 5. Dodson point 1 was derived by interpolating original scans 1 and 2, and Dodson point 3 was derived by interpolating original scans 3 and 4. The fact that there are no shared scans means that the errors associated with Dodson points 1 and 3 are uncorrelated, so their correlation coefficients (and covariances) will be zero. Note that off-diagonal elements D89 = C90 are zero, reflecting the lack of correlation between Dodson points 1 and 3, whereas non-zero off-diagonal elements are retained at cells E57 = D58, and F58 = E59, reflecting the preserved correlation between Dodson points 3 and 4, and 4 and 5 respectively.
- Rows 94–155 are exactly analogous to LinReg_CorrelErrors rows 32–100 in all respects: see description above. Note that the values in LinReg_DeletePoint!B136:B137,B143:145,B154:155 are harvested by LinReg_CorrelErrors for the DeletePoint summary, where they appear at LinReg_CorrelErrors!C104:C110.
- Row 157 starts the process of calculating the linear regression (and its errors and covariances) when the third point is being excluded. The calculations are analogous to the second-point calculation in most respects: only the position of the off-diagonal zeroes, reflecting the juxtaposition of uncorrelated Dodson points 2 and 4, is different.
- Rows 164–225 are exactly analogous to LinReg_CorrelErrors rows 32–100 in all respects: see description above. Note that the values in LinReg_DeletePoint!B206:B207,B213:215,B224:225 are harvested by LinReg_CorrelErrors for the DeletePoint summary, where they appear at LinReg_CorrelErrors!D104:D110.
- Row 227 starts the process of calculating the linear regression (and its errors and covariances) when the fourth point is being excluded. The calculations are analogous to the second-point calculation in most respects: only the position of the off-diagonal zeroes, reflecting the juxtaposition of uncorrelated Dodson points 3 and 5, is different.
- Rows 234–295 are exactly analogous to LinReg_CorrelErrors rows 32–100 in all respects: see description above. Note that the values in LinReg_DeletePoint!B276:B277,B283:285,B294:295 are harvested by LinReg_CorrelErrors for the DeletePoint summary, where they appear at LinReg_CorrelErrors!E104:E110.
- Row 297 starts the process of calculating the spot-average and its error when the fifth/final point is being excluded. The calculations are analogous to the first-point calculation, because the 4 x 4 matrix created by removing the last row and last column results in a 4 x 4 matrix in which all elements remain correlated, as in the first-exclusion case (row 17).
- Rows 304–365 are exactly analogous to LinReg_CorrelErrors rows 32–100 in all respects: see description above. Note that the values in LinReg_DeletePoint!B346:B347,B353:355,B364:365 are harvested by LinReg_CorrelErrors for the DeletePoint summary, where they appear at LinReg_CorrelErrors!F104:F110. This is the end of the LinReg_DeletePoint routine; return to the remaining description of LinReg_CorrelErrors above.
Next: G. Synthesis of Steps 1–4: Cell-by-cell comparisons of SQUID 2.50 and Calamari output