Miscellaneous Algorithms

Here we discuss algorithms whose discussion doesn't seem to belong in any specific part of the overview.

Using Bresenham algorithm's to interpolate.
Interpolating over bad pixels.
The Software Bias.
Removing the Bias due to Choosing the Minimum of two Pixel Values
Determining a Frame's Statistics using regStatsFromQuartiles
Fitting Taut Splines to Data
Using asinh to Control the Dynamic Range of Data that can be Negative
How to Estimate the Mode of a distribution

Using Bresenham algorithm's to interpolate

When doing linear interpolation, it is possible to use a variant of Bresenham's algorithm (usually used in driving bit-mapped displays). Doing so is fast, and photo makes use of this trick. Apart from speed, and rounding errors, this is equivalent to a naive linear interpolator.

Interpolation

Given a set of pixel values, I[i], with one value (say I[0]) missing, how should we go about replacing it? Due to laziness on the part of RHL, we restrict ourselves to 1-dimensional interpolation; the adopted algorithm would trivially extend to working in 2-d, but the book-keeping is a little intimidating.

The simplest solution would be to interpolate linearily,

    I'[0] = (I[-1] + I[1])/2,

but this doesn't perform well for marginally sampled data.

We know something about the auto-correlation present in our data, as it has been convolved with the PSF; we should look for interpolation schemes that minimise errors when interpolating over missing data in a signal consisting solely of the PSF.

Interpolation: Theoretical Considerations

As usual, there are a variety of ways of considering this problem, all of which are closely related. We give three approaches below.

Brute Force
If we approximate the PSF as a Gaussian, experimentation indicates that the mean error in interpolating is minimised for a filter of the form
```
    I'[0] = (-I[-2] + 3*I[-1] + 3*I[1] - I[2])/4
```
for a reasonably wide range of FWHM (providing that the data is at least marginally sampled). This is the filter that we've adopted for the photometric pipeline.
Arguments based directly on Nyquist's Theorem
Consider the following argument:
If the data were perfectly band-limited, with no power at the Nyquist limit, we could consider our signal to be the original data added to a delta function of amplitude -I[0]; consider Fourier transforming this --- it's clear that we'll have power -I[0] at the Nyquist limit, which is the value that we want. The formula giving the Nyquist power is
```
    ... - I[-3] + I[-2] - I[-1] - I[1] + I[2] - I[3] + ...
```
so our estimate becomes
```
    I'[0] = ... + I[-3] - I[-2] + I[-1] + I[1] - I[2] + I[3] + ...
```
and the formula given above is seen as a suitably belled version of this result.
Linear Prediction
We would like to thank Bill Press for discussions on this topic. Note carefully that the Linear in Linear Prediction does not mean linear interpolation.
The basic idea is to divide the I[i] into signal and noise, I[i] == s[i] + n[i], and then to estimate value of pixel ?, S[?], as a weighted sum:
```
    S[?] = Sum_i d_i I[i] == Sum_i d_i (s[i] + n[i])
```
We can then attempt to mimimise the expectation value, E, of the squared residual S[?] - s[?]:
```
    E == <(S[?] - s[?])^2> = <(Sum_i d_i (s[i] + n[i]) - s[?])^2>
```
i.e., differentiating with respect to d_i,
```
    <(s[i] + n[i])(S[?] - Sum_j d_j (s[j] + n[j]))> = 0
```
or
```
    < s[i]S[?] > = Sum_j d_j < (s[i] + n[i])(s[j] + n[j]) >
                 = Sum_j d_j < s[i]s[j] > + < n[i]n[j] >
```
if we assume that signal and noise are uncorrelated. If we assume that our images consists of uncorrelated PSFs, then <s[i]s[j]> == e^{-(i-j)^2/alpha^2} and <n[i]n[j]> == N_i delta_{ij} and we may invert this matrix equation to estimate the d_i. Details (and figures) are provided in Linear Prediction and Interpolation.

Software Bias

As we are using unsigned 16-bit integers, bias and sky subtraction could lead to negative pixels, which would be a problem; photo gets around this by adding an arbitrary constant to the data just after removing the bias (it's currently 1000 and is a #define in the C code); when sky is subtracted the soft_bias is added back in (and removed from the sky estimate!). As there was a hardware bias of this order in the raw frames, doing so doesn't reduce the dynamic range of the data.

Removing the Bias due to Choosing the Minimum of Two Pixel Values

It is easy enough to calculate that the expectation value of the minimum of two Gaussian variates (with zero mean and unit variance; N(0,1)) is -0.5641895835. If we ever replace a pixel value by the minimum of two estimates of its true value, we will systematically underestimate its value; we should (and do) add a correction of approximately 0.56sigma.

Determining a Frame's Statistics using regStatsFromQuartiles

The routine regStatsFromQuartiles constructs a histogram of pixel intensities for an image, and derives any of a wide range of parameters. See the source code for details.

Fitting Taut Splines to Data

A Taut Spline is a regular (cubic) spline with the extra constraint that there be no superfluous turning points; it's rather as if you took the piece of flexible wood that provides the term spline and pulled on each end to remove unwanted wiggles.

This is achieved by inserting extra knots at cleverly chosen places, and seems to work well in practice.

Using `asinh` to Control the Dynamic Range of Data that can be Negative

It is a common practive to take logarithm of numbers that span many orders of magnitude, e.g. to display an object's radial profile. This has two main disadvantages:

If the data is non-positive, the logarithm is badly behaved
The errors on very small values become vary large (and 1-sigma ranges can easily be negative).

Both of these problems can be circumvented by using asinh(x/Delta) instead of lg(x), where Delta is some constant with magnitude comparable to the expected error in x. This has the nice property of being (within a scale) a simple logarithm for x >> Delta, and approximately linear for x <~ Delta.

For more details, see the paper by Szalay, Lupton, and Gunn.

How to Estimate the Mode

The result that mode = 3*median - 2*mean is readily proved from a Gram-Charlier series up to the third Hermite polynomial; to the same order it's easily shown that mode = median - (Q25 + Q75 - 2*median)/eta^2 where eta ~ 0.673 is the 75th percentile of an N(0,1) Gaussian. We can then estimate the mode as 5.4*median - 2.2*(Q25 + Q75) The variance of this estimate is approximately (4.78)^2 sigma^2/N; much worse then the mean (with variance sigma^2/N), and 3.8 times worse than the median (whose variance is (pi/2)sigma^2/N).