< Contents

# Numerical algorithms used in PopTools

There is some concern that the statistical algorithms used in Excel can give erroneous results and that the procedures for random number generation do not reach an acceptable standard (Knüsel 1998, McCullough & Wilson 1999). I have tried to use robust algorithms in PopTools to replace the built-in functions of Excel, but there are still some areas that need attention (there is only one of me doing this - sigh). If you need to know about specific algorithm used in PopTools, e-mail me, and I will send the code. If you would like PopTools to become completely open source, I will need to be inundated with requests :-)

Here are some general references for the algorithms used in PopTools:

(1) All PopTools functions and procedures that depend on random variables use the Mersenne Twister algorithm for generation of pseudo-random numbers. This PRNG has excellent statistical properties, and has passed George Marsaglia's battery of tests called DIEHARD. All routines for generating random variables, and for shuffling and resampling call this routine rather than Excel's built-in PRNG. The PopTools function dRandReal(0,1) can be used as a replacement for Excel's RAND() function - it is slower (partly because of calls to the DLL).

(2) The REGRESSION and REGRESS routines are replacements for Excel's LINEST function, which apparently has problems with near singular data matrices. The PopTools routines have been tested with the data sets provided by the National Institutes of Standards in Technology (NIST) (see http://www.nist.gov/itl/div898/strd/). You can download a zipped workbook showing the results of the tests - and methods for using the functions - here. Main features of the function "Regression" are better performance than LINEST with highly collinear problems, and correct R2 with forced intercept. The function can be entered using the menu system.

(3) The new routines for parameter estimation using ML (PopTools version 2.0 and above) use open-source code in the TPMath library. Some routines for function minimisation in PopTools depend on Solver. According to McCullough & Wilson (1999), Solver can report solutions that are not local minima, especially with the default options. Solver is an add-in made by Frontline Systems (not Microsoft).

(4) Some routines in PopTools use Excels' built-in statistical tables (eg, CHIINV, FINV and TTEST). If you see a P-value in the output, it probably depends on these routines, although the regression routines use the TPMath functions.

(5) None of the matrix routines depend on Excel's built-in features; all use routines available in the open-source TPMath library.

(6) The likelihood routines do not use any of Excel's calculation features. They should produce more reliable results at extreme values than Excel's native formulas.

(7) I have tested the routines dPoisson and dNormDist: dPoisson gives accurate results; dNormDist gives slightly more accurate results than Excel's NORMDIST function, but depends on a polynomial approximation that loses accuracy in the tails of the distribution.

(8) The ANOVA function in PopTools uses the simple algorithm discussed in Sokal & Rohlf (1995). It is useful -- in the sense that it returns a dynamic result which can used, say, to simulate an experiment -- but it should not be used for a formal statistical test, since it has the same inherent problems as Excel's algorithm, which are described by McCullough & Wilson (1999).

(9) The functions dVariance and dVarianceP introduced in PopTools 1.3.7 are replacements for Excel's formulas (VAR and VARP). They use the usual formula:

rather than the hand computational formula

I tested then with Numacc3 and Numacc4 (downloaded from http://www.nist.gov/itl/div898/strd/). As stated in McCullough & Wilson (1999), they return several accurate digits with these data sets.