next up previous contents
Next: Choice of Iterative Algorithm Up: User Guide for Previous: Dielectric Functions

 

Choice of FFT Algorithm

One major change in going from DDSCAT.4b to 4c and 5a was modification of the code to permit use of the GPFA FFT algorithm developed by Dr. Clive Temperton. DDSCAT continues to offer both the Brenner code as well as the ``old'' Temperton code as alternative FFT implementations. The ``old'' Temperton code requires about 11% more memory than either the Brenner or GPFA codes (to use the ``old'' Temperton algorithm, DDSCAT.f must be compiled with MXMEM=1 rather than MXMEM=0).

To help persuade the user that the GPFA code is an important step forward, we provide a program TSTFFT to compare the performance of different 3-D FFT implementations. To compile, link, and run this program on a Unix system,gif position yourself in the DDA/src directory and type

make tstfft tstfft

Output results will be written into a file res.dat. Here is a copy of the res.dat file created when run on a Sun Ultrasparc 170:

 CPU time (sec) for different 3-D FFT methods
 Machine= Sun Ultrasparc-170
 parameter LVR = 64
 LVR="length of vector registers" in gpfa2f,gpfa3f,gpfa5f
==========================================================
                     Brenner      Temperton      Temperton
   NX  NY  NZ        (FOURX)        (Old)          (GPFA)
    2   2   2       0.000068       0.000065       0.000262
    3   3   3       0.000200       0.000077       0.000096
    4   4   4       0.000064       0.000086       0.000111
    5   5   5       0.000651       0.000292       0.000137
    6   6   6       0.001171       0.000170       0.000223
    8   8   8       0.000350       0.000250       0.000323
    9   9   9       0.004654       0.000300       0.000505
   10  10  10       0.005218       0.000457       0.000575
   12  12  12       0.008664       0.000814       0.000751
   15  15  15       0.023387       0.001583       0.001798
   16  16  16       0.003137       0.001359       0.002386
   18  18  18       0.042196       0.003908       0.003479
   20  20  20       0.043754       0.003881       0.004010
   24  24  24       0.075672       0.010722       0.007198
   25  25  25       0.103492       0.008142       0.011232
   27  27  27       0.160173       0.011126       0.012293
   30  30  30       0.185177       0.017111       0.014631
   32  32  32       0.042211       0.034032       0.023245
   36  36  36       0.322581       0.071013       0.027502
   40  40  40       0.379551       0.133391       0.044182
   45  45  45       0.797107       0.199537       0.069740
   48  48  48       0.668408       0.255849       0.094520
   50  50  50       0.983722       0.287208       0.115833
   54  54  54       1.530319       0.427839       0.142388
   60  60  60       1.804154       0.607189       0.181221
   64  64  64       1.005013       0.806311       0.342762
   72  72  72       3.812469       1.219364       0.362328
   75  75  75       5.364067       1.234528       0.437496
   80  80  80       4.509048       1.576118       0.563839
   81  81  81       8.017456       1.798195       0.560326
   90  90  90      10.201881       2.455399       0.779869

It is clear that the GPFA code is generally MUCH faster than the Brenner FFT (by a factor of 13 for the 90tex2html_wrap_inline321190tex2html_wrap_inline321190 case) and significantly faster than the ``old'' Temperton FFT code (by a factor of 3 for the 90tex2html_wrap_inline321190tex2html_wrap_inline321190 case), although for some cases the differences are not large (e.g., 32tex2html_wrap_inline321132tex2html_wrap_inline321132). Since the GPFA code is memory-efficient as well, it appears to be the method of choice on scalar machines. It appears to also be best on vector machines.

The GPFA code contains a parameter LVR which is set in data statements in the routines gpfa2f, gpfa3f, and gpfa5f. LVR is supposed to be optimized to correspond to the ``length of a vector register'' on vector machines. As delivered, this parameter is set to 64, which is supposed to be appropriate for Crays other than the C90 (for the C90, 128 is supposed to be preferable).gif The value of LVR is not critical for scalar machines, as long as it is fairly large. We found little difference between LVR=64 and 128 on a Sparc 10/51 and on an Ultrasparc 170. You may wish to experiment with different LVR values on your computer architecture. To change LVR, you need to edit gpfa.f and change the three data statements where LVR is set.

The choice of FFT implementation is obtained by specifying one of:


next up previous contents
Next: Choice of Iterative Algorithm Up: User Guide for Previous: Dielectric Functions

Bruce Draine
Thu Aug 10 09:34:16 EDT 2000