Categories | Alphabetical | Classes | All Contents | [ < ] | [ > ]

PCOMP


Syntax | Return Value | Arguments | Keywords | Examples | Version History | See Also

The PCOMP function computes the principal components of an m-column, n-row array, where m is the number of variables and n is the number of observations or samples. The principal components of a multivariate data set may be used to restate the data in terms of derived variables or may be used to reduce the dimensionality of the data by reducing the number of variables (columns).

This routine is written in the IDL language. Its source code can be found in the file pcomp.pro in the lib subdirectory of the IDL distribution.

Syntax

Result = PCOMP( A [, COEFFICIENTS=variable] [, /COVARIANCE] [, /DOUBLE] [, EIGENVALUES=variable] [, NVARIABLES=value] [, /STANDARDIZE] [, VARIANCES=variable] )

Return Value

The result is an nvariables-column (nvariables £ m), n-row array of derived variables.

Arguments

A

An m-column, n-row, single- or double-precision floating-point array.

Keywords

COEFFICIENTS

Use this keyword to specify a named variable that will contain the principal components used to compute the derived variables. The principal components are the coefficients of the derived variables and are returned in an m-column, m-row array. The rows of this array correspond to the coefficients of the derived variables. The coefficients are scaled so that the sums of their squares are equal to the eigenvalue from which they are computed.

COVARIANCE

Set this keyword to compute the principal components using the covariances of the original data. The default is to use the correlations of the original data to compute the principal components.

DOUBLE

Set this keyword to use double-precision for computations and to return a double-precision result. Set DOUBLE=0 to use single-precision for computations and to return a single-precision result. The default is /DOUBLE if Array is double precision, otherwise the default is DOUBLE=0.

EIGENVALUES

Use this keyword to specify a named variable that will contain a one-column, m-row array of eigenvalues that correspond to the principal components. The eigenvalues are listed in descending order.

NVARIABLES

Use this keyword to specify the number of derived variables. A value of zero, negative values, and values in excess of the input array's column dimension result in a complete set (m-columns and n-rows) of derived variables.

STANDARDIZE

Set this keyword to convert the variables (the columns) of the input array to standardized variables (variables with a mean of zero and variance of one).

VARIANCES

Use this keyword to specify a named variable that will contain a one-column, m-row array of variances. The variances correspond to the percentage of the total variance for each derived variable.

Examples

PRO ex_pcomp 
 
   ;Define an array with 4 variables and 20 observations. 
   array = [[19.5, 43.1, 29.1, 11.9], $ 
            [24.7, 49.8, 28.2, 22.8], $ 
            [30.7, 51.9, 37.0, 18.7], $ 
            [29.8, 54.3, 31.1, 20.1], $ 
            [19.1, 42.2, 30.9, 12.9], $ 
            [25.6, 53.9, 23.7, 21.7], $ 
            [31.4, 58.5, 27.6, 27.1], $ 
            [27.9, 52.1, 30.6, 25.4], $ 
            [22.1, 49.9, 23.2, 21.3], $ 
            [25.5, 53.5, 24.8, 19.3], $ 
            [31.1, 56.6, 30.0, 25.4], $ 
            [30.4, 56.7, 28.3, 27.2], $ 
            [18.7, 46.5, 23.0, 11.7], $ 
            [19.7, 44.2, 28.6, 17.8], $ 
            [14.6, 42.7, 21.3, 12.8], $ 
            [29.5, 54.4, 30.1, 23.9], $ 
            [27.7, 55.3, 25.7, 22.6], $ 
            [30.2, 58.6, 24.6, 25.4], $ 
            [22.7, 48.2, 27.1, 14.8], $ 
            [25.2, 51.0, 27.5, 21.1]] 
 
   ;Remove the mean from each variable. 
   m = 4    ; number of variables 
   n = 20   ; number of observations 
   means = TOTAL(array, 2)/n 
   array = array - REBIN(means, m, n) 
 
   ;Compute derived variables based upon the principal components. 
   result = PCOMP(array, COEFFICIENTS = coefficients, $ 
      EIGENVALUES=eigenvalues, VARIANCES=variances, /COVARIANCE) 
   PRINT, 'Result: ' 
   PRINT, result, FORMAT = '(4(F8.2))' 
   PRINT 
   PRINT, 'Coefficients: ' 
   FOR mode=0,3 DO PRINT, $ 
      mode+1, coefficients[*,mode], $ 
      FORMAT='("Mode#",I1,4(F10.4))' 
   eigenvectors = coefficients/REBIN(eigenvalues, m, m) 
   PRINT 
   PRINT, 'Eigenvectors: ' 
   FOR mode=0,3 DO PRINT, $ 
      mode+1, eigenvectors[*,mode],$ 
      FORMAT='("Mode#",I1,4(F10.4))' 
   array_reconstruct = result ## eigenvectors 
   PRINT 
   PRINT, 'Reconstruction error: ', $ 
      TOTAL((array_reconstruct - array)^2) 
   PRINT 
   PRINT, 'Energy conservation: ', TOTAL(array^2),    
TOTAL(eigenvalues)*(n-1) 
   PRINT 
   PRINT, '     Mode   Eigenvalue  PercentVariance' 
   FOR mode=0,3 DO PRINT, $ 
      mode+1, eigenvalues[mode], variances[mode]*100 
 
END 

When the above program is compiled and executed, the following output is produced:

Result:  
 -107.38   13.40   -1.41   -0.03 
    3.20    0.70    5.95   -0.02 
   32.50   38.66   -3.87    0.01 
   40.89   13.79   -4.98   -0.01 
 -107.24   19.36    1.77    0.02 
   18.43  -17.15   -1.47   -0.00 
   99.89   -6.23    0.13    0.02 
   45.38    8.11    6.53   -0.01 
  -21.31  -18.31    3.75   -0.01 
    5.54  -11.17   -4.52    0.02 
   83.14    4.97    0.09    0.01 
   87.11   -3.16    2.81    0.00 
 -101.32  -11.78   -6.12    0.01 
  -73.07    6.24    6.61    0.02 
 -137.02  -19.10    1.33    0.01 
   57.11    6.96    0.84   -0.01 
   42.13  -10.07   -2.14    0.01 
   83.30  -16.69   -2.72   -0.01 
  -54.13    2.56   -4.21   -0.03 
    2.84   -1.06    1.62   -0.01 
 
Coefficients:  
Mode#1    4.8799    5.0568    1.0282    4.7936 
Mode#2    1.0147   -0.9545    3.4885   -0.7743 
Mode#3   -0.6183   -0.9554    0.2690    1.5796 
Mode#4   -0.0900    0.0752    0.0472    0.0022 
 
Eigenvectors:  
Mode#1    0.0665    0.0689    0.0140    0.0653 
Mode#2    0.0690   -0.0649    0.2372   -0.0526 
Mode#3   -0.1601   -0.2473    0.0697    0.4089 
Mode#4   -5.6290    4.7013    2.9540    0.1372 
 
Reconstruction error:  1.44876e-010 
 
Energy conservation:       1748.17      1748.17 
 
     Mode   Eigenvalue  PercentVariance 
       1      73.4205      79.7970 
       2      14.7099      15.9875 
       3      3.86271      4.19818 
       4    0.0159915    0.0173803 

The first two derived variables account for 96% of the total variance of the original data.

Version History

Introduced: 5.0

See Also

CORRELATE, EIGENQL


Categories | Alphabetical | Classes | All Contents | [ < ] | [ > ]