VI. Data Processing: ARRAY Statement

Purpose of Arrays

Arrays are often used in conjunction with DO loops when performing actions for a series of variables. The following example illustrates the same action being performed on two separate diagnostic field variables. The study diagnosis of 820.0 can occur in either of these fields, and the statements are identical except for the name of the diagnostic field. The intent of the following statements is to flag all occurrences of the study diagnosis by creating a new variable - "HIPFRAC" - where '1' indicates the presence of the desired diagnosis.

If '82000'<=DX01<='82009' then HIPFRAC='1'; 
If '82000'<=DX02<='82009' then HIPFRAC='1';

Sixteen diagnostic fields (DX01-DX16), however would require 16 lines of code.

Array processing can make the program more efficient by streamlining the code required to accomplish the task (depending on the situation, if-then/else statements can be faster; however, they are also more error-prone). A specified series of variables is associated with a collective name of your choice; for example, the diagnostic fields DX01 through DX16 could be associated with the name "DIAG", which will then operate similarly to variables in data step manipulations.


Arrays are set up using an ARRAY statement. It can appear anywhere in the DATA step as long as it occurs prior to any reference to it. The variables that make up the array are called elements. Individual elements are identified by subscripts (numbers that identifies an element's position in the array).

ARRAY array-name {number of variables} variable-1, variable-2...variable-n;

Array-name is a name you choose to represent the group of variables (must be 32 characters or fewer beginning with a letter or underscore).

Number of variables tells SAS how many variables are being grouped; it is represented by subscripts that are enclosed in brackets.

Variable-1, variable-2,...variable-n lists the names of the variables (the variable list does not have to begin at 1 - e.g., DX5-DX16).


ARRAY diag{16} $ dx01-dx16;

This statement tells SAS to :

  • create a group or array name DIAG for the duration of the DATA step.
  • have DIAG represent 16 variables: diagnostic fields DX01 through DX16

Note that DX01-DX16 are character variables and thus must be preceded by a "$".

You can refer to the entire array or just one of its elements when performing logical comparisons or arithmetic calculations. All variables listed in the ARRAY statement are assigned extra names with the form array-name{position}, where position is the position of the variable in the list (1,2,3,...,16 in the example). The additional name is called an array reference and the position is often called the subscript.

In the above ARRAY statement, DX01 is assigned the array reference DIAG{1}; DX02 the array reference DIAG{2}; etc. From that point in the data step, you can refer to the variable by either its original name or by its array reference; for example, the names DX01 and DIAG{1} are equivalent.

Caution: An array is simply a convenient way of temporarily identifying a group of variables; it exists only for the duration of the DATA step. Arrays are not variables.

 << Previous


 Next >>