VI. Data Processing: Do Loops


The iterative DO statement is used to repeatedly execute a set of statements occurring between the DO statement and the END statement (the DO loop). It is often used in conjunction with ARRAY statements so that the repeated actions occur within the loop for each of a specified series of variables.


A DO loop begins with an iterative DO statement, followed by other SAS statement(s), and completed with an END statement. This loop iterates (is processed repeatedly) according to the directions in the DO statement. Its basic form is:

DO <index-variable=1> TO <upper bound of array>; 
[ or DO <index-variable=1> TO <upper bound of array> UNTIL
<specified condition>]
[ or DO <index-variable=1> TO <upper bound of array> WHILE
<specified condition>]

Index-variable is a name you choose (e.g., "I"). Its value changes with each iteration of the loop, or each time the loop is processed. By default, the value of the index variable (I) is increased by 1 before each new iteration of the loop, consecutively representing the values 1 to n (number of variables in array). DO loops can iterate by 2 or by 'n' with a BY statement (i.e., do i=1 to 10 by 2;)

Number-of-variables-in-array: if used conjunction with an array, the loop will execute as many times as there are variables in the array. If there are 16 variables, the loop will execute the statements on each of the 16 variables (i.e., DO I=1 to 16). The processing stops when the value of the index variable becomes greater than number-of-variables-in-array.


The SAS statements in an iterative DO loop often contain references to an array. In the following example, the array name is "diag" and the number of variables represented in its subscript are 16. With each iteration of the loop, the values of the subscript is replaced with the current value of the index variable ("i"). Successive iterations of the loop process the statements on consecutive variables in the array. The intent of this example is to search all 16 diagnostic fields to flag any occurrence of a hip diagnosis of ICD-9-CM 820.

ARRAY diag{16} $ dx01 - dx16;                       (1) 
hipdiag=0; (2)
DO i=1 to 16; (3)
IF '820' <=diag{i}<='82099' THEN hipdiag='1'; (4)
END; (5)

(1) Create an array called "diag" to represent the group of diagnostic fields dx01-dx16 for the duration of the data step.

(2) Create a new variable named "hipdiag" and set it equal to zero. It will remain at 0 if none of the 16 diagnostic fields for that record have a hip diagnosis present.

(3) Perform the actions in the loop sixteen times for each record in the data set. When the value of "i" is 1, SAS reads the array reference as DIAG{1} and processes the statements on DIAG{1}, that is, DX01. in each iteration of the loop, the subscript associated with DIAG is replaced with the index variable's (i) current value.

(4) When a diagnosis within the specific range is encountered, the variable "hipdiag" will be assigned a value of '1'.

(5) All iterative DO loops must end with an END statement.


OBS DX01 DX02 DX03 DX04 DX05 DX06 DX07 DX08...DX16 HIPDIAG 
1 650 V270 0
2 71783 0
3 4549 0
4 V664 82021 E8809 4538 3569 36250 7213 2859 1
5 V301 7746 0
6 8208 E888 4019 3310 1
7 4111 4140 4280 4011 42731 586 5990 V668 0
8 486 0
9 V72 0
10 650 V270 0

Observation 4 has hipdiag set to '1' because DX02 has a value of '82021', thus falling within the ICD-9-CM range 820-820.99. Observation 6 similarly has hipdiag set to '1' because DX01 has a value of '8208'.

 << Previous


 Next >>