VI. Data Processing: RETAIN Statement


We can often do data calculations/manipulations within observations, but sometimes it is necessary to do calculations across observations. The RETAIN statement is used to keep a specified value (assigned by an INPUT statement or assignment statement) from the current iteration of the DATA step to the next. Otherwise, SAS automatically sets such values to missing before each iteration of the DATA step. The RETAIN statement allows values to be kept across observations; for example, computing a running total of values, counting the number of occurrences of a variable's value, setting indicators within a BY-group, and so on. RETAIN statements are often used with FIRST. and LAST. processing.


The RETAIN statement can be used to specify initial values for variable(s) or elements of an array. All elements or variables will be initialized to the specified value.

RETAIN <varlist> [initial-value(s)];

Varlist: specifies the names of the variables, lists or arrays whose values you wish to retain.

Initial-value(s): the initial value(s) can be numeric or character (e.g., 'y') and is assigned to all listed variables. If the initial value is not specified, it is set to missing.

The following shows four variables specified in each retain statement.

RETAIN var1-var4 1; sets initial values of var1, var2, var3, var4 to 1. 
RETAIN var1-var4 (1);
only var1 is set to 1; var2-4 are set to missing.
RETAIN var1-var4 (1 2 3 4); OR
RETAIN var1-var4 (1,2,3,4); var1 is set to 1, var2 to 2, var3 to 3, var4 to 4.

For example, the statement RETAIN pop 1;within a DATA step will assign a value of 1 to each observation for the variable POP.

/*Use the retain statement to count the number of observations in 
each BY group. An index weight is identified and each subsequent
weight is compared to the index. An example is given for how the
output might look*/
PROC SORT data=htwt_long;
by name age;
data w_compare (keep=name age index_weight weight over count last_name)
set htwt_long;
by name;
retain count index_weight;
if then do;
/*Set and retain the first weight*/
/*Counter for number of records for each*/
*Indicator variable for increased weight*/
over = (index_weight < weight);;
PROC PRINT data=w_compare;
var name age weight over index_weight weight count;
title 'Retain Statement';

The following two questions assume that a permanent SAS data set has been created from the sample Clinical data set, including the format file. Examples are given for how program, log and output might look.

1. Create a BY-group by primary DX.

2. Count the number of observations in each BY-group using the RETAIN statement.

The following questions assume that a permanent SAS data set has been created from the simulated Manitoba Health data set available at MCHP. Examples are given for how program, log and output might look.

3. Use the ARRAY statement to find all the records with a diabetes diagnosis (code 250). Hint: the SUBSTR function is useful here.

 << Previous


 Next >>