VI. Data Processing: By-Group Processing (FIRST./LAST.)

Purpose

By-group processing refers to the use of a BY statement in a DATA step, which permits identification of the first- and last-occurring record for each of the specified BY variables. Two dichotomous (1/0) variables are automatically created for each variable specified in the BY statement when using SET, MERGE, or UPDATE: FIRST.varname and LAST.varname, where varname is the name of the BY variable(s). By creating these variables, a number of various calculations are possible, such as obtaining a count of records for each unique identifier.

Syntax

BY varname1 varname2...;

For the first record in a BY group, the value of the FIRST.varname1 is set to 1, with all other records in the BY group set to 0. For the last record in a BY group, the value of the LAST.varname1 is set to 1, with all other records set to 0. If the data are sorted by more than one BY variable, the FIRST.varname for each variable is set to 1 at the first occurrence of a new value for the variable. FIRST. and LAST. variables are temporary variables that are only available for the current data step. You can create permanent variables equal to the temporary FIRST. and LAST. variables (firstvar=FIRST.var;). These permanent variables will be available in subsequent PROC and DATA steps.

Example

PROC SORT data=hosp;
     BY phin;
RUN;

DATA dup;
     SET hosp;
     BY phin;                           (1)
     firstfl=FIRST.phin;                (2)
     lastfl=LAST.phin;                  (3)
RUN;

(1) Set the data by PHIN (already previously sorted by this variable) in order to create FIRST.PHIN and LAST.PHIN.

(2) Create a new variable called FIRSTFL and assign it a value of 1 for every FIRST.PHIN=1 encountered.

(3) Create a new variable called LASTFL and assign it a value of 1 for every LAST.PHIN=1 encountered.

Output:

OBS   PHIN   FIRSTFL   LASTFL
1 562737 1 1
2 563850 1 1
3 563961 1 1
4 565858 1 1
5 566739 1 1
6 568729 1 0
7 568729 0 0
8 568729 0 1

9 569961 1 1
10 660861 1 1

In the above example, the person with PHIN 568729 has 3 record (observations 6-8). For the first record (#6), FIRSTFL is set to 1, indicating that it is the first record for that person and all other records for that PHIN show FIRSTFL values set to 0. For the third and last record, LASTFL is set to 1, indicating that it is the last record for that person and all other records show LASTFL values set to 0.

Caution: When conducting BY-group processing, DO NOT do any data exclusions; data manipulation is ok. Data exclusions can be done in a subsequent data step. Data exclusions conducted during a data step with FIRST. and LAST. processing can cause unexpected results by eliminating the FIRST. and LAST. records for each BY-group. The only time data exclusions can be done with BY-group processing is with a subsetting WHERE statement, which is applied to the data set coming in, before any BY-group processing is carried out.


 << Previous

 Index

 Next >>