V. Adding Variables and Observations to Data Sets: The MERGE Statement

To match observations from one data set to another, you can use the MERGE statement in the DATA step. If you know that the two or more data sets are in exactly the same order then you do not need a common variable between the data sets (mismatched merge). However, data sets are usually merged together using a merge key (match merge). We will only look at match merges here. A merge key is a variable that is common to both data sets (i.e. a variable that has the same name length in both data sets). Before you can merge the data sets, they both must be sorted by the merge key using PROC SORT.

There are two types of merges: a) One-to-one merges and; b) One-to-many merges.

A. One-to-One Merge

The one-to-one merge combines observations from two or more data sets into a single observation in a new SAS data set.

If you merge two or more data sets and they both have variables with the same names other than the merge key, the variables in the second data set will overwrite the variables with the same name in the first data set.

*This program assumes that the data set htwt has already been created* 
         /*Create a temporary data set*/
    data htwt_reg;
    set course.htwt;

         /*Sort the data sets by the merge key*/
    PROC SORT data=htwt;
    by name;

    PROC SORT data=htwt_reg;
    by firstname;

    data mer;
        merge htwt (in=m1)
         /*Merge keys from both data sets must have the 
           same name, rename the merge key in the htwt_reg 
           data set*/              
              htwt_reg (in=m2 rename=(firstname=name));
        by name;
         /*Create variables that indicate which data set 
           contributed the observations*/

    PROC PRINT data=mer;
    title 'Merged Data Set';
B. One-to-Many Merges

One-to-many merges refer to the case where one data set has one observation for each value of the merge key and the other data set has more than one observation for each value of the merge key.

*This program assumes that the data set htwt has already been created*        
         /*Create a data set with one observation per value of sex*/ 
PROC MEANS data=course.htwt;
class sex;
var age;
/*Create a temporary data set called mage*/
output out=mage mean=mean_age;
    PROC SORT data=course.htwt out=htwt; 
by sex;
    data mer; 
merge htwt (in=m1)
/*Create variables that indicate what data set
contributed their observations. In the data set
mage keep only the variables sex and mean_age*/
mage (in=m2 keep=sex mean_age);
by sex;

CAUTION: It is important to remember to use a BY statement in the merge. By default, SAS will not report an error and you may end up with a mismatched merge instead of the matched merge that was intended.


These questions assume that a permanent SAS data set has been created from the sample Clinical data set, including the format file. Examples are given for how program, log and output might look.

  1. Create two data sets containing: a) Only Males and; b)Females that are not pregnant. Concatenate the data sets and interleave the data sets by date of birth.
  2. Create two new data sets keeping only the following variables: a) id, gender, date of birth, primary DX, and secondary DX and; b) gender and heart rate. Merge the data sets together to add the 'heart rate' variable. Limit the data set to those who have a heart rate over 70.
  3. Create a data set with one observation per value of gender using PROC MEANS. Use the variable 'heart rate'. Find the proportion of observations with heart rate greater than the mean heart rate.

 << Previous


 Next >>