V. Adding Variables and Observations to Data Sets: The SET Statement

A. Concatenating Data Sets

The SET statement, when used with one data set, can allow you to read or modify the data. If the SET statement is used with two or more data sets it can not only allow you to read and modify the data but also it can concatenate or stack the data sets on top of each other. The SAS system will read all observations from the first data set then the second and so on until all observations are read. This process is useful when you want to combine data sets that have most or all of the same variables with different observations.

The number of observations in the new data set will be the sum of all the observations from the original data sets. The order of the observations is based on the order of the list of the original data sets. If any of the data sets has a variable that is not contained within another data set, the observations from that data set will have missing values for that particular variable.

*This program assumes that the data set htwt has already been created*

         /*Create temporary data sets*/
    data male_htwt;
    set course.male_htwt;
    run;

    data female_htwt;
    set course.female_htwt;
    run;

         /*Add observations by creating a new data set*/
    data concat;
         /*concatenate the data using a SET statement*/
         /*create variables that indicate whether the data 
           set set contributed data to the current observation, 
           using in=*/ 
    set male_htwt (in=m1)
        female_htwt (in=m2);
         /*make the indicators permanent variables*/
    inmale=m1;
    infemale=m2;
    run;

    PROC PRINT data=concat;
    title 'Data=Male and Data=Female Concatenated';
    run;
B. Interleaving Data Sets

If you have data sets that are sorted by some variable, simply concatenating the data sets as shown previously, may unsort the data sets. If you want to concatenate observations from two or more data sets in a particular order, it is more efficient to use a BY statement with the SET statement outlined above. This process is called interleaving data sets.

Before you can interleave the data sets you must sort the data sets by interleaving variable using PROC SORT. Like concatenated data sets, the number of observations in the new data set is equal to the sum of observations from the original data sets. If data set does not have a variable contained within the other data sets, the observations will be set to missing.

    *This program assumes that the data sets htwt, male_htwt, and female_htwt have already been created* 
    /*Sort the male and female data sets BY age*/

    PROC SORT data=male_htwt; 
    by age; 
    run; 

    PROC SORT data=female_htwt; 
    by age; run; 

    /*Create a new data set interleaving the male and female data sets by age*/
    data interleave; 
    set male_htwt female_htwt; 
    by age; 
    run; 

    PROC PRINT data=interleave; 
    title 'Interleaving Male and Female Data Sets by Age'; 
    run;

 << Previous

 Index

 Next >>