III. Explore the Data: Statistics for Numeric Data

Certain SAS procedures can only be performed on numeric data. Two such procedures - PROC MEANS and PROC UNIVARIATE - are illustrated here using the height/weight SAS data set. (Note that PROC SUMMARY generates output similar to PROC MEANS.)

1. PROC MEANS

PROC MEANS: Example 1

*****************************************************
*This program creates output (Example 1)            *
*using the default setting of PROC MEANS.           *
*****************************************************;
 
proc means data=htwt;/* Begin the PROC step */
                            /* Add 2 titles */
  title1 'PROC MEANS:  Example 1';
  title2 'No keywords specified';
run;             /* End the PROC step */

PROC MEANS: Example 2

*****************************************************
*This program specifies a series of keywords and    *
*optional statements to create output (Example 2)   *
*using PROC MEANS. The CLASS statement avoids having*
*to sort the data first, but the CLASS statement is *
*more suited to smaller data sets or when just a few*
*CLASS variables are to be used.                    *
*****************************************************;
 
 /*Some of the keywords available with PROC MEANS:
                N - number of observations
                MEAN - mean value
                MIN - minimum value
                MAX - maximum value
                SUM - total of values
                NMISS - number of missing values
                MAXDEC=n - set maximum number of 
                           decimal places */
proc means data=htwt n 
   mean min max sum nmiss maxdec=1;
   /*Apply analysis only to "age" variable*/
  var age;    
   /*Separate the analysis by values of sex*/
  class sex;   
   /* Add 3 titles */
  title1 'PROC MEANS:  Example 2';
  title2 'Use of VAR, CLASS, and TITLE statements';
  title3 'CLASSED by gender';
run;            

PROC MEANS:  Example 3

*****************************************************
*This program generates output (Example 3)          *
*similar to Example 2 but displays the output       *
*slightly differently and also creates another SAS  *
*data set. Additional resources are used because    *
*the data must be sorted first.                     *
*****************************************************;
 
 /* Sort the data first because a BY statement
    is being used in the next PROC step */
       /*Sort by sex */
proc sort data=htwt;  
  by sex;       
run;

proc means data=htwt n 
    mean min max sum nmiss maxdec=1;

     /*Separate the output by sex*/
  var age;
  by sex;        

 /*Create a temporary SAS data set containing 
      the information generated by PROC MEANS */
  output out=agedata; 

  /* Add 3 titles */
  title1 'PROC MEANS:  Example 3';
  title2 'Use of VAR, BY and OUTPUT statements';
  title3 'SORTED by gender';
run; 
 
   /*Display values of the new data set*/
proc print data=agedata; 

   /* Add a 4th title*/
  title4 'A print of the OUTPUT data set'; 
run;

 /*Remove Titles 2-4 from the next set of output*/
title2;
title3;
title4;

2. PROC UNIVARIATE

PROC UNIVARIATE provides additional statistics, some of which are not available from PROC MEANS (e.g. mode).

PROC UNIVARIATE: Example

******************************************************
*This program uses PROC UNIVARIATE to create         *
*detailed output of numeric statistics               *
*(Univariate example) on the "age" variable.         *
******************************************************;
 
proc univariate data=htwt;
                           
  var age;   /*Apply analysis only to "age" variable*/
                    /* Add 3 titles */
  title1 'PROC UNIVARIATE example';
run;           

EXPLORE THE DATA - PRACTICE QUESTIONS (numeric)

These questions assume that a permanent SAS data set has been created from the sample clinical data. The format file does not need to be included for this section. Examples are given for how program, log, and output might look.

  1. Generate numeric statistics using the default setting for PROC MEANS.

  2. Obtain the mean values for heart rate and systolic and diastolic blood pressure, limiting the decimal places to 2, and indicating how many missing values there may be.

  3. Re-submit the question, this time obtaining the mean values for the 3 variables for each gender and for whether or not the patient is pregnant. Save these values to a separate data set, and display a listing of these values. (3 procedures)

  4. Obtain mean, median, and mode values for systolic and diastolic blood pressure.


 << Previous

 Index

 Next >>