III. Explore the Data: Creating Tables

The SAS procedure PROC FREQ is commonly used to produce summary data in tabular form. Five examples are shown here using this procedure on the height/weight data set. It can be used on either character or numeric data, although a procedure specifically for numeric data (like PROC MEANS or PROC UNIVARIATE) may be more appropriate for numeric variables having many different values.

The following is a summary of options and optional statements that can be used with PROC FREQ. Optional statements can be in any order, while options are entered at the end of the TABLES statement, following "/" and before ";" Note that this list represents only a portion of all available to the user from SAS:

  • TABLES - optional statement for specifying the variables to be included in the analysis.
  • WEIGHT - optional statement for specifying the variables to be summed for each value of the variables specified in the TABLES statement.
  • CHISQ - option to obtain chi-square statistic to test for significant differences.
  • ALL - option to obtain all statistics available with PROC FREQ.
  • MISSING - option to include missing values in the calculations within the table.
  • MISSPRINT - option to display the missing values in the tables without including them in the calculations.
  • LIST - option to list values of variables side by side rather than in tabular form.
  • OUT= - option to create a data set containing the output generated by the TABLES statement.

PROC FREQ: Example 1

*This program creates output (Example 1)         *
*using the default setting of PROC FREQ, which   *
*produces 1-way tables of ALL the variables in   * 
*the data.                                       *
           /* Begin the PROC step */
proc freq data=htwt;
           /* Add 2 titles */
  title1 'PROC FREQ:  Example 1';
  title2 'No keywords specified';
           /* End the PROC step */

PROC FREQ: Example 2

*This program creates 1-way tables for two variables*
*(Example 2).                                       *
proc freq data=htwt;

      /* Produce tables for 2 variables */
  tables sex age;  

  title1 'PROC FREQ:  Example 2';
  title2 '1-way tables for variables specified 
 by TABLES keyword'; 

PROC FREQ: Example 3

*This program creates a 2-way table (a "cross-tab"),*
*from a subset of the data (Example 3) by adding an *
*asterisk between the two variables. The            *
*values for the first variable specified appear on  *
*the left side of the table while the values for the*
*second variable appear across the top of the table.*
*A statistic is requested and a new data set is also*
*created.                                           *
proc freq data=htwt;   

             /* Produce cross-tab with chi-square
                 statistic and create a new data set
                 containing the output generated by
                 the TABLES statement*/
  tables sex * age /chisq out=freqtbl; 

             /*Keep only ages 0 to 29 */
  where 0<=age<=29; 

  title1 'PROC FREQ:  Example 3';
  title2 '2-way table using the CHISQ, 
            WHERE, and OUT= keywords';
  title3 'Subsetting ages 0 to 29';

      /* Produce a listing of the new data set*/
proc print data=freqtbl;  
  title4 'A PRINT of the OUTPUT data set';

PROC FREQ: Example 4

*This program creates a 2-way table listing the     *
*values of the variables side by side (Example 4).  *
*This is a useful way of checking the values        *
*of existing variables against those of new         *
*variables to ensure they have been accurately      *
*created.                                           *
proc freq data=htwt;   

        /* Use the LIST keyword to list the values
             side by side, and the MISSING keyword to
           indicate which variable(s) may have missing
  tables sex * age /list missing; 

  title1 'PROC FREQ:  Example 4';
  title2 '2-way table using LIST and MISSING options';
       /*Remove previous TITLE3 and TITLE4 */

PROC FREQ: Example 5

*This program creates a 3-way table using three     *
*variables on a subset of the data (Example 5).     *
*The first variable represents the control variable,*
*for which separate output (cross-tabs of the other *
*two variables)is created for each of its values.   *
proc freq data=htwt;   

    /* Controlling for "name", produce cross-tabs
          of "height" by "weight"*/
  tables name * height * weight;  

        /*Keep only ages 0 to 27 */
  where 0<=age<28;  

  title1 'PROC FREQ:  Example 5';
  title2 '3-way table: height by weight,
     controlling for name';
Note: SAS can create tables that cross any amount of variables (i.e., 'n'-way table), but interpretations can get complicated with too many variables.


These questions assume that a permanent SAS data set has been created from the sample Clinical data set and that the format file has been included. The default setting for PROC FREQ is would generate a lengthy list of all numeric and character variables; instead the variables for analysis should always be specified using a TABLES statement (similar to the VAR statement used in the numeric procedures MEANS and UNIVARIATE). Examples are given for how program, log, and output might look.

  1. Create one-way tables for each of the following variables: gender, pregnant, primary DX and secondary DX. Add value labels for each of them; the format names are found in the format file for the clinical data set. These one-way tables display the distribution of values for each of the specified variables.

  2. Create separate two-way tables (or cross-tabs), i.e., one variable against the other, for each of the following questions; label the values of each variable using the available formats:

    • What proportion of pregnant women were taking vitamins, compared with non-pregnant women? In this case, only women should be kept for analysis.
    • How does primary diagnosis differ by gender? (Suggestion: put gender as the last variable in the TABLES statement because it has only 2 values. Recall that values for the last variable are displayed across the width of the table.)
    • Create a side-by-side listing to check the values of gender against the values of pregnant.

  3. Controlling for gender, how does the distribution of primary diagnosis differ for those taking vitamins versus those not taking vitamins? This can be answered using a 3-way table.

 << Previous


 Next >>