Prior to completing the exercises, several steps are needed to prepare the data:
Open the lbls93 file in the Program Editor window to comment out the FORMAT statement (i.e., add * to the beginning of the statement). Save the revised file and clear the Program Editor window. The original values, rather than the formatted values of variables can thus be referenced in the programs (it is simpler, for example, to refer to regionre='1' rather than regionre='central Manitoba').
Open the program that creates the temporary SAS data set "test" from the simulated Manitoba Health data set into the Program Editor window. No changes need to be made to this program.
Submit the program and check the log for messages; the log should indicate that a temporary SAS data set called "test" (in the WORK library) was created for use for this SAS session.
It is assumed that the questions are completed during the course of one SAS session. If not, the data set must be re-created for the next SAS session, as well as the formats (the record layout shows which formats correspond with each of the variables in the data set).
Programs can be developed and tested in a number of different ways. If the programming for all the questions below is saved into one file, the user might, rather than submitting the entire file to test only portions of code, instead highlight the portion to be tested before pressing the submit key. The resulting log and output can thus be checked to ensure the code is accurate, before keeping it as part of the larger program.
For each of the questions, add: 1) a title descriptive of the data set being used, and 2) either a second title or a footnote indicating the question number. The same title can be used for each question, so there is no need to repeat the TITLE1 statement for the other questions (SAS will automatically keep the same title for the duration of the SAS session unless instructed otherwise).
1. Produce the following listings of data:
- For the first 20 observations, specify the following variables to be shown on the output (original values): gender, age, los, op01, diag01, and diag02.
- Sort the data by gender and regionre and produce a listing of the first 40 observations. Display only ncase, gender, regionre, and icd17brk in the output. This time display the formatted, or labeled, values rather than the original values for all except ncase
2. For a later exercise, utilization for Winnipeg vs non-Winnipeg residents will be compared. Create two formats, one that will be used to group regionre into new values and one that will be used to label the new values:
- Name the grouping format $wpgf; this format should be able to group the Winnipeg value into '1' and non-Winnipeg values into '0',
- Name the labeling format $wpgl; this format should be able to label each of the two new values.
Although this question could be done using only one format (i.e., specifying the label 'Winnipeg' in the first format instead of '1'), the two-step process is typically used, for example, to simplify specification of values of the new variable within a SAS program - e.g., to be able to use '1' within a line of code rather than 'Winnipeg' to reference Winnipeg records.
3. Obtain information on the number of observations and the mean, minimum, and maximum values, setting maximum decimal places to 2 for the following:
- The variables for age, length of stay, and days to death.
- Note the skewed results for deathsep. The value of 9999 actually refers to those still alive. Run a program for this variable only, including a WHERE statement to keep only the values which are less than 9999.
- The variables for age and length of stay, this time showing the results by region of residence. Use the region format to attach labels to region of residence.
4. How does the distribution of hospital discharges for selected categories of ICD-9-CM diagnoses icd17brk differ by gender gender? Display the information using original values and again using formatted values.
5. Examine the relationship among variables for the following:
- Is the presence of high-risk diagnoses on admission charyes associated with neighbourhood income level incdr? Display the formatted values for both variables.
- How does the relationship between these two variables differ by gender (use the formatted value for this variable as well)?
6. Develop a program that will create the following new variables (always within a data step):
7. Check the new variables:
- For losgroup and wpgres, use a side-by-side listing (PROC FREQ) to compare original variables against the new variables, ensuring that labeled values are used for the 3 character variables. Both comparisons can be run within the same PROC FREQ.
- For loswks (the only new numeric variable) and los, run a PROC MEANS.
- For the remaining two character variables, run a PROC PRINT for the first 30 observations, showing both original and new variables (i.e., output for a total of 4 variables).
- Do a PROC CONTENTS on the data set to ensure the new variables were properly labeled.
The program, log, and output are all available for the above questions. For additional practice, another set of more research-focused questions has been developed.