Friday, 2 October 2015








The presentation of data is considered to be just as important as the data analysis itself. In this blogpost we will look at 5 examples from Ron Cody’s “Learning SAS by Example: A Programmer’s Guide” which will help us summarize data and create meaningful reports out of it. The main topics we will cover are as follows –

  1. Summarizing Data
  2. Counting the Frequencies
  3. Creating Tabular Reports

The format we will follow for each problem is quite similar to my previous blogpost and it is as follows -

  1. Problem
  2. Code
  3. Result
  4. Learning


Summarizing the data
In this section we will try to understand how Proc Means can be used to summarize data using various commands

1) Using the SAS data set College (available in the link at the bottom), we will report the mean GPA for the following categories of ClassRank: 0–50 = bottom half, 51–74 = 3rd quartile, and 75 to 100 = top quarter. This is done by creating an appropriate format. We will not use a DATA step.

Code

In the first proc step, we are creating the 3 constrains and naming them accordingly which will serve as the 3 row names in the result. In the second proc step we first declare what descriptive analysis we need to do, in this case n and mean. Later we are specifying how they will be categorized by using CLASS function instead of a BY function. var stands for variable. It tells SAS which variable to be taken for analysis.

Result

Learning
In this exercise we have learnt how to use descriptive statistics related to Proc Means and classify that data based on the desired variables. 



2) Using the SAS data set College we will create four summary data sets containing the number of non-missing and missing values and the mean, minimum, and maximum for ClassRank and GPA, broken down by Gender and SchoolSize. The first data set (Grand) will contain the statistics for all subjects, the second data set (ByGender) will contain the statistics broken down by Gender, the third data set (BySize) will contain the statistics broken down by SchoolSize, and the fourth data set (Cell) will contain the statistics broken down by Gender and SchoolSize. We will do this by using PROC MEANS (with a CLASS statement) and one DATA step. 

Code
 In the first proc step we are, like in the previous example, using proc means and categorizing it with CLASS function. Then var is used to select variables. the functions within means are in the 5th line. In the Data step, drop is used to remove the two variables Gender and SchoolSize. We are actually doing a triple activity in a single data step(as shown in chapter 8) In the end we are using an IF - ELSE statement to get the desired output.   

Result


Learning
In this exercise, we have learnt firstly how to create multiple tables in a single data step. Secondly, how to effectively use  If - Else statements to fill in every table by the requirement 



Counting the Frequencies
In this topic we will try to understand how Proc Freq can be used for the purpose of creating meaningful tables to summarize the data.


3) Using the data set Blood, we will produce frequencies for the variable Chol (cholesterol) using a format to group the frequencies into three groups: low to 200 (normal), 201 and higher (high), and missing. We will run PROC FREQ twice, once using the MISSING option, and once without to compare the percentages in both listings.


Code

 In the first step, proc format, we use value function to categorize chol into 3 segments i.e Normal, High, and Missing(for missing values). In the second step, proc freq we find the frequencies taking into consideration the missing values. In the third step, we find the frequencies without considering the missing values. 

Result

Learning
In this exercise we learn how to get rid of the missing values with ease using different types of formatting techniques.


Creating Tabular Reports
In this section we will learn how to create tables having multiple variables in both columns and rows. This is largely used to create descriptive statistics for final reports

   
4) We will produce the following table.

Result

 Code

In the first step, we format the variables into 2 sections by Gender. Next we use proc tabulate create a table using Class function and 3 variables. Next we use tables function to multiply Gender and Scholarship to get the desired result.

Learning
We have learnt here how to create tabular reports using proc tabulate. This has helped us summarize the data in one single table instead of multiple tables.


5) We are trying to produce the following table. 

Result

Code 

Learning
We have learnt how to use descriptive statistics in the tabular format. This is used for reporting research studies.

The following is the link to access the data sets and the code used to create this blogpost - https://drive.google.com/file/d/0B2sVurZ_f97PSWwyQW9aZEFhdk0/view?usp=sharing