# SAS Regresesion Analysis

SAS provides the capability to build regression models to analyze ordinal data correlations and use this model to predict out-of-sample values. Here is the process:

Open SAS Studio > Go Tasks > Linear Regression > the following window will open:

Input Data settings: Model settings: One can add simple variables to the indepedent variables set. If one suspects a combinaiton of variables can be influencing the dependent variable, one can add this combination of variables as “Cross” As shown in the below snapshot:
Regression Model selection is also performed here, and the following models are the options: Full Factorial – N-Factorial – Polynomial Order: Options: Output settings: The following is an example of the results:

 Root MSE 84.0414 R-Square 0.3995 Dependent Mean 472.225 Adj R-Sq 0.3194 Coeff Var 17.7969
 Number of Observations Read 150 Number of Observations Used 35 Number of Observations with Missing Values 115

ANOVA for testing whether the predictors’ coefficients collectively are different from 0. In other words if there is a linear relationship of the response variable with the predictor ones, thus whether this (or any) model is good:

 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 4 140951 35238 4.99 0.0033 Error 30 211889 7062.96453 Corrected Total 34 352840

This indicates that if the average coefficient was 0, there is a tiny chance to get our values of the coefficients. Thus there must be a linear relationship between the variables, and that the regression model is good.

Further analysis on the goodness of the model:

R-Square and Adjusted R-Square also test the collective correlation between the response and all the predictor variables collectively in one number. A value near 1 means high  correlation and thus the regression is good, and near 0 means not good.

RMSE tests the goodness of regression fit by measuring the residuals.

Here the test checks whether each predictor variable separately has a significant coefficient. Clearly VIX and Jobless and P_E seem not good predictors (there is high possibility that their coefficients are near 0):

 Parameter Estimates Variable Label DF Parameter Estimate Standard Error t Value Pr > |t| Intercept Intercept 1 739.59605 201.20407 3.68 0.0009 P_E P/E 1 -1.36988 2.19982 -0.62 0.5382 GDP GDP 1 17.20573 7.73193 2.23 0.0337 VIX VIX 1 0.67569 2.44551 0.28 0.7842 JOBLESS JOBLESS 1 -0.67914 0.42317 -1.60 0.1190 Observed vs Predicted Y R-Student By Predicted Y Residual by Regression