Notes on Clustering, Fixed Effects, and Fama-MacBeth regressions in SAS

Noah Stoffman, Kelley School of Business, Indiana University
Code updated June, 2011; Links updated August, 2016

This page shows how to run regressions with fixed effect or clustered standard errors, or Fama-Macbeth regressions in SAS. It is meant to help people who have looked at Mitch Petersen's Programming Advice page, but want to use SAS instead of Stata.

Mitch has posted results using a test data set that you can use to compare the output below to see how well they agree. You can generate the test data set in SAS format using this code.

SAS now reports heteroscedasticity-consistent standard errors and t-statistics with the hcc option:

proc reg data=ds; 
 model y=x / hcc;
 run;
quit;

You can use the option acov instead of hcc if you want to see the covariance matrix of the standard errors. Thanks to Guan Yang at NYU for making me aware of this. Until version 9.2, you had to use ODS to capture these statistics, which always seemed silly to me. SAS finally caught up though.

A regression with fixed effects using the absorption technique can be done as follows. (Note that, unlike with Stata, we need to supress the intercept to avoid a dummy variable trap.)

proc glm;
 absorb identifier;
 model depvar = indvars / solution noint; run;
quit;

Absorption is computationally fast, but the individual fixed effects estimates will not be displayed. If you want to see the fixed effects estimates, use:

proc glm;
 class identifier;
 model depvar = indvars identifier / solution; run;
quit;

This will automatically generate a set of dummy variables for each level of the variable "identifier".

Clustered standard errors may be estimated as follows:

proc genmod;
 class identifier;
 model depvar = indvars;
 repeated subject=identifier / type=ind; run;
quit;

This method is quite general, and allows alternative regression specifications using different link functions. The online SAS documentation for the genmod procedure provides detail.

Alternatively, you may use surveyreg to do clustering:

proc surveyreg data=ds;
 cluster culster_variable;
 model depvar = indvars; run;
quit;

Note that genmod does not report finite-sample adjusted statistics, so to make the results between these two methods consistent, you need to multiply the genmod results by (N-1)/(N-k)*M/(M-1) where N=number of observations, M=number of clusters, and k=number of regressors. More detail is provided here.

Clustering in two dimensions can be done using the method described by Thompson (2011) and others. SAS code to do this is here and here.

Running a Fama-Macbeth regression in SAS is quite easy, and doesn't require any special macros. The following code will run cross-sectional regressions by year for all firms and report the means.

ods listing close;
ods output parameterestimates=pe;
proc reg data=dset;
 by year;
 model depvar = indvars; run;
quit;
ods listing;

proc means data=pe mean std t probt;
 var estimate; class variable;
run;

Since the results from this approach give a time-series, it is common practice to use the Newey-West adjustment for standard errors. Unlike Stata, this is somewhat complicated in SAS, but can be done as follows:

proc sort data=pe; by variable; run;

%let lags=3;
ods output parameterestimates=nw;
ods listing close;
proc model data=pe;
 by variable;
 instruments / intonly;
 estimate=a;
 fit estimate / gmm kernel=(bart,%eval(&lags+1),0) vardef=n; run;
quit;
ods listing;

proc print data=nw; id variable;
 var estimate--df; format estimate stderr 7.4;
run;

Note that the lag length is set by defining a macro variable, lags. The approach here is to use GMM to regress the time-series estimates on a constant, which is equivalent to taking a mean. This works because the Newey-West adjustment gives the same variance as the GMM procedure. (See Cochrane's Asset Pricing book for details.)

[Home]