 Article
 Open Access
Estimation of standard errors and treatment effects in empirical economics—methods and applications
 Olaf Hübler^{1}Email author
https://doi.org/10.1007/s1265101301350
© Institut für Arbeitsmarkt und Berufsforschung 2013
 Published: 10 July 2013
Abstract
This paper discusses methodological problems of standard errors and treatment effects. First, heteroskedasticity and clusterrobust estimates are considered as well as problems with Bernoulli distributed regressors, outliers and partially identified parameters. Second, procedures to determine treatment effects are analyzed. Four principles are in the focus: differenceindifferences estimators, matching procedures, treatment effects in quantile regression analysis and regression discontinuity approaches. These methods are applied to CobbDouglas functions using IAB establishment panel data.
Different heteroskedasticityconsistent procedures lead to similar results of standard errors. Clusterrobust estimates show evident deviates. Dummies with a mean near 0.5 have a smaller variance of the coefficient estimates than others. Not all outliers have a strong influence on significance. New methods to handle the problem of partially identified parameters lead to more efficient estimates.
The four discussed treatment procedures are applied to the question whether companylevel pacts affect the output. In contrast to unconditional differenceindifferences and to estimates without matching the companylevel effect is positive but insignificant if conditional differenceindifferences, nearestneighbor or Mahalanobis metric matching is applied. The latter result has to be specified under quantile treatment effects analysis. The higher the quantile the higher is the positive companylevel pact effect and there is a tendency from insignificant to significant effects. A sharp regression discontinuity analysis shows a structural break at a probability of 0.5 that a companylevel pact exists. No specific effect of the Great Recession can be detected. Fuzzy regression discontinuity estimates reveal that the companylevel pact effect is significantly lower in East than in West Germany.
Schätzung von Standardfehlern und Kausaleffekten in der empirischen Wirtschaftsforschung – Methoden und Anwendungen
Zusammenfassung
Dieser Beitrag diskutiert Möglichkeiten zur Schätzung von Standardfehlern und Kausaleffekten. Zunächst werden heteroskedastie und gruppenrobuste Schätzungen für Standardfehler betrachtet sowie Auffälligkeitenund Probleme bei DummyVariablen als Regressoren, Ausreißern und nur partiell identifizierten Parametern erörtert. Danach geht es um Verfahren zur Bestimmung von Treatmenteffekten. Vier Prinzipien werden hierzuvorgestellt: DifferenzvonDifferenzenSchätzer, Matchingverfahren, Kausaleffekte in der Quantilsregressionsanalyse und Ansätze zur Bestimmung von Diskontinuitäten bei Regressionsschätzungen. Anwendungen erfolgen im zweiten Teil der Arbeit auf CobbDouglasProduktionsfunktionen unter Verwendung von IABBetriebspaneldaten.
Verschiedene heteroskedastiekonsistente Verfahren führen zu recht ähnlichen Ergebnissen bei den Standardfehlern. Clusterrobuste Schätzungen zeigen dagegen deutliche Abweichungen. Dummies als Regressoren mit einem Mittelwert in der Nähe von 0.5 weisen kleinere Varianzen der Koeffizienterschätzer auf als andere. Nicht alle Ausreißer haben einen nennenswerten Einfluss auf die Signifikanz. Neuere Methoden zur Behandlung des Problems von nur partiell identifizierten Parametern führen zu effizienteren Schätzungen.
Die vier diskutierten Verfahren zur Bestimmung der Wirkungen von Maßnahmen werden auf das Problem, ob betriebliche Bündnisse einen signifikanten Einfluss auf den Produktionsoutput haben, angewandt. Im Gegensatz zu nicht konditionalen DifferenzvonDifferenzenSchätzern und Schätzern ohne Matching sind die Effekte betrieblicher Bündnisse bei bedingten DifferenzvonDifferenzen Schätzern und MatchingVerfahren zwar positiv, aber insignifikant. Diese Aussage ist auf Basis der TreatmentQuantilsanalysezu präzisieren. Je höher die Quantile sind, umso größer ist die Wirkung betrieblicher Bündnisse mit einer Tendenz von insignifikanten zu signifikanten Effekten. Die deterministische Regressionsanalyse mit Diskontinuitäten zeigt einen Strukturbruch bei Wahrscheinlichkeit 0.5, dass ein betriebliches Bündnis existiert. Es lassen sich keine spezifischen Effekte während der Rezession 2009 ausmachen. Schätzungen im Rahmen stochastischer Diskontinuitätsansätze offenbaren, dass die Wirkungen betrieblicher Bündnisse in Ostdeutschland signifikant niedriger ausfallen als in Westdeutschland.
Keywords
 Standard errors
 Outliers
 Partially identified parameters
 DiD estimators
 Matching
 Quantile regressions
 Regression discontinuity
JEL Classification
 C21
 C26
 D22
 J53
1 Introduction
Contents, questions and methods have changed in empirical economics in the last 20 years. Many methods were developed in the past but the application in empirical economics follows with a lag. Some methods are wellknown but have experienced only little attention. New approaches focus on characteristics of the data, on modified estimators, on correct specifications, on unobserved heterogeneity, on endogeneity and on causal effects. Real data sets are not compatible with the assumptions of classical models. Therefore, modified methods were suggested for the estimation and inference.
 (1)
Significance is an important indicator in empirical economics but the results are sometimes misleading.
 (2)
Assumptions’ violation, clustering of the data, outliers and only partially identified parameters are often the reason of wrong standard errors using classical methods.
 (3)
The estimation of average effects is useful but subgroup analysis and quantile regressions are important supplements.
 (4)
Causal effects are of great interest but the determination is based on disparate approaches with varying results.
2 Econometric methods
2.1 Significance and standard errors in regression models
 (1)
There does not exist any effect but due to technical inefficiencies a significant effect is reported.
 (2)
The effect is small but due to the precision of the estimates a significant effect is determined.
 (3)
There exists a strong effect but due to the variability of the estimates the statistical effect cannot be detected.

Compute robust standard errors.

Analyze whether variation within clusters is only small in comparison with variation between the clusters.

Check whether dummies as regressors with high or low probability are responsible for insignificance.

Test whether outliers induce large standard errors.

Consider the problem of partially identified parameters.

Detect whether collinearity is effective.

Investigate alternative specifications.

Use subsamples and compare the results.

Execute sensitivity analyses (Leamer 1985).

Employ the sniff test (Hamermesh 2000) in order to detect whether econometric results are in accord with economic plausibility.
2.1.1 Heteroskedasticityrobust standard errors
The intention is to obtain more efficient estimates. It can be shown for hc _{2} that under homoskedasticity the mean of \(\hat{u}_{i}^{2}\) is the same as σ ^{2}(1−c _{ ii }). Therefore, we should expect that the hc _{2} option leads under homoskedasticity to better estimates in small samples than the simple hc _{1} option. Then \(E(\hat{u}_{i}^{2}/(1c_{ii}))\) is σ ^{2}. The second correction is presented by MacKinnon and White (1985). This is an approximation of a more complicated estimator which is based on a jackknife estimator—see Sect. 2.1.2. Applications demonstrate that the standard error increases started with OLS via hc _{1}, hc _{2} to the hc _{3} option. Simulations, however, do not show a clear preference. As one cannot be sure which case is the correct one, a conservative choice is preferable (Angrist and Pischke 2009, p. 302). The estimator should be chosen that has the largest standard error. This means the null hypothesis (H _{0}: no influence on the regressand) keeps up longer than with other options.
2.1.2 Resampling procedures
2.1.3 The Moulton problem
A simple and extreme example shall demonstrate the cluster problem.
Example
i  V1  V2  V3  V4 

1  24  123  −234  −8 
2  875  87  54  3 
3  −12  1234  −876  345 
4  231  −87  −65  9808 
5  43  34  9  −765 
\(\hat{\beta}\)  1M  2M  4M  8M  

\(\hat{\sigma}_{\hat{\beta}}\)  \(\hat{\sigma}_{\hat{\beta}}\)  \(\hat{\sigma}_{\hat{\beta}}\)  \(\hat{\sigma}_{\hat{\beta}}\)  
V2  1.7239  1.7532  0.7158  0.4383  0.2922 
V3  2.7941  2.3874  0.9747  0.5969  0.3979 
V4  0.0270  0.0618  0.0252  0.0154  0.0103 
const  323.2734  270.5781  110.463  67.64452  45.0963 
2.1.4 Large standard errors of dichotomous regressors with small or large mean
If \(s_{D}^{2}\) is determined by \(\bar{D}=(nD=1)/n\) we find that \(\bar{D}\) is at most 0.5. \(V(\hat{b})\) is minimal at given n and σ ^{2} when the sample variance of D reaches the maximum, if \(\bar{D}=0.5\). This result holds only for inhomogeneous models.
Example
Y  \(\hat{\beta}_{1}\)  std.err. 

\(\overline{RV}=0.1\)  −0.3727  0.6819 
\(\overline{RV}=0.2 \)  −0.5970  0.5100 
\(\overline{RV}=0.3\)  −0.4768  0.4455 
\(\overline{RV}=0.4\)  0.3068  0.4170 
\(\overline{RV}=\boldsymbol{0.5}\)  0.1338  0.4094 
\(\overline{RV}=0.6\)  0.0947  0.4187 
\(\overline{RV}=0.7\)  −0.0581  0.4479 
\(\overline{RV}=0.8\)  −0.1860  0.5140 
\(\overline{RV}=0.9\)  −0.1010  0.6827 
This example confirms the theoretical result. The standard error is smallest if \(\overline{RV}=0.5\) and increases systematically if the mean of RV decreases or increases. An extension to multiple regression models seems possible—see applications in the Appendix, Tables 11, 12, 13, 14. The more \(\bar{D}\) deviates from 0.5, the larger or smaller is the mean of D, the higher is the tendency to insignificant effects. A caveat is necessary. The conclusion that the tvalue of a dichotomous regressor D _{1} is always smaller than that of D _{2}, when V(D _{1})>V(D _{2}), is not unavoidable. The basic effect of D _{1} on y may be larger than that of D _{2} on y. The theoretical result aims on specific variables and not on the comparison between regressors. In practice, significance is determined by \(t=\hat{b}/\sqrt{\hat{V}(\hat {b})}\). However, we do not find a systematic influence of \(\hat{b}\) on t if \(\bar{D}\) varies. Nevertheless, the random differences in the influence of D on y can dominate the \(\bar{D}\) effect via \(s_{D}^{2}\). The comparison of Table 13 with Table 14 shows that the influence of a works council (WOCO) is stronger than that of a companylevel pact (CLP). The coefficients of the former regressor are larger and the standard errors are lower than that of the latter regressor so that the tvalues are larger. In both cases the standard errors increase if the mean of the regressor is reduced. The comparison of line 1 in Table 13 with line 9 in Table 14, where the mean of CLP and WOCO is nearly the same, makes clear that the stronger basic effect of WOCO on lnY dominates the mean reduction effect of WOCO. The tvalue in line 9 of Table 14 is smaller than that in line 1 of Table 14 but still larger than that in line 1 of Table 13. Not all deviations of the mean of a dummy D as regressor from 0.5 induce the described standard error effects. A random variation of \(\bar{D}\) is necessary. An example, where this is not the case, is matching—see Sect. 2.2 and the application in Sect. 3. \(\bar{D}\) increases due to the systematic elimination of those observations with D=0 that are dissimilar to those of D=1 in other characteristics.
2.1.5 Outliers and influential observations
2.1.6 Partially identified parameters
Assume that some observations are unknown or not exactly measured. Consequence is that a parameter cannot exactly be determined but only within a range. The outlier situation leads to such a partial identification problem. There exist many other similar constellations.
Example

all persons who have not answered are employed

all persons who have not answered are unemployed.
The main methodological focus of partially identified parameters is the search for the best statistical inference. Chernozhukov et al. (2007), Imbens and Manski (2004), Romano and Shaikh (2010), Stoye (2009) and Woutersen (2009) have discussed solutions.
2.2 Treatment evaluation
In the last 20 years a wide range of methods was developed to determine the “correct” causal effect. Which approach should be preferred depends on the data, the behavior of the economic agents and the assumptions of the model. The major difficulty is that we have to compare an observed situation with an unobserved situation. Depending on the available information the latter is estimated. We have to ask what would occur if not D=1 but D=0 (treatment on the treated) would take place. This counterfactual is unknown and has to be estimated. Inversely, if D=0 is observable we can search for the potential result under D=1 (treatment on the untreated). A further problem is the fixing of the control group. What is the meaning of “otherwise” in the definition of D? Or in other words: What is the causal effect of an unobserved situation? Should we determine the average causal effect or only that of a subgroup?
Regression discontinuity (RD) design allows to determine treatment effects in a special situation. This approach uses information on institutional and legal regulations that are responsible that changes occur in the effects of economic measures. Thresholds are estimated indicating discontinuity of the effects. Two forms are distinguished: sharp and fuzzy RD. Either the change of the status is exactly effective at a fixed point or it is assumed that the probability of a treatment change or the mean of a treatment change is discontinuous.
The localized average follows if a small interval around \(\bar{S}\) is modeled, i.e. \(\bar{S}\Delta S <S_{i}<\bar{S} + \Delta S\). The treatment effect corresponds to the difference of the two former determined intercepts, restricted to \(\bar{S}<S_{i}<\bar{S}+\Delta S\) on the one hand and to \(\bar{S}\Delta S <S_{i}<\bar{S}\) on the other hand.
3 Applications: Some New Estimates of CobbDouglas Production Functions
Estimates of CobbDouglas production functions under alternative determination of standard errors using hc1, hc3, bootstrap, jackknife and clusterrobust estimates
hc1  hc3  bootstrap  jackknife  cluster (idnum)  

lnL  0.9472  0.9472  0.9472  0.9582  0.9472 
(184.02)  (183.99)  (227.40)  (184.49)  (126.29)  
lnK  0.2225  0.2225  0.2225  0.2178  0.2225 
(60.80)  (60.79)  (60.40)  (59.58)  (43.04)  
const  9.0810  9.0810  9.0810  9.0908  9.0810 
(307.86)  (307.81)  (271.82)  (308.83)  (215.20) 
Table 1 focus on alternative estimates of standard errors—see Sects. 2.1.1–2.1.3—of CobbDouglas production functions (CDF) in the logarithm representation with the input factors lnL and lnK. The estimation of conventional standard errors can be found for comparing in Table 3, column 1. The small standard deviations and therefore the large tvalues are remarkable. Though the clusterrobust standard errors in Table 1, column 5 are larger, they are still by far too low. This is due to unobserved heterogeneity. Fixed effects estimates can partially solve this problem as can be seen in the Appendix, Table 15.
The estimated coefficients in column 1–3 and 5 of Table 1 are identical. Estimates with hc2 and hc4—not presented in the tables—deviate only slightly from those with hc1. This could mean that it is not necessary to distinguish between hc1 to hc4. However, one could guess that stronger differences are observed if the sample is small. Empirical investigations, where only 10, 1 and 0.1 percent of the original sample size is used, do not support this presumption. The jackknife estimates of standard errors and tvalues are also not so far away from the heteroskedasticityconsistent estimates with hc1 and hc3. The nearness to estimates with hc3 is plausible because the latter is only a slightly simplified version of what one gets by employing the jackknife technique. Furthermore, Table 1 demonstrates that bootstrap and clusterrobust estimates of the tvalues differ strongest of the input factor labor (lnL), measured by the number of employees in the firm. Capital (lnK), approximated by the sum of investments of the last four years, has evidently larger clusterrobust estimates of standard errors than that from the other methods.
OLS estimates of an extended CDF with Bernoulli distributed regressors
Mean  Coef.  Std.err.  t  

lnL  0.8808  0.0061  144.33  
lnK  0.2049  0.0041  49.55  
CLP  0.0871  0.0307  0.0236  1.30 
WOCO  0.3035  0.3915  0.0184  21.19 
CB  0.3819  0.1385  0.0133  10.36 
P1  0.0834  0.2462  0.0231  10.65 
P2  0.3695  0.1032  0.0132  7.78 
const  9.2905  0.0367  253.03 
OLS estimates of CDFs with and without outliers, tvalues in parentheses; dependent variable: logarithm of sales—lnY
With outliers  Without outliers  Without strong leverages  With HadiMOD  

lnL  0.9472  0.9415  1.0409  0.9412 
(222.12)  (240.28)  (169.10)  (240.10)  
lnK  0.2225  0.2242  0.1724  0.2243 
(70.11)  (77.04)  (36.33)  (77.08)  
MOD  1.8810  
(2.33)  
const  9.0811  9.0498  9.3445  9.0490 
(333.20)  (362.66)  (238.53)  (362.62)  
n  34,308  33,851  27,262  34,308 
R ^{2}  0.866  0.866  0.805  0.843 
Column 4 extends the consideration to outliers following Hadi (1992).The squared difference between individual regressor values and the mean for all regressors—here lnL and lnK—is determined for each observation weighted by the estimated covariance matrix—see Sect. 2.1.5. The decision whether establishment i is an outlier is now based on the Mahalanobis distance. MOD, the vector of multiple outlier dummies (MOD_{ i }=1 if i is an outlier; =0 otherwise), is incorporated as an additional regressor. The estimates show that outliers have a significant effect on the output variable lnY. The coefficients and the tvalues in column 2 and 4 are very similar. This is a hint that the outliers defined via \(\hat{u}^{*}\) are mainly determined by large deviations of the regressor values. From \(\hat {u}^{*}\) it is unclear whether the values of the dependent variable or the independent variables are responsible for the fact that an observation is an outlier.
Confidence intervals (CI) of output elasticities of labor and capital based on a CobbDouglas production function, estimated with and without outliers, Stoye’s confidence interval at partially identified parameters; dependent variable: logarithm of sales—lnY
CI with outliers  CI without outliers  Stoye CI  

\(\hat{\beta}_{\ln L;u}\)  0.9555  0.9492  0.9511 
\(\hat{\beta}_{\ln L;l}\)  0.9388  0.9339  0.9376 
\(\hat{\beta}_{\ln K;u}\)  0.2287  0.2299  0.2282 
\(\hat{\beta}_{\ln K;l}\)  0.2162  0.2185  0.2184 
\(\Delta\hat{\beta}_{\ln L}\)  0.0167  0.0153  0.0135 
\(\Delta\hat{\beta}_{\ln K}\)  0.0125  0.0114  0.0098 
Unconditional and conditional DiD estimates with companylevel pact (CLP) effects; dependent variable: logarithm of sales—lnY
Unconditional  Conditional  

lnL  0.9423  
(166.03)  
lnK  0.2211  
(53.37)  
CLP  3.1152  0.0951 
(35.91)  (2.36)  
D2009  0.0597  0.0216 
(2.25)  (1.54)  
CLP∗D2009  −0.3029  0.0400 
(−2.90)  (0.84)  
n  31,985  20,490 
R ^{2}  0.101  0.841 
Estimates of CDFs with CLP effects using matching procedures; dependent variable: logarithm of sales—lnY
No matching  MM  NNM  

lnL  0.9420  0.9362  0.9533 
(166.03)  (47.75)  (63.32)  
lnK  0.2212  0.1938  0.2007 
(53.42)  (15.12)  (19.70)  
CLP  0.1231  0.1928  0.0496 
(5.22)  (1.31)  (1.46)  
n  20,490  1,806  3,346 
R ^{2}  0.840  0.838  0.849 
Quantile estimates of CLP effects; dependent variable: logarithm of sales—lnY
Quantile  F  KB  FM  AAI  MM+KB  MM+AAI 

q=0.1  2.9957  0.2236  5.3012  1.2092  −0.1064  0.9776 
(38.94)  (6.76)  (20.42)  (3.10)  (−0.87)  (1.06)  
q=0.3  3.3242  0.1836  5.8227  1.1615  0.0715  0.7140 
(54.67)  (7.15)  (23.67)  (3.11)  (0.46)  (0.62)  
q=0.5  3.1325  0.1526  6.3549  1.2000  0.1793  0.6736 
(54.19)  (6.31)  (24.58)  (2.57)  (1.09)  (1.37)  
q=0.7  2.9312  0.1036  6.8703  1.2479  0.2270  0.8072 
(56.91)  (4.07)  (26.14)  (2.09)  (1.54)  (2.18)  
q=0.9  2.3203  −0.0176  7.8119  1.6549  0.4523  1.4242 
(34.18)  (−0.37)  (20.12)  (1.36)  (3.36)  (2.92)  
n  31,985  20,490  20,909  13,496  1,806  1,206 
From the view of expected CLP coefficients the conventional quantile estimator, the KoenkerBassett approach, with lnL and lnK as regressors seems best. However, the ranking of the size of the coefficients within column 2 seems unexpected. The smaller the quantile the larger is the estimated coefficient. This could mean that CLPs are advantageous for small firms. However, it is possible that small firms with advantages in productivity due to CLPs have relative high costs to adopt a CLP. In this case the higher propensity of large firms to introduce a CLP is consistent with higher productivity of small firms.
The coefficients of the AbadieAngristImbens approach, a combination of FrölichMelly’s and KoenkerBassett’s model, are also large but not so large as in column 1 and 3.
Possibly, all estimates in column 1–4 of Table 7 are biased and inconsistent. This is the case when CLP and nonCLP firms fundamentally differ due to unobserved variables. To avoid this problem the QTE and the matching approaches are combined. Based on the matching of Table 6 the QTE analogously to column 1–4 in Table 7 can be estimated. In column 5 and 6 only two combinations are presented, namely MM+KB and MM+AAI. We find that the ranking and the size of the coefficients are plausible in column 5. The sizes of the coefficients in column 6 are smaller than in column 4 but the identified causal effects seems still too high. The most important result is the following: the CLP effects are significant for higher quantiles, i.e. for q=0.9 in column 5 and for q=0.7 and q=0.9 in column 6. However, the median estimators (q=0.5) of CLP effects in column 5 and 6 that can be compared with the estimates of column 2 in Table 6 are insignificant. Quantile estimators highlight information that cannot be revealed by other treatment methods, i.e. in Tables 5 and 6. The estimations of the other six combinations (MM+F, MM+FM, NNM+F, NNM+KB, NNM+FM, NNM+AAI)—not presented in the tables—are less plausible. The ranking of the size of coefficients is inconsistent in the light of theoretical and practical experience.
The final discussed treatment method in Sect. 2.2 is the regression discontinuity (RD) design. This approach exploits information of the rules determining treatment. The probability of receiving a treatment is a discontinuous function of one or more variables where treatment is triggered by an administrative definition or an organizational rule.
Testing for structural break of CLP effects between Pr(CLP)≤0.5 and Pr(CLP)>0.5
Coef.  Std.err.  t  P>t  

D_Pr(CLP)  −3.9608  0.5765  −6.87  0.000 
cPr(CLP)  4.3413  0.8390  5.17  0.000 
D_Pr(CLP)⋅cPr(CLP)  11.3838  0.8437  13.49  0.000 
const  18.4375  0.5764  31.99  0.000 
Testing for differences in structural break of CLP effects between Pr(CLP)≤0.5 and Pr(CLP)>0.5 in 2006/07 and 2008/10
Coef.  Std.err.  t  P>t  

T  0.0130  1.3118  0.01  0.992 
D_Pr(CLP)  −4.1045  1.1191  −3.67  0.000 
cPr(CLP)  3.9314  1.6795  2.34  0.019 
D_Pr(CLP)⋅cPr(CLP)  11.6383  1.6884  6.89  0.000 
D_Pr(CLP)⋅T  0.0392  1.3119  0.03  0.976 
cPr(CLP)⋅T  0.2801  1.9520  0.14  0.886 
D_CLP⋅cPr(CLP)⋅T  −0.0662  1.9623  −0.03  0.973 
const  18.5422  1.1190  16.57  0.000 
The ttest analogously to the first example yields weak significance (\(\hat{\gamma}_{1}=13.8667\); t=−1.61; probvalue=0.107). The same procedure for middlesize firms—see Fig. 3—leads to following results.
Apparently, there exists a break. However, the first part of the definition of middlesize firms from the Institut für Mittelstandsforschung is not compatible with the second part. The break of sales at 500 employees is not 50 million Euro per year but around 150 million Euro. Furthermore, the visual result might be due to a nonlinear relationship as the fractional polynomial estimation over the entire range suggests. The ttest does not reject the null (\(\hat{\gamma }_{1}=8977\); t=−0.54; probvalue=0.588). The conclusion from Fig. 2 and 3 is that the graphical representation without the polynomial shape as comparison course and without testing for a structural break can lead to a misinterpretation.
Fuzzy regression discontinuity between East and West German federal states (GFS)—Wald test for structural break of companylevel pact (CLP) effects on sales; jump at GFS>0; dependent variable: logarithm of sales—lnY
Variable  Coef.  Std.err.  z 

lnY jump  −0.8749  0.1234  −7.09 
CLP jump  −0.0571  0.0138  −4.13 
Wald estimator  15.3165  3.5703  4.29 
4 Summary
Many reasons like heteroskedasticity, clustering, basic probability of qualitative regressors, outliers and only partially identified parameters may be responsible that estimated standard errors based on classical methods are biased. Applications show that the estimates under suggested modifications do not always deviate so much from that of the classical methods.
The development of new procedures is ongoing. Especially, the field of treatment methods were extended. It is not always obvious which method is preferable to determine the causal effect. As the results evidently differ it is necessary to develop a framework that helps to decide which method is most appropriated under typically situations. We observe a tendency away from the estimation of average effects. The focus is shifted to distribution topics. Quantile analysis helps to investigate differences between subgroups of the population. This is important because economic measures have not the same influence on heterogeneous establishments and individuals. A combination of quantile regression with matching procedure can improve the determination of the causal effects. Further combinations of treatment methods seem helpful. Differenceindifferences estimates should be linked with matching procedures and regression discontinuity designs. And also regression discontinuity split to quantiles can lead to new insights.
Executive summary
Empirical economics is governed by econometric methods since many years. During the last 20 years contents and major questions have strongly changed in this field. Therefore methods were modified and completely new methods were developed. In comparison to conventional approaches attention is paid to peculiarities of the data, to the specification of the estimating approach, to unobserved heterogeneity, to endogeneity and causal effects. Real data are often not compatible with the assumptions of classical methods. If the latter are used, this can lead to a misinterpretation of the results. We have to ask, whether the results are correct. Is it really possible to interpret the estimated effects as causal or are these only statistical artifacts, which are irrelevant or even counterproductive for policy measures? In order to avoid this, the practitioner has to be familiarwith the wide range of existing methods for the empirical investigations. The user has to know the assumptions of the methods and whether the application allows adequate conclusions at given information. It is necessary to check the robustness of the results by alternative methods and specifications.
This paper presents a selective review of econometric methods and demonstrates by applications that the methods work. In the first part, methodological problems to standard errors and treatment effects are discussed. First, heteroskedasticity and clusterrobust estimates are presented. Second, peculiarities of Bernoulli distributed regressors, outliers and only partially identified parameters are revealed. Approaches to the improvement of standard error estimates under heteroskedasticity differ in the weighting of residuals. Other procedures use the estimated disturbances in order to create a larger number of artificial samples, to obtain better estimates. And again others use nonlinear information. Cluster robust estimates try to solve the Moulton problem. Too low standard errors between observations within clusters are adjusted. This objective is only partially successful. We should be cautious if we compare the effects of dummy variables on an endogenous variable, because the more the mean of dummies deviates from 0.5 the higher are the standard errors. Outliers, i.e. unusual observations that are due to systematic measurement errors or extraordinary events may have enormous influence on the estimates. The suggested approaches to detect outliers vary relating to the measurement concept and do not necessarily demonstrate whether outliers should be accounted for in the empirical analysis. New methods for partially identified parameters may be helpful in this context. Under uncertainty the degree of precision, whether outliers should be eliminated, can be increased.
Four principles to estimate causal effects are in the focus: differenceindifferences (DiD) estimators, matching procedures, quantile treatment effects (QTE) analysis and regression discontinuity design. The DiD models distinguish between conditional and unconditional approaches. The range of the popular matching procedures is wide and the methods evidently differ. They aim to find statistical twins, to homogenize the characteristics of observations from the treatment and the control group. Until now, the application of QTE analysis is relatively rare in practice. Four types of models are important in this context. The user has to decide whether the treatment variable is exogenous or endogenous and whether additional control variables are incorporated or not. Regression discontinuity (RD) designs separate between sharp and fuzzy RD methods. It is distinguished whether an observation is assigned to the treatment or to the control group directly by an observable continuous variable or indirectly via the probability and the mean of treatment, respectively, conditional on this variable.
In the second part of the paper the different methods are applied to estimates of CobbDouglas production functions using IAB establishment panel data. Some heteroskedasticityconsistent estimates show similar results while clusterrobust estimates differ strongly. Dummy variables as regressors with a mean near 0.5 reveal as expected smaller variances of the coefficient estimators than others. Not all outliers have a strong effect on the significance. Methods of partially identified parameters demonstrate more efficient estimates than traditional procedures.
The four discussed treatment effects methods are applied to the question whether companylevel pacts have a significant effect on the production output. Unconditional DiD estimators and estimates without matching display significantly positive effects. In contrast to this result we cannot find the same if conditional DiD or matching estimates based on the Mahalanobis metric are applied. This outcome has more precisely formulated under quantile regression. The higher the quantile the more is the tendency to positive and significant effects. Sharp regression discontinuity estimates display a jump at the probability 0.5 that an establishment has a companylevel pact. No specific influence can be detected during the Great Recession. Fuzzy regression discontinuity estimates reveal that the output effect of companylevel pacts is significantly lower in East than in West Germany. A combined application of the four principles determining treatment effects lead to some interesting new insights. We determine joint DiD and matching estimates as well as that ofthe former together with regressions discontinuity designs. Finally, matching is interrelated to quantile regression.
Kurzfassung
Empirische Wirtschaftsforschung wird schon seit vielen Jahren ganz wesentlich von ökonometrischen Methoden getragen. In den letzten 20 Jahren haben sich Inhalte und Fragestellungen in der empirischen Wirtschaftsforschung stark verändert. Dies hat dazu geführt, dass viele Methoden modifiziert oder völlig neue entwickelt wurden. Gegenüber traditionellen Ansätzen wird verstärkt auf die Besonderheiten der Daten, auf die Spezifikation des zu schätzenden Ansatzes, auf unbeobachtete Heterogenität, auf Endogenität und auf Kausaleffekte geachtet. Reale Daten sind ganz überwiegend nicht vereinbar mit den Annahmen klassischer Methoden. Werden letztere trotzdem eingesetzt, so sind damit häufig Fehlinterpretationen der Ergebnisse verbunden. Zu fragen ist, wie sicher die getroffenen Aussagen sind. Können die Schätzergebnisse tatsächlich kausal interpretiert werden oder haben sich lediglich rein statistische Zusammenhänge ergeben, die für Handlungsanweisungen irrelevant oder gar kontraproduktiv sind? Um dies zu verhindern, muss der Praktiker für seine empirischen Untersuchungen mit dem Spektrum vorhandener Methoden vertraut sein. Er muss wissen, welche Annahmen den jeweiligen Methoden zugrunde liegen und ob deren Anwendung bei gegebener Information geeignete Aussagen zulassen. Er sollte durch den Einsatz vergleichbarer Methoden die Robustheit der Ergebnisse überprüfen.
Einen Überblick über selektiv ausgewählte ökonometrische Methoden zu liefern und anhand von Anwendungen deren Arbeitsweise aufzuzeigen, ist Anliegen dieses Beitrags. Behandelt werden methodische Probleme zu Standardfehlern und TreatmentEffekten. Zunächst geht es um heteroskedastie und clusterrobuste Schätzungen. Es folgt die Erörterung von Problemen bei bernoulliverteilten Regressoren, Ausreißern und partiell identifizierten Parametern. Vorgeschlagene Ansätze zur Verbesserung der Standardfehler bei Vorliegen von Heteroskedastie unterscheiden sich in der Gewichtung der Residuen. Andere Verfahren nutzen die geschätzten Störgrößen aus, um künstlich eine größere Anzahl von Stichproben zu erzeugen, um auf deren Basis eine bessere Schätzung der Standardfehler zu erhalten oder machen sich vorhandene Nichtlinearitäten zunutze. Clusterrobuste Schätzungen zielen darauf ab, das MoultonProblem zu lösen. Zu geringe Standardfehler bei Vorliegen von in Clustern zusammengefassten ähnlichen Beobachtungen werden korrigiert. Dies gelingt in den vorgeschlagenen Ansätzen nur unvollständig. Ein bisher nicht erörtertes Phänomen, dass DummyVariablen als Regressoren zu höheren Standardfehlern führen, je mehr ihr Mittelwert von 0.5 entfernt ist, mahnt zur Vorsicht beim Vergleich hinsichtlich der Präzision des Einflusses verschiedener [0;1]Regressoren. Ausreißer, d. h. ungewöhnliche Beobachtungen, die vor allem auf systematische Messfehler oder ungewöhnliche Ereignisse zurückzuführen sind, können erhebliche Auswirkungen auf die Schätzergebnisse haben. Die vorgeschlagenen Ansätze zur Aufdeckung von Ausreißern variieren hinsichtlich des Messkonzeptes und liefern nicht zwangsläufig Hinweise darauf, ob diese bei der empirischen Analyse zu berücksichtigen sind. Neuere Ansätze für nur partiell identifizierte Parameter können hier hilfreich sein. Erhöhen sie doch den Präzisionsgrad bei Unsicherheit, ob Ausreißer zu entfernen sind oder nicht.
Bei den Verfahren zur Bestimmung von TreatmentEffekten stehen vier Prinzipien im Fokus: DifferenzvonDifferenzenSchätzer, MatchingVerfahren, Analyse von TreatmentEffekte bei Quantilsregressionen und RegressionDiscontinuityAnsätze. Bei den DifferenzvonDifferenzenSchätzern ist zu unterscheiden, ob zusätzliche Kontrollvariablen zu berücksichtigen sind oder nicht. Das Spektrum der in neuerer Zeit sehr beliebten MatchingVerfahren, die darauf abzielen Untersuchungsgruppe und Kontrollgruppe zu homogenisieren, um statistische Zwillinge herauszufiltern, ist einerseits recht umfangreich geworden und weist andererseits methodisch bedeutsame Unterschiede auf. Noch vergleichsweise selten ist bisher der Einsatz von Quantilsregressionen zur Erfassung heterogener Kausaleffekte. Methodisch zu unterscheiden ist dabei, ob die Treatmentvariable als exogen oder endogen aufgefasst wird und ob weitere Kontrollvariablen Berücksichtigung finden oder nicht. Bei den RegressionDiscontinuityAnsätzen ist zu unterscheiden, ob die Zuordnung zur Treatment oder Kontrollgruppe allein auf Basis einer beobachteten kontinuierlichen Variablen erfolgt oder auch nicht beobachtete Variablen herangezogen werden.
Die zunächst rein auf die Methodik abgestellte Diskussion der verschiedenen Verfahren wird im zweiten Teil dieses Beitrags um Anwendungen auf CobbDouglasProduktionsfunktionen unter Verwendung von IABBetriebspaneldaten ergänzt. Verschiedene heteroskedastiekonsistente Schätzverfahren führen zu ähnlichen Resultaten für die Standardfehler. Clusterrobuste Schätzungen weisen deutlichere Abweichungen auf. DummyVariable als Regressoren mit einem Mittelwert in der Nähe von 0.5 führen zu kleineren Varianzen der Koeffizientenschätzer als Dummies mit niedrigeren oder höheren Mittelwerten. Nicht alle Ausreißer haben einen starken Einfluss auf die Signifikanz. Neuere Methoden zur Behandlung des Problems nur partiell identifizierter Parameter führen zu effizienteren Schätzungen als traditionelle Verfahren.
Die vier diskutierten TreatmentEffektVerfahren werden angewandt auf die Frage, ob betriebliche Bündnisse einen signifikanten Effekt auf den Produktionsoutput haben. Im Gegensatz zu unbedingten DifferenzvonDifferenzenSchätzern und Schätzern ohne Matching ergeben sich bei bedingten DifferenzvonDifferenzenSchätzern oder MatchingSchätzern auf Basis der MahalanobisMetrik positive, aber nur insignifikante Effekte. Das letztere Ergebnis muss im Rahmen der QuantilsTreatmenteffektAnalyse spezifiziert werden. Je höher das betrachtete Quantil ist, umso eher besteht eine Tendenz zu positiv signifikanten Effekten. Eine einfache RegressionDiscontinuityAnalyse zeigt einen Strukturbruch bei einer Wahrscheinlichkeit von 0.5, dass ein Betrieb ein betriebliches Bündnis vereinbart hat. Keine speziellen Effekte lassen sich während der großen Rezession 2008/09 ausmachen. Fuzzy RegressionDiscontinuitySchätzungen offenbaren, dass der Outputeffekt betrieblicher Bündnisse in Ostdeutschland signifikant niedriger liegt als in Westdeutschland. Eine kombinierte Anwendung der vier Grundprinzipien zur Ermittlung von Kausaleffekten führt zu interessanten neuen Erkenntnissen. So werden unter anderem DifferenzvonDifferenzen Schätzer mit MatchingVerfahren verknüpft. Erstere werden auch in Verbindung mit RegressionsDiscontinuity erörtert und letztere in Verbindung mit Quantilsregressionen.
Declarations
Acknowledgements
I wish to thank an anonymous reviewer for his constructive suggestions and the participants of the Nutzerkonferenz in Nürnberg for helpful comments.
Authors’ Affiliations
References
 Abadie, A., Angrist, J., Imbens, G.: Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica 70, 91–117 (2002) View ArticleGoogle Scholar
 Ai, C., Norton, E.C.: Interaction terms in logit and probit models. Econ. Lett. 80, 123–129 (2003) View ArticleGoogle Scholar
 Angrist, J., Pischke, J.S.: Mostly Harmless Econometrics—an Empiricist’s Companion. Princeton University Press, Princeton (2009) Google Scholar
 Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics—Identifying Influential Data and Sources of Collinearity. Wiley, New York (1980) View ArticleGoogle Scholar
 Cameron, A.C., Miller, D.L.: Robust inference with clustered data. In: Ullah, A.C., Giles, D.E.A. (eds.) Handbook of Empirical Economics and Finance, pp. 1–28 (2010) View ArticleGoogle Scholar
 Chernozhukov, V., Hong, H., Tamer, E.: Estimation and confidence regions for parameter sets in econometric models. Econometrica 75, 1243–1284 (2007) View ArticleGoogle Scholar
 CribariNeto, F., da Silva, W.D.: A new heteroskedasticityconsistent covariance matrix estimator for the linear regression model. AStA Adv. Stat. Anal. 95, 129–146 (2011) View ArticleGoogle Scholar
 CribariNeto, F., Souza, T.C., Vasconcellos, K.L.P.: Inference under heteroskedasticity and leveraged data. Commun. Stat., Theory Methods 36, 1877–1888 (2007) View ArticleGoogle Scholar
 Firpo, S.: Efficient semiparametric estimation of quantile treatment effects. Econometrica 75, 259–276 (2007) View ArticleGoogle Scholar
 Frölich, M., Melly, B.: Unconditional Quantile Treatment Under Endogeneity. (2012) mimeo Google Scholar
 Goldstein, H.: Multilevel Statistical Models, Kendall’s Library of Statistics, 3rd edn. Arnold, London (2003) Google Scholar
 Guo, S., Fraser, M.W.: Propensity Score Analysis. Sage Publications, Thousand Oaks (2010) Google Scholar
 Hadi, A.S.: Identifying multiple outliers in multivariate data. J. R. Stat. Soc. B 54, 761–771 (1992) Google Scholar
 Hamermesh, D.S.: The craft of labormetrics. Ind. Labor Relat. Rev. 53, 363–380 (2000) View ArticleGoogle Scholar
 Imbens, G.W., Manski, C.F.: Confidence intervals for partially identified parameters. Econometrica 72, 1845–1857 (2004) View ArticleGoogle Scholar
 Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978) View ArticleGoogle Scholar
 Krämer, W.: The cult of statistical significance—what economists should and should not do to make their data talk. J. Appl. Soc. Sci. Stud. 131, 455–468 (2011) Google Scholar
 Leamer, E.E.: Sensitivity analyses would help. Am. Econ. Rev. 75, 308–313 (1985) Google Scholar
 MacKinnon, J.G., White, H.: Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties. J. Econom. 29, 305–325 (1985) View ArticleGoogle Scholar
 Moulton, B.R.: Random group effects and the precision of regression estimates. J. Econom. 32, 385–397 (1986) View ArticleGoogle Scholar
 Moulton, B.R.: Diagnostic tests for group effects in regression analysis. J. Bus. Econ. Stat. 6, 275–282 (1987) Google Scholar
 Moulton, B.R.: An illustration of a pitfall in estimating the effects of aggregate variables on micro units. Rev. Econ. Stat. 72, 334–338 (1990) View ArticleGoogle Scholar
 Puhani, P.: The treatment, the cross difference, and the interaction term in nonlinear ‘differenceindifferences’ models. Econ. Lett. 115, 85–87 (2012) View ArticleGoogle Scholar
 Raudenbush, A.S., Bryk, S.W.: Hierarchical Linear Models, 2nd edn. Sage Publications, Thousand Oaks (2002) Google Scholar
 Romano, J.P., Shaikh, A.M.: Inference for the identified set in partially identified econometric models. Econometrica 78, 169–211 (2010) View ArticleGoogle Scholar
 Rosenbaum, P.R., Rubin, D.P.: Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am. Stat. 39, 33–38 (1985) Google Scholar
 Stoye, J.: More on confidence intervals for partially identified parameters. Econometrica 77, 1299–1315 (2009) View ArticleGoogle Scholar
 Wald, H.: The fitting of straight line if both variables are subject to error. Ann. Math. Stat. 11, 284–300 (1940) View ArticleGoogle Scholar
 White, H.: A heteroskedasticityconsistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–838 (1980) View ArticleGoogle Scholar
 Wooldridge, J.M.: Clustersample methods in applied econometrics. Am. Econ. Rev. 93(PaP), 133–138 (2003) View ArticleGoogle Scholar
 Woutersen, T.: A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. (2009) mimeo Google Scholar
 Ziliak, S., McCloskey, D.: The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice and Lives. University of Michigan Press, Michigan (2008) Google Scholar