20 years of IAB Establishment Panel – Payoffs and Perspectives / 20 Jahre IABBetriebspanel – Erträge und Perspektiven
 Article
 Published:
Estimation of standard errors and treatment effects in empirical economics—methods and applications
Schätzung von Standardfehlern und Kausaleffekten in der empirischen Wirtschaftsforschung – Methoden und Anwendungen
Journal for Labour Market Research volume 47, pages 43–62 (2014)
Abstract
This paper discusses methodological problems of standard errors and treatment effects. First, heteroskedasticity and clusterrobust estimates are considered as well as problems with Bernoulli distributed regressors, outliers and partially identified parameters. Second, procedures to determine treatment effects are analyzed. Four principles are in the focus: differenceindifferences estimators, matching procedures, treatment effects in quantile regression analysis and regression discontinuity approaches. These methods are applied to CobbDouglas functions using IAB establishment panel data.
Different heteroskedasticityconsistent procedures lead to similar results of standard errors. Clusterrobust estimates show evident deviates. Dummies with a mean near 0.5 have a smaller variance of the coefficient estimates than others. Not all outliers have a strong influence on significance. New methods to handle the problem of partially identified parameters lead to more efficient estimates.
The four discussed treatment procedures are applied to the question whether companylevel pacts affect the output. In contrast to unconditional differenceindifferences and to estimates without matching the companylevel effect is positive but insignificant if conditional differenceindifferences, nearestneighbor or Mahalanobis metric matching is applied. The latter result has to be specified under quantile treatment effects analysis. The higher the quantile the higher is the positive companylevel pact effect and there is a tendency from insignificant to significant effects. A sharp regression discontinuity analysis shows a structural break at a probability of 0.5 that a companylevel pact exists. No specific effect of the Great Recession can be detected. Fuzzy regression discontinuity estimates reveal that the companylevel pact effect is significantly lower in East than in West Germany.
Zusammenfassung
Dieser Beitrag diskutiert Möglichkeiten zur Schätzung von Standardfehlern und Kausaleffekten. Zunächst werden heteroskedastie und gruppenrobuste Schätzungen für Standardfehler betrachtet sowie Auffälligkeitenund Probleme bei DummyVariablen als Regressoren, Ausreißern und nur partiell identifizierten Parametern erörtert. Danach geht es um Verfahren zur Bestimmung von Treatmenteffekten. Vier Prinzipien werden hierzuvorgestellt: DifferenzvonDifferenzenSchätzer, Matchingverfahren, Kausaleffekte in der Quantilsregressionsanalyse und Ansätze zur Bestimmung von Diskontinuitäten bei Regressionsschätzungen. Anwendungen erfolgen im zweiten Teil der Arbeit auf CobbDouglasProduktionsfunktionen unter Verwendung von IABBetriebspaneldaten.
Verschiedene heteroskedastiekonsistente Verfahren führen zu recht ähnlichen Ergebnissen bei den Standardfehlern. Clusterrobuste Schätzungen zeigen dagegen deutliche Abweichungen. Dummies als Regressoren mit einem Mittelwert in der Nähe von 0.5 weisen kleinere Varianzen der Koeffizienterschätzer auf als andere. Nicht alle Ausreißer haben einen nennenswerten Einfluss auf die Signifikanz. Neuere Methoden zur Behandlung des Problems von nur partiell identifizierten Parametern führen zu effizienteren Schätzungen.
Die vier diskutierten Verfahren zur Bestimmung der Wirkungen von Maßnahmen werden auf das Problem, ob betriebliche Bündnisse einen signifikanten Einfluss auf den Produktionsoutput haben, angewandt. Im Gegensatz zu nicht konditionalen DifferenzvonDifferenzenSchätzern und Schätzern ohne Matching sind die Effekte betrieblicher Bündnisse bei bedingten DifferenzvonDifferenzen Schätzern und MatchingVerfahren zwar positiv, aber insignifikant. Diese Aussage ist auf Basis der TreatmentQuantilsanalysezu präzisieren. Je höher die Quantile sind, umso größer ist die Wirkung betrieblicher Bündnisse mit einer Tendenz von insignifikanten zu signifikanten Effekten. Die deterministische Regressionsanalyse mit Diskontinuitäten zeigt einen Strukturbruch bei Wahrscheinlichkeit 0.5, dass ein betriebliches Bündnis existiert. Es lassen sich keine spezifischen Effekte während der Rezession 2009 ausmachen. Schätzungen im Rahmen stochastischer Diskontinuitätsansätze offenbaren, dass die Wirkungen betrieblicher Bündnisse in Ostdeutschland signifikant niedriger ausfallen als in Westdeutschland.
1 Introduction
Contents, questions and methods have changed in empirical economics in the last 20 years. Many methods were developed in the past but the application in empirical economics follows with a lag. Some methods are wellknown but have experienced only little attention. New approaches focus on characteristics of the data, on modified estimators, on correct specifications, on unobserved heterogeneity, on endogeneity and on causal effects. Real data sets are not compatible with the assumptions of classical models. Therefore, modified methods were suggested for the estimation and inference.
The road map of the following considerations are four hypotheses where the first two and the second two belong together:

(1)
Significance is an important indicator in empirical economics but the results are sometimes misleading.

(2)
Assumptions’ violation, clustering of the data, outliers and only partially identified parameters are often the reason of wrong standard errors using classical methods.

(3)
The estimation of average effects is useful but subgroup analysis and quantile regressions are important supplements.

(4)
Causal effects are of great interest but the determination is based on disparate approaches with varying results.
In the following some econometric methods are developed, presented and applied to CobbDouglas production functions.
2 Econometric methods
2.1 Significance and standard errors in regression models
The working horse in empirical economics is the classical linear model
The coefficient vector β is estimated by ordinary least squares (OLS)
and the covariance matrix by
where X is the design matrix and \(\hat{\sigma}^{2}\) the estimated variance of the disturbances. The influence of a regressor, e.g. x _{ k }, on the regressand y is called significant at a 5 percent level if \(t=\hat{\beta}_{k}/\sqrt {\hat{V}(\hat{\beta}_{k})}>t_{0.975}\). In empirical papers this result is often documented by an asterisk and implicitly interpreted as a good one, while insignificance is a negative signal. Ziliak and McCloskey (2008) and Krämer (2011) have criticized this procedure although the analysis is extended by robustness tests in many investigations. Three types of mistakes can lead to a misleading interpretation:

(1)
There does not exist any effect but due to technical inefficiencies a significant effect is reported.

(2)
The effect is small but due to the precision of the estimates a significant effect is determined.

(3)
There exists a strong effect but due to the variability of the estimates the statistical effect cannot be detected.
The consequence cannot be to neglect the instrument of significance. But what can we do? The following proposals may help to clarify why some standard errors are high and others low, why some influences are significant and others not, whether alternative procedures can reduce the danger of one of the three mistakes:

Compute robust standard errors.

Analyze whether variation within clusters is only small in comparison with variation between the clusters.

Check whether dummies as regressors with high or low probability are responsible for insignificance.

Test whether outliers induce large standard errors.

Consider the problem of partially identified parameters.

Detect whether collinearity is effective.

Investigate alternative specifications.

Use subsamples and compare the results.

Execute sensitivity analyses (Leamer 1985).

Employ the sniff test (Hamermesh 2000) in order to detect whether econometric results are in accord with economic plausibility.
2.1.1 Heteroskedasticityrobust standard errors
OLS estimates are inefficient or biased and inconsistent if assumptions of the classical linear model are violated. We need alternatives which are robust to the violation of specific assumptions. In empirical papers we find often the hint that robust standard errors are displayed. This is imprecise. In most cases this means only heteroskedasticityrobust. This should be mentioned and also that the estimation is based on White’s approach. If we know the type of heteroskedasticity, a transformation of the regression model should be preferred, namely
where i=1,…,n. Typically, the individual variances of the error term are unknown. In the case of unknown and unspecific heteroscedasticity White (1980) recommends the following estimation of the covariance matrix
Such estimates are asymptotically heteroscedasticityrobust. In many empirical investigations this robust estimator is routinely applied without testing whether heteroskedasticity exists. We should stress that those estimated standard errors are more biased than conventional estimators if residuals are homoskedastic. As long as there is not too much heteroskedasticity, robust standard errors are also biased downward. In the literature we find some suggestions to modify this estimator, namely to weight the squared residuals \(\hat{u}_{i}^{2}\):
where j=2,3,4, c _{ ii } is the main diagonal element of X′(X′X)^{−1} X and δ _{ j }=1;2;min[γ _{1},(nc _{ ii })/K]+min[γ _{2},(nc _{ ii })/K], γ _{1} and γ _{2} are real positive constants.
The intention is to obtain more efficient estimates. It can be shown for hc _{2} that under homoskedasticity the mean of \(\hat{u}_{i}^{2}\) is the same as σ ^{2}(1−c _{ ii }). Therefore, we should expect that the hc _{2} option leads under homoskedasticity to better estimates in small samples than the simple hc _{1} option. Then \(E(\hat{u}_{i}^{2}/(1c_{ii}))\) is σ ^{2}. The second correction is presented by MacKinnon and White (1985). This is an approximation of a more complicated estimator which is based on a jackknife estimator—see Sect. 2.1.2. Applications demonstrate that the standard error increases started with OLS via hc _{1}, hc _{2} to the hc _{3} option. Simulations, however, do not show a clear preference. As one cannot be sure which case is the correct one, a conservative choice is preferable (Angrist and Pischke 2009, p. 302). The estimator should be chosen that has the largest standard error. This means the null hypothesis (H _{0}: no influence on the regressand) keeps up longer than with other options.
CribariNeto and da Silva (2011) suggest γ _{1}=1 and γ _{2}=1.5 in hc _{4}. The intention is to weaken the effect of influential observations compared with hc _{2} and hc _{3} or in other words to enlarge the standard errors. In an earlier version (CribariNeto et al. 2007) a slight modification is presented: \(hc_{4}^{*}=1/(1c_{ii})^{\delta_{4*}}\), where δ _{4∗}=min(4,nc _{ ii }/K). It is argued that the presence of high leverage observations is more decisive for the finitesample behavior of the consistent estimators of \(V(\hat{\beta})\) than the intensity of heteroskedasticity, hc _{4} and hc _{4∗} aim at discounting for leverage points—see Sect. 2.1.5—more heavily than hc _{2} and hc _{3}. The same authors formulate a further estimator
where \(\delta_{5}=\min(\frac{nc_{ii}}{K},\max(4,\frac{nkc_{ii, \max}}{K}))\), k is a predefined constant, where k=0.7 is suggested. In this case squared residuals are affected by the maximal leverage.
2.1.2 Resampling procedures
Other possibilities to determine the standard error are the jackknife and the bootstrap estimator. These are resampling procedures, which construct subsamples with n−1 observations in the jackknife case. Sequentially, one observation is eliminated. The former methods compare the estimated coefficients of the total sample size \(\hat{\beta}\) with those after eliminating one observation \(\hat{\beta}_{i}\). The jackknife estimator of the covariance matrix is
There exist many ways to bootstrap regression estimates. The basic idea is assume that the sample with n elements is the population and B times m elements (sampling with replacement) are drawn, where m≤n and m>n is feasible. If \(\hat{\beta}_{\mathrm{boot}}'=(\hat {\beta}(1)_{m}', \ldots,\hat{\beta}(B)_{m}')\) are the bootstrap estimators of the coefficients the asymptotic covariance matrix is
where \(\hat{\beta}\) is the estimator with the original sample size n. Alternatively, \(\hat{\beta}\) can be substituted by \(\bar{\beta}=1/B\sum \hat{\beta}(b)_{m}\). Bootstrap estimates of the standard error are especially helpful when it is difficult to compute standard errors by conventional methods, e.g. 2SLS estimators under heteroskedasticity or clusterrobust standard errors when many small clusters or only short panels exist. The jackknife can be viewed as a linear approximation of the bootstrap estimator. A further popular way to estimate the standard errors is the delta method. This approach is especially used for nonlinear functions of parameter estimates \(\hat{\gamma}=g(\hat{\beta})\). An asymptotic approximation of the covariance matrix of a vector of such functions is determined. It can be shown that
where γ _{0} is the vector of the true values of γ, G _{0} is an l×K matrix with typical element ∂g _{ i }(β)/∂β _{ j }, evaluated at β _{0}, and V ^{∞} is the asymptotic covariance matrix of \(n^{1/2}(\hat{\beta} \beta_{0})\).
2.1.3 The Moulton problem
The variance of a regressor is low if this variable strongly varies between groups but only little within groups (Moulton 1986, 1987, 1990). This is especially the case if industry, regional and macroeconomic variables are introduced in a microeconomic model or panel data are considered. In a more general context this is called the problem of cluster sampling. Individuals or establishments are sampled in groups or clusters. Consequence may be a weighted estimation that adjust for differences in sampling rates. However, weighting is not always necessary and estimates may understate the true standard errors. Some empirical investigations note that clusterrobust standard errors are displayed but do not mention the cluster variable. If panel data are used then this is usually the identification variable of the individuals or firms. In many specifications more than one cluster variable, e.g. a regional and an industry variable, is incorporated. Then it is misleading if the cluster variable is not mentioned. Furthermore, then a sequential determination of a clusterrobust correction is not qualified if there is a dependency between the cluster variables. If we can assume that there is a hierarchy of the cluster variables then a multilevel approach can be applied (Raudenbush and Bryk 2002; Goldstein 2003). Cameron and Miller (2010) suggest a twoway clustering procedure. The covariance matrix can be determined by
when the three components are computed by
Different ways of clustering can be used. Clusterrobust inference asymptotics are based on G→∞. In many applications there are only a few clusters. In this case \(\hat{u}_{g}\) has to be modified. One way is the following transformation
Further methods and suggestions in the literature are presented by Cameron and Miller (2010) and Wooldridge (2003).
A simple and extreme example shall demonstrate the cluster problem.
Example
Assume a data set with 5 observations (n=5) and 4 variables (V1–V4).
i  V1  V2  V3  V4 

1  24  123  −234  −8 
2  875  87  54  3 
3  −12  1234  −876  345 
4  231  −87  −65  9808 
5  43  34  9  −765 
The linear model
is estimated by OLS using the original data set (1M). Then the data set is doubled (2M), quadrupled (4M) and octuplicated (8M). The following OLS estimates result.
\(\hat{\beta}\)  1M  2M  4M  8M  

\(\hat{\sigma}_{\hat{\beta}}\)  \(\hat{\sigma}_{\hat{\beta}}\)  \(\hat{\sigma}_{\hat{\beta}}\)  \(\hat{\sigma}_{\hat{\beta}}\)  
V2  1.7239  1.7532  0.7158  0.4383  0.2922 
V3  2.7941  2.3874  0.9747  0.5969  0.3979 
V4  0.0270  0.0618  0.0252  0.0154  0.0103 
const  323.2734  270.5781  110.463  67.64452  45.0963 
The coefficients of 1M to 8M are the same, however, the standard errors decrease if the same data set is multiplied. Namely, the variance is only 1/6, 1/16 and 1/36 of the original variance. The general relationship can be shown as follows. For the original data set (X _{1}) the covariance matrix is
Using X _{1}=⋯=X _{ F } the F times enlarged data set with the design matrix \(X'=:(X_{1}'\cdots X_{F}')\) leads to
and
K is the number of regressors including the constant term, n is the number of observations in the original data set (number of clusters), F is the number of observations within a cluster. In the numerical example with F=8, K=4, n=5 the Moulton factor MF that indicates the deflation factor of the variance is
This is exactly the same as it was demonstrated in the numerical example. Analogously the estimated values 1/6 and 1/16 can be determined. As the multiplying of the data set does not add any further information to the simple original data set not only the coefficients but also the standard errors should be the same. Therefore, it is necessary to correct the covariance matrix. Statistical packages, e.g. Stata, supply clusterrobust estimates
where C is the number of clusters. In our specific case this is the number of observations n. This approach implicitly assumes that F is small and n→∞. If this assumption does not hold a degreesoffreedom correction
is helpful. \(\mathit{df}_{C}\cdot\hat{V}(\hat{\beta})_{C}\) is the default option in Stata and corrects for the number of clusters in practice being finite. Nevertheless, this correction eliminates only partially the underestimated standard errors. In other words, the corrected tstatistic of the regressor x _{ k } is larger than that of \(\hat{\beta }_{k}/\sqrt{\hat{V}_{1k}}\).
2.1.4 Large standard errors of dichotomous regressors with small or large mean
Another problem with estimated standard errors can be induced by Bernoulli distributed regressors. Assume a simple twovariable classical regression model
D is a dummy variable and the variance of \(\hat{b}\) is
where
If \(s_{D}^{2}\) is determined by \(\bar{D}=(nD=1)/n\) we find that \(\bar{D}\) is at most 0.5. \(V(\hat{b})\) is minimal at given n and σ ^{2} when the sample variance of D reaches the maximum, if \(\bar{D}=0.5\). This result holds only for inhomogeneous models.
Example
An income variable (Y=Y _{0}/10^{7}) with 53,664 observations is regressed on a Bernoulli distributed random variable RV. The coefficient β _{1} of the linear model Y=β _{0}+β _{1} RV+u is estimated by OLS, where alternative values of the mean of RV (\(\overline{RV}\)) are assumed (0.1,0.2,…,0.9)
Y  \(\hat{\beta}_{1}\)  std.err. 

\(\overline{RV}=0.1\)  −0.3727  0.6819 
\(\overline{RV}=0.2 \)  −0.5970  0.5100 
\(\overline{RV}=0.3\)  −0.4768  0.4455 
\(\overline{RV}=0.4\)  0.3068  0.4170 
\(\overline{RV}=\boldsymbol{0.5}\)  0.1338  0.4094 
\(\overline{RV}=0.6\)  0.0947  0.4187 
\(\overline{RV}=0.7\)  −0.0581  0.4479 
\(\overline{RV}=0.8\)  −0.1860  0.5140 
\(\overline{RV}=0.9\)  −0.1010  0.6827 
This example confirms the theoretical result. The standard error is smallest if \(\overline{RV}=0.5\) and increases systematically if the mean of RV decreases or increases. An extension to multiple regression models seems possible—see applications in the Appendix, Tables 11, 12, 13, 14. The more \(\bar{D}\) deviates from 0.5, the larger or smaller is the mean of D, the higher is the tendency to insignificant effects. A caveat is necessary. The conclusion that the tvalue of a dichotomous regressor D _{1} is always smaller than that of D _{2}, when V(D _{1})>V(D _{2}), is not unavoidable. The basic effect of D _{1} on y may be larger than that of D _{2} on y. The theoretical result aims on specific variables and not on the comparison between regressors. In practice, significance is determined by \(t=\hat{b}/\sqrt{\hat{V}(\hat {b})}\). However, we do not find a systematic influence of \(\hat{b}\) on t if \(\bar{D}\) varies. Nevertheless, the random differences in the influence of D on y can dominate the \(\bar{D}\) effect via \(s_{D}^{2}\). The comparison of Table 13 with Table 14 shows that the influence of a works council (WOCO) is stronger than that of a companylevel pact (CLP). The coefficients of the former regressor are larger and the standard errors are lower than that of the latter regressor so that the tvalues are larger. In both cases the standard errors increase if the mean of the regressor is reduced. The comparison of line 1 in Table 13 with line 9 in Table 14, where the mean of CLP and WOCO is nearly the same, makes clear that the stronger basic effect of WOCO on lnY dominates the mean reduction effect of WOCO. The tvalue in line 9 of Table 14 is smaller than that in line 1 of Table 14 but still larger than that in line 1 of Table 13. Not all deviations of the mean of a dummy D as regressor from 0.5 induce the described standard error effects. A random variation of \(\bar{D}\) is necessary. An example, where this is not the case, is matching—see Sect. 2.2 and the application in Sect. 3. \(\bar{D}\) increases due to the systematic elimination of those observations with D=0 that are dissimilar to those of D=1 in other characteristics.
2.1.5 Outliers and influential observations
Outliers may have strong effects on the estimates of the coefficients, of the dependent variable and on standard errors and therefore on significance. In the literature we find some suggestions to measure outliers that are due to large or small values of the dependent variable or on the independent variables. Belsley et al. (1980) use the main diagonal elements c _{ ii } of the hat matrix C=X(X′X)^{−1} X′ to determine the effects of a single observation on the coefficient estimator \(\hat{\beta}\), on the estimated endogenous variable \(\hat {y}_{i}\) and on the variance \(\hat{V}(\hat{y})\). The higher c _{ ii }, the higher is the difference between the estimated dependent variable with and without the ith observation. A rule of thumb orients on the relation
An observation i is called an influential observation with a strong leverage if this inequality is fulfilled. The effects of the ith observation on \(\hat{\beta}\), \(\hat{y}\) and \(\hat{V}(\hat{\beta})\) and the rules of thumb can be expressed by
If the inequalities are fulfilled, this indicates a strong influence of observation i where (i) means that observation i is not considered in the estimates. The determination of an outlier is based on externally studentized residuals
Observations which fulfill the inequality \(\hat{u}^{*}_{i}>t_{1\alpha /2;nK1}\) are called outliers. Alternatively, a mean shift outlier model can be formulated
where
Observation j has a statistical effect on y if δ is significantly different from zero. The estimated tvalue is the same as \(\hat{u}^{*}_{j}\). This procedure does not separate whether the outlier j is due to unusual y or unusual xvalues.
Hadi (1992) proposes an outlier detection with respect to all regressors. The decision whether the design matrix X contains outliers is based on an elliptical distance
where intuitively the classical choices of c and W are the arithmetic mean (\(\bar{x}\)) and the inverse of the sample covariance matrix (S ^{−1}) of the estimation function of β, respectively, so that the Mahalanobis distance follows. If
observation i is identified as an outlier. As \(\bar{x}\) and S react sensitive to outliers it is necessary to estimate an outlierfree mean and sample covariance matrix. For this purpose, only outlierfree observations are considered to determine \(\bar{x}\) and S. Another way to avoid the sensitivity problem is to use more robust estimators of the location and covariance matrix, e.g. the median but not the mean is robust to outliers. Finally, an outlier vector MOD (multiple outlier dummy) instead of A is incorporated in the model in order to test whether the identified outlier observations have a significant effect on the dependent variable. A second problem is whether we should eliminate all outliers or only some of them or no outlier. The situation is obvious if an outlier is induced by measurement errors. Then we should eliminate this observation if we have no information to correct the error. Typically, however, we cannot be sure that an anomalous value is due to measurement errors. Insofar, the correct estimation is based between the two extremes: all outliers are considered or all outliers are eliminated. A solution is presented in the next subsection.
2.1.6 Partially identified parameters
Assume that some observations are unknown or not exactly measured. Consequence is that a parameter cannot exactly be determined but only within a range. The outlier situation leads to such a partial identification problem. There exist many other similar constellations.
Example
The share of unemployed persons is 8 % but 5 % have not answered to the question of the employment status. Therefore, the unemployment rate can only be calculated within certain limits, namely between the two extremes:

all persons who have not answered are employed

all persons who have not answered are unemployed.
In the first case the unemployment rate is 7.6 % and in the second case 12.6 %.
The main methodological focus of partially identified parameters is the search for the best statistical inference. Chernozhukov et al. (2007), Imbens and Manski (2004), Romano and Shaikh (2010), Stoye (2009) and Woutersen (2009) have discussed solutions.
If Θ _{0}=[θ _{ l },θ _{ u }] describes the lower and the upper bound based on the two extreme situations Stoye (2009) develops the following confidence interval
where \(\hat{\sigma}_{l}\) is the standard error of the estimation function \(\hat{\theta}_{l}\). c _{ α } is chosen by
where Δ=θ _{ u }−θ _{ l }. As Δ is unknown, the interval has to be estimated (\(\hat{\Delta}\)).
2.2 Treatment evaluation
The objective of treatment evaluation is the determination of causal effects of economic measures. The simplest form to measure the effect is to estimate α in the linear model
where D is the intervention variable and measured by a dummy: 1 if an individual or an establishment is assigned to treatment; 0 otherwise. Typically, this is not the causal effect. An important reason for this failure are unobserved variables that influence y and D, when D and u correlate.
In the last 20 years a wide range of methods was developed to determine the “correct” causal effect. Which approach should be preferred depends on the data, the behavior of the economic agents and the assumptions of the model. The major difficulty is that we have to compare an observed situation with an unobserved situation. Depending on the available information the latter is estimated. We have to ask what would occur if not D=1 but D=0 (treatment on the treated) would take place. This counterfactual is unknown and has to be estimated. Inversely, if D=0 is observable we can search for the potential result under D=1 (treatment on the untreated). A further problem is the fixing of the control group. What is the meaning of “otherwise” in the definition of D? Or in other words: What is the causal effect of an unobserved situation? Should we determine the average causal effect or only that of a subgroup?
Neither a beforeafter comparison \((\bar{y}_{1}D=1)(\bar{y}_{0}D=1)\) nor a comparison of \((\bar{y}_{t}D=1)\) and \((\bar{y}_{t}D=0)\) in crosssection is usually appropriate. Differenceindifferences estimators (DiD), a combination of these two methods, are very popular in applications
The effect can be determined in the following unconditional model
where T=1 means a period that follows the period of the measure (D=1). T=0 is a period before the measure takes place. In this approach \(\hat{b}_{3}=\bar{\Delta}_{1}\bar{\Delta}_{0}\) is the causal effect. The equation can be extended by further regressors X. This is called a conditional DiD estimator. Nearly all DiD investigations neglect a potential bias in standard error estimates induced by serial correlation. A further problem results under endogenous intervention variables. Then an instrumental variables estimator should be employed avoiding the endogeneity bias. This procedure will be considered in the quantile regression analysis. If the dependent variable is a dummy a nonlinear estimator has to be applied. Suggestions are presented by Ai and Norton (2003) and Puhani (2012).
Matching procedures were developed with the objective to find a control group that is very similar to the treatment group. Parametric and nonparametric procedures can be employed to determine the control group. Kernel, inverse probability, radius matching, local linear regression, spline smoothing or trimming estimators are possible. Mahalanobis metric matching with or without propensity scores and nearest neighbor matching with or without caliper are typical procedures—see e.g. Guo and Fraser (2010). The Mahalanobis distance is defined by
where u (v) is a vector that incorporates the values of matching variables of participants (nonparticipants) and S is the empirical covariance matrix from the full set of nontreated participants.
An observed or artificial statistical twin can be determined to each participant. The probability of all nonparticipants to participate on the measure is calculated based on probit estimates (propensity score). The statistical twin j of a participant i is that who has a propensity score (ps _{ j }) nearest to that of the participant. The absolute distance between i and j may not exceed a given value ϵ
where ϵ is a predetermined tolerance (caliper). A quarter of a standard deviation of the sample estimated propensity scores is suggested as the caliper size (Rosenbaum and Rubin 1985). If the control group is identified the causal effect can be estimated using the reduced sample (treatment observations and matched observations). In applications α from the model y=Xβ+αD+u or b _{3} from the DiD approach is determined as causal effect. Both estimators implicitly assume that the causal effect is the same for all subgroups of individuals or firms and that no unobserved variables exist that are correlated with observed variables. Insofar matching procedures suffer from the same problem as OLS estimators.
If the interest is to detect whether and in which amount the effects of intervention variables differ between the percentiles of the distribution of the objective variable y a quantile regression analysis is an appropriate instrument. The objective is to determine quantile treatment effects (QTE). The distribution effect of a measure can be estimated by the difference Δ of the dependent variable with (y _{1}) and without (y _{0}) treatment (D=1; D=0) separate for specific quantiles Q ^{τ} where 0<τ<1
The empirical distribution function of an observed situation and that of the counterfactual is identified. From the view of modeling four major cases are developed in the literature that differ in the assumptions. The measure is assumed exogenous or endogenous and the effect on y is unconditional or conditional analogously to DiD.
In case (1) the quantile treatment effect \(Q_{y^{1}}^{\tau}Q_{y^{0}}^{\tau }\) is estimated by
where j=0;1, q _{ j }=α _{0}+α _{1}(DD=j), ρ _{ τ }=a(τ−1(a≤0)) is a check function; a is a real number. The weights are
The estimation is characterized by two stages. First, the propensity score is determined by a large number of regressors X via a nonparametric method—\(\hat{p}(X)\). Second, in \(Q_{y^{j}}^{\tau}\) the probability p(X) is substituted by \(\hat{p}(X)\).
Case (2) follows Koenker and Bassett (1978).
has to be minimized with respect to α and β, where τ is given. In other words,
where j=0;1, q _{ j }=α(DD=j)+x′β.
The method of case (3) is developed by Frölich and Melly (2012). Due to the endogeneity of the intervention variable D, an instrumental variables estimator is used with only one instrument Z and this is a dummy. The quantiles follow from
where j=0;1, q _{ j }=α _{0}+α _{1}(DD=j), c means complier. The weights are
Abadie et al. (2002) investigate case (4) and suggest a weighted linear quantile regression. The estimator is
where the weights are
Regression discontinuity (RD) design allows to determine treatment effects in a special situation. This approach uses information on institutional and legal regulations that are responsible that changes occur in the effects of economic measures. Thresholds are estimated indicating discontinuity of the effects. Two forms are distinguished: sharp and fuzzy RD. Either the change of the status is exactly effective at a fixed point or it is assumed that the probability of a treatment change or the mean of a treatment change is discontinuous.
In the case of sharp RD individuals or establishments (i=1,…,n) are assigned to the treatment or the control group on the base of the observed variable S. The latter is a continuous or an ordered categorial variable with many parameter values. If variable S _{ i } is not smaller than a fixed bound \(\bar{S}\) then i belongs to the treatment group (D=1)
The following graph based on artificial data with n=40 demonstrates the design. Assuming we know that an institutional rule changes the conditions if \(S>\bar{S}=2.5\) and we want to determine the causal effect induced by the adoption of the new rule. This can be measured by the difference of the two estimated regressions at \(\bar{S}\).
In a simple regression model y=β _{0}+β _{1} D+u the OLS estimator of β _{1} would be inconsistent when D and u correlate. If, however, the conditional mean E(uS,D)=E(uS)=f(S) is additionally incorporated in the outcome equation (y=β _{0}+β _{1} D+f(S)+ϵ, where ϵ=y−E(yS,D)), the OLS estimator of β _{1} is consistent. Assume f(S)=β _{2} S, the estimator of β _{1} corresponds to the difference of the two estimated intercepts of the parallel regressions
The sharp RD approach identifies the causal effect by distinguishing between the nonlinear function due to the discontinuous character and the smoothed linear function. If, however, a nonlinear function of the general type f(S) is given, modifications have to be regarded.
Assume, the true function f(S) is a polynomial of pth order
but two linear models are estimated, then the difference between the two intercepts, interpreted as the causal effect, is biased. What looks like a jump is in reality a neglected nonlinear effect.
Another strategy is to determine the treatment effect exactly at the fixed discontinuity point \(\bar{S}\) assuming a local linear regression. Two linear regressions are considered
where y _{ j }=E(yD=j) and j=0;1. In combination with
follows
The linear regression
can be estimated, where \(\tilde{u}=u_{0}+D(u_{1}u_{0})\). This looks like the DiD estimator but now \(\gamma_{1}=E(y_{1}S=\bar{S})E(y_{0}S=\bar{S})\) and not γ _{3} is of interest. The estimated coefficient \(\hat{\gamma _{1}}\) is a global but not a localized average treatment effect.
The localized average follows if a small interval around \(\bar{S}\) is modeled, i.e. \(\bar{S}\Delta S <S_{i}<\bar{S} + \Delta S\). The treatment effect corresponds to the difference of the two former determined intercepts, restricted to \(\bar{S}<S_{i}<\bar{S}+\Delta S\) on the one hand and to \(\bar{S}\Delta S <S_{i}<\bar{S}\) on the other hand.
A combination of the latter linear RD model with the DiD approach leads to an extended interaction model. Again, two linear regressions are considered
where the first index of γ _{ jt } with j=0;1 refers to the treatment and the second index with t=0;1 refers to the period. In contrast to the pure RD model, where y _{ j } and j=0;1 is considered, now the index of y is a time index, i.e. y _{ T } and T=0;1. Using
follows
Now, it is possible to determine whether the treatment effect varies between T=1 and T=0. The difference follows by a DiD approach
under the assumption that the disturbance term does not change between the periods. The hypothesis of a timeinvariant break cannot be rejected if DT and \(D(S\bar{S})T\) have no statistical influence on y.
The fuzzy RD assumes that the propensity score function of treatment P(D=1S) is discontinuous with a jump in \(\bar{S}\)
where it is assumed that \(g_{1}(\bar{S})>g_{0}(\bar{S})\). Therefore, treatment in \(S_{i}\ge\bar{S}\) is more likely. In principle, the functions g _{1}(S _{ i }) and g _{0}(S _{ i }) are arbitrary, e.g. a polynomial of pth order can be assumed but the values have to be within the interval [0;1] and different values in \(\bar{S}\) are necessary.
The conditional mean of D that depends on S is
where \(T_{i}=1(S_{i}\ge\bar{S})\) is a dummy indicating the point where the mean is discontinuous. If a polynomial of pth order is assumed the interaction variables \(S_{i}T_{i}, S_{i}^{2}T_{i}\cdots S_{i}^{p}T_{i}\) and the dummy T _{ i } are instruments of D _{ i }. The simplest case is to use only T _{ i } as an instrument if g _{1}(S _{ i }) and g _{0}(S _{ i }) are discriminable constants.
We can determine the treatment effect around \(\bar{S}\)
The empirical analogon is the Wald (1940) estimator that was first developed for the case of measurement errors
QTE and RD analysis allow the determination of variable causal effects with a different intention. A further possibility is a separate estimation for subgroups, e.g. for industries or regions.
3 Applications: Some New Estimates of CobbDouglas Production Functions
This section presents some estimates of production functions, where IAB establishment panel data are used. The empirical analysis is restricted to the period 2006–2010. The decision to start with 2006 is the following: in this year information on company levelpacts (CLPs) were collected in the IAB establishment panel for the first time and many of the following applications deal with CLPs. Methods of Sect. 2 are applied. The intention of Sect. 3 is to illustrate that the discussed methods work with implemented STATA programmes. It is not discussed whether the applied methods are best for the given data set and the substantial problems. From a didactical perspective the paper is always concerned with only one issue and different suggestions to solve the problem are compared. The results can be found in Tables 1–10.
Table 1 focus on alternative estimates of standard errors—see Sects. 2.1.1–2.1.3—of CobbDouglas production functions (CDF) in the logarithm representation with the input factors lnL and lnK. The estimation of conventional standard errors can be found for comparing in Table 3, column 1. The small standard deviations and therefore the large tvalues are remarkable. Though the clusterrobust standard errors in Table 1, column 5 are larger, they are still by far too low. This is due to unobserved heterogeneity. Fixed effects estimates can partially solve this problem as can be seen in the Appendix, Table 15.
The estimated coefficients in column 1–3 and 5 of Table 1 are identical. Estimates with hc2 and hc4—not presented in the tables—deviate only slightly from those with hc1. This could mean that it is not necessary to distinguish between hc1 to hc4. However, one could guess that stronger differences are observed if the sample is small. Empirical investigations, where only 10, 1 and 0.1 percent of the original sample size is used, do not support this presumption. The jackknife estimates of standard errors and tvalues are also not so far away from the heteroskedasticityconsistent estimates with hc1 and hc3. The nearness to estimates with hc3 is plausible because the latter is only a slightly simplified version of what one gets by employing the jackknife technique. Furthermore, Table 1 demonstrates that bootstrap and clusterrobust estimates of the tvalues differ strongest of the input factor labor (lnL), measured by the number of employees in the firm. Capital (lnK), approximated by the sum of investments of the last four years, has evidently larger clusterrobust estimates of standard errors than that from the other methods.
An extended version of the CobbDouglas function in Table 1 is presented in Table 2. The latter estimates show smaller coefficients and smaller tvalues of the input factors labor and capital. The major intention of Table 2 is to demonstrate that also in this example there is—as maintained in Sect. 2.1.4—a clear relationship between \(\bar{D}\), the mean of a dummy as independent variable, and the estimated standard errors. The nearer \(\bar{D}\) to 0.5 the smaller is the standard error. The results in Table 2 cannot be generalized in contrast to that in Table 11 because the standard error of a dummy is not only determined by the mean. Each regressor has a specific influence on the dependent variable independent of the regressor’s variance.
Outliers—see Sect. 2.1.5—may have strong effects on coefficient and standard error estimates. However, estimates do not react sensitively to all outliers. This can be demonstrated if the results with and without outliers are compared. Table 3 presents an example for simple CobbDouglas functions in column 1 and 2. An observation in column 2 is defined as an outlier if \(\hat{u}^{*}>3\). The coefficients in column 1 and 2 are very similar while the differences of the standard errors become more evident. The differences are enlarged under a wider definition of an outlier, e.g. if 3 is substituted by 2. The picture becomes also clearer if observations with high leverage are eliminated—see column 3. Coefficients and standard errors in column 1 and 3 reveal a clear disparity for both input factors. This result is not unexpected but the consequence is ambiguous. Is column 1 or 3 preferable? If all observations with strong leverages are due to measurement errors the decision speaks in favor of the estimates in column 3. As no information is available to this question both estimates may be useful.
Column 4 extends the consideration to outliers following Hadi (1992).The squared difference between individual regressor values and the mean for all regressors—here lnL and lnK—is determined for each observation weighted by the estimated covariance matrix—see Sect. 2.1.5. The decision whether establishment i is an outlier is now based on the Mahalanobis distance. MOD, the vector of multiple outlier dummies (MOD_{ i }=1 if i is an outlier; =0 otherwise), is incorporated as an additional regressor. The estimates show that outliers have a significant effect on the output variable lnY. The coefficients and the tvalues in column 2 and 4 are very similar. This is a hint that the outliers defined via \(\hat{u}^{*}\) are mainly determined by large deviations of the regressor values. From \(\hat {u}^{*}\) it is unclear whether the values of the dependent variable or the independent variables are responsible for the fact that an observation is an outlier.
As it is not obvious whether the outliers are due to measurement errors that should be eliminated or whether these are unusual but systematically induced observations that should be accounted for, parameters can only partially be identified. Therefore, in Table 4 confidence intervals are not only presented for the two extreme cases (column 1: all outliers are induced by specific events; column 2: all outliers are due to random measurement errors). Additionally, in column 3 the confidence interval (CI) based on Stoye’s method is displayed. The results show that the lower and upper coefficient estimates of lnL by Stoye lies within the estimated coefficients in column 1 and 2. The upper coefficient is nearer to that of column 2 and the lower is nearer to column 1. We do not find the same pattern for input factor lnK. In this case Stoye’s \(\hat{\beta}_{\ln K;u}\) deviates more from that in column 2 than in column 1. And for \(\hat{\beta}_{\ln K;l}\) we find the opposite result. Stoye’s intervals (\(\Delta\hat{\beta}_{\ln L}= \hat{\beta}_{\ln L;u}\hat{\beta}_{\ln L;l}\); \(\Delta\hat{\beta}_{\ln K}= \hat{\beta}_{\ln K;u}\hat{\beta}_{\ln K;l}\)) are shorter than that with or without outliers. In other words, the estimates are more precise.
The next tables present estimates of alternative methods in order to determine causal effects. First, the differenceindifferences (DiD) approach is estimated. Results can be found in Table 5. The coefficient of the interaction variable CLP∗D2009 in column 1 is significantly different from zero. This means that sales between firms with a companylevel pact (CLP), adopted in 2009, and those without such a pact differ between 2009 and the years before (2006–2008). The adoption of a CLP in the year of the Great Recession is combined with lower sales than in the years before if an unconditional DiD specification is used. In column 2 the sign changes and the effect of the interaction variable is insignificant if an extended CDF is estimated. This approach is preferred because in the former the influence of the input factors is partially added to the causal effect. Now, no influence of the adoption of a CLP on sales in 2009 can be detected. One could argue that the estimates in column 1 lead more than that in column 2 to significant results because the sample in the former is larger. This argument is not compelling. If we draw a random sample of 63.83 percent so that in column 1 the sample size is n=20,489 the interaction effect is −0.2939 and the significance is preserved (t=−2.26). If CLPs change labor and capital productivity we should not incorporate lnL and lnK in a conditional DiD. In other words, in this case we should not control for these variables before treatment.
Alternative methods to determine causal effects are matching procedures. These are suggested when there does not exist control over the assignment of treatment conditions, when in the basic equation y=Xβ+αD+u the dichotomous treatment variable D and the disturbance term u correlate, when the ignorable treatment assignment assumption is violated. In the example of the CDF it is questioned that this condition is fulfilled for CLPs. As an alternative the Mahalanobis metric matching (MM) without propensity score and the nearest neighbor matching (NNM) with caliper are applied, presented in Table 6, column 2 and 3, respectively. In the latter method nonreplacement is used. That is, once a treated case is matched to a nontreated case, both cases are removed from the pool. The former method allows that one control case can be used as a match for several treated cases. Therefore, the total number of observations in the nearest neighbor is larger than that in column 2. We find that the CLP effect on sales is insignificant in both cases but the CLP coefficient of MM estimates exceeds by far that of NNM. The estimates of the partial elasticities of production are very similar in the three estimates in Table 6. The insignificance of the CLP effect confirms the result of column 2 of Table 5. If the DiD estimator of column 2 in Table 5 is applied after matching the causal effect is—not unexpected—also insignificant. The probvalue is 0.182 if the MM procedure is used and 0.999 under the NNM procedure.
The previous estimates have demonstrated that companylevel pacts (CLP) have no statistically significant influence on output. We cannot be sure that this result is also true for subgroups of firms. One way to test this is to conduct quantile estimates. As presented in Sect. 2.2 four methods can be applied to determine quantile treatment effects (QTE). The CLP effects on sales can be found in Table 7 where the results of five quantiles (q=0.1,0.3,0.5,0.7,0.9) are presented. In contrast to the previous estimations most CLP effects are significant in the columns 1–4 of Table 7. Firpo considers the simplest case without control variables under the assumption that the adoption of a companylevel pact is exogenous. The estimated coefficients in column 1 (F) seem oversized. The same follows from the FrölichMelly approach, where CLP is instrumented by a short work time dummy (column 3—FM). Other available instruments like opening clauses, collective bargaining, works councils or research and development within the firm do not evidently change the results. One reason for the overestimated coefficients can be neglected determinants of the output that correlate with CLP. Estimates of column 2 (KB) and 4 (AAI) support this hypothesis.
From the view of expected CLP coefficients the conventional quantile estimator, the KoenkerBassett approach, with lnL and lnK as regressors seems best. However, the ranking of the size of the coefficients within column 2 seems unexpected. The smaller the quantile the larger is the estimated coefficient. This could mean that CLPs are advantageous for small firms. However, it is possible that small firms with advantages in productivity due to CLPs have relative high costs to adopt a CLP. In this case the higher propensity of large firms to introduce a CLP is consistent with higher productivity of small firms.
The coefficients of the AbadieAngristImbens approach, a combination of FrölichMelly’s and KoenkerBassett’s model, are also large but not so large as in column 1 and 3.
Possibly, all estimates in column 1–4 of Table 7 are biased and inconsistent. This is the case when CLP and nonCLP firms fundamentally differ due to unobserved variables. To avoid this problem the QTE and the matching approaches are combined. Based on the matching of Table 6 the QTE analogously to column 1–4 in Table 7 can be estimated. In column 5 and 6 only two combinations are presented, namely MM+KB and MM+AAI. We find that the ranking and the size of the coefficients are plausible in column 5. The sizes of the coefficients in column 6 are smaller than in column 4 but the identified causal effects seems still too high. The most important result is the following: the CLP effects are significant for higher quantiles, i.e. for q=0.9 in column 5 and for q=0.7 and q=0.9 in column 6. However, the median estimators (q=0.5) of CLP effects in column 5 and 6 that can be compared with the estimates of column 2 in Table 6 are insignificant. Quantile estimators highlight information that cannot be revealed by other treatment methods, i.e. in Tables 5 and 6. The estimations of the other six combinations (MM+F, MM+FM, NNM+F, NNM+KB, NNM+FM, NNM+AAI)—not presented in the tables—are less plausible. The ranking of the size of coefficients is inconsistent in the light of theoretical and practical experience.
The final discussed treatment method in Sect. 2.2 is the regression discontinuity (RD) design. This approach exploits information of the rules determining treatment. The probability of receiving a treatment is a discontinuous function of one or more variables where treatment is triggered by an administrative definition or an organizational rule.
In a first example using a sharp RD design it is analyzed whether at an estimated probability of 0.5 that a companylevel pact (CLP) exists a structural break on logarithm of output (lnY) is evident. For this purpose a probit model is estimated with profit situation, workingtime account, total wages per year and works council as determinants of CLP. All coefficients are significantly different from zero—not in the tables. The estimated probability Pr(CLP) is then plotted against lnY based on a fractional polynomial model over the entire range (0<Pr(CLP)<1) and on two linear models split into Pr(CLP)<=0.5 and Pr(CLP)>0.5. The graphs are presented in Fig. 1.
A structural break seems evident. Two problems have to be checked: First, is the break due to a nonlinear shape, and second, is the break significant? The answer to the first question is yes, because the shape over the range 0<Pr(CLP)<1 is obviously nonlinear when a fractional polynomial is assumed. The answer to the second question is given by a ttest—cf. Sect. 2.2—based on
where
The null that there is no break has to be rejected (\(\hat{\gamma }_{1}=3.96\); t=−6.87; probvalue=0.000) as can be seen in Table 8.
The estimates in Table 8 cannot tell us whether the output jump in Pr(CLP)=0.5 is a general phenomenon or whether the Great Recession in 2008/09 is responsible. To test this the combined method of RD and DiD—derived in Sect. 2.2—is employed and the results are presented in Table 9. The estimates show that the output jump does not significantly change between 2006/2007 and 2008/2010. The influence of D_Pr(CLP)⋅T and that of D_Pr(CLP)⋅cPr(CLP)⋅T on lnY is insignificant. Therefore, we conclude that the break is of general nature.
Two further examples are presented in Fig. 2 and 3. The Institut für Mittelstandsforschung defines small firms as such that have less than 10 employees and until 1 million Euro sales per year. The analogous definition of middlesize firms is less than 500 employees and until 50 million Euro sales per year. A sharp regression discontinuity design is applied to test whether the first and the second part of the definition are consistent. In other words, based on a CobbDouglas production function with only one input factor, the number of employees, it is tested whether there exists a structural break for small firms between 9 and 10 employees at a 1 million sales border. We find for small firms in Fig. 2 that there seems to be a sales break around 1 million Euro per year.
The ttest analogously to the first example yields weak significance (\(\hat{\gamma}_{1}=13.8667\); t=−1.61; probvalue=0.107). The same procedure for middlesize firms—see Fig. 3—leads to following results.
Apparently, there exists a break. However, the first part of the definition of middlesize firms from the Institut für Mittelstandsforschung is not compatible with the second part. The break of sales at 500 employees is not 50 million Euro per year but around 150 million Euro. Furthermore, the visual result might be due to a nonlinear relationship as the fractional polynomial estimation over the entire range suggests. The ttest does not reject the null (\(\hat{\gamma }_{1}=8977\); t=−0.54; probvalue=0.588). The conclusion from Fig. 2 and 3 is that the graphical representation without the polynomial shape as comparison course and without testing for a structural break can lead to a misinterpretation.
The final example uses a fuzzy regression discontinuity design. It is analyzed whether the CLP effects on the logarithm of sales (lnY=ln(sales/10000)) differ between the East and West German federal states. The graphical representation can be found in Figs. 4a and 4b. The former shows the disparities in the level of sales per year and the latter those of Pr(CLP)—here measured by the relative frequency of firms with a CLP to all firms in a German federal state.
Although clear differences are detected for both characteristics (lnY,Pr(CLP)) we cannot be sure that these disparities are significant and whether the CLP effects are smaller or larger in West Germany. This is checked by a Wald test in Table 10. We find that the CLP effects on lnY (−0.8749/−0.0571=15.3165) are significantly higher in the West German federal states (z=4.29). When the interpretation is focussed on the dummy “East Germany” as an instrument of a dummy “CLP” we should note that the former is not a proper instrument because the output lnY differs between East and West Germany independent of a CLP.
4 Summary
Many reasons like heteroskedasticity, clustering, basic probability of qualitative regressors, outliers and only partially identified parameters may be responsible that estimated standard errors based on classical methods are biased. Applications show that the estimates under suggested modifications do not always deviate so much from that of the classical methods.
The development of new procedures is ongoing. Especially, the field of treatment methods were extended. It is not always obvious which method is preferable to determine the causal effect. As the results evidently differ it is necessary to develop a framework that helps to decide which method is most appropriated under typically situations. We observe a tendency away from the estimation of average effects. The focus is shifted to distribution topics. Quantile analysis helps to investigate differences between subgroups of the population. This is important because economic measures have not the same influence on heterogeneous establishments and individuals. A combination of quantile regression with matching procedure can improve the determination of the causal effects. Further combinations of treatment methods seem helpful. Differenceindifferences estimates should be linked with matching procedures and regression discontinuity designs. And also regression discontinuity split to quantiles can lead to new insights.
Executive summary
Empirical economics is governed by econometric methods since many years. During the last 20 years contents and major questions have strongly changed in this field. Therefore methods were modified and completely new methods were developed. In comparison to conventional approaches attention is paid to peculiarities of the data, to the specification of the estimating approach, to unobserved heterogeneity, to endogeneity and causal effects. Real data are often not compatible with the assumptions of classical methods. If the latter are used, this can lead to a misinterpretation of the results. We have to ask, whether the results are correct. Is it really possible to interpret the estimated effects as causal or are these only statistical artifacts, which are irrelevant or even counterproductive for policy measures? In order to avoid this, the practitioner has to be familiarwith the wide range of existing methods for the empirical investigations. The user has to know the assumptions of the methods and whether the application allows adequate conclusions at given information. It is necessary to check the robustness of the results by alternative methods and specifications.
This paper presents a selective review of econometric methods and demonstrates by applications that the methods work. In the first part, methodological problems to standard errors and treatment effects are discussed. First, heteroskedasticity and clusterrobust estimates are presented. Second, peculiarities of Bernoulli distributed regressors, outliers and only partially identified parameters are revealed. Approaches to the improvement of standard error estimates under heteroskedasticity differ in the weighting of residuals. Other procedures use the estimated disturbances in order to create a larger number of artificial samples, to obtain better estimates. And again others use nonlinear information. Cluster robust estimates try to solve the Moulton problem. Too low standard errors between observations within clusters are adjusted. This objective is only partially successful. We should be cautious if we compare the effects of dummy variables on an endogenous variable, because the more the mean of dummies deviates from 0.5 the higher are the standard errors. Outliers, i.e. unusual observations that are due to systematic measurement errors or extraordinary events may have enormous influence on the estimates. The suggested approaches to detect outliers vary relating to the measurement concept and do not necessarily demonstrate whether outliers should be accounted for in the empirical analysis. New methods for partially identified parameters may be helpful in this context. Under uncertainty the degree of precision, whether outliers should be eliminated, can be increased.
Four principles to estimate causal effects are in the focus: differenceindifferences (DiD) estimators, matching procedures, quantile treatment effects (QTE) analysis and regression discontinuity design. The DiD models distinguish between conditional and unconditional approaches. The range of the popular matching procedures is wide and the methods evidently differ. They aim to find statistical twins, to homogenize the characteristics of observations from the treatment and the control group. Until now, the application of QTE analysis is relatively rare in practice. Four types of models are important in this context. The user has to decide whether the treatment variable is exogenous or endogenous and whether additional control variables are incorporated or not. Regression discontinuity (RD) designs separate between sharp and fuzzy RD methods. It is distinguished whether an observation is assigned to the treatment or to the control group directly by an observable continuous variable or indirectly via the probability and the mean of treatment, respectively, conditional on this variable.
In the second part of the paper the different methods are applied to estimates of CobbDouglas production functions using IAB establishment panel data. Some heteroskedasticityconsistent estimates show similar results while clusterrobust estimates differ strongly. Dummy variables as regressors with a mean near 0.5 reveal as expected smaller variances of the coefficient estimators than others. Not all outliers have a strong effect on the significance. Methods of partially identified parameters demonstrate more efficient estimates than traditional procedures.
The four discussed treatment effects methods are applied to the question whether companylevel pacts have a significant effect on the production output. Unconditional DiD estimators and estimates without matching display significantly positive effects. In contrast to this result we cannot find the same if conditional DiD or matching estimates based on the Mahalanobis metric are applied. This outcome has more precisely formulated under quantile regression. The higher the quantile the more is the tendency to positive and significant effects. Sharp regression discontinuity estimates display a jump at the probability 0.5 that an establishment has a companylevel pact. No specific influence can be detected during the Great Recession. Fuzzy regression discontinuity estimates reveal that the output effect of companylevel pacts is significantly lower in East than in West Germany. A combined application of the four principles determining treatment effects lead to some interesting new insights. We determine joint DiD and matching estimates as well as that ofthe former together with regressions discontinuity designs. Finally, matching is interrelated to quantile regression.
Kurzfassung
Empirische Wirtschaftsforschung wird schon seit vielen Jahren ganz wesentlich von ökonometrischen Methoden getragen. In den letzten 20 Jahren haben sich Inhalte und Fragestellungen in der empirischen Wirtschaftsforschung stark verändert. Dies hat dazu geführt, dass viele Methoden modifiziert oder völlig neue entwickelt wurden. Gegenüber traditionellen Ansätzen wird verstärkt auf die Besonderheiten der Daten, auf die Spezifikation des zu schätzenden Ansatzes, auf unbeobachtete Heterogenität, auf Endogenität und auf Kausaleffekte geachtet. Reale Daten sind ganz überwiegend nicht vereinbar mit den Annahmen klassischer Methoden. Werden letztere trotzdem eingesetzt, so sind damit häufig Fehlinterpretationen der Ergebnisse verbunden. Zu fragen ist, wie sicher die getroffenen Aussagen sind. Können die Schätzergebnisse tatsächlich kausal interpretiert werden oder haben sich lediglich rein statistische Zusammenhänge ergeben, die für Handlungsanweisungen irrelevant oder gar kontraproduktiv sind? Um dies zu verhindern, muss der Praktiker für seine empirischen Untersuchungen mit dem Spektrum vorhandener Methoden vertraut sein. Er muss wissen, welche Annahmen den jeweiligen Methoden zugrunde liegen und ob deren Anwendung bei gegebener Information geeignete Aussagen zulassen. Er sollte durch den Einsatz vergleichbarer Methoden die Robustheit der Ergebnisse überprüfen.
Einen Überblick über selektiv ausgewählte ökonometrische Methoden zu liefern und anhand von Anwendungen deren Arbeitsweise aufzuzeigen, ist Anliegen dieses Beitrags. Behandelt werden methodische Probleme zu Standardfehlern und TreatmentEffekten. Zunächst geht es um heteroskedastie und clusterrobuste Schätzungen. Es folgt die Erörterung von Problemen bei bernoulliverteilten Regressoren, Ausreißern und partiell identifizierten Parametern. Vorgeschlagene Ansätze zur Verbesserung der Standardfehler bei Vorliegen von Heteroskedastie unterscheiden sich in der Gewichtung der Residuen. Andere Verfahren nutzen die geschätzten Störgrößen aus, um künstlich eine größere Anzahl von Stichproben zu erzeugen, um auf deren Basis eine bessere Schätzung der Standardfehler zu erhalten oder machen sich vorhandene Nichtlinearitäten zunutze. Clusterrobuste Schätzungen zielen darauf ab, das MoultonProblem zu lösen. Zu geringe Standardfehler bei Vorliegen von in Clustern zusammengefassten ähnlichen Beobachtungen werden korrigiert. Dies gelingt in den vorgeschlagenen Ansätzen nur unvollständig. Ein bisher nicht erörtertes Phänomen, dass DummyVariablen als Regressoren zu höheren Standardfehlern führen, je mehr ihr Mittelwert von 0.5 entfernt ist, mahnt zur Vorsicht beim Vergleich hinsichtlich der Präzision des Einflusses verschiedener [0;1]Regressoren. Ausreißer, d. h. ungewöhnliche Beobachtungen, die vor allem auf systematische Messfehler oder ungewöhnliche Ereignisse zurückzuführen sind, können erhebliche Auswirkungen auf die Schätzergebnisse haben. Die vorgeschlagenen Ansätze zur Aufdeckung von Ausreißern variieren hinsichtlich des Messkonzeptes und liefern nicht zwangsläufig Hinweise darauf, ob diese bei der empirischen Analyse zu berücksichtigen sind. Neuere Ansätze für nur partiell identifizierte Parameter können hier hilfreich sein. Erhöhen sie doch den Präzisionsgrad bei Unsicherheit, ob Ausreißer zu entfernen sind oder nicht.
Bei den Verfahren zur Bestimmung von TreatmentEffekten stehen vier Prinzipien im Fokus: DifferenzvonDifferenzenSchätzer, MatchingVerfahren, Analyse von TreatmentEffekte bei Quantilsregressionen und RegressionDiscontinuityAnsätze. Bei den DifferenzvonDifferenzenSchätzern ist zu unterscheiden, ob zusätzliche Kontrollvariablen zu berücksichtigen sind oder nicht. Das Spektrum der in neuerer Zeit sehr beliebten MatchingVerfahren, die darauf abzielen Untersuchungsgruppe und Kontrollgruppe zu homogenisieren, um statistische Zwillinge herauszufiltern, ist einerseits recht umfangreich geworden und weist andererseits methodisch bedeutsame Unterschiede auf. Noch vergleichsweise selten ist bisher der Einsatz von Quantilsregressionen zur Erfassung heterogener Kausaleffekte. Methodisch zu unterscheiden ist dabei, ob die Treatmentvariable als exogen oder endogen aufgefasst wird und ob weitere Kontrollvariablen Berücksichtigung finden oder nicht. Bei den RegressionDiscontinuityAnsätzen ist zu unterscheiden, ob die Zuordnung zur Treatment oder Kontrollgruppe allein auf Basis einer beobachteten kontinuierlichen Variablen erfolgt oder auch nicht beobachtete Variablen herangezogen werden.
Die zunächst rein auf die Methodik abgestellte Diskussion der verschiedenen Verfahren wird im zweiten Teil dieses Beitrags um Anwendungen auf CobbDouglasProduktionsfunktionen unter Verwendung von IABBetriebspaneldaten ergänzt. Verschiedene heteroskedastiekonsistente Schätzverfahren führen zu ähnlichen Resultaten für die Standardfehler. Clusterrobuste Schätzungen weisen deutlichere Abweichungen auf. DummyVariable als Regressoren mit einem Mittelwert in der Nähe von 0.5 führen zu kleineren Varianzen der Koeffizientenschätzer als Dummies mit niedrigeren oder höheren Mittelwerten. Nicht alle Ausreißer haben einen starken Einfluss auf die Signifikanz. Neuere Methoden zur Behandlung des Problems nur partiell identifizierter Parameter führen zu effizienteren Schätzungen als traditionelle Verfahren.
Die vier diskutierten TreatmentEffektVerfahren werden angewandt auf die Frage, ob betriebliche Bündnisse einen signifikanten Effekt auf den Produktionsoutput haben. Im Gegensatz zu unbedingten DifferenzvonDifferenzenSchätzern und Schätzern ohne Matching ergeben sich bei bedingten DifferenzvonDifferenzenSchätzern oder MatchingSchätzern auf Basis der MahalanobisMetrik positive, aber nur insignifikante Effekte. Das letztere Ergebnis muss im Rahmen der QuantilsTreatmenteffektAnalyse spezifiziert werden. Je höher das betrachtete Quantil ist, umso eher besteht eine Tendenz zu positiv signifikanten Effekten. Eine einfache RegressionDiscontinuityAnalyse zeigt einen Strukturbruch bei einer Wahrscheinlichkeit von 0.5, dass ein Betrieb ein betriebliches Bündnis vereinbart hat. Keine speziellen Effekte lassen sich während der großen Rezession 2008/09 ausmachen. Fuzzy RegressionDiscontinuitySchätzungen offenbaren, dass der Outputeffekt betrieblicher Bündnisse in Ostdeutschland signifikant niedriger liegt als in Westdeutschland. Eine kombinierte Anwendung der vier Grundprinzipien zur Ermittlung von Kausaleffekten führt zu interessanten neuen Erkenntnissen. So werden unter anderem DifferenzvonDifferenzen Schätzer mit MatchingVerfahren verknüpft. Erstere werden auch in Verbindung mit RegressionsDiscontinuity erörtert und letztere in Verbindung mit Quantilsregressionen.
References
Abadie, A., Angrist, J., Imbens, G.: Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica 70, 91–117 (2002)
Ai, C., Norton, E.C.: Interaction terms in logit and probit models. Econ. Lett. 80, 123–129 (2003)
Angrist, J., Pischke, J.S.: Mostly Harmless Econometrics—an Empiricist’s Companion. Princeton University Press, Princeton (2009)
Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics—Identifying Influential Data and Sources of Collinearity. Wiley, New York (1980)
Cameron, A.C., Miller, D.L.: Robust inference with clustered data. In: Ullah, A.C., Giles, D.E.A. (eds.) Handbook of Empirical Economics and Finance, pp. 1–28 (2010)
Chernozhukov, V., Hong, H., Tamer, E.: Estimation and confidence regions for parameter sets in econometric models. Econometrica 75, 1243–1284 (2007)
CribariNeto, F., da Silva, W.D.: A new heteroskedasticityconsistent covariance matrix estimator for the linear regression model. AStA Adv. Stat. Anal. 95, 129–146 (2011)
CribariNeto, F., Souza, T.C., Vasconcellos, K.L.P.: Inference under heteroskedasticity and leveraged data. Commun. Stat., Theory Methods 36, 1877–1888 (2007)
Firpo, S.: Efficient semiparametric estimation of quantile treatment effects. Econometrica 75, 259–276 (2007)
Frölich, M., Melly, B.: Unconditional Quantile Treatment Under Endogeneity. (2012) mimeo
Goldstein, H.: Multilevel Statistical Models, Kendall’s Library of Statistics, 3rd edn. Arnold, London (2003)
Guo, S., Fraser, M.W.: Propensity Score Analysis. Sage Publications, Thousand Oaks (2010)
Hadi, A.S.: Identifying multiple outliers in multivariate data. J. R. Stat. Soc. B 54, 761–771 (1992)
Hamermesh, D.S.: The craft of labormetrics. Ind. Labor Relat. Rev. 53, 363–380 (2000)
Imbens, G.W., Manski, C.F.: Confidence intervals for partially identified parameters. Econometrica 72, 1845–1857 (2004)
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)
Krämer, W.: The cult of statistical significance—what economists should and should not do to make their data talk. J. Appl. Soc. Sci. Stud. 131, 455–468 (2011)
Leamer, E.E.: Sensitivity analyses would help. Am. Econ. Rev. 75, 308–313 (1985)
MacKinnon, J.G., White, H.: Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties. J. Econom. 29, 305–325 (1985)
Moulton, B.R.: Random group effects and the precision of regression estimates. J. Econom. 32, 385–397 (1986)
Moulton, B.R.: Diagnostic tests for group effects in regression analysis. J. Bus. Econ. Stat. 6, 275–282 (1987)
Moulton, B.R.: An illustration of a pitfall in estimating the effects of aggregate variables on micro units. Rev. Econ. Stat. 72, 334–338 (1990)
Puhani, P.: The treatment, the cross difference, and the interaction term in nonlinear ‘differenceindifferences’ models. Econ. Lett. 115, 85–87 (2012)
Raudenbush, A.S., Bryk, S.W.: Hierarchical Linear Models, 2nd edn. Sage Publications, Thousand Oaks (2002)
Romano, J.P., Shaikh, A.M.: Inference for the identified set in partially identified econometric models. Econometrica 78, 169–211 (2010)
Rosenbaum, P.R., Rubin, D.P.: Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am. Stat. 39, 33–38 (1985)
Stoye, J.: More on confidence intervals for partially identified parameters. Econometrica 77, 1299–1315 (2009)
Wald, H.: The fitting of straight line if both variables are subject to error. Ann. Math. Stat. 11, 284–300 (1940)
White, H.: A heteroskedasticityconsistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–838 (1980)
Wooldridge, J.M.: Clustersample methods in applied econometrics. Am. Econ. Rev. 93(PaP), 133–138 (2003)
Woutersen, T.: A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. (2009) mimeo
Ziliak, S., McCloskey, D.: The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice and Lives. University of Michigan Press, Michigan (2008)
Acknowledgements
I wish to thank an anonymous reviewer for his constructive suggestions and the participants of the Nutzerkonferenz in Nürnberg for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Hübler, O. Estimation of standard errors and treatment effects in empirical economics—methods and applications. J Labour Market Res 47, 43–62 (2014). https://doi.org/10.1007/s1265101301350
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1265101301350
Keywords
 Standard errors
 Outliers
 Partially identified parameters
 DiD estimators
 Matching
 Quantile regressions
 Regression discontinuity