Measurement error in longitudinal earnings data: evidence from Germany

Schmillen, Achim; Umkehrer, Matthias; von Wachter, Till

doi:10.1186/s12651-024-00366-x

Original Article
Open access
Published: 04 June 2024

Measurement error in longitudinal earnings data: evidence from Germany

Achim Schmillen¹,
Matthias Umkehrer² &
Till von Wachter^3,4,5,6

Journal for Labour Market Research volume 58, Article number: 8 (2024) Cite this article

487 Accesses
1 Altmetric
Metrics details

Abstract

We present evidence on the extent of measurement error in German longitudinal earnings data. Qualitatively, we confirm the main result of the international literature: longitudinal earnings data are relatively reliable in a cross section but much less so in first differences. Quantitatively, in the cross section our findings are very similar to those of Bound and Krueger (J Labor Econ 9:1–24, 1991) and Pischke (J Bus Econ Stat 13:305–314, 1995) for the United States while we find even stronger evidence that first-differencing exacerbates measurement error problems. We also show that measurement error in our survey data is not “classical” as it is negatively correlated with administrative earnings and positively autocorrelated over an extended period of time. Additionally, we estimate a model of measurement error stemming from underreporting of transitory earnings shocks in combination with a white-noise component and make a number of methodological contributions. Our results are robust to the use of two different linked survey-administrative data sets and various other sensitivity checks.

1 Introduction

Over the last three decades, administrative data sets have been “transforming the analysis of economic policy” (Friedman 2010, p. 2). Card et al. (2010, p. 1) highlight that as compared to survey data “[a]dministrative data offer much larger sample sizes and have far fewer problems with attrition, non-response, and measurement error.” In one of the pioneering and most influential examinations of measurement error in survey data, Bound and Krueger (1991) (hereafter: BK) focus on the prevalence and properties of measurement error in longitudinal earnings information. Their analysis compares potentially mismeasured Current Population Survey (CPS) data to administrative Social Security payroll tax records, which BK assume to be free of measurement error. According to the findings by BK, in a cross section survey-based earnings data are relatively reliable. At the same time, reliability ratios are much lower for specifications that rely on survey-based earnings data in first differences. In addition, measurement errors are not “classical” in the sense of being identically and independently distributed and uncorrelated with administrative earnings. Instead, measurement errors are serially correlated over 2 years and negatively correlated with administrative earnings (or “mean-reverting” according to BK’s parlance).

In the first part of this paper, we replicate and extend the exercise by BK with the help of German data that link the administrative “ADIAB” social security records with the “PASS” (“Panel Study Labour Market and Social Security”/“Panel Arbeitsmarkt und soziale Sicherung”) household survey. In order to compare our findings for Germany directly with those already available for the United States, we make the same assumptions as BK. In particular, we assume that administrative ADIAB earnings represent “true earnings” and hereafter designate them as such. In fact, because the underlying administrative data are used to compute social security contributions, the ADIAB earnings information is considered highly reliable. While the data set used by BK only spans two time periods, we are able to exploit cross-sectional and longitudinal variation in earnings over the first four waves of the PASS survey. Our data are also more recent than those of BK. In addition, the provision of internationally comparable evidence on the extent of measurement error in German longitudinal earnings data makes it possible to put the conclusions by BK into an international perspective.

Qualitatively, we confirm many of the main results of BK. In particular, we confirm that in a cross section survey-based earnings data are relatively reliable but that their reliability tends to be much lower when the data are specified in first differences. Quantitatively, in the cross section our findings for Germany are very similar to those of BK for the United States while we find even stronger evidence that first-differencing exacerbates measurement error problems. As yardstick for the reliability of survey-based earnings data, BK consider the “reliability ratio” defined as the ratio of the covariance between mismeasured earnings and true earnings to the variance of mismeasured earnings. A value for this ratio of 1 would indicate that the covariance between mismeasured earnings and true earnings equals the variance of mismeasured earnings. This would imply perfect reliability. Conversely, a value of 0 would imply a complete lack of reliability. Allowing for mean-reverting measurement error, BK estimate that in a cross section, the reliability ratio is 0.97 to 1.02 for men and 0.93 to 0.96 for women. They also estimate that this ratio falls to between 0.78 and 0.86 when the data are specified in first differences. Also allowing for mean-reverting measurement error, we find that in a cross section the reliability ratio is 0.93 to 0.95 for men and 0.90 to 1.00 for women. In first differences, it falls to 0.11 to 0.47 for men and 0.25 to 0.34 for women.

Our other noteworthy findings include (a) that measurement error in our survey data is not classical but mean-reverting, with a high degree of autocorrelation and a strong negative correlation with true earnings, and (b) that the mismeasurement of earnings leads to little bias when survey-based earnings are on the left-hand side of a typical Mincer-type earnings regression. Both these findings are again in line with conclusions reached by BK. Going beyond the exercise by BK, we analyze both first- and higher-order autocorrelations of measurement error. Our results strongly suggest that measurement errors in our survey-based earnings data are positively correlated over an extended period of time. This leads us to conjecture that at least a sizable fraction of the autocorrelation in the measurement error is not actually due to a simple autoregressive process but either to a person fixed effect or to a more complex time series process.

In the second part of this paper, we use our data to estimate the dynamic model of measurement error by Pischke (1995). Pischke (1995) explains the measurement error in earnings data from the Panel Study of Income Dynamics Validation Study (PSIDVS)—which combines earnings information from the Panel Study of Income Dynamics (PSID) questionnaire with information from payroll records at a specific, anonymous firm which he assumes to be free of measurement error—by an individual fixed effect, the misreporting of transitory earnings shocks and a white-noise component. This simple model fits our data surprisingly well. Again under the assumption that administrative earnings information represents true earnings, we find mean-reverting measurement error and, like Pischke (1995), that individuals underreport the transitory component of earnings. Pischke (1995, p. 309) explains that the underreporting of transitory earnings changes is entirely plausible given that “[o]bviously, it is the changes in permanent earnings that are related to the more important events in people’s lives.” Also in line with Pischke (1995), we find that underreporting of transitory earnings leads to downwardly biased estimates of the variance of earnings growth while the white-noise component induces upward bias. In our case, the upward bias more than offsets the downward bias, implying that earnings growth observed in survey data appears to be more spread out than it actually is. This is in contrast to Pischke (1995), who finds upward and downward biases of similar magnitude. As a further contribution, we document an upward bias in estimates of the variance of transitory earnings and a downward bias in estimates of the variance of permanent earnings due to measurement error in the survey data. Thus, earnings inequality is actually more persistent than suggested by survey data.

Our results are robust to various sensitivity checks and arguably exhibit a high degree of external validity as demonstrated through the use of a second linked survey-administrative data set that combines the administrative social security records with the “WeLL” (“Continuing Education and Lifelong Learning”/“Weiterbildung und Lebenslanges Lernen”) household survey. While the PASS survey oversamples poorer households, the WeLL survey includes a disproportionally large number of high earners. Reassuringly, our results are qualitatively robust across the two surveys.

We also make three methodological contributions: (a) we extend the simple model of measurement error suggested by BK as well as the dynamic measurement error model by Pischke (1995) from the two-period to the four-period case, (b) we introduce the methodology for taking account of top-coded administrative earnings information developed by Card et al. (2013) to the measurement error literature and (c) we develop a procedure for merging the correct administrative records to the different waves of the PASS and WeLL surveys.^{Footnote 1} These methodological contributions will help researchers to further investigate the prevalence and properties of measurement error in longitudinal earnings data and tap the full potential of German linked survey-administrative data.

Our study is relevant for three distinct strands of literature. First, it is relevant for the literature that links administrative and survey data to investigate the extent of measurement error in longitudinal earnings data. This literature is partly surveyed in Bound et al. (2001) and Meyer and Mittag (2021) and almost exclusively focuses on the United States. Together with BK and Pischke (1995), pioneering studies include those by Duncan and Hill (1985) and Bound et al. (1994) which both rely on the PSIDVS. In accordance with the findings by BK, Duncan and Hill (1985) show that in levels the reliability ratio of earnings in the PSID data exceeds 80 percent while Bound et al. (1994) document mean reversion and a positive autocorrelation in measurement error. Other noteworthy studies that link American administrative and survey data sets to investigate the extent of measurement error in earnings data include those by Mellow and Sider (1983), Rodgers et al. (1993), Bollinger (1998) and Stinson (2002).^{Footnote 2}

In addition, our results on measurement error in longitudinal earnings data complement the emerging literature that compares the quality of German administrative and survey-based earnings information in the cross section. Notable contributions to this literature include Oberski et al. (2017), Antoni et al. (2019), Valet et al. (2019), Gauly et al. (2020) and Stüber et al. (2023). One recurrent conclusion of this literature is that while there is a certain tendency for average earnings to be larger in administrative records than in surveys, in the cross section average earnings differ only relatively little between the two types of data sources.

Third and finally, our study is relevant for the literature that investigates whether findings regarding labor market characteristics and impacts of economic policy depend on using either administrative or survey-based earnings data (either in levels or with regard to dynamics). For instance, Gideon et al. (2017), Abowd and McKinney (2017) and Kopczuk et al. (2010) demonstrate that using administrative instead of survey-based earnings data can alter well-established and policy-relevant findings regarding topics as diverse as the extent of race-based wage discrimination, the role of firms in explaining earnings inequality and the degree of earnings mobility over the life cycle. Two recent studies on the effects of the introduction of a nationwide minimum wage in Germany neatly illustrate the advantages of administrative data, but also that these advantages come at a cost. Dustmann et al. (2022) rely on administrative data and convincingly argue that as compared to exploiting survey data this approach improves the measurement of earnings and the precision of estimates. In contrast, Caliendo et al. (2023) follow the more traditional route and employ survey data. While the authors acknowledge that there may be measurement error in survey-based earnings information, they also highlight that some other relevant variables such as working hours are only partially and imprecisely captured in German administrative data sources. Indeed, Dustmann et al. (2022) need to rely on certain assumptions and an imputation procedure to expand their analysis from daily earnings to hourly wages.

The remainder of this paper is structured as follows: the PASS and ADIAB data, our linkage and sampling procedure and summary statistics are presented in the next section. This is followed by a description of our static measurement error model in Sect. 3. In Sect. 4, we characterize the measurement error in the PASS earnings data and the induced bias in survey-based earnings regressions. Section 5 contains the findings of our main empirical analysis using the static model of measurement error, including sensitivity checks. Section 6 outlines our application of the dynamic measurement error model of Pischke (1995), presents relevant estimates and discusses biases in estimates of earnings processes when using error-ridden data. Section 7 concludes.

2 Data, sampling and summary statistics

We base our investigation on the PASS household survey linked with ADIAB administrative social security records prepared and maintained by the Institute for Employment Research (IAB) in Nuremberg, Germany.^{Footnote 3} PASS is a household panel survey with a focus on poverty and receipt of non-contributory, means-tested unemployment benefits (“Arbeitslosengeld II”). The survey covers approximately 10,000 households and is carried out annually. Households receiving unemployment benefits are oversampled to allow for a detailed analysis of the dynamics of benefits receipt. As unemployment benefit recipients are comparatively more likely to transition into and out of lower-paying jobs, average earnings in PASS are lower than in the German labor market as a whole (cf. Fig. 1 which captures both the actual earnings distribution of PASS for our estimation sample as defined below and the counterfactual PASS earnings distribution approximating the German labor market using probability weights, i.e., where the weight of a respondent is equivalent to the reciprocal value of their inclusion probability).

PASS comprises information on, for example, individuals’ socio-demographic characteristics and subjective well-being. In our study, we incorporate the first four waves of the survey, collected in 2006/2007, 2007/2008, 2008/2009 and 2010, respectively. The PASS survey takes great care to collect data that is reliable, robust and comparable across households and across time. Given the declining prevalence of landline phones in Germany there is a mix of CATI and CAPI interviews. Based on respondents’ preferences, interviews are conducted either in German or in one of three other languages (English, Russian and Turkish). Moreover, there are detailed interviewer training, outreach, engagement, follow-up and quality assurance processes. While there is a survey module regarding the overall household situation that is directed at one knowledgeable household member, the information on individual household members’ characteristics used here is directly solicited from these individual members.^{Footnote 4}

The version of the ADIAB data used here contains the universe of all individuals who were employed subject to social security contributions, received unemployment benefits or were registered as job seekers in the Federal Republic of Germany at least once between 1975 and 2010. All individuals with at least one spell of “marginal” employment, i.e., employment not covered by social security, in 1999 or later are also included. For most individuals in the data set, information on the majority of their labor market biography is available. Only spells of employment not covered by social security—like those as civil servants or family workers—and spells of self-employment are not covered. All in all, the ADIAB data cover more than 80 percent of Germany’s total workforce. They encompass detailed longitudinal information on employment status, earnings, socio-demographic and firm characteristics to the exact day. Because Germany’s social security agencies use the underlying administrative data to compute social security contributions and unemployment benefits, the earnings information in the ADIAB data is considered highly reliable.

In the ADIAB data, precise earnings information is not recorded beyond Germany’s “contribution assessment ceiling”, the maximum level of earnings that are subject to contributions to the country’s various social security programs. To address the resulting top-coding of earnings information for 1.6 percent of observations, we rely on the imputation procedure developed by Card et al. (2013). This procedure uses separate Tobit regressions for different calendar years and for East and West Germany with a series of imputation variables that can be calculated when an establishment’s entire workforce and a time series of earnings are observed.^{Footnote 5} Importantly, as reported in Sect. 5.3 all our results are robust to instead excluding observations with top-coded ADIAB earnings or simply keeping these observations with the original earnings information.

For the purpose of our study, we disregard spells of unemployment and inactivity and focus on individuals’ spells in employment only (which will include time spent on annual or sick leave as long as the employment spell is not interrupted). Labor market biographies from the PASS and ADIAB data are linked using individuals’ social security numbers. Linkage is only possible for those survey participants who explicitly permitted a match of their survey data to administrative records (depending on the PASS wave, this is the case for 80 to 87 percent of participants). A number of steps are needed to derive comparable information on earnings. PASS refers to gross earnings in the last month before the survey. In the first wave the respondents were to report the earnings from their main job, in the other waves they were to report the total amount of earnings. The administrative data, in turn, encompass all jobs of a person and include the sum of gross earnings over a reporting period of up to a year from a given job. There are also differences in the PASS and ADIAB data regarding whether one-time payments are captured in the definition of earnings and regarding the coverage of non-standard forms of employment. These differences are the subject of robustness checks in Sect. 5.3.^{Footnote 6}

The process of linking PASS with ADIAB data and defining the estimation sample through data cleansing and taking account of the top-coding of the ADIAB earnings data is visualized in simplified form in Fig. 2. First, we identify the ADIAB spells covering the month the respective PASS survey waves refer to. Next, for this month we calculate average daily gross earnings in the ADIAB data. In case of multiple earnings records within the month, for wave one of PASS we select the job with the highest average daily earnings as the main job while for waves two to four we combine earnings from all spells. Finally, we calculate monthly gross earnings by multiplying average daily gross earnings with $(7/12*31+4/12*30+1/12*28.75)$. In practical terms, we are able to successfully link 11,575 spells with information on earnings from both ADIAB and PASS. We label these 11,575 spells our “raw” data.

As the literature emphasizes the potentially substantial consequences of mismatch between different data sources, we go to great lengths to assure that the earnings information gathered from the PASS and ADIAB data are in fact comparable. Starting with our raw data, we implement four discrete data cleansing steps. First, we exclude data points where earnings information from PASS is unlikely to be directly comparable to earnings as recorded in the ADIAB data. This is evident for 179 cases with a measurement error that in absolute terms exceeds 150 percent. Second, we exclude 851 cases where respondents are only willing to give their earnings in terms of a broad range instead of a precise number. Third, we drop 158 cases where respondents indicate that they are self-employed at least once in addition to or instead of being in dependent employment. Fourth, following BK we exclude some occupations and sectors with untypical pay structures. For this reason, all observations for workers in agriculture and engaged as coachmen, managers, artists, performers, clerks, hairdressers, innkeepers, waiters, cleaners, housekeepers, cab drivers, barkeepers or homeworkers are dropped. The fourth step affects 1404 observations in our raw PASS data. Altogether, data cleansing cuts the estimation sample from 11,575 to 9254 observations.

Table 1 contains separate summary statistics for men and women for (a) the raw linked survey-administrative data, (b) the estimation sample and (c) the estimation sample reduced to a strongly balanced panel (which is required for some of our subsequent analyses). The table focuses on the following individual characteristics: education (whether an individual holds a school leaving certificate from an academic high school, also called “Abitur”), citizenship (German passport or not), marital status, geographic location (East or West Germany), monthly gross earnings according to both the survey and the administrative information, age and weeks in employment in the respective year.

Table 1 Summary statistics by gender and sample

Measurement error in longitudinal earnings data: evidence from Germany

Abstract

1 Introduction

2 Data, sampling and summary statistics

3 A static model of measurement error

4 Measurement error and induced bias in survey-based earnings regressions

5 Reliability of the data

5.1 Variance-covariance matrices of true earnings and measurement error

5.2 Reliability of PASS earnings data

5.3 Robustness and sensitivity checks

6 Measurement error and earnings dynamics

6.1 A dynamic model of measurement error

6.2 Estimation results

6.3 Implications for the estimation of earnings dynamics

7 Conclusions

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix: Measurement error in WeLL earnings data

Appendix: Measurement error in WeLL earnings data

1.1 Data, sampling and summary statistics

1.2 Measurement error, induced bias in survey-based earnings regressions and reliability of the data

1.3 Measurement error and earnings dynamics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification