The 2011 Break in the Part-Time Indicator and the Evolution of Wage Inequality in Germany

German social security records involve an indicator for part-time or full-time work. In 2011, the reporting procedure was changed suggesting that a fraction of worker recorded to be working full-time before the change were in fact part-time workers. This study develops a correction based on estimating the probability of being a part-time worker before and after the break. Using the correction, the paper confirms that the rise in wage inequality among full-time workers in West Germany until 2010 is not a spurious consequence of the misreporting of working time.


Introduction
The Sample of Integrated Labor Market Biographies (SIAB) for the time period 1975 to 2014 -and its earlier versions or larger versions of the same data -are widely used datasets for empirical analyses on the German labor market (e.g. Dustmann et al., 2009;Card et al., 2013;Möller, 2016;Antonczyk et al., 2018;Biewen et al., 2018). 1 The employment data in the SIAB comprise spells of employment subject to social security taxation recording in particular the length of employment, the daily gross wage, and an indicator for the part-time employment. Beyond this indicator, there is no information about hours of work. The reporting procedure for the part-time indicator changed in 2011 with dramatic consequences on the share of reported part-time workers. This paper develops a correction procedure for this break and investigates the robustness of previous findings on the evolution of wage inequality in Germany.
Relying on the part-time indicator, the literature on long-term trends in wage inequality in Germany using SIAB data focuses on the subsample of full-time employees because of the lack of information on hours of work (see the studies cited in footnote 1). This assumes that differences in hours of work among full-time employees are negligible for the analysis of long-term trends in wage inequality thus that daily wages (earnings) provide a good approximation of the price of labor. Most studies do not analyze wages for part-time employees, among whom wage differences are likely to mostly reflect differences in hours of work.
In 2011, there was a change in the reporting procedure employers had to apply for social security records (Ganzer et al., 2017;Ludsteck and Thomsen, 2016;Möller, 2016). Before this change, the part-time indicator was integrated in a question which involved various characteristics of the job. After the change, there was a separate question regarding working-time status (Bertat et al., 2013). 2 The change both forced employers to reassess and update their reporting routines and made the need for employers to report part-time employment more salient. Thus, the reported part-time information after the change is likely to be more reliable.
The change in the reporting procedure was not implemented at the start of 2011 for all employers.
There was a grace period until the end of 2011. While some employers started to use the new procedure early in 2011, others kept using the old procedure until the end of 2011. During this transition, the quality of the reported data severely deteriorated as indicated by a large number of missing values for 1 The SIAB dataset involves a 2% sample of the Integrated Employment Biographies (IEB) of the Institute for Employment Research (IAB). The SIAB version up to 2014, denoted as SIAB7514, is described by (Ganzer et al., 2017). The predecessors for earlier time periods up to 2010 were used in Dustmann et al. (2009), Antonczyk et al. (2018), and Biewen et al. (2018) -among others -for the analysis of wage inequality. Card et al. (2013) use the full population of social security records of all workers in their study on the importance of worker and firm heterogeneity for the analysis of wage inequality. The SIAB7514 was used by Möller (2016). 2 The change involves the recording of the activity in the job (Tätigkeitsschlüssel), which is reported to the social security administration by the employer for each employee. It was changed from a five-digit number to be a nine-digit number (Bundesagentur für Arbeit, 2019). In addition to the part-time status, it contains information on occupation, educational background, the contract period and whether employment is temporary (Bertat et al., 2013). a number of variables, including the part-time indicator. The large increase in missing values is likely to be an effect of the reporting changes. While the share of missing values in the part-time indicator is below 1 percent in other years, it lies above 30 percent in the raw data in 2011. All this suggests that the new reporting procedure only started to operate fully in 2012. In this year, the number of missings in the raw data returned to normal levels.
For preparation of the most recent version of the SIAB, from 1975 to 2014 (henceforth SIAB7514), researchers at the IAB implemented an imputation for the part-time indicator in 2011 (Ganzer et al., 2017;Ludsteck and Thomsen, 2016) in order to both account for the missing data and update the full data for 2011 to the new reporting procedure. Figure 1 shows that this imputation is successful insofar that the share of 2011 fits a smooth backward extrapolation of the trend for the years after 2011. At The jump in the level of the part-time share before and after the change in the reporting scheme is striking. We can not think of a plausible reason such as a policy change, an economic shock or a mistake in the data collection from 2011 onward which could explain the sizeable increase in the part-time share.
Similar in spirit to the imputation procedures applied by Thomsen (2016) andMöller (2016), we argue that the higher part-time share since 2011 seems correct and therefore the lower parttime share before 2010 is likely to be the result of misreporting true part-time employment as full-time employment. Further, there was an upward bias in reported full-time work because employers tend to reuse the record on previous employment spells by the same employees (Ludsteck and Thomsen, 2016) and workers are more likely to switch from full-time work to part-time work within the same job. 3 The need to update the working time status in such a case seems more salient for employers under the new reporting procedure introduced in 2011. Möller (2016) pointed out that before 2011 full-time spells with low daily wages in the raw data are disproportionately likely to be in fact part-time spells and therefore the raw SIAB data is likely to overstate the level of wage inequality among full-timers until 2010. 4 Further, this bears the risk that the increase in wage inequality until 2010 among reported full-timers (as discussed in the literature) may also have overstated the true increase. This is an important issue because Möller (2016)'s evidence based on the SIAB7514 suggests a trend reversal in wage inequality trends among full-timers in West Germany from 2011 onward, i.e. at the time of the break in the part-time indicator, such that inequality increased until 2010 and has then stopped to grow further.
Figures 2 and 3 indeed show remarkably increasing trends for different percentiles of the full-timelog-earning distribution for women and men in 2011. As expected, the increase is much stronger for women which can be explained by the larger increase in the part-time share shown in figure 1. For both genders, the increase is also stronger at the bottom of the distribution. In the male distribution, one needs to consider a quite low percentile such as the 2.5th percentile to see a large kink. This is not surprising given the low share of part-time employees among men (below ten percent). Still, the effect of the change in the reporting procedure is also visible for men -and our evidence suggests to correct employment spells up to the 25th percentile of the male full-time wage distribution. For women, a discontinuous increase from 2010 to 2011 can even be detected in the upper half of the distribution.
Möller (2016) corrects full-time employment before 2011 using a simple imputation correction, which shares some similarities with our approach. 5 He first estimates a non-linear trend for total part-time employment for the time period until 2010 and uses this estimate to predict part-time employment in 2011. As to be expected, this provides evidence for underreporting of part-time employment before 2011. He then fits a logit model for the incidence of part-time employment for the sample until 2010, using age, industry, wage, region among others as predictors. In the pre-2011 sample, he then corrects those reported full-time spells with the highest predicted part-time probability to part-time. This correction is continued until the break in the time trend in part-time employment, as calculated in the first step, disappears. The underlying assumption for this correction is that the relative amount of underreporting part-time employment was basically constant in the pre-2011 period.
The goal of our paper is to develop a correction based on estimating the probability of being reported as a part-time worker before and after the break in 2011. Our paper extends upon Möller (2016) in three dimensions. First, we use the year 2012 as benchmark year assuming that part-time is reported correctly in that year and we also correct wages in 2011. Second, we use an inverse probability weighting approach to reweight reported full-time spells instead of a binary prediction as to whether a spell is full-time or part-time. Using a discrete prediction entails the danger that the correction is too strong in the bottom part of the distribution and not strong enough further up the distribution. Third, we use graphical evidence on the evolution of the wage percentiles among full-time employment to determine the position in the wage distribution below which a correction of wages is necessary.
Our approach involves estimating the probability for a part-time spell being reported among all employees (full-timers and part-timers) as a function of employee and job characteristics both for the year 2012 and the years before 2012. The regression is estimated based on those observations with wages below the upper bound, above which graphical evidence suggests that there is no need for correction. The rank difference in the wage distribution between the upper bound and the individual's wage is used as key covariate. Based on the regression estimates, the full-time employment data before 2012 is then reweighted using inverse probability reweighting based on the estimated propensity scores.
This way, reported full-time employment spell before 2011 are corrected. We identify and downweight observations which are likely to be misreported as full-time, which results in a continuous upward correction of low wage percentiles among full-timers. This correction is smooth and we also correct the data in 2011 because our graphical evidence on wage trends suggests that the data in 2011 suffers from misreporting of low wage spells as involving full-time employment. Using our correction, the paper The remainder of this paper is organized as follows. Section 2 describes the data. Our correction approach is developed in section 3. Section 4 revisits the analysis of wage inequality based on the corrected data. Section 5 concludes.

Data Description and Sample Restrictions
In its latest version, the SIAB dataset involves a two-percent sample of the Integrated Employment Biographies (IEB) of the Institute for Employment Research (IAB) from 1975 to 2014 (for East Germany from 1992 onward). Next to data on benefit recipients, it contains information on employment spells for employees who are subject to social security contributions and marginally employed (from 1998 onward) -not included are civil servants and self-employed (Ganzer et al., 2017).
The data include the exact duration of employment spells on a daily level, some characteristics as industry, occupation, educational background, part-time status, and the daily gross-wage. The gross wage is right censored at around the 96th and 88th wage percentile for full-timers among women and men, respectively. In our empirical analysis, we analyze wage percentiles which are unaffected by the censoring. Throughout the paper all wages are given in real terms. Wages are deflated by the annual consumer price index of the Federal Statistical Office (Statistisches Bundesamt, 2018). The base year for the inflation adjustment is 2014.
We apply the following sample restriction throughout the rest of our paper. We use only observations from the ten former West-German states (without Berlin) and only employees aged 25 to 55. Additionally, we exclude employment spells for apprentices and for marginally employed. These restrictions broadly ensure that our results on wage inequality can be related to the previous literature for West Germany which used similar age restrictions (Card et al., 2013;Dustmann et al., 2009;Antonczyk et al., 2018;Biewen et al., 2018).
Further, we weight always the employment spells by their duration in days. Employment spells are at most one year long and are always completely included in one calendar year. The maximum length of a spell is thus from January 1st to December 31st of a given year. To obtain the weight, we divide the length of all spells in days by the maximum spell duration of that year, 365 or 366 days.

Correction Approach for Part-time Indicator
Our correction approach for the 2011 break in the part-time indicator is based on inverse probability reweighting for full-time employment spells before 2012. This is based on regressions for the probability to be reported working part-time. Based on the observation that the full-time share is generally higher in the years before 2011, we assume that there are observations which have a low probability given their characteristics to be reported as part-time in their observation year, but conditional on their characteristics their probability to be a part-time spell would be higher in 2012. Our correction approach builds on the assumption that such employment spells may be misreported in the raw data as being full-time and therefore we downweight them to reflect this possibility. In short, we estimate a correction weighting factor between zero and one for such spells. We then use these weights to reweight those spells we deem to be potentially misreported. Because of the strong gender differences in the part-time share, the correction is applied separately for women and men.

Upper bound for correction suggested by evolution of wage percentiles
As starting point, we argue that the observed evolution of the wage distribution before and after the structural break in 2011 suggests that there is an upper bound for wages below which a correction of the full-time status is necessary. Further, the correction below the upper bound should be the larger the lower the rank in the wage distribution, i.e. the larger the rank difference between the upper bound and the individual's wage. Figures 2 and 3 show the evolution of various uncorrected wage percentiles for men and women, respectively. Figure 3 for men shows no discontinuous upward jump for the 25th wage percentile between 2010 and 2012, a modest increase is visible at the 10th percentile and increasing jumps in absolute numbers for the 5th and the 2.5th percentile. Based on the graphical evidence, we take the 25th percentile of full-time wages as the upper bound for correcting the wage data for men.
For women, the part-time share is much higher resulting in a stronger need for correction. Figure 2 for women shows strong increases in 2010 even for the median. At the 80th percentile, the 2010-to-2012 increase becomes rather small. Based on this evidence, we assume that wage observations above the 80th percentile of the full-time distribution for women are correctly reported and our correction applies to wages lying below this upper bound.
In the second step, we determine the rank of the upper bounds of the full-time wage distribution in the gender specific wage distribution in 2012 for total employment involving both part-timers and fulltimers. 6 The data reveal that the 80th and 25th percentile of the full-time distribution for women and men in 2012 correspond to the 88th and the 29th percentiles, respectively, of the total wage distribution in the same year.
As upper bounds for the corrections for the years 2000 to 2011, we use the 88th and 29th percentiles, respectively, in the gender specific wage distribution of total employment, assuming that there is no need for correcting full-time wages above. Using total employment, there is no risk of confusing full-

Rank differences as drivers of correction
The amount of reweighting is allowed to depend on the year specific rank difference θ tsi between the wage wage tsi and the year specific upper bound calculated above. θ tsi is calculated for the year and gender specific wage distribution for total employment. Formally, we define θ tsi = 0.88 − F t f (wage tsi ) for women and θ tsi = 0.29 − F t m (wage tsi ) for men. θ tsi is zero at the upper bound and increases when moving down the wage distribution.

Propensity score of being reported full-time in raw data
To estimate the full-time probability, we run a probit regression for reported part-time employment among all observations in full-time and part-time below the upper bound, separately by gender and year. The spells above the upper bound are not used for the regression and the full-time spells among these will later receive a weight of one. The probit regression specifies the probability of reporting part-time as a function of the wage position θ tsi and a vector of characteristics x tsi , with where Φ(.) is the distribution function of the standard normal. The controls contained in x tsi are a second order polynomial for age, dummies for low, medium and high educational attainments, ten dummies for different job categories, 13 dummies for different industries, and ten dummies for the West German states. The reference category for educational categories, job categories, sectors and states includes observations with missing values.
The probit regressions yield the predicted probabilities for a spell to be reported as a part-time spell for the years 2000 to 2012. Since the part-time share increases substantially after 2011, we expect for the vast majority of observations in years t ≤ 2011 that P r(pt 2012si = 1|θ tsi , x tsi ) > P r(pt tsi = 1|θ tsi , x tsi ), i.e. for given spell characteristics θ tsi , x tsi the predicted part-time probability based on the 2012 regression exceeds the predicted part-time probability based on the regression for the earlier year t. This expectation is confirmed in the data. Tables 1 and 2 show the coefficient estimates for the probit regressions for selected years. The results confirm that the part-time probabilities increase in the rank difference θ tsi . For women and men, its effect is highly significant and it tends to be larger in size for the former.

Weights
In the final step, we calculate the full-time weight f t weight tsi , i.e. the reweighting factor, as the ratio of the predicted probabilities of being reported to work full-time in the year 2012 and the year t (2) f t weight tsi = min P r(pt 2012i = 0|θ tsi , x tsi ) P r(pt tsi = 0|θ tsi , x tsi ) , 1 = min 1 − P r(pt 2012i = 1|θ tsi , x tsi ) 1 − P r(pt tsi = 1|θ tsi , x tsi ) , 1 .
We censor the weight at one, i.e. the weight is set to one if the ratio exceeds one.
The goal is to downweight observations in year t which given their characteristics have a lower full-time probability in 2012 than the probability of being reported full-time in year t. Indeed, the weights prove to be less than one for more than 91 percent of spells, which means that P r(pt 2012si = 1|θ tsi , x tsi ) > P r(pt tsi = 1|θ tsi , x tsi ) and f t weight tsi < 1. For the remaining spells, the ratio of the probabilities is larger than or equal to one and we cap the ratio at one, which means that such spells are not downweighted because there is no reason to increase the actual sample weight of a spell reported as full-time in the raw data. This corresponds to our assumption that full-time employment is overstated before the 2011 break in the part-time indicator. decreases for a growing rank difference, implying that spells with lower wages are more likely to be misreported. For both genders, the estimates for the weights cover the part of the distribution below the initially defined upper bound -every spell below the 88th and 29th percentile for women and men, respectively, in the total employment sample.
For our subsequent analysis of wage inequality, we weight the spells by the product of the full-time weight in equation (2) and the relative length of the spell. The resulting spell weight becomes where length tsi denotes the length of the spell in days and ndays t the total number of days in year t, 365 or 366, respectively.

Trends in Wage Inequality Before and After the Correction
We now investigate whether the paramount evidence reported in the literature for rising wage inequality until 2010 among full-timers is robust against the misreporting of low-wage part-time employment in the raw SIAB data as full-time employment during that time period. Doing so, we revisit the evidence reported in Möller (2016) showing that correcting the data before 2011 does not qualitatively change the finding of a strong rise in wage inequality among full-timers until 2010. 7 Downweighting the full-time employment spells up to 2011 mainly affects wages in the lower part of the distribution but still also changes higher wage percentiles. This is because a reduction in the weighted shares of workers with low wages mechanically increases all percentiles up the wage distribution, an effect which goes even beyond the percentiles above the upper bounds used for the correction. However, this increase in all percentiles is not uniform across the wage distribution -in fact the increase becomes smaller further up the distribution. Thus, the correction reduces the level of wage inequality and it may possibly affect the estimated trend of wage inequality. lower tail of the wage distribution and, holding the wage level constant, the correction is stronger for women than for men. The correction for men at the median and the 25th percentile is small but still visible and it becomes sizeable at the 5th percentile for men. For women, the correction is sizeable even at the median and it grows further moving down the wage distribution. To give some numbers for 2010, the upward correction for women is 0.12 log-points (three percent of the real log wage) at the 25th percentile and 0.18 log points (five percent) at the 5th percentile. For men, it is 0.01 log-points As final part of our empirical analysis, we investigate the age dimension of wage inequality. Antonczyk et al. (2018) find that an important aspect of the rise of wage inequality among men was that wage differences between older and younger workers increase strongly until 2004 and that real wages fell strongly for younger workers, especially in the lower part of wage distribution. Against this backdrop, 8 However, the implied trends of wage percentiles for the corrected data are not smooth in 2011. This would deserve further investigation based on the raw social security records, which we do not have access to. This nonsmooth trends in 2011 may reflect the uncertainty involved with the change in the reporting procedure because the SIAB7514 data already involves an imputation of the reported part-time status in 2011 for observations with missing data, see Ludsteck and Thomsen (2016). Further note that it is difficult to pin down the trend in 2011 in light of the likely trend break in the evolution of wage inequality during the years 2010/2011 -based on the results reported in Möller (2016) and in this paper.
we contrast workers aged 25 to 34 and those aged 35 to 55. Specifically, we investigate whether wage trends for younger workers at the bottom of the wage distribution continued to fall strongly after 2004 and whether there was a reversal after 2010, while checking whether key results change after applying our correction. Figures 10 and 11 show the wage trends by age groups and gender, both based on the raw data and the corrected data.
Our findings show that cumulative wage growth at all percentiles is lower for younger workers than for older workers and that wage inequality within wage groups grows strongly over time (because wage growth at lower percentiles is lower than at higher percentiles). The effects of the correction are similar to what has been discussed above for the overall wage distribution. The correction reduces the wage growth after 2010, especially in the lower tail of the wage distribution and for women. The key findings are that there was a very strong fall of real wages for young workers until 2010, especially at lower percentiles with the 20th percentile falling by about 10 log points for women and 17 log points for men.
After 2010, there is a modest recovery of wages for both men and women except for the 20th percentiles for men in both age groups and the 20th percentile for older women. Incidentally, wages at the 20th percentile for young women grow in parallel to the other percentiles in that group. Our findings confirm the strong decline of real wages for young workers at low percentiles (as stressed by Antonczyk et al. (2018)) for men during a longer time period (until 2010). Further, there is little indication of a recovery after 2010. Clearly, these findings would deserve more scrutiny, which is, however, beyond the scope of this paper.

Conclusions
The Sample of Integrated Labor Market Biographies (SIAB) are based on German social security records which involve an indicator for part-time or full-time work. These data are widely used to analyze trends in wage inequality among full-time workers. The reporting procedure for the part-time indicator changed in 2011 with dramatic consequences on the share of reported part-time workers. This paper develops a refined correction procedure for this break and investigates the robustness of previous findings on the evolution of wage inequality in Germany. We argue that the full adjustment to the new reporting procedure was completed only in 2012 and therefore we also apply our correction approach to the data for 2011.
Our correction approach involves estimating the probability of being reported to work part-time as a function of the rank difference in the wage distribution among all employees (full-timers and parttimers) for all years 2000 to 2012. The full-time employment data before 2012 is then reweighted using inverse probability reweighting based on the estimated propensity scores. This approach detects and downweights observations which are likely to be misreported as full-time, which results in a continuous upward correction of low wage percentiles among full-timers. We plan to make the correction procedure available to all users of the SIAB data.
Using our correction, the paper confirms that the rise in wage inequality among full-time workers in West Germany until 2010 is not a spurious consequence of the misreporting of working time. Furthermore, based on our corrected data, we find that the fall in real wages among full-timers during the 2000s was strongest among young workers and there is in fact a trend reversal after 2010, as already observed by Möller (2016). While the raw wage data show strong wage growth for women after 2010, the correction shows that most of this growth is spurious. In fact, based on the corrected data, wage trends between 2010 and 2014 have contributed little to reverse the strong increase in wage inequality until 2010, a findings which holds in particular for low-wage earners among men. On a methodological note, our findings show the importance of correcting for the break in the part-time indicator when analyzing wage trends.
Future research should determine whether further key results in the literature on trends in wage inequality for West Germany (see e.g. Card et al., 2013;Biewen et al., 2018) are robust when applied to the corrected data. In addition, it will be of great interest to extend the analysis beyond the year 2014. From a methodological perspective, this will allow to estimate longer term trends for data      difference p50-p20 difference p80-p20 difference p80-p50, corrected difference p50-p20, corrected difference p80-p20, corrected Notes: Differences between the 80th and 50th, 50th and 20th and 80th and 20th percentile of the log-wage distribution for full-time employees, inflation adjusted, only former West-Germany (without Berlin), only employees aged 25 to 55, without marginally employed and apprentices, weighted by the length of employment spells. difference p50-p20 difference p80-p20 difference p80-p50, corrected difference p50-p20, corrected difference p80-p20, corrected Notes: Differences between the 80th and 50th, 50th and 20th and 80th and 20th percentile of the log-wage distribution for full-time employees, inflation adjusted, only former West-Germany (without Berlin), only employees aged 25 to 55, without marginally employed and apprentices, weighted by the length of employment spells.   age2 -0.001 -0.001 -0.000 -0.000 -0.000 (-3.9) (-5.9) (-2.5) (-3.2) (-3.1) ]. low education: intermediate high school degree after ten years, medium education: high school degree after at least twelve years or a vocational degree, high education: college degree. The regression also includes ten dummies for different occupation categories, 13 dummies for different industries, and ten dummies for the West German states. The reference category involves observations with missing values for education, occupation, industries, and states. The employment spells are weighted by their length. ]. low education: intermediate high school degree after ten years, medium education: high school degree after at least twelve years or a vocational degree, high education: college degree. The regression also includes ten dummies for different occupation categories, 13 dummies for different industries, and ten dummies for the West German states. The reference category involves observations with missing values for education, occupation, industries, and states. The employment spells are weighted by their length.