On the measurement of tasks: does expert data get it right?

Storm, Eduard

doi:10.1186/s12651-023-00332-z

Original Article
Open access
Published: 13 February 2023

On the measurement of tasks: does expert data get it right?

Eduard Storm ORCID: orcid.org/0000-0003-0781-5811¹

Journal for Labour Market Research volume 57, Article number: 6 (2023) Cite this article

2255 Accesses
3 Citations
Metrics details

Abstract

Using German survey and expert data on job tasks, this paper explores the presence of omitted-variable bias suspected in conventional task data derived from expert assessment. I show expert task data, which is expressed at the occupation-level, introduces omitted-variable bias in task returns on the order of 26–34%. Motivated by a theoretical framework, I argue this bias results from expert data ignoring individual heterogeneity rather than fundamental differences on the assessment of tasks between experts and workers. My findings have important implications for the interpretation of conventional task models as occupational task returns are overestimated. Moreover, a rigorous comparison of the statistical performance of various models offers guidance for future research regarding choice of task data and construction of task measures.

1 Introduction

A growing body of research has adopted the “task-approach” to labor markets (Autor 2013) that models the assignment of worker-specific skills to job tasks. This framework allows a more nuanced evaluation on the role of skills in the production function as worker’s skills are derived from comparative advantages in tasks. Most studies employing task data use information at the occupation-level, which is often based on external assessment by labor market experts. While widely used, this expert data may introduce measurement error attributed to (i) aggregated task data and (ii) misperception of experts on the importance of job tasks. The primary interest in the present paper is on the unit of dimension as expert data disregards heterogeneity within occupations.

Indeed, using survey data on job activities of US workers at the workplace, Autor and Handel (2013) contrast variation in tasks at the individual- and occupation-level and find worker-level information on tasks to be informative about wage differences not only between occupations, but also within. Similarly, Autor and Handel (2013) point out that individual job tasks differ within education and demographic groups. Cassidy (2017) and Rohrbach-Schmidt (2019) provide similar evidence in the German context, Storm (2022b) shows differences in task specialization within occupations between natives and foreigners, and de la Rica et al. (2020) in a cross-country setting using PIAAC data, suggesting within-occupation heterogeneity in tasks is not country-specific. Related evidence on dispersion of tasks within occupations can be found in Spitz-Oener (2006), Atalay et al. (2018), Atalay et al. (2020), Deming and Noray (2019), Modestino et al. (2019), and Stinebrickner et al. (2019). These papers highlight rich heterogeneity in tasks that is masked in conventional occupation-level data and rising dispersion of tasks within occupations over time.

Importantly, the existing empirical literature echoes the well-known difference in the unit of interest between survey and expert data. While the former emphasizes tasks performed at the workplace, the latter describes occupational characteristics (Autor 2013, Dengler et al. 2014). By focusing on the occupational dimension, expert data implicitly assumes workers within an occupation perform a common set of tasks. These conventional task models therefore ignore individual heterogeneity, giving rise to omitted-variable bias in estimated task returns.

While previous contributions on the heterogeneity of job tasks are convincing and important, neither of these studies explicitly measure the bias in task returns embedded in conventional task data. This information is important for practitioners, however, who often use task data on the grounds of theory of comparative advantage in tasks. In this paper I fill this gap by rigorously comparing the statistical properties of task models based on survey and expert data, respectively. This comparison allows me to test and quantify the presence of omitted-variable bias in task returns based on expert data. To this end, I make four contributions to the existing literature.

First, I find worker-level information on tasks is predictive of wage differences in all specifications and thus in line with prior research. Relative to performing manual tasks, I find a 1 pp. increase in abstract task intensity raises wages by 36–53%. Employing a sizable cross-section of more than 27,000 workers in Germany from 2012-18 with self-reported information on job tasks represents an improvement over the existing literature that either uses much smaller samples (Autor and Handel 2013, Rohrbach-Schmidt 2019) or older data (Cassidy 2017). Idiosyncratic differences in tasks are especially pronounced in models conditional on occupational fixed effects (FE), providing direct evidence on task specialization within occupations.

Second, I conduct formal tests of various task models. In this analysis, I compare the statistical performance of wage regressions comprising survey data and, respectively, expert data, provided by Dengler et al. (2014)—henceforth DMP.^{Footnote 1} Overall, baseline results suggest only minor statistical differences between survey- and expert task data. While goodness-of-fitness measures and information criteria favor models based on worker-level variation, expert data has more unique explanatory power. The broad statistical similarity likewise holds true for a comparison of occupation-level expert data with occupation-level task measures derived from survey data. Hence, assessment by labor market experts on the importance of job tasks does not appear to be fundamentally different from worker assessment.

Further robustness checks reverse some of the perceived benefits of expert data in baseline specifications, however. Instead, a majority of robustness tests support statistical superiority of individual-level task measures from survey data, especially with respect to its unique explanatory power. The preferred model uses survey data and conditions worker-level tasks with occupational FE. This specification explains about 20% of the wage variation not accounted for in conventional (Mincerian) wage regressions.

Third, I show the omitted-variable bias in task returns estimated with expert data ranges from 26 to 34%, depending on specification. In the baseline model, this bias is nearly 30% and most sensitive to assumptions in the construction of task measures. I conceptualize this omitted-variable bias in a theoretical framework in which wages are determined by an individual- and occupation-level task dimension. This model accounts for individual heterogeneity by highlighting the importance of task specialization within occupations. Since the best-performing specification does combine worker-level information on tasks with occupational FE, I view this theory supported by the data.

These findings have important implications for the interpretation of conventional task models. Economists often conceptualize the association between job tasks and wages with a Roy model in which comparative advantage governs occupational choices (Boehm et al. 2021; Cavaglia and Etheridge 2020; Cortes 2016; Yamaguchi 2012). Subsequently, workers receive occupation-level task returns in their chosen occupation. My findings suggest, however, these task returns are substantially inflated due to confounding with underlying individual heterogeneity. In this context, survey data has the advantage that it allows the researcher to aggregate individual responses at the occupation-level. Therefore, the researcher can account for task variation at the individual and occupation-level, thereby mitigating omitted variable bias with respect to task information.

Fourth, I present methodological guidance for practitioners seeking to work with task data. The robustness checks in this paper identify assumptions underlying the definition of tasks and occupations as key drivers of differences in statistical performance between survey and expert data. Researchers should therefore pay close attention to classification of tasks and occupations. Specifically, the bias in occupation-level task returns estimated with expert data is higher if (i) occupations are defined broadly (e.g., 2-digit level) and (ii) tasks are defined narrowly (e.g., five task groups). Moreover, specifications that use occupation-level task measures derived from aggregated survey responses display the worst statistical performance. The statistical discrepancies are overall negligible, but, compared to other specifications, these task measures lead to substantially larger point estimates for task returns. This finding warrants caution in the practice of linking aggregated task measures from survey data to other data sources.

2 Conceptual background on tasks and wages

In this section, I discuss the role of tasks in the process of wage determination and highlight potential origins of bias in task data. In general, the task approach allows the researcher to study skills based on observations on job tasks. As workers have different levels of skill, they will be differentially compensated depending on their ability to perform tasks on the job. Variation in observed tasks thus allows the researcher to draw conclusions about underlying skill differences.

To illustrate this idea, I follow Autor and Handel (2013) and let worker i be employed in occupation o in which she receives a wage w in return for performing J tasks. Subsequently, she combines these tasks to produce output according to^{Footnote 2}

$$\begin{aligned} Y_{io} = exp \bigg ( \alpha _{o} + \sum _{J} \lambda _{jo} T_{ij} + \mu _{i} \bigg ) \end{aligned}$$

(1)

where $T_{ij}$ denotes task j performed by i and $\lambda _{jo} \ge 0$ represents returns earned for performing task j in o, i.e., task returns are occupation-specific. The parameters $\alpha _{o}$ and $\mu _{i}$ reflect, respectively, an occupation-specific constant and worker-specific error term. Assuming she is being paid her marginal product, I write her log wage as

$$\begin{aligned} ln\ w_{i} = \alpha _{o} + \sum _{J} \lambda _{jo} T_{ij} + \mu _{i} \end{aligned}$$

(2)

This wage equation is identical to Autor and Handel (2013), implying i’s wage is determined by her individual job activities $T_{ij}$. Next, to conceptualize quality differences in labor, I expedite on the idea that employers hire workers with similar, but not identical, skills. To this end, I replace the generic constant $\alpha _{o}$ with occupation-level activities $T_{jo}$:

$$\begin{aligned} ln\ w_{i} = \sum _{J} \beta _{jo} T_{jo} + \sum _{J} \lambda _{jo} T_{ij} + \mu _{i} \end{aligned}$$

(3)

where $T_{jo}$ measures occupational skill requirements based on occupation-specific tasks and is described by the average task content among N workers employed in occupation o, i.e., $T_{jo}= \frac{1}{N} \sum _{i} T_{ij}$. These tasks are compensated with $\beta _{jo}$ and may differ from individual-level returns $\lambda _{jo}$. Eq. (3) formalizes that data containing worker-level information or, respectively, occupation-level information explain unique parts of the wage variation, as in Autor (2013), and illustrated by the two distinct parameters $\beta _{jo}$ and $\lambda _{jo}$ in above model.

Since occupational skill requirements apply to all workers in a given occupation, $T_{jo}$ gives rise to occupational sorting in spirit of Roy (1951). In these Roy-type models, a job is defined as an occupation and workers choose a job that maximizes their expected earnings. Viewing $T_{jo}$ as a representation of occupation-specific skill requirements therefore illustrates occupational sorting resulting from a set of core tasks needed to produce output.

Yet, this framework may be overly restrictive by assuming workers in occupation o perform the same set of tasks. If this were true, all variation in $T_{ij}$ would be entirely absorbed by occupation-level tasks $T_{jo}$. Otherwise, the implied equivalence will not hold and both task dimensions, $T_{ij}$ and $T_{jo}$, determine i’s wage.

The key departure in this model from a Roy-type framework is thus its degree of task specialization. I think of $T_{jo}$ as capturing occupational heterogeneity in spirit of Roy (1951), i.e., occupations compensate tasks differentially. In comparison, I interpret $T_{ij}$ as capturing individual heterogeneity, i.e., specialization in a subset of tasks is compensated differentially across individuals. In my model I conceptualize this insight on individual heterogeneity by allowing for individual task specialization within occupations.

$$\begin{aligned} \text {Omitted-variable Bias } \& \text { Relationship to Conventional Methods } \end{aligned}$$

While $T_{jo}$ is readily available, $T_{ij}$ is usually not available in the data. For this reason, researchers often rely on occupation-level task data that is derived from expert assessment and approximate the relationship between wages and tasks as follows:

$$\begin{aligned} ln\ w_{i} = \sum _{J} \beta _{jo} T_{jo} + \mu _{i} + \epsilon _{i} \end{aligned}$$

(4)

where $\epsilon _{i}$ represents a standard i.i.d. error term. This specification is closely related to Roy-type models by assuming the relationship between wages and tasks is sufficiently described by occupation-level tasks. In this paper, I test whether the assumptions embedded in expert task data lead to biased estimates in $\beta _{jo}$ by confounding occupation-level task returns with individual task specialization as a result of disregarding individual heterogeneity. To this end, I study the potential for omitted-variable bias. In order to fix ideas, assume the relationship between individual-level and occupation-level tasks follows:

$$\begin{aligned} T_{ij} = \delta _{j}T_{jo} + \nu _{i} \end{aligned}$$

(5)

where $\nu _{i}$ is an i.i.d. error term. I interpret $\delta$ as task pass-through. This parameter describes the responsiveness of individual activities to variation in occupation-level tasks. The model I propose allows for task specialization within occupations. At one extreme, a value of $\delta = 1$ implies perfect pass-through, i.e. variation at the occupation-level trickles down to the invidividual-level one-by-one. In contrast, a value of $\delta = 0$ implies no task specialization within occupations. Hence, $0< \delta < 1$ implies imperfect pass-through from task variation at the occupation- to the individual-level. Plugging Eq. (5) into (3) yields, after some rearranging, the following wage equation:

$$\begin{aligned} ln\ w_{i} = \sum _{J} (\beta _{jo} + \lambda _{jo} \delta _{j}) T_{jo} + (\epsilon _{i} + \lambda _{jo} \nu _{i}) \end{aligned}$$

(6)

This model highlights the classic omitted-variable bias, implying conventional regressions in spirit of Eq. (4) yield biased estimates of (occupation-level) task returns $\beta _{jo}$ unless (i) $\lambda _{jo} = 0$ or (ii) $\delta _{j} = 0$.

The first assumption (i) is likely not satisfied as workers self-select into occupations based on individual skills (Autor and Handel 2013). The second assumption (ii) captures individual heterogeneity via task pass-through from the occupation- to the individual-level. Pronounced task specialization within occupations implies workers in said occupation perform a different set of tasks, i.e., $\delta _{j} > 0$. In this case, assumption (ii) is likewise violated and occupation-level task returns based on conventional wage regressions, such as Eq. (6), are biased upwards.

3 Data

3.1 Data sources

3.1.1 Survey data

The first data source is a series of German employment surveys, assembled by the Federal Institute for Vocational Education (BIBB) and the Federal Institute of Occupational Safety and Health (BAuA), respectively, in 2011/2012 (Hall et al. (2020b), doi:10.7803/501.12.1.1.60) and 2017/2018 (Hall et al. (2020a), doi:10.7803/501.18.1.1.10). While interviews took place between October and March, I will refer to the surveys as 2012 and 2018 sample, respectively, for reasons of brevity. This data set establishes a repeated labor force cross-section on qualification and working conditions in Germany, covering 20,000 workers in each wave. See Rohrbach-Schmidt and Hall (2018) and Rohrbach-Schmidt and Hall (2013) for data manuals for each of the surveys used in this study.

Three key features make the BIBB/BAuA employment surveys suitable for the present study. First, workers self-report job-related activities. While the primary interest of expert-based data is on the occupational dimension, the unit of interest in survey data is the workplace (Dengler et al. 2014). Having data at the (aggregated) occupation- and (disaggregated) worker-level thus permits an analysis on the presence of omitted-variable bias as described in section (). Second, compared to other surveys with task information at the individual level, the BIBB/BAuA data offers a comparably sizable sample.^{Footnote 3} Third, each of the employment surveys provides information on income, allowing me to study the effects of individual variation in tasks on wages. Expert-based data by itself, on the other hand, must be combined with other data sources to infer wage implications. The key dependent variable of this paper is log hourly real wages, which I construct as follows. In the first step, I use information on monthly labor income stated by each worker individually. In the second step, I convert this income measure into real monthly income to adjust for purchasing power. To this end, I use data on the German Consumer Price Index (CPI), which is indexed CPI=100 as of 2015.^{Footnote 4} Third, I calculate hourly wage rate by dividing the real monthly income by individually stated weekly working hours times four (weeks). This way, I account for differences in working hours by, for example, gender and occupations.^{Footnote 5}

3.1.2 Expert data

The second data source is derived from the BERUFENET Database, a free online portal for occupations provided by the German Federal Employment Agency (BA). This database is a popular research tool for people seeking career guidance and exploring job placements. Occupations must offer legally regulated vocational training or must be sufficiently relevant to be included in the database and provide a rich set of occupation-specific information, including common tasks. Overall, the database comprises more than 10,000 narrowly-defined occupations (Matthes et al. 2008), however, only 3,900 of those occupations contain rich occupational information, such as tasks. This database is therefore conceptually similar to the frequently used O*NET data in the US. In comparison, however, O*NET comprises some 800 occupations that are part of the US Standard Occupational Classification (SOC) (Handel 2016).^{Footnote 6} Like its American counterpart O*NET, BERUFENET is not solely based on expert assessment but rather the result of a process. To this end, experts use descriptions on vocational training, analysis of vacancies, information from job seekers and employers, and input from various economic associations to describe occupations.^{Footnote 7}

At the core of BERUFENET is the requirement matrix, containing 8,000 skills that are assigned to occupations. This requirement matrix is used for career counseling and therefore continuously updated with monthly checks to identify new requirements and redundancies. In contrast, O*NET is not updated as often, reducing its usefulness for the analysis of of task changes over time (Autor 2013). On the flipside, O*NET and its predecessor, the Dictionary of Occupational Titles (DOT), have been used by social scientists for decades (Handel 2016), while the requirement matrix from BERUFENET has only been available since 2008, reducing its usefulness for long-term analysis to date.

This requirement matrix is the foundation for the database provided by DMP. The authors assign requirements to tasks following previous literature and implementing basic rules. I provide more details on this correspondence and differences to existing literature in Section 3.2 below. Using the data compiled by DMP, I gather information on the relative importance of occupation-level tasks and use this information as proxy for expert data on tasks.^{Footnote 8} The DMP data is especially useful for research on occupational skill requirements and has been widely used ever since its release, for instance in the context of substitution potentials of the digital transformation (Dengler and Matthes 2018), the “greening of jobs” (Janser 2018), labour market entry (Reinhold and Thomsen 2017), and labour market mismatch (Kracke et al. 2018; Kracke and Rodrigues 2020; Storm 2022a). Relatedly, see Christoph et al. (2020) for a comprehensive overview of relevant occupation-based measures for labour market research, including BERUFENET.

3.1.3 Combined data

The key variables are tasks performed on the job. DMP use information on occupational requirements from 2011-2013 for their classification. To broadly match this time horizon, I use survey data from 2012 and 2018. I average out task information across all years to enhance statistical precision and merge both data sources via occupational identifiers. This approach moreover avoids a key drawback of the BIBB/BAuA data in the task context, as this data was never intended to operationalise tasks and faced changing survey mode and questionnaires over time (Rohrbach-Schmidt and Tiemann 2013). The data I use in the present study is nearly identical in terms of information on tasks, alleviating this type of measurement error. Occupations are measured in terms of the 3-digit definition of the official BA Classification of Occupations, issue 2010 (KldB 2010). This classification scheme has a high degree of compatibility with the International Standard Classification of Occupations 2008 (ISCO-08), thus making it comparable with international classifications.

A few disadvantages of the BIBB/BAuA data remain, however. Notably, the data is not representative of the entire workforce. For instance, only workers with a sufficient command of the German language are asked to participate, favoring the native workforce disproportionately. Due to non-random sorting of native and foreign workers into occupations (Peri and Sparber 2009; Storm 2022b), occupations and their composition are thus not representative. However, this data limitation is unlikely to affect my main analysis. In Table 1 I compare employment shares for 2-digit occupation using BIBB/BAuA data, collected in 2011–12 and 2017–18, and data from the German Socioeconomic Panel (SOEP) from the 2011, 2012, 2017, and 2018 surveys. The SOEP data (Liebig et al. (2021), doi:10.5684/soep.core.v36eu) is a representative, multi-cohort household survey with a large sample size that has been running since 1984 and is widely used in labor market research, especially in the German context.^{Footnote 9} This comparison suggests employment shares in both surveys are broadly similar, alleviating concerns regarding non-random sorting in the BIBB/BAuA data.^{Footnote 10}

Table 1 Workforce composition in German survey data: BIBB/BAuA versus SOEP

On the measurement of tasks: does expert data get it right?

Abstract

1 Introduction

2 Conceptual background on tasks and wages

3 Data

3.1 Data sources

3.1.1 Survey data

3.1.2 Expert data

3.1.3 Combined data

3.2 Task construction

3.3 Sample selection & summary statistics

4 Empirical analysis

4.1 Methodology

4.2 Results

4.2.1 Task returns: survey vs expert task data

4.2.2 Statistical performance: survey vs expert task data

5 Robustness

5.1 Robustness tests

5.2 Robustness results

6 Discussion of results

7 Conclusions

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL classification