Measuring the effect of gender segregation on the gender gap in time-related underemployment

This paper focuses on the impact that gender segregation in the labour market exerts on the underemployment gender gap for young adult workers in Spain. In order to analyse the relative importance of segregation in this gap, we develop a methodology based on two counterfactual simulations that provides a detailed decomposition of the gap into endowments and coefficients effects as well as the interaction of these effects. To the best of our knowl‐ edge, we are the first to perform a decomposition using bivariate probit models with sample selection. Using annual samples of the Spanish Labour Force Survey 2006–2016, the results show that working in female‐dominated occupa‐ tions or industries hinders working as many hours as desired, especially for women. Furthermore, we conclude that the gender gap in underemployment is mainly due to the different distribution of male and female workers across occupations and industries. Additionally, the different impact by gender that working in the same gender‐typing jobs exerts on the risk of underemployment contributes to widening the gap.


Introduction
Time-related underemployment, which refers to those workers who would like to work more hours than available, is a persistent problem in labour markets and the Spanish one is no exception. Moreover, this problem increased during the Great Recession in many countries (Bell and Blanchflower 2013;Acosta-Ballesteros et al. 2018).
Furthermore, women experience this situation more often than men do (Weststar 2011;Kjeldstad and Nymoen 2012a, b;Vuluku et al. 2013;Acosta-Ballesteros et al. 2018), therefore, we must pay attention to the reasons for this gender gap. Particularly, occupational and industry segregation may be an important factor, as suggested by Barret and Doiron (2001), since a higher underemployment rate has been linked to female-dominated occupations and industries (Kjeldstad and Nymoen 2012a, Kjeldstad and Nymoen, 2012b, and Kamerāde and Richardson, 2018. Moreover, as Spain has experienced higher levels of segregation than other European countries in a persistent way (Iglesias-Fernández et al. 2012), analysing the effect of segregation on the underemployment gender gap in the Spanish labour market emerges as an interesting research issue.
To the best of our knowledge, only Vuluku et al. (2013) have tried to explain the underemployment gender gap, but they did not include any occupational and industry segregation indicators in their study. Thus, to overcome this shortcoming in the literature, the main objective of this article is to carry out an in-depth analysis of the underemployment gender gap. Specifically, we intend to test whether segregation plays an important role in explaining it, as well as to quantify how much is due to men and women working in different industries and occupations, and how much is due to men and women facing different underemployment risks when they work in the same gender-typing jobs.
To do this, first, we quantify the effect of occupational and industry segregation on workers' underemployment risk using a detailed measure of gender segregation. Estimating this impact through bivariate models with selection enables us to handle the potential sample selection bias due to estimating the probability of underemployment just for employed people. Second, as we are not aware of a methodology that allows decomposing a gap using this kind of model, we develop one that is inspired by the Fairlie technique (1999the Fairlie technique ( , 2005the Fairlie technique ( , and 2017). It is based on two counterfactual simulations that provide a detailed decomposition of the gap into effects due to workers having different characteristics, effects due to these characteristics having different returns, and the interaction of both these effects.
We focus on young workers because this collective is especially affected by underemployment. Data from the Spanish Labour Force Survey (LFS) indicate that in 2017 the underemployment rate for workers under 35 was 14.7%, while the figure was 8.5% for workers older than 34 years old. Additionally, by looking at people right at the beginning of their careers we can avoid many of the cumulative advantages/disadvantages that people may have experienced throughout their careers. Thus, focusing on young workers allows us to have a current view of underemployment patterns and of the gender gap in it, avoiding possible gender differences from the past. This paper is organised as follows. It begins with the conceptual framework on time-related underemployment and puts forward our working hypotheses. The next section describes the methodological approach used. Then, the data and variables used in the econometric model are presented. This is followed by the results, while a discussion of these findings is provided at the end.

Conceptual framework and hypotheses
According to neoclassical theory, individuals can choose their working hours freely from a continuous time distribution; these hours are chosen by maximising a utility function subject to a particular budget constraint. Nevertheless, employers and trade unions' decisions, the degree of labour mobility and economic conditions determine the actual hours offered to employees (Simic 2002). Therefore, workers' preferred and actual hours may not coincide, so some individuals will work either more (overemployed) or less (underemployed) than they want. Thus, time-related underemployment means that some employed people would like to work more hours than available.
The demographic and job factors that determine timerelated underemployment have been previously analysed in the literature (Hakim 1997;Weststar 2011;Prause and Dooley 2011;McKee-Ryan and Harvey 2011;Nymoen 2012a, 2012b;Wilkins 2006;Acosta-Ballesteros et al. 2018). Particularly, significant differences in underemployment have been found across occupations and industries Nymoen 2012a, 2012b;Valletta et al. 2016). However, very few studies have linked these differences to occupational and industry gender segregation in the labour market. Thus, Kjeldstad and Nymoen (2012a) and Kjeldstad and Nymoen (2012b) find a higher underemployment risk in those occupations and sectors that are traditionally female-dominated, with a stronger effect for men, although they do not include specific variables for segregation in their econometric model. Kamerāde and Richardson (2018) consider segregation measures in their analysis, and they also find a higher likelihood of underemployment in female-dominated occupations; however, this effect is not so clear across industries. Additionally, Dueñas-Fernández et al. (2016), who do not focus specifically on this issue, analyse involuntary part-time work in Spain and find that segregation, especially occupational, is strongly related to part-time work (particularly for women).
There are several reasons that explain higher underemployment risks in female-dominated occupations and industries. In this sense, as Kamerāde and Richardson (2018) point out, women are mainly employed in labour intensive jobs where employers can change the number of hours their employees work to adapt to fluctuations in demand. Therefore, part-time or short-schedule jobs are more likely to be found in female-dominated occupations, which may lead to underemployment. Moreover, female-dominated occupations usually require low qualifications. In addition, female workers tend to cluster in industries that offer comparatively low payment for the same level of qualification, such as in education, health and social work activities (Boll et al. 2016). This pattern also translates into a frequent desire to work more hours. Conversely, male-dominated occupations are typically characterised by better-paid jobs and are usually related to more stable, full-time contracts (Hegewisch et al. 2010). Furthermore, male workers are overrepresented in industries that offer high rewards for the same level of qualification (particularly manufacturing). Therefore, male-dominated occupations and industries often lead to low underemployment rates.
Despite these arguments suggesting the important role that occupational and industry segregation plays in the likelihood of underemployment, accurate estimates of its impact have not been achieved in the aforementioned research. In this article, we overcome this shortcoming using a more suitable estimation strategy. Specifically, we propose and test the following hypothesis: Hypothesis 1: working in female-dominated occupations and industries implies a higher probability of time-related underemployment than being employed in male-dominated ones, both for men and women.
As women face a higher risk of underemployment than men do, there is a gender gap regarding this handicap and occupational and industry segregation may have an important impact on it. Furthermore, this effect may be partially due to the uneven distribution of men and women across different jobs, as suggested by Barrett and Doiron (2001). Additionally, differences in the returns that working in female or male-dominated jobs imply should also be considered. These authors, as a simple exercise, give women the average male distribution across occupations and industries and conclude that the main reason that explains women being involuntary parttimers more often than men is simply being employed in different industries and occupations.
Interestingly, previous research highlights the fact that men may benefit from their minority status in femaledominated jobs in several ways (Simpson 2004). In this sense, as reviewed in Lupton (2006), men progress more quickly than women do to senior positions avoiding the problem of the "glass ceiling" inherent in vertical segregation. Additionally, men may be channelled into certain specialties in occupations that are regarded as more appropriate to their gender. As a third advantage, men are paid more than women are in female-dominated occupations (Torre 2018). By contrast, women may face negative outcomes in male-dominated jobs (Simpson 1997(Simpson , 2000. Thus, for example, as Martin and Barnard (2013) find, formal and covert organisational practices, which maintain gender discrimination and bias, are the main challenges that women face. These arguments may also apply regarding underemployment, so female workers may face a higher risk of underemployment than men both in female and male-dominated jobs.
Nevertheless, to the best of our knowledge, only Vuluku et al. (2013) have tried to identify the reasons behind the underemployment gender gap, though they do not include any measure of gender segregation in their analysis and use univariate models, which can lead to biased estimations. We fill this gap in the literature using a new methodology that allows us to propose and test the following hypotheses:

Methodology
As a first step, to analyse the effect of occupational and industry segregation on time-related underemployment, we estimate two bivariate probit selection models (Greene 2012), one for men and another for women. These models enable us to handle the potential sample selection bias due to estimating the probability of underemployment just for employed people, as Acosta- Ballesteros et al. (2018) have already shown.
Let us define y * 1 and y * 2 as the latent variables reflecting the likelihood of being underemployed and employed, respectively. Thus, the model can be specified as follows: with (y i1 , x i1 ) observed only when y i2 = 1. In these equations, y i1 indicates if worker i is underemployed and y i2 if the individual is employed; row vector x i1 contains the variables explaining underemployment; x i2 reflects the variables determining employment. As usual, the independent variables that have a qualitative nature are included in the model as dummy variables or as groups of them. Finally, ε i1 and ε i2 are the error terms, which follow a bivariate normal distribution with mean zero, variance equal to 1 and covariance ρ To test if working in female-dominated occupations and industries implies a higher probability of underemployment than working in male-dominated ones (Hypothesis 1), we analyse the estimated marginal effects of occupational and industry segregation on the probability of underemployment. Since the model is bivariate with selection, these partial effects (like those regarding the rest of variables) are obtained using the conditional probability of underemployment given employment. In addition, the marginal effects on the probability of employment are computed using the selection equation. 1 To simplify notation, we redefine the variables and coefficients in Eqs. (1) and (2) as follows. The variables (1) considered in both equations for individual i are gathered in x i , which is a row vector including vectors x i1 and x i2 . Additionally, vector β 1 contains the estimated values for γ 1 ( γ 1 ) and takes value zero for those variables in x i2 which are not included in x i1 . In a similar way, β 2 includes components equal to zero for those variables considered in Eq.
(1) but not in Eq.
(2). Thus, x i β 1 ≡ x i1 γ 1 and x i β 2 ≡ x i2 γ 2 . According to this notation, the estimated probability of being underemployed conditioned to being employed for individual i is: where Φ is the cumulative standard normal distribution function and BVN is the joint cumulative distribution of the bivariate normal. Superscript j refers to men (M) or women (W).
As stated above, our main objective is to identify the most relevant factors explaining the gender gap in underemployment and, more specifically, to test if gender segregation accounts for an important portion of it. To achieve this goal and test Hypotheses 2a and 2b, a detailed decomposition of the gap is required.
The traditional Oaxaca-Blinder two-fold decomposition (Blinder, 1973 andOaxaca, 1973) of the gap into endowments (portion of the gap due to group differences in observable characteristics) and coefficients effects (the "unexplained" portion of the gap) cannot be applied because our model is not linear. Previous research (Even and Macpherson, 1990;Doiron and Riddell, 1994;Fairlie 1999Fairlie , 2005Fairlie , 2017Yun, 2004Yun, , 2008Powers et al., 2011;and Bazen et al., 2017) has decomposed the gap in probit and logit models, with the Fairlie and Yun techniques being the two most widely applied. However, as we estimate a nonlinear model with two equations, we develop a new procedure to decompose the gap, which extends the Fairlie technique to this kind of model. We have chosen the Fairlie approach as our starting point because it uses a non-linear function to obtain the gap decomposition, while in the Yun procedure, the curvature of the corresponding function is not considered.
According to the Fairlie technique, the contribution of each observable variable to the explained portion of the gap is equal to the change in the average predicted probability from replacing (for instance) the female distribution with the male distribution of that variable (keeping constant the rest). The procedure he proposed is matching one-to-one individuals in the female and male subsamples and switching the distributions of variables sequentially from a woman to a man. Nevertheless, the order of switching is potentially important because in non-linear models, the independent contribution of one variable to the gap depends on the value of the other variables, which may imply a path dependence problem. Moreover, Fairlie methodology does not identify the coefficients effect corresponding to a specific variable, 2 which is required to test Hypothesis 2b.
The aggregate decomposition in our methodology, which is a direct extension of Fairlie's, is defined by Eqs. (4) to (6), where E W reflects the endowments effect using as weights women's coefficients, and C M quantifies the coefficients effect using as weights men's characteristics: Summations in (5) and (6) are across the subsample of the employed, as we decompose differences in the average predicted probabilities of being underemployed conditioned to being employed. Thus, N W and N M indicate the sample size for employed women and men, respectively.
An alternative decomposition (Eq. 7) with each component evaluated using as weights the other gender coefficients or endowments is also possible. However, we do not define and explain it here because it is symmetric to this one.
To obtain a detailed decomposition, we develop a methodology based on two counterfactual simulations that identify the contribution of each variable to both E W and C M . These simulations can be used together to approximate the total impact of a specific variable on the underemployment gender gap.
The first one provides a detailed decomposition of the endowments effect and has been designed for discrete variables 3 (as most of the variables in the labour market It is inspired by Fairlie, who pointed out that a potential solution to the path dependence problem "is to estimate each contribution by switching the variable of interest first" (Fairlie 2005, page 313), as our method does. Specifically, we calculate the contribution of a single variable k as the change in women's average conditional probability of underemployment resulting from switching women from the categories where they are over-represented to those where they are under-represented. This procedure is carried out until women's relative frequencies across the categories of k are equal to men's ones. The selection of women who are switched is random, so the procedure is repeated 50 times to ensure consistency, and then the results are averaged. 4 As the changes described affect 10% of the observations or less for most variables, the change in the probability of underemployment is due to a relatively small change in the data.
Specifically, the contribution of a single variable k, denoted as θ W E (k) , can be computed as described in Eq. (8): where x W →M i (k) contains the same information as x W i but variable k has been modified as described 5 and k = 1,2…n, where n refers to the number of categorical variables included in x i . The sum of the individual contributions of all the variables does not exactly equal the endowments effect. Thus, the summing up property, which the method proposed by Fairlie has, does not satisfy. So we can write: An approximation error ( D W E ) emerges because the endowments effect ( E W ) in the aggregate decomposition is computed by switching all the variables simultaneously. Conversely, in our simulation, we switch only one variable at a time. As the conditional probability of underemployment is not linear, both results are slightly different. Although when both expressions, E W and n k=1 θ W E (k) , are linearised they coincide, a disparity emerges from the differences in Taylor expansion remainders (see Appendix 1 in Additional file 1).
To calculate the detailed decomposition of the coefficients effect, we propose a second counterfactual simulation following a similar procedure to that used in the first one. Oaxaca and Ransom (1999) show that this decomposition is destined to suffer from an identification problem, since the detailed coefficients effect attributed to dummy or categorical variables is not invariant to the choice of reference groups. Gardeazábal and Ugidos (2004) and Yun (2005) propose methods to solve this problem. Despite being widely used, these approaches show some limitations. 6 Thus, we use the grand-mean method that Kim (2013) proposes. This method appears to be a good option for analyses regarding labour market outcomes because it accurately estimates the extent to which each variable contributes to the group differences. Additionally, it gives a meaning to the intercept term and to the coefficient component of each dummy variable.
Specifically, we calculate the coefficient effect related to a specific variable k,θ M β (k), k = 1, . . . n , as the change in men's average conditional probability of underemployment if the parameter of a specific characteristic were that of women. Additionally, it is necessary to include the change corresponding to parameter ρ,θ M ρ . These effects are described in Eqs. (10) and (11) (10) This number of repetitions was selected after an analysis of sensitivity. We decided to choose 50 because the average difference in the results found with respect to using 200 was around 10 −5 and the standard errors could be computed in a reasonable time. 5 Note that each discrete variable k is included as a set of dummies in x i . 6 According to Fortin et al. (2011) and Kim (2013), these normalizations have several limitations: they may leave the estimation and decomposition without a simple meaningful interpretation; they will likely be sample specific and make comparisons across studies impossible; and they are sensitive to the number of categories and to the grouping method. 7 The estimated coefficients are transformed by subtracting from each of them the grand-mean weighted sum of the coefficients of each vari- to the intercepts in order to transform them.
Again, summing the individual contributions of the variables does not exactly equal the coefficients effect, C M . An approximation error ( D M C ) emerges for the same reasons already explained (see Appendix 2 in Additional file 1). Thus, we can write 8 : Since our detailed decomposition of the gap is the sum of both expressions (E W and C M ), the approximation errors imply that the sum of individual contributions of all the variables does not equal the gap. To assess the magnitude of this disparity, in the Results section, we display the approximation errors of our decomposition.
Despite our decomposition of the gap not being exact, it provides technical advantages compared to Fairlie decomposition procedure, as well as being applicable to a bivariate probit model. Thus, its economic interpretation is straightforward, and it avoids the path dependence problem, since it always uses the same starting point, real women (or men) in the sample, and only one characteristic is modified. Moreover, our approach does not require a one-to-one matching of individuals, since we replicate the distribution of each specific variable and the number of women who have a specific characteristic changed is just those strictly necessary, so we keep almost real individuals. Conversely, in Fairlie decomposition technique, each woman is randomly matched with a man in the sample, and she takes his characteristics sequentially until she becomes that man. As the sequential change of characteristics is made, it is likely that the remaining combination of characteristics will be unreal. In addition, our methodology offers a simulation that allows us to approximate a detailed decomposition of the coefficients effect.
Even though we could test our hypotheses using a twofold decomposition, it is increasingly common in the literature to use a three-fold one (Daymont and Andrisani, 1984), which has the advantage that endowments and coefficients effects are computed from the same starting point, so they are more easily interpreted. This three-fold decomposition when the starting point is women 9 can be easily obtained from Eqs. (4) or (7) and can be expressed as: The new term, C M − C W = E M − E W , can be interpreted as an interaction component that indicates the portion of the gap that occurs when both endowments and coefficients change simultaneously. Alternatively, it is the portion of the gap that remains after controlling for the endowments and coefficients effects. This interaction component is more difficult to interpret than the first two and is often disregarded. However, we believe, as Etezady et al. (2021), that neglecting it provides a substantially incomplete picture of the total influences of endowments and coefficients to the gap. Thus, our analysis is based on Eq. (14).
Some final comments regarding our methodology need to be pointed out. First, to obtain the standard errors for the results of both counterfactual simulations, which are necessary to test if the corresponding changes in the probability of underemployment are statistically significant, Krinsky and Robb's (1986) method has been applied, 10 as Dowd et al. (2014) explain.
Second, the survey structure of our data has been taken into account in the methodology. Thus, the bivariate probit selection models have been estimated considering sample weights and cluster-robust standard errors. Additionally, the sample weights have been considered in both counterfactual simulations by replicating each observation according to its weight. 11 Third, our methodology is displayed for bivariate probit models with sample selection, but it can also be easily applied to single equation models like the probit or logit ones. This fact allows us to carry out some robustness analyses. Thus, we specify univariate probit models to explain underemployment and we obtain the three-fold decomposition of the gap. These results are compared to Equations 13 and 14 can also be proposed for men. 10 In studies based on survey data not only the outcome variable but also the predictors are subject to sampling variation (Jann 2008). It implies that the standard errors may be underestimated, especially those regarding the endowment component. However, the results of the models shown in Additional file 2 seem to indicate that this is not the case, since the standard errors estimated using our methodology are very similar to those obtained using the  This process is required in the first simulation in order to switch the value of a specific variable from those categories where women (men) are over-represented. In the second simulation, weighting each observation according to its raising factor is enough. those obtained using Fairlie and Yun methodologies for a probit model and to those achieved using Oaxaca-Blinder technique for a linear model. Table S1 in Additional file 2 shows these results, which are similar to those obtained with our methodology.

Data and variables
In this article, we use the definition of time-related underemployment directly provided by the Spanish Statistical Office. Specifically, the criteria applied in the Spanish LFS to classify workers as underemployed (in line with the International Labor Organization Bureau of Statistics recommendations) are: they would like to work more hours, they are available to do so, and they work less than the usual weekly hours of full-timers in their industry. Thus, underemployment is a more accurate indicator of labour underutilization than involuntary part-time employment. It reflects non-desired workdays for all types of workers, capturing the preference of both part-timers and full-timers to have longer workdays.
The data used come from the 2006-2016 annual samples of the Spanish LFS. 12 Therefore, our database is a pool of cross-sectional annual observations, since each individual is included only once in the annual sample. Our sample contains young people aged 16 to 34 who were active. The few individuals with inconsistent answers or who do not provide the necessary information for the analysis have been removed. The final sample includes 70,445 women: 73.6% are employed, and among them, 16.5% are underemployed. The corresponding figures for the male subsample are 80,962, 75.1% and 11.8%, respectively.
The independent variables included in the econometric analysis 13 (displayed in Table 1) reflect the main factors previously found to determine underemployment. In order to classify occupations and industries as genderdominated or integrated, we follow the relative concept of Anker (1998). Thus, the dividing line between genderdominated and integrated occupations (or industries) is established in relation to the average percentage of female workers in the labour force as a whole (44% over the period analysed). Specifically, we consider femaledominated occupations or industries are those having more than 1.25 times the mean percentage female, while male-dominated ones are those having less than 0.75 times the mean percentage female. If the percentage of women is between both limits, the occupation or industry is labelled as gender-integrated. Applying this criterion, we obtain a band similar to the one in Hakim (1998), where gender-integrated occupations are characterised by a proportion of women ten percentage points around the percentage of women in total employment.
Our gender segregation measures have been computed using the three-digit codes from both occupations (according to the National Classification of Occupations, 1994 and 2011) and industries (National Classification of Economic Activities, 1993 and 2009). However, when the number of people working in a certain occupation or industry is less than 50, segregation has been defined according to two-digit or one-digit codes. 14 Additionally, given the methodological change in both classifications, it has been necessary to calculate the value of each segregation variable for two different sub-periods. As both gender segregation measures are correlated, we have solved the collinearity problem by defining an interaction variable with nine categories that integrates both occupational and industry segregation.
As education plays an important role in the risk of underemployment (Acosta-Ballesteros et al. 2018), we define 43 educational categories using the information provided by the LFS on education level and field of study, and according to the National Classification of Education (2000 and 2014). Ten specializations for vocational training and university degrees are distinguished. 15 Moreover, whether workers took longer than usual in completing their studies is also considered.
The remaining explanatory variables include nationality, having children under 16, and some additional regressors reflecting household composition; whether the individual is enrolled in formal studies is also taken into account. We also consider professional status (selfemployed or employed in the public or private sector with a fixed-term or permanent contract), the size of the firm, having a recent job (tenure up to 12 months and depending on the worker's age), the unemployment rate by gender in the Autonomous Regions, 16 as well as a dummy variable that takes value one if the observation corresponds to the period after the labour reform of 2012. 17 12 The accuracy of the results in this paper using these data is our sole responsibility. 13 The frequencies of the independent variables are provided in Table S2 in  Additional file 2. 14 This fact only occurs in a few occupations (industries) that account for 0.15% (0.89%) of workers in our sample. 15 Before the Bologna Process, the Spanish education system distinguished short-cycle (three years) and long-cycle (more than three years, usually five) university degrees. The new degrees under the European Higher Education Area are included as short-cycle programmes. 16 This is the only continuous variable in the model. As it is already defined by gender, it is not necessary to develop the procedure explained above. Thus, in the first simulation, the change in this characteristic has been carried out by simply attributing each woman the unemployment rate she would face if she were a man (and vice-versa). 17 This reform, among other measures, allows firms to reduce the working hours of their employees more easily than before, and may partially explain the relatively high underemployment rate observed since 2012 in Spain.  Finally, the employment equation includes variables as educational attainment, taking longer than usual to graduate, age, household composition, enrolled in school, the area of residence (which reflects the general conditions of local demand for work) and the year when the worker was interviewed.

Marginal effects
In this section, we present the results of the bivariate probit selection models for both subsamples, men and women. The estimations indicate that the rho coefficients are 0.207 (p-value = 0.01964) and 0.361 (p-value = 0.0000), The year when the survey was carried out is controlled for, but not reported Under: underemployed. Employ: employed a Unemployment rate by gender and year in the Autonomous Region of residence * p < .10. **p < .05. ***p < .01  respectively, justifying the use of these models to handle the sample selection bias due to estimating the underemployment probability just for employed people. The marginal effects of the explanatory variables have been computed from the estimated coefficients and are displayed in Table 1. The ones corresponding to the employment equation (Columns 2 and 4) follow the same direction as in previous studies.
The results obtained for underemployment are shown in Columns 1 and 3. Focusing on the role of gender segregation, 18 we observe that workers in female-dominated occupations are those with the highest probability of underemployment, with a larger impact for women. Specifically, we find that young women working in femaledominated occupations are 4.6 percentage points more likely to be underemployed than those in male-dominated occupations; the corresponding increase in the male subsample is 2.8 points. When segregation is defined in relation to activity sectors, our results indicate that women in female-dominated industries also show the highest likelihood of underemployment (an increase of 4.5 percentage points). Men in female-dominated activities are also more often underemployed than in male-dominated ones. However, men in gender-balanced industries are the most prone to suffer this handicap. Altogether, these results support Hypothesis 1, confirming a higher underemployment risk associated with working in femaledominated jobs than in male-dominated ones.
The marginal effects of the remaining explanatory variables are not discussed due to space constraints. We only want to briefly point out some of the results regarding educational attainment, due to the relevance of this variable in almost any labour market outcome. The figures in Table 1 suggest that having a long-cycle university degree in almost any field reduces underemployment. Overall, business, administration and law, and information and communication technologies (ICTs) seem to be the best specializations at most education levels. Additionally, it is worth noting that education means greater differences in the risk of being underemployed for women than for men. Some additional comments are included in the sensitivity section.

Analysis of the underemployment gender gap
According to the results of the bivariate probit models, women's estimated conditional probability of underemployment is 0.163; the corresponding figure for men is 0.124. Therefore, young female workers in Spain are 1.31 times more likely of being underemployed than their male peers. Thus, the underemployment gender gap (−0.039) is not negligible.
As stated above, the main objective of this article is to carry out an in-depth analysis of this gap and to determine the impact of gender segregation on it. To do this, we have carried out a detailed three-fold decomposition of the gap obtained from Eq. (14) and the results are shown in Table 2.
According to the last row in Table 2, the overall endowments effect, E W , is very important and seems to explain the underemployment gender gap (130.8%), while the coefficient effects, C W , is not significant. Moreover, the interaction term, which is positive (0.0126), reduces the gap (− 32.7%).
Regarding the detailed decomposition of the gap, the sum of the individual contributions of all the variables to each component of the gap shows that the errors terms from our counterfactual simulations are small and not statistically significant. These results validate our approach.
The results in columns 1-3 allow us to conclude that the different distribution of men and women across jobs is crucial to explain the gender gap, as stated in Hypothesis 2a. Thus, if women were distributed across occupations and industries as men are (maintaining female coefficients), the gap would be reduced to the greatest extent (− 0.0453). This result suggests that gender segregation leads to a kind of discrimination against women. Indeed, it cannot be argued that women work more frequently in certain (female-dominated) jobs because they prefer shorter work-schedules. Conversely, their larger risk of underemployment, which means they would like to work more hours than available more often than their male peers, is mainly due to the kind of jobs they work in.
Focusing on the coefficients effect (Columns 4-6 in Table 2), the contribution of the intercept term of the underemployment equation (Intercept 1) quantifies the extent to which women are, on average, treated differently to men. This contribution can be interpreted as the average extent of discrimination (Kim 2013). Our results indicate that the contribution of Intercept 1 to the gap is small and not significant, so women and men would face the same average risk of underemployment. However, the coefficient effect of gender segregation indicates an important deviation from the mean discriminatory level. Specifically, the gender gap in underemployment would reduce (− 0.0146) if women had men's returns in the same gender-typing jobs (maintaining female characteristics). Therefore, the different impact on underemployment that working in certain occupations and industries exerts on the risk of experiencing it contributes to widening the gender gap, supporting our Hypothesis 2b. In fact, this is the most important contribution to the coefficient Measuring the effect of gender segregation on the gender gap in time-related underemployment effect affecting the underemployment gender gap, reinforcing our previous finding regarding the discriminating effect of segregation against women.
The interaction term captures the effect of changing endowments and coefficients simultaneously. The positive sign of this portion suggests that the risk of underemployment in female-dominated jobs, where women are indeed highly represented, is significantly larger for women than for men. Therefore, this interaction term reflects that, once the returns have been changed from women to men, the additional contribution of distributing women across jobs as men is only − 0.0236 (which is 0.0217 smaller than the contribution of segregation to the endowments effects).
Regarding the total impact of each individual variable to the underemployment gender gap, figures in the last column in Table 2 show gender segregation is the most important one explaining it (98.6%). Moreover, gender differences in age also widen the gap, explaining 19% of it. A similar conclusion is obtained for professional status (16.8%). Conversely, educational attainment reduces the underemployment gap (− 17%).

Sensitivity analysis
In this subsection, we carry out different sensitivity analyses to check the robustness of our results. Table 3 shows the estimated gender gap in underemployment and the specific contribution of occupational and industry segregation to it, according to Eq. (14).
In Model I, we re-estimate our baseline model using an alternative definition of gender segregation. Specifically, we have re-defined integrated occupations and industries as those where the female percentage is between 0.5 and 1.5 times the average female share of employment in the labour force. The following two models consider gender segregation only in occupations (Model II) or industries (Model III) to analyse their separate effects. Thus, we can check if our results change when no threshold is established to classify occupations and industries as gender-dominated or integrated. Additionally, the baseline model is re-estimated splitting the original sample into two subsamples: workers with tertiary education (Model IV) and those without (Model V). Therefore, taking into account the marginal effects already analysed, we can test our hypotheses for more homogeneous groups of education. Model VI is the same as the benchmark model but includes inactive people together with unemployed ones when estimating the employment equation. Model VII includes the same independent variables as the baseline model but only involuntary part-time workers are considered underemployed, as is often the case in the literature. The last three models include some methodological changes. In Model VIII, we adjust the variable gender segregation last instead of first to check if the order of switching affects the results. As Heckman-type selection models are mostly identified by assumptions about error distributions, in Model IX we use Inverse Probability Weighting (IPW) based on a probit model as an alternative technique to address sample selection problems (see Seaman and White 2013 for a review). 19 Finally, Model X is estimated using the methodology proposed in Gardeazábal and Ugidos (2004) to address the identification problem that arises for categorical variables (instead of Kim's approach).
In general terms, the results in the first column of Table 3 are quite similar to those from the main analysis and few differences can be found. In particular, the largest estimated underemployment gender gap is found for workers without tertiary education (Model V) and when only involuntary part-timers are classified as underemployed (Model VII). 20 When we carry out the detailed three-fold decomposition, our main conclusions remain unchanged.
As expected, in Models II and III the three effects are smaller than those obtained in the baseline model. It is noteworthy that the portion of the gap explained by segregation is larger when we consider only occupational segregation. Some differences are also found if only involuntary part-time workers are considered underemployed (Model VII). This decision implies estimating our models using an alternative endogenous variable, so the results obtained are likely to change significantly. However, the different distribution of men and women across occupations and industries is what still drives the gap.
The most interesting results are those obtained for workers with different levels of education (Models IV and V). It is worth noting that the estimated underemployment gap is larger for workers without tertiary education, as might be expected. Gender segregation, however, is the factor that mainly explains the gap regardless of workers' educational level, explaining almost the same percentage of it in both subsamples. Moreover, the different distribution of men and women across jobs widens the gap to a larger extent for less educated workers than for more educated ones. Additionally, only in the sample of workers without tertiary education, the different returns associated with women and men widen the gap. Thus, we can conclude that tertiary education reduces the gap not only because women and men are more similarly distributed across jobs (in comparison with less educated workers), but also because working in the same gender-typing jobs leads to a similar risk of underemployment for male and female.
Regarding the last three models, there are some small differences with the baseline in the three components, especially in the coefficients effect (that seems to increase). These differences translate into a larger proportion of the gap explained by segregation in the three models. However, our main conclusions still hold, so our procedure is robust to the methodological changes they include.
Overall, the estimated underemployment gender gap is mainly explained by the different distribution of male and female workers across occupations and industries in every specification of the model, which clearly supports Hypothesis 2a. Moreover, Hypothesis 2b is also confirmed because, in most cases, the different impact that working in certain gender-typing jobs exerts on the risk of male and female underemployment contributes to widening the gender gap. Hence, the results in this subsection give robustness to our findings.

Conclusions and discussion
This article provides evidence of the crucial impact of occupational and industry segregation on the timerelated underemployment gender gap for people aged 16 to 34 using the annual samples of the Spanish LFS from 2006-2016. It is worth noting that despite the Spanish labour market being deeply gender segregated, the effect of this feature on the gap has not been addressed before in the literature. Furthermore, to the best of our knowledge, we are the first to perform a decomposition using bivariate probit models with sample selection. To do this, we have developed a methodology based on two counterfactual simulations that provides a three-fold detailed decomposition of the underemployment gender gap into endowments and coefficients effects as well as the interaction of these effects.
Our methodology, inspired by the Fairlie technique (1999Fairlie technique ( , 2005Fairlie technique ( , and 2017, has several advantages. Thus, it allows identifying the coefficients effect corresponding to a specific variable (while Fairlie technique does not). In fact, this effect can be easily interpreted using Kim's (2013) method to solve the identification problem. Moreover, we keep almost real individuals for two reasons. First, our approach does not require a one-to-one matching of individuals, since we replicate the distribution of each specific variable, and the number of women who have a specific characteristic adjusted is just those strictly necessary. Second, the proposed procedure always uses the same starting point, and we estimate each contribution by switching the variable of interest first (as suggested by Fairlie 2005). Thus, only one characteristic is modified at a time. Although we are aware that the order of switching the variables is potentially important, this decision provides a potential solution to the path dependence problem already pointed out by Fairlie.
The methodology proposed has some limitations. It does not show the summing up property of the Fairlie method, however, the estimated approximation errors are small and not significant. Moreover, using the Krinsky-Robb approach could lead to underestimating the standard errors. Despite our results indicating this is not the case, nonparametric bootstrap could be considered as an alternative.
The procedure has been tested in several ways and the main conclusions still hold. Firstly, the results obtained applying our procedure to a univariate probit model are very similar to those obtained through other decomposition techniques like Fairlie, Yun (2004) and Oaxaca-Blinder (Oaxaca 1973 andBlinder 1973). Secondly, the main results of our baseline model are robust to methodology changes, definition of some variables and to the sample used.
Our results demonstrate that working in femaledominated occupations and industries implies a higher probability of time-related underemployment than in male-dominated ones, confirming our first hypothesis. Moreover, we find that the disadvantage in terms of underemployment that implies working in a female-dominated occupation or industry is greater for women than for men.
Furthermore, according to our results, the estimated underemployment gender gap for young workers in Spain is 3.9 percentage points. As underemployment has a negative impact on income, welfare dependency and life satisfaction (Wilkins 2007), the higher underemployment risk faced by women implies negative consequences related to experience, earnings, and possibly promotions (Weststar, 2011). Therefore, designing effective policies leading to more gender equality will only be possible if the factors behind the gender gap in underemployment are clearly identified. To the best of our knowledge, the reasons for this difference have been little investigated. As an exception, Barrett and Doiron (2001) affirm that the main reason that explains women being involuntary part-timers more often than men is the fact they are employed in different industries and occupations. Nevertheless, Vuluku et al. (2013), who are the only ones who have decomposed the underemployment gender gap, did not include occupational or industry segregation as an explanatory factor, while we do. This fact could be one of the reasons why they find that only 5.4% of the gap is explained by female-male differences in characteristics, while 94.6% is unexplained, while our results are the opposite.
The results obtained from our simulations lead us to conclude that the fact that men and women work in different industries and occupations is what widens the underemployment gap to the greatest extent. This effect is even increased due to women facing a different risk of underemployment than men when they work in the same gender-typing jobs. So, Hypotheses 2a and 2b are supported and we can state that the gender gap in underemployment would be largely reduced if men and women were more evenly distributed across occupations and industries. Thus, segregation (especially occupational) is not only a source of gender differences in terms of wages and job quality (Stier and Yaish 2014), but also a key factor explaining the underemployment gender gap. Moreover, as the World Economic Forum (2017) states, a crucial factor for further progress in reducing the overall global gender gap is the closing of occupational gender gaps. Therefore, policy measures should be designed and implemented to fight against segregation in the labour market in order to achieve gender equality in Spain.
In this respect, education is an important factor to be highlighted. Thus, we can consider two different ways in which education can influence underemployment. Firstly, education has a direct impact on underemployment since the marginal effects show that some educational attainments contribute to reduce the risk of underemployment. Moreover, the results from our simulations using the baseline model, as well as those obtained after splitting the sample into workers with tertiary education and without it, allow us to affirm that education reduces the underemployment gender gap. Specifically, our results suggest a higher educational attainment helps women to escape from this disadvantage in the labour market.
Second, education may affect segregation in the marketplace due to the link between educational presorting and occupational and industry segregation (Borghans and Groot 1999;Shauman 2006;Smyth and Steinmetz 2008). The statistical evidence on the strength of the link between segregation in education and in employment is mixed 21 (Bettio et al. 2009). Nevertheless, appropriate remedies that address the barriers women experience to enter male dominated jobs should include changes in the education and training of women and girls, such as introducing gender aware career counselling/guidance (National Foundation for Australian Women 2017). Particularly, young women should be encouraged to enrol in previously male-dominated education programmes in order to gain access to a wider range of jobs. Additionally, encouraging young men to join female-dominated specialties could translate into a more mixed gender educational profiles. In fact, as Bettio et al. (2009) point out, since women are outperforming men in levels of education attained-up to the first stage of tertiary education-choice of field is the primary channel through which education can influence de-segregation in the labour market in the future.
Especially desirable would be more women specialising in ICTs since our results show this field seems to be linked to a lower risk of experiencing underemployment. This result is in line with the literature attributing ICTs the potential capacity to reduce gender inequalities, since they improve the occupational and professional position of women (Castaño et al. 1999), particularly in the Spanish labour market (Iglesias-Fernández et al. 2010a, 2010b. Specifically, ITCs reduce both the need for manual labour and physical effort in favour of knowledge, teamwork and communication skills (WWW-ITC 2004). This fact, in turn, promotes changes in the sectoral distribution and educational requirement of jobs and the demand for occupations (Iglesias-Fernández et al. 2012), leading to more opportunities for women. However, the ratio of female/male graduates in ICTs is only 0.14 in Spain (World Economic Forum 2017).
Although educational segregation by gender plays a significant role in shaping gender segregation within the labour market, as Smyth and Steinmetz (2008) point out, women and men who choose similar fields do not have exactly the same occupational outcomes. Thus, educational policies should be complemented with other instruments. For instance, reinforcing policies that encourage employers to hire female workers in male intensive occupations and industries could also help reduce both segregation and the gender gap in underemployment. Additionally, the economic conditions of female occupations should be improved to raise both men's and women's interest in female-dominated occupations. In order to achieve an egalitarian distribution of men and women across jobs, eradicating the disincentives to work in female-dominated occupations is necessary (Torre 2018). Finally, any other measures devoted to fighting against gender stereotypes and discrimination would be welcome to reduce inequality in the Spanish labour market.
Abbreviation LFS: Labour Force Survey. Measuring the effect of gender segregation on the gender gap in time-related underemployment Funding Authors declare that they have received no specific funding to conduct this research.

Availability of data and materials
The data that support the findings of this study are available from Instituto Nacional de Estadística (INE). Purchase terms do not allow authors sharing these data.

Code availability
The results have been obtained using STATA.

Declarations
Ethics approval and consent to participate