Identifying couples in administrative data
© The Author(s) 2017
Accepted: 10 January 2017
Published: 15 May 2017
We develop a new method for identifying married couples in administrative data. Using address and name data from the universe of employment records in Germany we find around 3.3 Mio. pairs of individuals who are living at the same location, have a matching last name and are less than 15 years apart in age. We show supporting evidence that around 89 to 94% of these pairs are indeed married couples and provide careful consistency checks. Using information from the German Microcensus, we show that our method identifies about 17% of all married couples in Germany and about 35% of couples where both spouses are in social security covered jobs or unemployed. In ongoing work this couple identifier will be made available to the research community and users for the IAB administrative data. Our method thus opens the door for household level analyses benefiting from the precision and very large number of observations available in administrative data.
Identifizierung von Ehepaaren in Administrativen Daten
Wir entwickeln eine neue Methode zur Identifizierung verheirateter Paare in administrativen Daten. Mittels Adressdaten und Nachnamen der Gesamtheit der Beschäftigungsmeldungen in Deutschland, identifizieren wir ca. 3,3 Millionen Paare von Personen die an der gleichen Adresse wohnen, deren Nachnamen übereinstimmen, und einen Altersabstand von weniger als 15 Jahren haben. Wir zeigen mittels verschiedener Konsistenzchecks, dass ca. 89 bis 94 Prozent dieser Paare tatsächlich verheiratete Paare sind. Anhand von Informationen des Mikrozensus, zeigen wir, dass unsere Methode etwa 17 Prozent aller verheirateten Paar in Deutschland identifiziert und ca. 35 Prozent aller Paare bei denen beide Partner in sozialversicherungspflichtiger Beschäftigung oder arbeitslos sind. Der Paarindikator wird der Forschungsgemeinschaft und Nutzern der IAB Daten zur Verfügung gestellt. Unsere Methode eröffnet damit neue Forschungsmöglichkeiten für Haushaltsanalysen die von der Präzision und großen Beobachtungszahlen von administrativen Daten profitieren.
Recent years have witnessed a dramatic rise in the use of administrative data in economic research, facilitated by increases in computing power and the availability of new administrative data sources. The main advantages of administrative data have been large sample sizes compared to survey data, often covering the entire universe; the ability to follow the units of observation over time and the high quality of recorded information. This shift has been particularly forceful in Labor and Public Economics, where the availability of individual level employment and tax records has led to the rise in new research designs such as regression discontinuity, regression kink or bunching designs that rely on very large sample sizes. While administrative data offer many advantages, they also come with limitations and the scope of available variables is often quite limited compared to household surveys. In particular, administrative employment records are typically on the individual level only and it is often not possible to link individuals to other household members. For this reason, administrative data have played a smaller role in studying traditional questions in labor economics, such as household labor supply, household investment decisions in human capital or within household income differences.1
In this project we develop a new method to impute household identifiers in the administrative employment records data in Germany to increase the scope of research questions that can be addressed. Our approach is to identify pairs of individuals who are, with a high probability, married couples using information on addresses, family names and dates of birth. In Germany it is still very common that at the time of marriage one spouse (in the vast majority of cases the wife) adopts the other spouse’s last name, either fully or as part of a double name. If two individuals with matching last names are living together at the same address, they are likely related, though they could also be in a sibling or parent-child relationship. To further narrow it down to married couples we take pairs of a woman and a man with matching last names with an age difference of less than 15 years, which should exclude most parent-child relationships. We present a detailed analysis of the likely extent of errors when applying this method. The new identifiers for married couples will be made available to external researchers and users of the IAB administrative datasets, facilitating a broad range of possible research projects that rely on household/couple identifiers. Something to which we return to in the conclusion.
Germany has a long tradition of women taking on their husbands’ last name at the time of marriage. The German Civil Code from 1896 unequivocally required that the wife takes on the name of her husband.2 A reform in 1953 allowed for the wife to keep her birth name as part of a double (or hyphenated) last name, but she was still required to take on her husband’s name as the family name. The family name law was revised again in 1970 allowing that a couple could decide to take on the wife’s name as the family name, but kept the requirement of a common family name for both spouses. Furthermore, if a couple could not come to an agreement with respect to which name would become the family name the decision was up to the husband. This only changed with a decision by the German constitutional court in 1991 and a subsequent revision of the family name law in 1994, after which both spouses were allowed to keep their own birth names, while the traditional option of taking on one of the birth names or a hyphenated double name for one of the spouses continued to exist. In practice it appears that it is still the case that the vast majority of women take on their husband’s names either fully or at least as part of a double name. While we are not aware of representative surveys or official registry data for Germany that would allow us to calculate the share of couples with matching last names, we found various press reports from city level wedding registries that seem to suggest that even among newly wedded couples around 85 to 90% still have a matching last names.3 Among couples married for longer (and in particular before 1994), the ratio is likely significantly higher.
We implement the method of identifying likely couples using last names, addresses and age using a cross-section of the administrative data from the Institute for Employment Research (IAB) in Germany spanning the universe of employment and unemployment records for 2008. This data, called Integrated Employment Biographies or IEB, covers all individuals who are employed in employment subject to social security contributions, receive benefits from the unemployment insurance (UI) system, or who are registered as job seekers. This data covers around 80% of employees, in particular excluding public servants and the self-employed. By design we are only able to identify married couples where both spouses are covered in the IEB. While this is certainly not a representative sample and excludes a sizable part of the population of couples we are still able to identify over 3 Mio. couples who are likely married to each other.
The two main concerns with this approach are the potential for false positives and false negatives. False positives may arise because people with matching last names may live at the same address either purely by chance, or because they are related to each other but not married. Using the distribution of same-sex matching name pairs, as well as information on family status for a subset of individuals we show that likely around 88–94% of our sample of couples are indeed married to each other. Even if both spouses of a married couple are in our data, false negatives may arise, because we may not match them to each other. Either they do not have matching names or there are more than 2 matching individuals at a location, making it impossible to tell who is married to whom. False negatives will also arise whenever one or both members of a marriage are not covered in the IEB data, which for example would include all self-employed, public servants or individuals not in the labor force, but also all individuals older than age 65. Using information from the Microcensus, we show that we can identify roughly 20% of the 19 Mio. married couples in Germany. Furthermore, we identify about one third of married couples where both individuals are covered by the IEB data (i. e. working in social security covered job or unemployed). We compare observable characteristics of our matched couples with the official microcensus data to show how our sample differs from the general population of married couples. While the representativeness of the matched couples is clearly limited, many research questions do not rely on having a representative sample. The large number of observations and the possibility to observe complete employment histories in the IAB data should make this data a valuable tool for many research projects. We will return to a discussion of how this data can be used in the conclusion.
This paper is related to other research that uses the special features of administrative data to impute information that is not directly available. For example, Jacobson, Lalonde and Sullivan (1993) use the combination of individual and firm identifiers in UI records from Pennsylvania to impute plant closings and mass-layoffs by observing when large numbers of individuals are moving away from firm identifiers and are scattered across many other employers. Hethey-Maier and Schmieder (2013) use a similar approach to identify new plant openings in administrative data, relying on worker flow information to distinguish plant openings from spurious changes in firm identifiers. Goldschmidt and Schmieder (2015) identify outsourcing of labor services in large firms employing an algorithm based on a combination of worker flows, industry and occupation codes.
The next section describes the data used in this project. Sect. 3 describes our method for identifying couples and presents the results based on individuals in 2008. In Sect. 4 we show supportive evidence that our method does in fact largely identify married couples and develop bounds on the fraction of false positives. We then present characteristics of the couples that we identify with our method and compare them to the general population in the German employment data, as well as to other data sources. Sect. 5 concludes.
2 Data sources
In this chapter, the sources of the data are explained in detail. Sect. 2.1 describes the Integrated Employment Biographies (IEB) data, while the geocoded location data and the individual name data are discussed in 2.2 and 2.3.
2.1 Integrated employment biographies
Employment subject to social security or marginal part-time employment.
Receipt of unemployment insurance benefits in accordance with Social Code Book II or III.
Job search registered with local employment agencies.
Planned or actual participation in an employment or training programs.
The IEB includes demographic variables such as nationality, birthdate, gender, and education. Information on employment, benefit receipt and job search include daily wage, daily benefit rate, occupational and employment status or economic activity. Additionally location data such as place of residence or work on different aggregated levels are provided. There were around 35 Mio. working individuals in Germany in 2008 (own calculations based on Microcensus data), about 80% of whom have at least one record in the IEB. The biggest groups which are not included in the biographies are self-employed workers and public servants called Beamte.4
We also have information on family status (married, living alone, single parent, cohabitating), but only for the subset of individuals who are unemployed and registered as job seekers. We use this information in Sect. 4 for various consistency checks.
2.2 Geocoded data
Our method relies on finding individuals living at the same location. In principle individuals can be matched to other individuals at the same location either by directly comparing addresses, or by first geocoding addresses into latitude/longitude coordinates and then comparing coordinates. Matching addresses directly is complicated by the fact that these can often be written in a variety of ways and need to be carefully cleaned. We instead match individuals on geographic coordinates, where the address processing was done using GIS software, which allows for careful error correction methods. The geocoding was done in a project between the Research Data Centre (FDZ) and the University of Duisburg-Essen for a cross-section of all individuals in the IEB data as of June 30th, 2008. This project used data from the Federal Agency for Cartography and Geodesy, and includes 22 Mio. addresses of German buildings and their geographic coordinates and it was possible to successfully geocode 94.6% of the IEB records.5 Individuals whose addresses are not geocoded were dropped from the data and are not used in the further analysis.
One of the criteria that we use for determining couples is whether the last names of two people match. We therefore also obtained data on last names covering the universe of individuals who have a record in the IEB as of June 30th, 2008. In order to improve the probability of success in matching, we first clean the names of errors and typos, and ensure consistency in terms of special characters and titles. With the support of the German Record Linkage Centre (GermanRLC) and their algorithm, the names of the individuals were cleaned, taking into account certain patterns and potential discrepancies.6 Umlauts were substituted (ä → ae and so forth) as well as ß to ss. All blank spaces in the front, middle or end of the name were removed. Professional and nobility titles (such as Dr., Prof., Freiherr von) were removed as well, and special characters (e. g. ~ or %) and non-ASCII characters (e. g. © or ™) were deleted.
The only special character that was retained is the hyphen (‑), which is used to indicate double names. While the family name law in the civil code book states that a spouse can add their birth name to the family name does not specifically mention a hyphen, in practice this appears to be the only option. In fact a court decision from 2013 specifically ruled that a couple was not allowed to combine the birth names of two spouses without a hyphen (Kammergericht Berlin 2013). Furthermore individuals are not allowed to create last name chains that involve more than one hyphen (for example if at the time of marriage an individual already has a double name from a previous marriage). We thus assume that double names are always separated by a hyphen and we describe below how we use hyphenated names in our name-matching algorithm. At the end of the cleaning process all letters were converted to upper case.
Although individuals have a consistent personal identifier, the Einheitliche Statistische Person (ESP), the last name may vary across different data sources. If, after the name cleaning process was completed, discrepancies persisted in the names across data sources, the individual was dropped. The exception was when an individual had a double last name in one source and an overlapping single last name in another (e. g. MUELLER-MEIER in one source and MEIER in another). In this case, the double last name was kept.
3 Identifying couples
Same home location.
Uniquely matching last name.
One male, one female, with an age difference of less than 15 years.
We go into more detail on each of these requirements below.
Distribution of the Number of Individuals at the Same Coordinate
individuals at coordinate
Total number of individuals
individuals with matched names
Percent matched (%)
Next, we look at the cleaned names of the individuals living within any given location. We require that our identified married couples share a last name. In situations where any of the people in the location has a hyphenated name, we consider two names to be a match if at least one part of the hyphenated name is identical to another name at the location. In locations with multiple people, we additionally require that a maximum of two people have matching names. Otherwise, we have no way to determine which two individuals are likely to be a couple and which may be unrelated, or related in other ways. The following examples help to clarify the procedure.
Examples of the name-matching procedure
Number of individuals at coordinate
Example 2.1 a
Example 2.2 b
Example 2.3 c
After running this algorithm over the 28 Mio. individuals, we are left with about 5 Mio. pairs (ten million individuals) who share a location and last name. The third and fourth columns of Table 1 show the number and percent of people that were matched through this algorithm, organized by the number of individuals at a location. For coordinates with only 2 individuals, almost 70% had matching names. At coordinates with 3 or more people found at the same location, the match rate is between 20 and 30%.
There are several limitations to this criterion. First, while the majority of married couples in Germany share a last name (or part of a double name), not all women (or men) change their last name upon marriage, and we are certain to miss those couples. Second, in locations with multiple people where more than two share a last name, since we can not be certain which two members are married (if any) we must drop them all, eliminating more potential matches from our sample. Finally, we may be capturing two people with the same last name living in the same coordinate who are related but not married. In addition, particularly in multi-unit residences, there may be two people who are unrelated but have the same last name, and we may erroneously be including them in our sample. Our next criteria, on gender and age, will eliminate some of these falsely matched people from our sample, but not all.
3.3 Gender and age
Gender Composition of Matched Potential Couples
Age Difference <15
Age Difference ≥15
For determining our sample of couples, we require that the difference in age of the matched man and woman be less than 15 years. This should eliminate any mother-son or father-daughter pairs from the set of couples. The remaining pairs – consisting of one man and one woman, with matching last names, who live in the same location and are less than 15 years apart in age – make up our final sample. Columns 4–5 of Table 3 show the results when we impose our age difference restriction. We retain 80% of our male-female couples, leaving us with a final sample of about 3.3 Mio. couples. This sample should be primarily composed of true couples, although some share will be “false positives”, made up of male-female siblings or family members who are similar in age, or unrelated people with the same name living at the same coordinates.
4 Consistency checks
Errors in our matching algorithm could occur in two ways. First, we have false positives – two people who are matched to each other by our algorithm, but who are not really a married couple. Second, there are couples that we do not pick up with our matching method, for various reasons. We discuss these two issues, and the steps we take to quantify their magnitude, below.
4.1 False positives
One type of error that could occur is when our algorithm matches two people who are not really married to each other, also known as type 1 error. Pairs in our sample may be wrongly matched if: (1) they are brother and sister, or have some other family relationship, are close in age, and live in the same location; or (2) they are unrelated, but living in a multi-unit residence, such as an apartment building, and happen to have the same last name and are close in age.
We can try to measure the size of this type of error in our final sample of couples in a few ways. First, we can use the distribution of same-sex matches to give us a sense of what share of our sample are wrongly matched if we make the following two assumptions. The first assumption is that opposite-sex family members who are close in age (i. e. brother and sister) are as likely to live together as same-sex family members (two sisters, for example). The second is that it is as likely for two people of the opposite sex who live in the same building to share a last name as it is for two people of the same sex. Using these assumptions, we can look at the number of same-sex matched pairs that fall within our age difference restriction (ages within 15 years of each other), using the numbers provided in Table 3 – these couples are likely either pairs of family members living in the same location, or unrelated people with the same last name in the same building. We find that there are 185,313 male/male and female/female pairs that fall within our age restriction. So, it is likely that approximately 185,000 couples in our sample of matched male-female couples with age difference under 15 years are also wrongly matched. In fact, since there are some same-sex civil unions where partners share a family name, this arguably overestimates the number of false positives by a small amount.7 Using this methodology, our accuracy rate is around 94% (final sample is 3,281,657; estimated wrongly matched is 185,313; correctly matched = 3,281,657– 185,313 = 3,096,344; accuracy rate = correctly matched/final sample = 3,096,344/3,281,657 = 94%). So, according to this method, only about 6% of our sample is wrongly matched and our sample does indeed identify couples who with a high degree of certainty are indeed married to each other.8
Family Status of Individuals in Matched Couples Sample
Absolute Number of Individuals
Percent among non-missing (%)
Family Status Composition, for matched couples
Family Status Combinations
Age diff <15
Age diff ≥15
Age diff <15
Age diff ≥15
Single parent-single parent
Using the information in Table 5, we can also estimate the share of matches in our final sample that are likely to be true couples and not wrongly matched people (i. e. our “accuracy rate”) using the subsample of couples with at least one family status listed. If we think that the family status variable is accurate, then the set of “true” couples in our sample should be 578,088: the number of couples who are listed of either being both married or one married, on missing family status. Even within these there may be individuals who were mistakenly matched. For example, there may be a job-seeking man with the last name MUELLER, whose wife is out of the workforce (and hence is not included in the IEB data), living at the same coordinates as a similarly-aged jobseeker woman with the last name MUELLER whose husband is not in the IEB data either. Our matching algorithm would connect these two jobseekers, who are both listed as being married, even though they are not actually married to each other. If we think that it is as likely for two individuals of the same gender to be wrongly matched in this way as it is for two opposite-gender individuals, then we can use the information on family status for same-sex pairs for our accuracy estimate. Specifically, there are 5173 (637 + 4536) same-sex matched pairs with age difference less than 15 years where family status is listed as both married or married-missing.10 Since we know that these are wrongly matched pairs, we can assume that the same number of opposite-sex pairs was wrongly matched as well. So, the estimated “true” number of couples in the subsample of couples with family status is 572,915 (578,088 matched M‑F with age difference <15 and family status married-married or married-missing minus 5173 same-sex pairs with age difference <15 and married-married or married-missing status). Since our full sample of matched couples (with family status) is made up of 649,643 (3,281,657–2,632,014) couples, our estimated accuracy rate is 88.2% (572, 915 “true” couples/649,643 total couples in our final sample of couples with family status filled in for at least one of the members), or 11.8% error rate.
Comparing Individuals and Couples with Microcensus
Final Matched Sample
Microcensus 2008 restricted
2 People at Coordinate
>2 People at Coordinate
Number of individuals on coordinate
≥35 and <45
≥45 and <65
≥35 and <45
≥45 and <65
Labor Force Status
Secondary/intermediate school leaving certificate
Upper secondary school leaving
Living in East Germany
Number of individuals
no age difference
≥1 and <4
≥4 and <7
≥7 and <11
≥11 and <16
Number of couples
(SUF: n = 226,787)
(SUF: n = 109,073)
While using the job-seeker data is helpful for estimating the likely fraction of false positives, it should be kept in mind that neither is this subsample representative, nor necessarily is family status measured without errors. It may well be the case that we are overestimating or underestimating the number of false positives here. Overall, based on the two approaches discussed, we estimate that the fraction of false positives lies somewhere in the range of 6% to 11.8%.
4.2 Missing couples
Given the data we are using and the matching algorithm we have developed, we are likely to have missed many true married couples, either among individuals who are in our dataset (a form of type 2 error) or where at least one spouse is not covered in the IEB. In order to get a sense of what share of couples we can identify in our data, we obtained the Scientific Use File of the Microcensus 2008 (see Boehle 2010), to calculate the number of married couples in 2008 overall and the number of married couples that satisfy the sample restrictions that we have to apply in the IEB data. Overall, there were 19,187,000 married couples in 2008; of those, about 9.2 Mio. were such that both spouses would live together, would be less than 15 years apart in age, and would be covered in the IEB data, i. e. either working in a social security covered job or being unemployed. Since, in our final sample, we have 3.2 Mio. couples, we capture about one third of the total number of married couples that match our baseline restrictions.
If the couple does not share a last name (or part of a hyphenated name), then we would not capture them with our algorithm. Until 1991 it was required by German law that married couples share a last name, and even afterwards most change or hyphenate their last name upon marriage. Although we were not able to find official statistics on this topic, according to several newspaper articles the share of new couples who share a last name is around 85 to 90%. Couples where one or both members are non-German are the least likely to share a last name.
Couples where the age difference between the husband and wife is more than 15 years are omitted from our sample in an effort to ensure that we do not mistakenly include parent-child pairs in our sample. Although there are certainly married couples with a 15-year or larger age difference, the number of these types of couples is quite small. For example, in the micro census, a representative survey of German households, the share of couples with a 16-year or more age difference was only 2% in 2008.
Couples not living together on June 30th, 2008 are impossible for us to identify with our data; however, we believe that this situation is likely to be rare.
If the couple lives at a location with more than 2 people with the same last name at the same coordinate, we have no way of knowing which two people are part of a couple, and so all are dropped (about 5.2 Mio.).
We drop people who have inconsistent names across data sources, thus potentially omitting more couples from our sample (about 1.8 M).
We can get a sense of how representative our final sample of couples is by comparing their characteristics to those of a truly representative sample of couples, those in the Microcensus. Table 6 compares individual characteristics of people in our final sample of couples (column 3) to couples in the Microcensus in 2008. Column (6) shows all married couples in 2008, while column (7) shows all couples satisfying the restrictions of our algorithm in the IEB. In terms of the age distribution, our men and women tend to be a younger than those of all census couples; this can be explained by the fact that our sample only includes people in the workforce, so older workers who are more likely to be retired are excluded. In addition, anyone married to a retired person will be omitted from our final sample, since their spouse will not be in our original dataset. Comparing the last column where we apply the same restrictions as in our matching algorithm, we find that the age distribution is much closer to our matched couples.
Looking next at the labor force status, we do not have the full range of labor force status options that are available in the micro census, since the IAB data only includes people in the labor force but omits self-employed and public servants. The couples in the last column of Table 6 look reasonably similar in terms of labor force status as our matched couples sample, although they are somewhat less likely to be unemployed. This might be because some long-term unemployed who are in the IEB might be identified as out of the labor force in the Microcensus, or because we are somehow more likely to identify unemployed individuals as part of couples in the IEB. Interestingly, when we restrict the matched couples data to a sample with exactly 2 people at a location (Column 4) the distribution is much closer to the census.
In the bottom half of Table 6 we can compare the characteristics of couples in the two different data sets. The distribution of age difference within couples of our final sample (column 3) is almost exactly the same as that of the Microcensus when using the same restrictions as in our algorithm (column 7). The couples in our sample are slightly more likely to be both German and less likely to be both non-German than those of the micro census; as mentioned earlier, non-Germans are less likely to change their name at marriage than Germans are, and so are more likely to be omitted by our matching algorithm. Overall, although we miss many couples in our data set and may mistakenly include some pairs who are not truly married, the couples that we identify seem roughly similar to the universe of couples in Germany that satisfy the restrictions that are imposed in the matching algorithm.
5 Discussion and conclusion
We present a new method for identifying a very large number of pairs of individuals who are likely married to each other in the German administrative data. While room for type 1 (false positives) and type 2 (false negatives) errors exists, our analysis suggests that our final sample still contains about 89 to 94% actually married couples. An important caveat is that due to the nature of the IEB, our sample of married couples is not representative of all married couples, but at best representative of couples where both individuals are either working in a job that is covered by social security (that is not civil service job or self employed) or are unemployed and receiving benefits. Our comparison with the Microcensus from our baseline year suggests that our matched couples look reasonably similar to couples in this more restrictive sample frame, but even then we are more likely to pick up married couples who live in smaller buildings, such as single family homes, and thus probably couples who are either living in less densely populated areas or with higher income levels. Finally, since we rely on last names our sample will miss all couples where the spouses do not share a name and this decision is likely correlated with other characteristics of the couple.
While the representativeness of this matched couple data is therefore clearly limited, many research questions do not rely on a representative sample. Most natural experiments that have been used by applied researchers only affect a very selected subsample of the population (e. g. typical regression discontinuity or regression kink designs), but obtaining causally interpretable parameters with a high degree of internal validity is still very valuable even if it cannot easily be extrapolated to the general population.
Overall, the method appears accurate enough to open the door for future research projects analyzing research questions in labor and public economics that rely on household (couple) identifiers using administrative data. We are working on making these identifiers available to external researchers through the existing IAB research data infrastructure. We can readily imagine a wide number of possible applications. For example, a long literature has studied the added worker effect, which is whether spouses of displaced workers respond to the job loss by increasing their own labor supply (see for example Lundberg 1985, or Stephen 2002). Most existing work in this literature has relied on panel survey datasets such as the PSID or GSOEP. Using our identifier, it will be possible to study the added worker effect for a much larger sample of workers after a variety of well identified shocks such as plant closings or mass layoffs. Another promising area of research is to study spillover effects of public programs. For example, Cullen and Gruber (2000) provide fascinating evidence that more generous unemployment insurance benefits reduce labor supply of spouses married to the benefit recipient. A lot of recent work on UI has been done with the German administrative data (e. g. Schmieder et al. 2012, 2016) exploiting the large number of observations and clean sources of identification such as age discontinuities in potential duration. With the possibility to link married couples it will be possible to use similar research designs to look at questions as in Cullen and Gruber (2000) to understand how households as a whole are affected by policies such as UI, active labor market policies or tax policies. Another example where our new identifier could be used is to study relative incomes within married couples as for example in Bertrand et al. (2015). Other areas where important work has been done with the IAB data that could be extended using our couple identifiers include for example the labor supply and mobility responses to immigration shocks (Dustmann et al. 2016), or the effects of maternity leave policies on labor supply (Schönberg and Ludsteck 2014).
We believe that providing access to a new way to study household decisions and responses in administrative data will inspire the research community to many new and creative research projects.
While some countries do allow for linking households in their administrative registry data, resulting in exciting and influential work, these countries tend to be relatively small and geographically clustered, such as Austria (Frimmel et al. 2014) or the Scandinavian countries (e. g. Hardoy, and Schøne 2014 or Huttunen and Kellokumpu 2016). Expanding the scope of administrative data to other countries will be very valuable to study the household behavior in new contexts.
All-in (2006) report that in Kempten in 2006 around 14% of newly married couples keep separate names. Janisch (2010) reports that a small survey among marriage registries several German cities yielded that around 10 to 20% of couples keep separate names. This also seems to refer to newly married couples, which suggests that the ratio of couples with separate names among the pool of existing couples is likely much lower.
See Scholz et al. (2012). That paper is based on geocoded data from 2009, but 2008 was also geocoded as part of the same project. We decided to use 2008 as a baseline to allow for more analysis years after the couples are identified which seemed useful for many possible research questions. In the future we hope to expand the procedure to more years.
Statistisches Bundesamt (2012) states that there are about 34,000 same sex civil unions in Germany in 2011. We do not know how common it is for same sex couples to adopt a common family name, nor that they would both be employed and covered in our data. It appears that due to the small number of same sex civil unions our method for identifying male-female marriages would not work as well for identifying same sex civil unions.
Here we assumed that two opposite sex individuals with matching last names who are not married are equally likely to live together as two same sex individuals, averaging over male-male and female-female pairs. A more conservative assumption would be to assume that opposite-sex pairs that are not married are as likely to live together as male-male pairs, i. e. 2*131,550 = 263,100 leading to an accuracy rate of 92%. We thank an anonymous referee for pointing this out.
These are typically either people who are unemployed (in particular unemployment insurance recipients are required to register as job seekers) or who expect to be unemployed soon.
We are again being conservative here, assuming that among the same-sex matched couples, none are true couples (same-sex civil unions). As discussed before this is likely a very small group.
This can also be seen from Table 5 if we look at the subsample of our matched couples with the family status variable available. Of the male-female pairs where both are listed as married, only 3% have an age difference of 15 years or more.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- All-in: Immer mehr behalten ihren Geburtsnamen – Zahl der Ehen mit Doppelnamen bleibt seit Jahren gleich (2006). http://www.all-in.de/nachrichten/lokales/Immer-mehr-behalten-ihren-Geburtsnamen;art26090,215128, Accessed September 1, 2014Google Scholar
- Bertrand, M., Kamenica, E., Pan, J.: Gender identity and relative income within households. Q J Econ 130(2), 571–614 (2015)View ArticleGoogle Scholar
- Boehle, M., Schimpl-Neimanns, B.: GESIS – Leibniz-Institut für Sozialwissenschaften (Ed.): Mikrozensus Scientific Use File 2008 : Dokumentation und Datenaufbereitung. Bonn (GESIS-Technical Reports 2010/13) (2010). http://nbn-resolving.de/urn:nbn:de:0168-ssoar-207237 Google Scholar
- Cullen, J.B., Gruber, J.: Does unemployment insurance crowd out spousal labor supply? J Labor Econ 18(3), 546–572 (2000)View ArticleGoogle Scholar
- Dustmann, C., Schönberg, U., Stuhler, J.: Labor supply shocks, native wages, and the adjustment of local employment. Quarterly Journal of Economics 132 (1), 435–483 (2017)Google Scholar
- Frimmel, W., Halla, M., Winter-Ebmer, R.: Can pro-marriage policies work? An analysis of marginal marriages. Demography 51(4), 1357–1379 (2014)View ArticleGoogle Scholar
- Goldschmidt, D., Schmieder, J.F.: The rise of domestic outsourcing and the evolution of the German wage structure, Quarterly Journal of Economics (forthcoming)Google Scholar
- Hardoy, I., Schøne, P.: Displacement and household adaptation: Insured by the spouse or the state? J Popul Econ 27(3), 683–703 (2014)View ArticleGoogle Scholar
- Hethey-Maier, T., Schmieder, J.F.: Does the use of worker flows improve the analysis of establishment turnover? Evidence from German administrative data. J Appl Soc Sci Stud – Schmollers Jahrb 2013 133(4), 477–510 (2013)Google Scholar
- Huttunen, K., Kellokumpu, J.: The effect of job displacement on couples’ fertility decisions. J Labor Econ 34(2), 403–442 (2016)View ArticleGoogle Scholar
- Jacobson, L.S., LaLonde, R.J., Sullivan, D.G.: Earnings losses of displaced workers. Am Econ Rev 83, No. 4, 685–709 (1993)Google Scholar
- Janisch, W.: Namenswahl nach der Heirat: Bekenntnis zum Mann (2010). http://www.sueddeutsche.de/leben/namenswahl-nach-der-heirat-bekenntnis-zum-mann-1.79245, Accessed August 1, 2016Google Scholar
- Kammergericht Berlin: Eheregistereintragung: Schreibweise von Ehenamen und Begleitnamen (2013). http://www.gerichtsentscheidungen.berlin-brandenburg.de/jportal/?quelle=jlink&docid=KORE209412013&psml=sammlung.psml&max=true&bs=10, Accessed September 1, 2014Google Scholar
- Lundberg, S.: The added worker effect. J Labor Econ 3, 11–37 (1985)View ArticleGoogle Scholar
- Schild, C.-J., Antoni, M.: Linking survey data with administrative social security data – the project “Interactions between capabilities in work and private life”. Working paper series, vol. 2014–02. German Record-Linkage Center, Nürnberg, p 11 (2014)Google Scholar
- Schmieder, J.F., von Wachter, T., Bender, S.: The effects of extended unemployment insurance over the business cycle: Evidence from regression discontinuity estimates over 20 years. Q J Econ 127(2), 701–752 (2012)View ArticleGoogle Scholar
- Schmieder, J.F., von Wachter, T., Bender, S.: The effect of unemployment benefits and nonemployment durations on wages. Am Econ Rev 106(3), 739–777 (2016)View ArticleGoogle Scholar
- Scholz, T., Rauscher, C., Reiher, J., Bachteler, T.: Geocoding of German administrative data. FDZ-Methodenreport, vol. 2012–09. Institute for Employment Research, Nürnberg, (2012)Google Scholar
- Schönberg, U., Ludsteck, J.: Expansions in maternity leave coverage and mothers’ labor market outcomes after childbirth. J Labor Econ 32(3), 469–505 (2014)View ArticleGoogle Scholar
- Sperling, F.: Familiennamensrecht in Deutschland und Frankreich: eine Untersuchung der Rechtslage sowie namensrechtlicher Konflikte in grenzüberschreitenden Sachverhalten. Mohr Siebeck, Tübingen, p 226 (2012)Google Scholar
- Statistisches Bundesamt: Bevölkerung und Erwerbstätigkeit – Haushalte und Familien Ergebnisse des Mikrozensus, 1st edn. 3. Statistisches Bundesamt, Wiesbaden (2012)Google Scholar
- Stephens Jr, M.: Worker displacement and the added worker effect. J Labor Econ 20(3), 504–537 (2002)View ArticleGoogle Scholar