LIFE-M combines vital records with census data to create a large-scale intergenerational and longitudinal database, spanning four generations (G0-G3) of American families. Vital records contain women’s birth (“maiden”) names as well as their names at marriage, allowing LIFE-M to link women longitudinally and intergenerationally as well as men. Final sample sizes include over 11 million records from Ohio and 4 million records from North Carolina. The table below shows sample sizes by generation.

Sample by States, Generation, and Sex

sample size table

Notes: Some individuals (2 to 5 percent) are linked as multiple generations and are, therefore, captured more than once. “Total” shows the unique number of individuals. We code sex as “unknown” when sex is missing in all records.


LIFE-M data contain rich information about individuals born in Ohio and North Carolina, including:

  • birth family characteristics (e.g., birth order, sibling sex composition, age differences, twinning, number of siblings);
  • multi-generational family  characteristics (e.g., age, race, occupation, education, and birth state or country of parents and grandparents from the censuses);
  • own economic and demographic outcomes (wages, employment, occupation, birth state or country, education from the 1940 census);
  • marriage family characteristics (e.g., age at marriage, spouse and characteristics including all characteristics on this list);
  • own births (number of children, mortality of own infants and children, timing of births, sex composition, and twinning); 
  • geographic location (town or address) at vital events and census enumeration (lifetime mobility); and
  • longevity (date and place of death).


Please see Bailey, Cole, and Massey (2019) for a step-by-step process to create weights for specific subsamples and purposes, which help adjust the samples for under- or over-representation of certain subgroups or characteristics.