These files describe the exact data used for the paper "Impact of restricted marital practices on genetic variation in an endogamous Gujarati group" by TJ Pemberton, F-Y Li, EK Hanson, NU Mehta, S Choi, J Ballantyne, JW Belmont, NA Rosenberg, C Tyler-Smith, PI Patel (American Journal of Physical Anthropology XX(X):XXX-XXX [2012]). These data are a combination of the data used for the paper "Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India" by NA Rosenberg, S Mahajan, C Gonzalez-Quevedo, MG Blum,L Nino-Rosales, V Ninis, P Das, M Hegde, L Molinari, G Zapata, JL Weber, JW Belmont, PI Patel (PLoS Genetics 2(12):e215 [2006]) and data on 180 additional Gujaratis. Note that five of the Gujaratis analyzed by Rosenberg et al. were not analyzed by Pemberton et al. (44200090, 445000135, 51400018, 65700024, 6710001). One male individual included in the data set (65600049) is from the village of Sojitra, but is a member of the Brahmin caste; consequently, he is not a member of the CGP and is labeled as an "other Gujarati" in the data files. *Version 1.0 of the package of files - created by TJP, May 30, 2012 --------------------------------------------------------------------- 1. pembertonEtAl2012.1200markers.stru This file includes the exact data used by Pemberton et al. (2012) --- both microsatellites and indels. The format is that used by the structure program. The first line gives the list of loci. After the first line, each individual is listed on two consecutive lines. The first five columns include the following information: (1) Individual code number assigned by Rosenberg et al. (2) Population code number assigned by Rosenberg et al. (non-Gujarati groups or by us (Gujarati groups). (3) Population name (non-Gujaratis groups) or village name (Gujaratis). (4) Population name (all groups). (5) Sex (M=male, F=female). The next columns contain genotypes (measured in base pairs). The left-to-right order of the genotypes corresponds to the left-to-right order of the locus names on the first line of the file. The placement of genotypes on the first versus second line for an individual is arbitrary. Missing data is denoted by "-9". --------------------------------------------------------------------- 2. pembertonEtAl2012.729microsats.stru This file includes the exact data used by Pemberton et al. (2012) --- microsatellites only. The format is that used by the structure program (see #1 above). --------------------------------------------------------------------- 3. pembertonEtAl2012.471indels.stru This file includes the exact data used by Pemberton et al. (2012) --- indels only. Genotypes are 100 for the "short" allele and 200 for the "long" allele. The format is that used by the structure program (see #1 above). --------------------------------------------------------------------- 4. pembertonEtAl2012.23Y-STR.stru This file includes the exact data used by Pemberton et al. (2012) --- Y chromosome STR only. The format is that used by the structure program (see #1 above). Genotypes on the first and second lines for an individual are identical, except for two individuals (29400039 and 691000157) who showed duplications at locus Y-GATA-A7.1; at this locus in these two individuals, the smaller allele is given on the first line and the larger allele is given on the second line. --------------------------------------------------------------------- 5. pembertonEtAl2012.MT-HVS1.stru This file includes the exact data used by Pemberton et al. (2012) --- mitochondrial HVSI variable sites only. The format is that used by the structure program (see #1 above). Genotypes are given as alleles (A, C, G, T). Genotypes on the first and second lines for an individual are identical, except for individuals in whom some sites showed evidence of more than one peak (heteroplasmic positions); at these sites in these indiviuals, the reference allele is given on the first line and the non-reference allele is given on the second line. --------------------------------------------------------------------- 6. pembertonEtAl2012.codes.txt This file contains code numbers that have been assigned to the populations in files associated with the Pemberton et al. (2012) paper. The columns include the following information: (1) Population code number assigned by Rosenberg et al. (non-Gujarati groups or by us (Gujarati groups). (2) Population name (non-Gujaratis groups) or village name (Gujaratis). (3) Population name (all groups). All Gujaratis not from the CGP are grouped into OTHER in column 2. ---------------------------------------------------------------------