These files describe the exact data used for the paper "Individual identifiability predicts population identifiability in forensic genetic markers" by BFB Algee-Hewitt*, MD Edge*, JZ Li, and NA Rosenberg, Current Biology, 2016. These data are a combination of the data used for the paper "Clines, clusters, and the effect of study design on the inference of human population structure" by NA Rosenberg et al., PLoS Genetics, 2005, and new genotypes from a subset of the same people considered in the Rosenberg et al study. *Version 1.0 of the package of files - created by MDE, August 4, 2015 --------------------------------------------------------------------- 1. HGDPmicrosatsIncludingCODIS.stru This file includes the exact data used by Algee-Hewitt et al. The format is that used by the structure program. The first line gives the list of loci. The second line indicates whether a locus is a CODIS locus (codis), a tetranucleotide non-CODIS locus (tetra), or another kind of locus (other). After the second line, each person is listed on two consecutive lines. The first five columns include the following information: (1) Individual code number assigned by HGDP (2) Population code number assigned by Rosenberg et al. 2002 (3) Population name (4) Geographic origin (specific) (5) Geographic region (Africa, Europe, Middle East, Central/South Asia, East Asia, Oceania, or America) The next columns contain allelic types. The 779 loci with data from Rosenberg et al. come first. The alleles of these 779 loci are measured in base pairs. The final 13 loci are the CODIS loci, all of which are tetranucleotide repeat loci. The CODIS genotypes are measured as follows: if x is the entry in a data column corresponding to a CODIS locus (columns 793-797), then the greatest integer less than or equal to x/4 is the number of complete repeats. The remainder when x is divided by 4 is the number of additional base pairs. For example, x=49 corresponds to 12 repeats and 1 additional base pair. The left-to-right order of the genotypes corresponds to the left-to-right order of the locus names on the first line of the file. The placement of genotypes on the first versus second line for an individual is arbitrary. Missing data are denoted by "-9". ---------------------------------------------------------------------