Rosenberg lab at Stanford University

HGDP-CEPH human genome diversity cell line panel

Data from HGDP-related papers published in
[2017] [2013] [2011] [2009] [2008] [2006] [2005] [2002]

The diversity panel is a large and widely-used collection of DNA samples from people distributed around the world. Several of our papers have utilized genotypes from the diversity panel. Here we provide HGDP-CEPH data exactly as used in these papers.

Note that slightly different versions of our microsatellite and indel data sets are located at the website of the Marshfield Clinic Research Foundation. In cases where it is of interest to compare new results on the diversity panel to what has been seen in our previous work, we recommend using the files downloadable from this site, rather than those available in Microsoft Excel from Marshfield.

Further information about the microsatellite markers, such as PCR primers and map positions, are available from Marshfield.


HGDP 2017 SNP-STR data (872 individuals)

(Posted August 31, 2018) HGDP STR data with neighboring SNPs are now available for
Files:

HGDP 2013 exome data (27 individuals)

(Posted July 17, 2013) HGDP exome data are now available for
Files:

HGDP+other 2013 microsatellites (645 autosomal microsatellite loci in 5795 individuals from 267 populations)

(Posted July 17, 2013) HGDP microsatellite data plus data from other major microsatellite datasets (human and chimp) are now available online for Files:

HGDP+India+Africa 2011 SNP data (2810 single-nucleotide polymorphisms in 1107 individuals from 63 populations)

(Posted July 17, 2013) HGDP+India+Africa SNP data are now available online. These data update the data of Conrad et al. (2006) and Pemberton et al. (2008) described below.
Files:
  • Download SNP data (you will be directed first to a registration page and we would very much appreciate if you register)

HGDP 2009 sequence properties of microsatellites (627 autosomal microsatellite loci in 1048 individuals, with repeat numbers and sequence properties)

(Posted January 21, 2010) For 627 HGDP microsatellites, these files provide sequence properties, such as the structure of the repeat motif and the GC content of the flanking region. They also convert the PCR fragment lengths in nucleotides to numbers of repeats, by calibration with the human genome reference sequence.

Files:

HGDP+India 2008 SNP data (2810 single-nucleotide polymorphisms in 957 individuals from 55 populations)

(Posted June 27, 2008) HGDP+India SNP data are now available online. These data update the data of Conrad et al. (2006) described below.
Files:
  • Download SNP data (you will be directed first to a registration page and we would very much appreciate if you register)

HGDP 2008 high-resolution genome-wide SNP data (525,910 single-nucleotide polymorphisms and 1428 copy-number variable loci in 485 individuals from 29 populations

(Posted Feb 26, 2008) HGDP SNP data are now available online for
  • M Jakobsson*, SW Scholz*, P Scheet*, JR Gibbs, JM VanLiere, H-C Fung, ZA Szpiech, JH Degnan, K Wang, R Guerreiro, JM Bras, JC Schymick, DG Hernandez, BJ Traynor, J Simon-Sanchez, M Matarin, A Britton, J van de Leemput, I Rafferty, M Bucan, HM Cann, JA Hardy, NA Rosenberg, AB Singleton (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998-1003. [Abstract] [PDF]
Files:

HGDP 2006 SNP data (2834 single-nucleotide polymorphisms in 927 individuals from 52 populations)

(Posted May 23, 2007) HGDP SNP data are now available online for
Files:
  • Download SNP data (you will be directed first to a registration page and we would very much appreciate if you register)

HGDP 2006 relatives

(Posted October 17, 2006) It is recommended that anyone working with the diversity panel read the following paper, which reports a variety of anomalies in the diversity panel individuals and recommends standard subsets for future use.

HGDP 2005 microsatellites and indels (783 autosomal microsatellite loci and 210 insertion/deletion polymorphisms in 1048 individuals from 53 populations)

(Posted November 1, 2005) The following data files, all in plain text format, were reported by two papers appearing nearly simultaneously. The microsatellite markers are drawn from Marshfield screening sets 10, 13, and 52, and the indels are drawn from Marshfield screening set 100. A description of how these data files differ from those on the Marshfield site is in the Ramachandran et al. (2005) and Rosenberg et al. (2005) papers.

In choosing data files for analysis, note that there are slight differences between the data used by Ramachandran et al. (2005) and those used by Rosenberg et al. (2005). Our uses in the lab employ the Rosenberg et al. (2005) version.

  • S Ramachandran, O Deshpande, CC Roseman, NA Rosenberg, MW Feldman, LL Cavalli-Sforza (2005) Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proceedings of the National Academy of Sciences USA 102: 15942-15947. [Abstract] [PDF] [Supplementary Figure 6] [Supplementary Table 2] [Supplementary text]

  • NA Rosenberg, S Mahajan, S Ramachandran, C Zhao, JK Pritchard, MW Feldman (2005) Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genetics 1: 660-671. [Abstract] [Full-text at journal website] [PDF]
Files:

HGDP 2002 microsatellites (377 autosomal microsatellites in 1056 individuals from 52 populations)

(Posted November 22, 2002) The following data files, all in plain text format, are reported in the following paper. The markers are drawn from Marshfield screening set 10. A description of how these data files differ from those on the Marshfield site is in the online supplement to the paper.

Files:


History

Created with 377 microsatellites, 22 November 2002
Addition of NEXUS file for 377 microsatellites, 28 December 2002
Minor modifications to site, 30 April 2004
Addition of data on 783 microsatellites and 210 indels, 1 November 2005
Addition of standardized subsets of individuals, 17 November 2006
Addition of SNP data from Conrad et al. (2006), 23 May 2007
Addition of genome-wide SNP and copy-number data, 26 February 2008
Addition of SNP data from Pemberton et al. (2008), 27 June 2008
Addition of sequence properties of microsatellites, 21 January 2010
Addition of data from Huang et al (2011), Pemberton et al. (2013), and Szpiech et al. (2013), 17 July 2013
Site substantially modified to improve readability, 17 July 2013