Rosenberg lab - software

Software

Note: We are pleased that you find programs from the lab to be of interest. It is our usual practice to include informative error messages and detailed user manuals. We welcome your feedback and wish to facilitate your use of the programs. When questions do arise, it is much appreciated if users put their best effort into making sure that the answer is not in the manual before contacting us. Thanks for your help.

ADZE

ADZE is a program that implements the rarefaction method for analyzing allelic diversity across populations while correcting for sample size differences. Using individual multilocus genotype data on genetic polymorphisms, ADZE computes estimates of allelic richness, private allelic richness, and private allelic richness for combinations of populations. ADZE was used for generating Figure 1a in "Genotype, haplotype, and copy-number variation in worldwide human populations" Nature 451: 998-1003 (2008). A program note describing ADZE was published in Bioinformatics 24: 2498-2504 (2008).

CLUMPP

CLUMPP is a program that deals with label switching and multimodality problems in population-genetic cluster analyses. CLUMPP permutes the clusters output by independent runs of clustering programs such as structure, so that they match up as closely as possible. The user has the option of choosing one of three algorithms for aligning replicates, with a tradeoff of speed and similarity to the optimal alignment. A program note describing CLUMPP was published in Bioinformatics 23: 1801-1806 (2007).

distruct

distruct is a program that can be used to graphically display results produced by the clustering program structure or by other similar programs. The figures produced by distruct display individual membership coefficients in the same form as used in "Genetic structure of human populations" Science 298: 2381-2385 (2002). Various options enable the user to control left-to-right printing order of populations, bottom-to-top printing order of clusers, colors, and other graphical details. A program note describing distruct was published in Molecular Ecology Notes 4: 137-138 (2004).

haploconfig

haploconfig is a program that can be used to implement tests of neutrality based on the frequency distribution of haplotypes in a sample of DNA sequences (the "haplotype configuration") and the number of segregating sites. The neutrality tests can be performed conditional on the standard neutral coalescent model with or without recombination, exponential population growth, or island migration. A description of the method underlying the program can be found in "Statistical tests of the coalescent model based on the haplotype frequency distribution and the number of segregating sites" Genetics 169: 1763-1777 (2005). The haploconfig program can also be used as a coalescent simulator for models with or without recombination.

infocalc

infocalc is a small script for calculating statistics that measure the ancestry information content of genetic markers. A description of these statistics can be found in "Informativeness of genetic markers for inference of ancestry" American Journal of Human Genetics 73: 1402-1422 (2003), with extensions in "Algorithms for selecting informative marker panels for population assignment" Journal of Computational Biology 12: 1183-1201 (2005).

MicroDrop

MicroDrop is a program that estimates allelic dropout rates from nonreplicated microsatellite genotype data. MicroDrop uses the estimated dropout rates to provide imputed data sets that sample missing genotypes (and replace some homozygous genotypes) using an allele frequency model with or without Hardy-Weinberg equilibrium. A description of the method underlying the program can be found in "A maximum likelihood method to correct for allelic dropout in microsatellite data with no replicate genotypes" Genetics 192: 651-669 (2012).

Monophyler

Monophyler computes monophyly probabilities for gene lineages conditional on a species tree. Monophyler uses the coalescent together with an arbitrary species tree to determine the monophyly probabilities. The calculations are detailed in three papers. (1) Using the method of "The probability of monophyly of a sample of gene lineages on a species tree" Proceedings of the National Academy of Sciences USA 113: 8002-8009 (2016), Monophyler calculates the probability of monophyly for a set of lineages or the reciprocal monophyly probability for a pair of sets of lineages. (2) Using "The probability of reciprocal monophyly of gene lineages in three and four species" Theoretical Population Biology 129: 133-147 (2019), it calculates various monophyly probabilities involving three or four lineage sets in three-taxon and four-taxon species trees. (3) Finally, using "The probability of joint monophyly of samples of gene lineages for all species in an arbitrary species tree", it calculates the probability of reciprocal monophyly for an arbitrary number of sets of lineages on an arbitrary species tree.