Rosenberg lab at Stanford University

Theory research in the lab

Theory research involves formulating and solving mathematical motivated by consideration of biological scenarios, and interpreting the mathematical results for their contributions to biology. Advances in our theoretical work often focus on mathematical models, involving construction and analysis of new models, derivation of new results about existing models, development of new techniques for analyzing models, and model comparisons. Progress can also come from mathematical analyses of statistical methods, numerical studies and simulations, or introduction of new theoretical principles.


Mathematical properties of population-genetic statistics. Many of the statistics used in population genetics are functions of the allele frequencies at a locus, a discrete set of nonnegative numbers that sum to one. This feature of allele frequencies contributes to surprising phenomena affecting some of the most popular population-genetic statistics, such as homozygosity and heterozygosity, the Fst measure of genetic differentiation, and the r2 statistic for linkage disequilibrium. For example, the upper and lower bounds on homozygosity vary as a function of the frequency of the most frequent allele at a locus, the upper bound on Fst varies with the homozygosity of a locus, and both upper bounds depend on the number of distinct alleles at a locus — all in a way that can be viewed as an epiphenomenon of the mathematical properties of the statistics. To facilitate sensible biological interpretations of observations of these statistics, we have been exploring their mathematical properties. This mathematical work provides explanations for a number of peculiar patterns seen in past applications of the statistics to population-genetic data.


The strict upper bound on the value of FST at a locus given the frequency of the most frequent allele. See Jakobsson, Edge, and Rosenberg (2013) for details.


Theoretical population genetics of admixture. When mating occurs between members of two or more groups that have long been separated, new populations can form that are admixed. Admixture is widespread in human populations, as a result of complex histories of migration, conquest, enslavement, and ongoing cultural interactions. A popular population-genetic model treats allele frequencies in an admixed population as linear combinations of the allele frequencies in its source populations, weighting each frequency by an admixture coefficient for its corresponding source population. We have examined a number of features of this admixture model in relation to the Fst measure of genetic differentiation, statistics for measuring ancestry information content, and neighbor-joining inference of population trees. Further, we have extended beyond the statistical model of admixture to develop a mechanistic model that acocunts for varying contributions of different source populations over time. This model enables assessments of the impact of different admixture histories on the pattern of admixture across individuals, and we are using it for analysis of the history and structure of admixture in a variety of admixed human populations.

A neighbor-joining tree illustrating the interior placement of admixed populations in relation to populations from source regions. See Kopelman, Stone, Gascuel, and Rosenberg (2013) for details.


Human migration and spatial expansion. The genomes of living humans carry information about past human migrations. Patterns of genetic diversity and similarity among individuals and populations reflect a complex history of such phenomena as migration, natural selection, and changes in population size. As population-genetic models of migration and spatial expansion make predictions about extant genetic variation given assumptions about active evolutionary phenomena, they can help to understand the connection between extant genetic variation and past evolutionary processes. We have been developing and studying models of population migration with the aim of understanding the processes that have been active during human evolution, particularly since the advent of anatomically modern humans. Recent interests include assessments of global models of human migration, evaluations of spatial patterns of genetic variation, and approaches for making use of genome-scale data.

A schematic of a serial founder model for human migrations out of Africa. See the work of Degiorgio, Degnan, and Rosenberg (2011) for details.


Consanguinity, identity by descent, relatedness, and runs of homozygosity. Genomic data enable new approaches for studies of genetic relationships and patterns of individual genomic sharing. For example, human individuals possess long stretches of their genomes in which the genomic copies inherited from their two parents are genetically identical. These runs of homozygosity (ROH) reflect a variety of different processes, such as pairing of identical ancient haplotypes, background levels of relatedness among individuals within in a population, and recent parental relatedness. We have been characterizing runs of homozygosity, their differences across human populations, and their connection with such processes as inbreeding, linkage disequilibrium, and the amplification of deleterious variation. We are also devising new approaches for assessing patterns of variation in data sets with a high level of genetic relatedness. Studies of ROH and relatedness contribute to such topics as clinical genomic testing, conservation genetics, and identification of genes for rare recessive diseases.
Combinatorics of evolutionary trees. Evolution within populations gives rise to trees of genetic lineages. When multiple species related by a species tree are considered, gene trees can differ in topology from each other and from the species tree on which they evolve. The joint analysis of gene trees and species trees then gives rise to consideration of a number of characteristic mathematical objects, such as coalescent histories and deep coalescences. Given a gene tree and a species tree, a coalescent history is a list of the branches of the species tree on which coalescences in the gene tree take place. A deep coalescence is tabulated when a pair of gene lineages fail to coalesce along a branch of the species tree. We have been examining how coalescent histories, deep coalescences, and other combinatorial features of gene trees and species trees generate both problems of mathematical interest as well as insights into the development and performance of methods for the inference of species trees.

  • F Disanto, NA Rosenberg (2016) Asymptotic properties of the number of matching coalescent histories for caterpillar-like families of species trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics 13: 913-925. [Abstract] [PDF]

  • F Disanto, NA Rosenberg (2015) Coalescent histories for lodgepole species trees. Journal of Computational Biology 22: 918-929. [Abstract] [PDF]

  • F Disanto, NA Rosenberg (2014) On the number of ranked species trees producing anomalous ranked gene trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics 11: 1229-1238. [Abstract] [PDF]

  • CV Than, NA Rosenberg (2014) Mean deep coalescence cost under exchangeable probability distributions. Discrete Applied Mathematics 174: 11-26. [Abstract] [PDF]

  • NA Rosenberg (2013) Coalescent histories for caterpillar-like families. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10: 1253-1262. [Abstract] [PDF]

  • NA Rosenberg (2013) Discordance of species trees with their most likely gene trees: a unifying principle. Molecular Biology and Evolution 30: 2709-2713. [Abstract] [Full-text at journal website] [PDF]

  • CV Than, NA Rosenberg (2013) Mathematical properties of the deep coalescence cost. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10: 61-72. [Abstract] [PDF]

  • JH Degnan, NA Rosenberg, T Stadler (2012) A characterization of the set of species trees that produce anomalous ranked gene trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9: 1558-1568. [Abstract] [PDF]

  • JH Degnan, NA Rosenberg, T Stadler (2012) The probability distribution of ranked gene trees on a species tree. Mathematical Biosciences 235: 45-55. [Abstract] [PDF]

  • NA Rosenberg, JH Degnan (2010) Coalescent histories for discordant gene trees and species trees. Theoretical Population Biology 77: 145-151. [Abstract] [PDF]

A gene tree that disagrees with the species tree can have as many or more coalescent histories than a matching gene tree. See Rosenberg & Degnan (2010) for details.


Coalescent theory. The coalescent, a stochastic process that connects genealogical lineages to a common ancestor through a process of "coalescence" of lineage pairs, represents a natural framework for studying the evolutionary history underlying a genetic sample. We have been developing coalescent-based models to investigate a variety of population-genetic phenomena, particularly in a setting in which multiple populations are themselves related through a common ancestral population. Areas of recent interest have been in the use of the coalescent in genotype imputation for genetic association studies, coalescent theory for the study of human evolution, and the coalescent along the branches of a phylogenetic tree.

  • RS Mehta, D Bryant, NA Rosenberg (2016) The probability of monophyly of a sample of gene lineages on a species tree. Proceedings of the National Academy of Sciences 113: 8002-8009. [Abstract] [PDF] [Supplement] [Software]

  • EM Jewett, NA Rosenberg (2014) Theory and applications of a deterministic approximation to the coalescent model. Theoretical Population Biology 93: 14-29. [Abstract] [PDF]

  • L Huang, EO Buzbas, NA Rosenberg (2013) Genotype imputation in a coalescent model with infinitely-many-sites mutation. Theoretical Population Biology 87: 62-74. [Abstract] [PDF]

  • EM Jewett*, M Zawistowski*, NA Rosenberg, S Zöllner (2012) A coalescent model for genotype imputation. Genetics 191: 1239-1255. [Abstract] [PDF]

  • D Bryant, R Bouckaert, J Felsenstein, NA Rosenberg, A RoyChoudhury (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29: 1917-1932. [Abstract] [PDF] [Supplement]

  • JH Degnan, NA Rosenberg, T Stadler (2012) The probability distribution of ranked gene trees on a species tree. Mathematical Biosciences 235: 45-55. [Abstract] [PDF]

  • M DeGiorgio, JH Degnan, NA Rosenberg (2011) Coalescence-time distributions in a serial founder model of human evolutionary history. Genetics 189: 579-593. [Abstract] [PDF]

  • ZA Szpiech, NA Rosenberg (2011) On the size distribution of private microsatellite alleles. Theoretical Population Biology 80: 100-113. [Abstract] [PDF]

  • NA Rosenberg, M Nordborg (2002) Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms. Nature Reviews Genetics 3: 380-390. [Abstract] [PDF]

Population growth and coalescent waiting times. See Jewett, Zawistowski, Zöllner, and Rosenberg (2012) for details.


Inference of species trees under gene tree discordance. It has long been known that gene trees and species trees need not have the same shape. Surprisingly, we have found that gene tree discordance can be so great that under a standard model of within-species evolution, for any species tree topology with five or more species, there exist branch lengths for which the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This phenomenon of "anomalous gene trees" implies that when combining data on multiple loci, the simple procedure of using the most frequently observed gene tree topology to infer the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We have been exploring the properties of probability models for gene trees conditional on species trees, developing tools for inference of species trees in the setting of gene tree discordance, and analyzing their performance.

  • LH Uricchio, T Warnow, NA Rosenberg (2016) An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees. BMC Bioinformatics 17: 417. [Abstract] [PDF]

  • M DeGiorgio, NA Rosenberg (2016) Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure. Theoretical Population Biology 110: 12-24. [Abstract] [PDF]

  • D Bryant, R Bouckaert, J Felsenstein, NA Rosenberg, A RoyChoudhury (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29: 1917-1932. [Abstract] [PDF] [Supplement]

  • LJ Helmkamp, EM Jewett, NA Rosenberg (2012) Improvements to a class of distance matrix methods for inferring species trees from gene trees. Journal of Computational Biology 19: 632-649. [Abstract] [PDF]

  • EM Jewett, NA Rosenberg (2012) iGLASS: an improvement to the GLASS method for estimating species trees from gene trees. Journal of Computational Biology 19: 293-315. [Abstract] [PDF]

  • CV Than, NA Rosenberg (2011) Consistency properties of species tree inference by minimizing deep coalescences. Journal of Computational Biology 18: 1-15. [Abstract] [PDF]

  • JH Degnan, M DeGiorgio, D Bryant, NA Rosenberg (2009) Properties of consensus methods for inferring species trees from gene trees. Systematic Biology 58: 35-54. [Abstract] [PDF]

  • JH Degnan, NA Rosenberg (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology and Evolution 24: 332-340. [Abstract] [PDF] [Supplement]


    An inductive proof that all species tree topologies with five or more taxa have anomalous gene trees. See Degnan & Rosenberg (2006) and Rosenberg (2013) for details.