Theory research involves formulating and solving mathematical motivated by consideration of biological scenarios, and interpreting the mathematical results for their contributions to biology. Advances in our theoretical work often focus on mathematical models, involving construction and analysis of new models, derivation of new results about existing models, development of new techniques for analyzing models, and model comparisons. Progress can also come from mathematical analyses of statistical methods, numerical studies and simulations, or introduction of new theoretical principles.

Mathematical properties of population-genetic statistics. Many of the statistics used in population genetics are functions of the allele frequencies at a locus, a discrete set of nonnegative numbers that sum to one. This feature of allele frequencies contributes to surprising phenomena affecting some of the most popular population-genetic statistics, such as homozygosity and heterozygosity, the Fst measure of genetic differentiation, and the r2 statistic for linkage disequilibrium. For example, the upper and lower bounds on homozygosity vary as a function of the frequency of the most frequent allele at a locus, the upper bound on Fst varies with the homozygosity of a locus, and both upper bounds depend on the number of distinct alleles at a locus — all in a way that can be viewed as an epiphenomenon of the mathematical properties of the statistics. To facilitate sensible biological interpretations of observations of these statistics, we have been exploring their mathematical properties. This mathematical work provides explanations for a number of peculiar patterns seen in past applications of the statistics to population-genetic data.

The strict upper bound on the value of FST at a locus given the frequency of the most frequent allele. See Jakobsson, Edge, and Rosenberg (2013) for details.

Theoretical population genetics of admixture. When mating occurs between members of two or more groups that have long been separated, new populations can form that are admixed. Admixture is widespread in human populations, as a result of complex histories of migration, conquest, enslavement, and ongoing cultural interactions. A popular population-genetic model treats allele frequencies in an admixed population as linear combinations of the allele frequencies in its source populations, weighting each frequency by an admixture coefficient for its corresponding source population. We have examined a number of features of this admixture model in relation to the Fst measure of genetic differentiation, statistics for measuring ancestry information content, and neighbor-joining inference of population trees. Further, we have extended beyond the statistical model of admixture to develop a mechanistic model that acocunts for varying contributions of different source populations over time. This model enables assessments of the impact of different admixture histories on the pattern of admixture across individuals, and we are using it for analysis of the history and structure of admixture in a variety of admixed human populations.

A neighbor-joining tree illustrating the interior placement of admixed populations in relation to populations from source regions. See Kopelman, Stone, Gascuel, and Rosenberg (2013) for details.

Human migration and spatial expansion. The genomes of living humans carry information about past human migrations. Patterns of genetic diversity and similarity among individuals and populations reflect a complex history of such phenomena as migration, natural selection, and changes in population size. As population-genetic models of migration and spatial expansion make predictions about extant genetic variation given assumptions about active evolutionary phenomena, they can help to understand the connection between extant genetic variation and past evolutionary processes. We have been developing and studying models of population migration with the aim of understanding the processes that have been active during human evolution, particularly since the advent of anatomically modern humans. Recent interests include assessments of global models of human migration, evaluations of spatial patterns of genetic variation, and approaches for making use of genome-scale data.

A schematic of a serial founder model for human migrations out of Africa. See the work of Degiorgio, Degnan, and Rosenberg (2011) for details.

Consanguinity, identity by descent, relatedness, and runs of homozygosity. Genomic data enable new approaches for studies of genetic relationships and patterns of individual genomic sharing. For example, human individuals possess long stretches of their genomes in which the genomic copies inherited from their two parents are genetically identical. These runs of homozygosity (ROH) reflect a variety of different processes, such as pairing of identical ancient haplotypes, background levels of relatedness among individuals within in a population, and recent parental relatedness. We have been characterizing runs of homozygosity, their differences across human populations, and their connection with such processes as inbreeding, linkage disequilibrium, and the amplification of deleterious variation. We are also devising new approaches for assessing patterns of variation in data sets with a high level of genetic relatedness. Studies of ROH and relatedness contribute to such topics as clinical genomic testing, conservation genetics, and identification of genes for rare recessive diseases.
Combinatorics of evolutionary trees. Evolution within populations gives rise to trees of genetic lineages. When multiple species related by a species tree are considered, gene trees can differ in topology from each other and from the species tree on which they evolve. The joint analysis of gene trees and species trees then gives rise to consideration of a number of characteristic mathematical objects, such as coalescent histories and deep coalescences. Given a gene tree and a species tree, a coalescent history is a list of the branches of the species tree on which coalescences in the gene tree take place. A deep coalescence is tabulated when a pair of gene lineages fail to coalesce along a branch of the species tree. We have been examining how coalescent histories, deep coalescences, and other combinatorial features of gene trees and species trees generate both problems of mathematical interest as well as insights into the development and performance of methods for the inference of species trees.

A gene tree that disagrees with the species tree can have as many or more coalescent histories than a matching gene tree. See Rosenberg & Degnan (2010) for details.

Coalescent theory. The coalescent, a stochastic process that connects genealogical lineages to a common ancestor through a process of "coalescence" of lineage pairs, represents a natural framework for studying the evolutionary history underlying a genetic sample. We have been developing coalescent-based models to investigate a variety of population-genetic phenomena, particularly in a setting in which multiple populations are themselves related through a common ancestral population. Areas of recent interest have been in the use of the coalescent in genotype imputation for genetic association studies, coalescent theory for the study of human evolution, and the coalescent along the branches of a phylogenetic tree.

Population growth and coalescent waiting times. See Jewett, Zawistowski, Zöllner, and Rosenberg (2012) for details.

Inference of species trees under gene tree discordance. It has long been known that gene trees and species trees need not have the same shape. Surprisingly, we have found that gene tree discordance can be so great that under a standard model of within-species evolution, for any species tree topology with five or more species, there exist branch lengths for which the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This phenomenon of "anomalous gene trees" implies that when combining data on multiple loci, the simple procedure of using the most frequently observed gene tree topology to infer the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We have been exploring the properties of probability models for gene trees conditional on species trees, developing tools for inference of species trees in the setting of gene tree discordance, and analyzing their performance.

    An inductive proof that all species tree topologies with five or more taxa have anomalous gene trees. See Degnan & Rosenberg (2006) and Rosenberg (2013) for details.