Rosenberg lab at Stanford University

Research

The research of the lab is in the general areas of evolutionary biology, population genetics, and phylogenetics. Our work largely consists of mathematical modeling and theory. We also engage in the development and implementation of new computational biology algorithms and statistical approaches, and in the use of biological problems to derive general advances in mathematics, statistics, and computational science.

Read about some of the specific areas of theory under active research in the lab...

Examine a classification of our published articles by subtopic...


Research themes of particular current interest

Major interests of the lab have included mathematical models in population genetics, mathematical analysis of statistics used in population genetics, mathematical phylogenetics, inference of human evolutionary history using genetic markers, and the relationship between human population genetics and the search for disease genes. Methodological approaches span diverse areas of mathematics, statistics, and computational science.

The list below describes themes that represent areas of current emphasis (April 2025). The theory research page describes a number of these topics in more detail. Trainees interested in joining the lab are encouraged to focus their interest on one or more of these areas.



Major research directions

Mathematical models in population genetics [entry point: theory research site]
We are interested in mathematical population genetics and in understanding how the various forces of evolution influence patterns of genetic variation. A focus is often on population-genetic theory for recently diverged populations or species. We are interested in how mathematical theory enables predictions about population-genetic data and how it can therefore aid in the development of statistical methods for analyzing these data. Our theoretical population genetics research considers dynamical and probabilistic models of populations as well as mathematical properties of the statistics used in population-genetic data analysis.

Mathematics of evolutionary trees [entry point: Degnan & Rosenberg (2009) review and theory research site]
Evolutionary descent follows tree-like processes that generate a variety of combinatorial structures of biological and mathematical interest. We are interested in understanding the various discrete structures that emerge in the study of evolutionary trees, and in deriving mathematical and biological knowledge from these structures. A particular interest concerns "gene trees and species trees." For closely related species, the evolutionary history of an individual gene need not reflect the history of species divergences. Partly because of this phenomenon of gene tree discordance, phylogenies reconstructed from different parts of a genome can suggest different relationships among the various species examined. We are developing theory that makes predictions about gene tree discordance, and we also study statistical methods for phylogenetic inference in closely related species.

Human variation and inference of human evolutionary history from genetic markers [entry point: Rosenberg (2011) review, republished in 2020 with a new foreword]
The genomes of individuals in a species record features of the history of the species. We are interested in understanding the geographic distribution of human genetic variation and in devising and applying statistical methods that use this variation to make inferences about human evolutionary history. We are broadly interested in the properties of statistical methods for analyzing genetic variation and in inferring genetic history, both from human data and from various other organisms.

The relationship of human population genetics to the search for disease-susceptibility genes [entry point: Rosenberg et al. (2010) review; Edge et al. (2013) review; Rosenberg et al. (2019) commentary]
The pattern of variation of a genetic marker in diseased and non-diseased individuals can potentially be used to identify a disease association with the marker. However, the history of the human population can affect the strength of the signal of association between markers and disease, as well as the replicability of observed associations across studies. We seek to understand the role of population-genetic factors in efforts to locate disease-susceptibility genes, and the effects of an understanding of human evolutionary history on such efforts.



Ten recent mini-collections of articles (2025)

  • Mathematical modeling and theory in population biology: what are the unique contributions and roles of mathematical modeling and theory in population biology? Essays, editorials, and commentaries trace the purposes and influences of mathematical modeling.
    [Rosenberg (2020)] [Kim et al. (2021)] [Rosenberg (2021b)] [Rosenberg & Boni (2022)] [Rosenberg (2022)] [Rosenberg et al. (2025)]

  • Consanguinity, runs of homozygosity, and identity by descent: consanguineous unions influence pedigrees and gene genealogies by increasing the probability that offspring receive two identical genetic lineages via two paths from the same ancestor. Mathematical models of consanguinity examine its effect on features of current interest in genomes: runs of homozygosity and long identical-by-descent segments.
    [Severson et al. (2019)] [Severson et al. (2021)] [Cotter et al. (2021)] [Cotter et al. (2022)] [Cotter et al. (2024)]

  • Mathematics of the measurement of variation in population genetics and community ecology: population genetics and community ecology both characterize variation from categorical data, in which individual observations are classified in one of a number of discrete categories (e.g. allelic types or taxonomic groups). Mathematical properties of the statistics used in these measurements guide interpretations of the resulting values.
    [Morrison et al. (2022)] [Morrison & Rosenberg (2023)] [Gress & Rosenberg (2024)] [Morrison et al. (2025)]

  • Coalescent theory, summary statistics, and tree balance: summaries of particular branch lengths, clades, and branch length ratios help to characterize evolutionary trees, toward understanding features of the processes that have produced those trees. Mathematical studies seek to analyze the properties of evolutionary trees under probabilistic models for how evolution takes place.
    [Kim et al. (2020)] [King & Rosenberg (2021)] [Alimpiev & Rosenberg (2022)] [Lappo & Rosenberg (2022)] [Lappo & Rosenberg (2025)]

  • Mixed-membership unsupervised clustering: mixed-membership unsupervised clustering organizes features of genetic variation into clusters that can assist in describing the gentic ancestry of a population. A series of studies examines the measurement of membership variation across individuals and the alignment of replicate cluster analyses — a task made difficult by the fact that unlabeled clusters that have no prior meaning.
    [Morrison et al. (2022)] [Liu et al. (2023)] [Liu et al. (2024)]

  • Combinatorics of galled trees: the galled trees are among the simplest classes of phylogenetic networks. Their combinatorial study includes enumerations of unlabeled and labeled time-consistent ("normal") galled trees, and enumerations of the labeled histories compatible with a labeled galled tree.
    [Mathur & Rosenberg (2023)] [Agranat-Tamir et al. (2024a)] [Agranat-Tamir et al. (2024c)]

  • Bijective encodings of rooted unlabeled trees with integers: bijective encodings of trees by integers provide convenient systems of representation, often facilitiating analyses of tree balance. Combinatorial studies describe properties of such an encoding, including the minimum and maximum among integers associated with some tree with a fixed number of leaves, and extensions to multifurcating trees.
    [Kim et al. (2020)] [Rosenberg (2021a)] [Maranca & Rosenberg (2024)] [Doboli et al. (2024)] [Devroye et al. (2025)]

  • Labeled histories: the labeled histories are the labeled sequences of coalescences in a tree that unfolds in time. Combinatorial studies examine the labeled histories for a variety of settings: galled trees, multifurcating trees, trees with simultaneity, and in analogies to sports tournaments.
    [Mathur & Rosenberg (2023)] [King & Rosenberg (2023)] [Dickey & Rosenberg (2025)]

  • Genetic record-matching for forensic genetics: genetic profiles containing disjoint sets of genetic markers can be identified as belonging to the same individual, with consequent forensic and privacy implications. Computational studies seek to evaluate the extent to which such matches can be made.
    [Edge et al. (2017)] [Kim et al. (2018)] [Kim & Rosenberg (2023)]

  • Admixture and genealogical ancestors: in admixed individuals, genealogical and genetic ancestors from the source populations can be counted in mathematical models. The computations have implications for the study of the history of admixed populations.
    [Kim et al. (2021)] [Mooney et al. (2023)] [Agranat-Tamir et al. (2024b)]