Research
The research of the lab is in the general areas of evolutionary biology,
population genetics, and phylogenetics. Our work largely consists of
mathematical modeling and theory. We also engage in the development and
implementation of new computational biology algorithms and statistical
approaches, and in the use of biological problems to derive general
advances in mathematics, statistics, and computational science.
Read about some of the specific
areas of theory under active research in the lab...
Examine a classification of our published
articles by subtopic...
Research themes of particular current interest
Major interests of the lab have included mathematical models in
population genetics, mathematical analysis of statistics used in
population genetics, mathematical phylogenetics, inference of human
evolutionary history using genetic markers, and the relationship between
human population genetics and the search for disease
genes. Methodological approaches span diverse areas of mathematics,
statistics, and computational science.
The list below describes themes that represent areas of current emphasis
(April 2025). The theory research page describes
a number of these topics in more detail. Trainees interested in joining
the lab are encouraged to focus their interest on one or more of these
areas.
Major research directions
Mathematical models in population genetics
[entry point: theory research site]
We are interested in mathematical population genetics and in
understanding how the various forces of evolution influence patterns of
genetic variation. A focus is often on population-genetic theory for
recently diverged populations or species. We are interested in how
mathematical theory enables predictions about population-genetic data and
how it can therefore aid in the development of statistical methods for
analyzing these data. Our theoretical population genetics research
considers dynamical and probabilistic models of populations as well as
mathematical properties of the statistics used in population-genetic data
analysis.
Mathematics of evolutionary trees
[entry point: Degnan &
Rosenberg (2009) review and theory research site]
Evolutionary descent follows tree-like processes that generate a
variety of combinatorial structures of biological and mathematical
interest. We are interested in understanding the various discrete
structures that emerge in the study of evolutionary trees, and in deriving
mathematical and biological knowledge from these structures. A particular
interest concerns "gene trees and species trees." For closely related
species, the evolutionary history of an individual gene need not reflect
the history of species divergences. Partly because of this phenomenon of
gene tree discordance, phylogenies reconstructed from different parts of a
genome can suggest different relationships among the various species
examined. We are developing theory that makes predictions about gene tree
discordance, and we also study statistical methods for phylogenetic
inference in closely related species.
Human variation and inference of human evolutionary history from
genetic markers [entry
point: Rosenberg (2011)
review, republished in 2020 with
a new
foreword]
The genomes of individuals in a species record features of the history
of the species. We are interested in understanding the geographic
distribution of human genetic variation and in devising and applying
statistical methods that use this variation to make inferences about human
evolutionary history. We are broadly interested in the properties of
statistical methods for analyzing genetic variation and in inferring
genetic history, both from human data and from various other organisms.
The relationship of human population genetics to the search for
disease-susceptibility genes
[entry point: Rosenberg et
al. (2010) review;
Edge et
al. (2013)
review; Rosenberg et
al. (2019) commentary]
The pattern of variation of a genetic marker in diseased and
non-diseased individuals can potentially be used to identify a disease
association with the marker. However, the history of the human population
can affect the strength of the signal of association between markers and
disease, as well as the replicability of observed associations across
studies. We seek to understand the role of population-genetic factors in
efforts to locate disease-susceptibility genes, and the effects of an
understanding of human evolutionary history on such efforts.
Ten recent mini-collections of articles (2025)
- Mathematical modeling and theory in population biology:
what are the unique contributions and roles of mathematical modeling
and theory in population biology? Essays, editorials, and commentaries trace the
purposes and influences of mathematical modeling.
[Rosenberg (2020)]
[Kim et al. (2021)]
[Rosenberg (2021b)]
[Rosenberg & Boni
(2022)] [Rosenberg
(2022)]
[Rosenberg et al. (2025)]
- Consanguinity, runs of homozygosity, and identity by
descent: consanguineous unions influence pedigrees and gene
genealogies by increasing the probability that offspring receive two identical
genetic lineages via two paths from the same ancestor. Mathematical
models of consanguinity examine its effect on features of current
interest in genomes: runs of homozygosity and long
identical-by-descent segments.
[Severson et
al. (2019)] [Severson et
al. (2021)] [Cotter et al.
(2021)] [Cotter et
al. (2022)] [Cotter et
al. (2024)]
- Mathematics of the measurement of variation in population
genetics and community ecology: population genetics and community
ecology both characterize variation from categorical data, in which
individual observations are classified in one of a number of discrete
categories (e.g. allelic types or taxonomic groups). Mathematical
properties of the statistics used in these measurements guide
interpretations of the resulting values.
[Morrison et
al. (2022)]
[Morrison &
Rosenberg (2023)]
[Gress & Rosenberg
(2024)] [Morrison et
al. (2025)]
- Coalescent theory, summary statistics, and tree balance:
summaries of particular branch lengths, clades, and branch length
ratios help to characterize evolutionary trees, toward understanding
features of the processes that have produced those trees. Mathematical
studies seek to analyze the properties of evolutionary trees under
probabilistic models for how evolution takes place.
[Kim et al. (2020)]
[King & Rosenberg
(2021)] [Alimpiev &
Rosenberg (2022)]
[Lappo & Rosenberg
(2022)] [Lappo &
Rosenberg (2025)]
- Mixed-membership unsupervised clustering: mixed-membership
unsupervised clustering organizes features of genetic variation into
clusters that can assist in describing the gentic ancestry of a
population. A series of studies examines the measurement of membership
variation across individuals and the alignment of replicate cluster
analyses a task made difficult by the fact that unlabeled
clusters that have no prior meaning.
[Morrison et al. (2022)]
[Liu et al. (2023)]
[Liu et al. (2024)]
- Combinatorics of galled trees: the galled trees are
among the simplest classes of phylogenetic networks. Their
combinatorial study includes enumerations of unlabeled and labeled
time-consistent ("normal") galled trees, and enumerations of the
labeled histories compatible with a labeled galled tree.
[Mathur & Rosenberg
(2023)]
[Agranat-Tamir et
al. (2024a)]
[Agranat-Tamir et al. (2024c)]
- Bijective encodings of rooted unlabeled trees with integers:
bijective encodings of trees by integers provide convenient systems of
representation, often facilitiating analyses of tree
balance. Combinatorial studies describe properties of such an encoding,
including the minimum and maximum among integers associated with some tree with a
fixed number of leaves, and extensions to multifurcating trees.
[Kim et
al. (2020)] [Rosenberg
(2021a)] [Maranca
& Rosenberg (2024)]
[Doboli et al. (2024)]
[Devroye et al. (2025)]
- Labeled histories: the labeled histories are the labeled
sequences of coalescences in a tree that unfolds in
time. Combinatorial studies examine the labeled histories for a
variety of settings: galled trees, multifurcating trees, trees with
simultaneity, and in analogies to sports tournaments.
[Mathur &
Rosenberg (2023)]
[King & Rosenberg
(2023)]
[Dickey & Rosenberg
(2025)]
- Genetic record-matching for forensic genetics: genetic
profiles containing disjoint sets of genetic markers can be identified
as belonging to the same individual, with consequent forensic and
privacy implications. Computational studies seek to evaluate the
extent to which such matches can be made.
[Edge et al. (2017)]
[Kim et al. (2018)]
[Kim & Rosenberg (2023)]
- Admixture and genealogical ancestors: in admixed individuals,
genealogical and genetic ancestors from the source
populations can be counted in mathematical models. The
computations have implications for the study of the history
of admixed populations.
[Kim et al. (2021)]
[Mooney et al. (2023)]
[Agranat-Tamir et al. (2024b)]