A Framework for Modeling Cancer Evolution in Single Cells

"Notes from the Lab" spotlights innovative work addressing problems in cancer research and care from Columbia investigators, post-docs, fellows, and students.

April 28, 2023

The Tavaré Lab

Dr. Khanh Dinh is a postdoctoral research scientist in the Irving Institute for Cancer Dynamics (IICD) and the department of statistics. Dr. Simon Tavaré is the Herbert and Florence Irving Director and a professor of statistics and biological sciences. He is also a member of the Herbert Irving Comprehensive Cancer Center.  

We develop mathematical models and computational algorithms to study DNA-sequencing data and learn about different aspects of cancer evolution. From this data, we can discover the age of tumors, the rates at which different cancer drivers occur, and how those drivers influence tumor evolution.

The Research

This work was recently presented at the 2022 European Conference on Mathematical and Theoretical Biology (ECMTB) and at the Irving Institute for Cancer Dynamics’ Mathematical and Computational Methods in Cancer and Biology Symposium. Previous research on the same topic “Statistical Inference for the Evolutionary History of Cancer Genomes,” was also published in Statistical Science.

The Cancer Problem We Are Solving

Since the 1970s, numerous studies have applied the theory of evolution to cancer research, using a mathematical framework to study tumor samples and learn how cancers evolve. These studies demonstrate that different cells in a tumor exhibit different genomes, and that evolution has favored the selection of tumor cells that promote tumor expansion, relapse, and metastasis. These mathematical models have been used extensively since the dawn of DNA sequencing (DNA-seq), leading to significant insights into how cancer develops. However, most of these models focus on the role of DNA mutations in driving cancer evolution. 

In the last decade, advances in DNA-seq have brought about an appreciation for the role of other DNA modifications in causing cancer such as the instantaneous rearrangement of the entire genome or the duplication or deletion of a small region on a chromosome. Though they likely have a role in causing cancer, it is difficult to understand how these different components affect each other and together contribute to cancer development. We have been working to establish both a modeling framework that can track multiple DNA change mechanisms and numerical algorithms that use this modeling framework to analyze DNA data.

What We've Uncovered So Far

We constructed a mathematical model that tracks the DNA profiles of single cells over time. These DNA profiles evolve as genomic damage occurs, fueling the selection of different clones. We then developed an efficient algorithm to simulate large cell populations. Finally, we built optimization algorithms that find the model parameters for a given cancer dataset. Our algorithms to find parameters from general DNA-seq data have shown agreement with empirical chromosome scores, indicating that the model is biologically relevant.  

When applied to general DNA-seq data, the model maps the DNA change patterns across the cancer genome to quantitative scores for individual chromosomes. These scores have shown remarkable agreement with different gene types located on these chromosomes, therefore proving the model’s biological relevance.

Why Is this Discovery Important?

It is difficult to predict treatment outcomes, the likelihood of cancer relapse, and metastasis if we do not have a clear picture of how cancer forms in the first place. Many aspects of DNA changes and selection forces together contribute to tumor formation.  

To effectively treat cancer, it is essential to understand how a tumor forms and adapts to a changing environment. This is made possible by recent technological breakthroughs, which have allowed us to appreciate the diversity that exists between cancer cells. However, to develop a systematic approach to analyzing how a tumor grows, we need a mathematical framework that considers all DNA modifications simultaneously. We aim to provide the research community with a modeling framework and algorithms to take apart these puzzle pieces.

Next Steps

We are currently developing algorithms for single-cell DNA-seq datasets, which can find DNA modification rates and selection rates that are difficult to untangle from bulk DNA-seq data. The algorithms will be used to analyze single-cell DNA-seq data from our collaborators at Memorial Sloan Kettering Cancer Center and Columbia University. Further on, the algorithm suite will be publicly available for the research community.