Detecting higher-order structural changes in 3D genome organization with multi-task matrix factorization

Abstract

Three-dimensional (3D) genome organization, which determines how the DNA is packaged inside the nucleus, has emerged as a key regulatory mechanism of cellular processes. High-throughput chromosomal conformation capture (Hi-C) technologies have enabled the study of 3D genome organization by experimentally measuring interactions among genomic regions in 3D space. Analysis of Hi-C data has revealed higher-order organizational units at multiple resolutions: chromosomal territories, compartments, and topologically associating domains (TADs). Changes or disruptions to such structures have been associated with disease, developmental, and evolutionary processes. Therefore, a key problem is to systematically detect higher-order structural changes across Hi-C datasets from multiple conditions. Existing methods to detect changes in 3D genome organization either do not model higher-order structural units, are specialized to one type of unit (e.g., TADs), or only compare pairs of Hi-C datasets. We address these limitations with Tree-structured Graph-regularized Integrated Factorization (TGIF), a new multi-task Non-negative Matrix Factorization (NMF) approach. TGIF makes use of complex tree-structured relationships among multiple Hi-C datasets such that closely related tasks, one for each Hi-C matrix, have similar lower-dimensional factors. The factors can be further constrained with task-specific graph regularization and are used to extract clusters of genomic regions with dynamically changing interaction profiles across tasks. We applied TGIF to simulated data and in real Hi-C data from cancer cell lines and mouse neural development process. TGIF effectively recovers ground-truth clusters in simulated data even with a large amount of noise and sparsity. When applied to genome-wide Hi-C matrices from karyotypically normal hematopoietic stem and progenitor cells (HSPC) and two chronic myelogenous leukemia (CML) cell lines (K562 and KBM7), TGIF detects the Philadelphia translocation, a large reciprocal translocation between chr9 and chr22 used in the diagnosis of CML. In per-chromosome Hi-C matrices from three cell states during mouse neural development (embryonic stem cell, neural progenitors, and cortical neurons), TGIF is able to identify compartmental switches as well as local TAD shifts accompanying change in nearby gene expression. Taken together, TGIF provides a powerful multi-task framework to study the dynamics and context-specificity of 3D genome organization.

Date
Nov 16, 2020—Nov 19, 2020
Location
Virtual/online

Related