Discovering structural units of chromosomal organization with matrix factorization and graph regularization

Abstract

Three dimensional organization of the genome is emerging as an important determinant of cell-type specific expression and is implicated in many diseases, including cancer (Bouwman & de Laat 2015). Hi-C is a type of high-throughput chromosome conformation capture (3C) assay which can be used to study the three-dimensional organization of chromosomes (Lieberman-Aiden et al. 2009). Analysis of Hi-C data can reconstruct the building blocks that give rise to or result from the organizational principles of the genome: topologically associating domains (TADs), transcriptionally active compartments, chromatin loops, chromosomal territories (Gibcus & Dekker 2013). Recent studies comparing TAD-finding methods (Forcato et al. 2017, Dali & Blanchette 2017) found the methods to vary significantly in their replicability and stability across sequence depth, sparsity, and resolution of the input data, suggesting the need for more robust methods.

Here we present GRiNCH, an approach based on Non-Negative Matrix Factorization (NMF) to identify organizational units of chromosomes from Hi-C data. NMF is a powerful dimensionality-reduction technique that can recover low-dimensional representations of images, texts, and biological data (Lee & Seung 2000). GRiNCH extends the NMF framework by using a graph regularization term that (Cai et al. 2011) encourages nearby genomic regions in similar chromatin state or with similar insulator binding pattern to converge to a similar low-dimensional state. Our results show that GRiNCH can recover clusters with TAD-like properties whose boundaries show a significant association with the presence of CTCF binding. Compared to existing TAD-finding methods, GRiNCH clusters are more stable to sparse and low-depth Hi-C datasets. Finally, through a matrix completion process, GRiNCH can impute missing interaction counts and offer a smoothed Hi-C matrix comparable in quality to smoothing process employed by methods like HiCRep (Yang et al. 2017). Taken together, GRiNCH offers a promising approach to mining biologically meaningful structural domains of the genome.

Date
May 21, 2019
Location
Madison, WI, United States

Related