A graph-regularized non-negative matrix factorization method to discover organizational units of chromosomes

Abstract

Three dimensional organization of the genome is emerging as an important determinant of cell-type specific expression and is implicated in many diseases, including cancer. Hi-C is a type of high-throughput chromosome conformation capture (3C) assay used to study three-dimensional organization of the genome. Analysis of Hi-C data has shown that the genome is organized into higher-order organizational units such as compartments and topologically associated domains (TADs). We present a non-negative matrix factorization approach, commonly used for clustering and dimensionality reduction, to infer clusters of regions from Hi-C data. To preserve the spatial dependency of Hi-C data (i.e. closer regions interact more with each other), we impose regularization on NMF with the nearest-neighbor graph of each genomic loci. Our results show that NMF and graph-regularized NMF are both important to discover clusters that exhibit a significant association with the presence of CTCF binding at the cluster boundaries and are robust to simulated sparsity and lower sequence depth.

Date
Jul 7, 2018
Location
Chicago, United States

Related