Discovering structural units of chromosomal organization with matrix factorization and graph regularization

Abstract

The three-dimensional (3D) organization of the genome is an important layer of regulation in developmental, disease, and evolutionary processes. Hi-C is a high-throughput chromosome conformation capture (3C) assay used to study the 3D genome by measuring pairwise interactions of genomic loci. Analysis of Hi-C data has shown that the genome is organized into higher-order organizational units such as compartments and topologically associating domains (TADs). Recent comparisons of TAD-finding methods found them to be unstable to different resolutions and sparsity levels of Hi-C data, suggesting the need for more robust methods. We present GRiNCH, a graph-regularized Non-negative Matrix Factorization (NMF) approach to identifying organizational units of chromosomes from Hi-C data. GRiNCH uses graph regularization to encourage neighboring genomic regions to belong to the same low-dimensional space. GRiNCH can recover TAD-like clusters which are significantly enriched in architectural protein binding in the boundaries and are more stable to sparse and low-depth Hi-C datasets than existing methods. Finally, GRiNCH can use the low-dimensional NMF factors to impute missing interaction counts and offer a smoothed Hi-C matrix. Taken together, GRiNCH offers a promising approach to identifying biologically meaningful structural domains of the genome.

Date
Jul 22, 2019
Location
Basel, Switzerland

Related