.. _CELL_TYPE_DIVERSITY: Cell type diversity =================== This section provides data related to the different subtypes present in the heart, i.e. snATAC cluster peaks, specific loci, feature clusters, HOMER motifs, GREAT ontology and co-accessible loci, RNA-Seq specific genes. snATAC-Seq Cluster peaks ------------------------ Below are the table of the peaks obtained for each cluster using MACS2 software with the following settings: `--nomodel --keep-dup all -q 0.01 --shift 37 --extsize 73` .. list-table:: macs2 cluster specific peaks :widths: 10 5 10 50 * - File - Size - Last modified - Description * - `Adipocyte.narrowPeak.bed `_ - 2.8M - 03/25/20 - macs2 narrowPeak file * - `all.merged.annotated.bed `_ - 8.4M - 03/25/20 - merged macs2 narrowPeak file for all clusters * - `all.merged.bed `_ - 6.6M - 03/25/20 - macs2 narrowPeak file * - `Atrial_cardiomyocyte.narrowPeak.bed `_ - 12M - 03/25/20 - macs2 narrowPeak file * - `Endothelial.narrowPeak.bed `_ - 6.0M - 03/25/20 - macs2 narrowPeak file * - `Fibroblast.narrowPeak.bed `_ - 12M - 03/25/20 - macs2 narrowPeak file * - `Lymphocyte.narrowPeak.bed `_ - 1.1M - 03/25/20 - macs2 narrowPeak file * - `Macrophage.narrowPeak.bed `_ - 6.7M - 03/25/20 - macs2 narrowPeak file * - `Merged.Consensus.narrowPeak.gz `_ - 7.6M - 03/25/20 - macs2 narrowPeak file * - `Nervous.narrowPeak.bed `_ - 2.0M - 03/25/20 - macs2 narrowPeak file * - `Smooth_muscle.narrowPeak.bed `_ - 6.8M - 03/25/20 - macs2 narrowPeak file * - `Ventricular_cardiomyocyte.narrowPeak.bed `_ - 15M - 03/25/20 - macs2 narrowPeak file snATAC-Seq Cluster specific features ------------------------------------ Below are lists of statistically significant cluster specific features (FDR < 0.01) derived from the 287K set of merged peaks from all clusters. They were detected by using edgeR analysis followed by K-means clustering. FDR, test statistic, etc for each of these elements can be found below in the “EdgeR analysis statistics” subsection. .. list-table:: feature bed file :widths: 10 5 10 40 * - File - Size - Last modified - Kmeans and edgeR features * - `Adipocyte.specific.elements.bed `_ - 8.6K - 03/24/20 - Kmeans and edgeR features * - `Atrial_cardiomyocyte.specific.elements.bed `_ - 75K - 03/24/20 - Kmeans and edgeR features * - `Endothelial.specific.elements.bed `_ - 44K - 03/24/20 - Kmeans and edgeR features * - `Fibroblast.specific.elements.bed `_ - 94K - 03/24/20 - Kmeans and edgeR features * - `Lymphocyte.specific.elements.bed `_ - 3.7K - 03/24/20 - Kmeans and edgeR features * - `Macrophage.specific.elements.bed `_ - 78K - 03/24/20 - Kmeans and edgeR features * - `Nervous_cell.specific.elements.bed `_ - 13K - 03/24/20 - Kmeans and edgeR features * - `Smooth_muscle.specific.elements.bed `_ - 58K - 03/24/20 - Kmeans and edgeR features * - `Ventricular_cardiomyocyte..specific.elements.bed `_ - 84K - 03/24/20 - Kmeans and edgeR features snATAC EdgeR analysis statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The edgeR results files contain the edgeR statistics for snATAC cluster significant peaks. The protocol followed was a) extraction of subtype specific features identified by a kmeans analysis and b) testing this features for each subtype using a one vs the rest strategy with edgeR. The covariates used were: donorID, sex, and read depth. These files contain a count table for cells from each donor inside and outside the subtype. The last columns report edgeR pvalues, FDR, log fold change (logFC). These statistics are reported for all the features tested, even the not significant. The subtype specific features have logFC < 0 and FDR < 0.01. .. code-block:: shell head Adipocyte.1e-2.sex.edgeR.tsv|cut -f1,2,3,7,8,30,31,32,32,34,35,36,37 ######## coordinates ############## ########## count table ########### ###################################### statistics ######################################### #### logFC #### ##### FDR ##### "chr7 114084392 114086804" 456 22 0 130 ... 3.02723597551046 124.710942232424 5.88756270976376e-29 1.14413006138839e-24 "chr4 99642626 99643349" 402 0 0 35 ... 3.13377334565615 97.5446708424732 5.26548775020566e-23 5.11621117248733e-19 "chr1 226972955 226975414" 511 10 0 38 ... 2.72457802893285 92.5225725063236 6.65612519270069e-22 4.31161602899175e-18 "chr14 81831044 81831953" 179 4 0 33 ... 3.22672031578897 90.6376381541516 1.72546700135439e-21 8.38275005932995e-18 "chr16 57829523 57830435" 166 5 0 27 ... 3.10956679858585 87.7589533880059 7.39374830753615e-21 2.873654217207e-17 "chr20 29444936 29445950" 121 15 0 82 ... 3.18135932367438 86.9928215326865 1.08915781214527e-20 3.52760062723649e-17 .. list-table:: feature bed file :widths: 10 5 10 40 * - File - Size - Last modified - Description * - `Adipocyte `_ - 3.3M - 03/25/20 - edgeR significant features statistics table * - `Atrial Cardiomyocyte `_ - 11M - 03/25/20 - edgeR significant features statistics table * - `Endothelial `_ - 3.0M - 03/25/20 - edgeR significant features statistics table * - `Fibroblast `_ - 4.6M - 03/25/20 - edgeR significant features statistics table * - `Lymphocyte `_ - 5.9M - 03/25/20 - edgeR significant features statistics table * - `Macrophage `_ - 4.5M - 03/25/20 - edgeR significant features statistics table * - `Nervous `_ - 2.9M - 03/25/20 - edgeR significant features statistics table * - `Smooth Muscle `_ - 2.9M - 03/25/20 - edgeR significant features statistics table * - `Ventricular Cardiomyocyte `_ - 11M - 03/25/20 - edgeR significant features statistics table HOMER Motifs for snATAC cluster-specific features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These files contain the knownMotifs enrichment result of HOMER analysis conducted on the set of cluster-specific-features from each cell type cluster. The output files report enrichment of known transcription factor binding site residues within open chromatin determined to be specific to each of the 9 snATAC-seq cell types. HOMER was run with the following command: .. code-block:: shell findMotifsGenome.pl {cluster-specific-features} hg38 {output directory} -size 200 -mask -p 1 Example of a homer result file: .. code-block:: shell Motif Name Consensus P-value Log P-value q-value (Benjamini) # of Target Sequences with Motif(of 258) % of Target Sequences with Motif # of Background Sequences with Motif(of 47047) % of Background Sequences with Motif CEBP(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer ATTGCGCAAC 1e-40 -9.305e+01 0.0000 89.0 34.50% 2987.5 6.35% HLF(bZIP)/HSC-HLF.Flag-ChIP-Seq(GSE69817)/Homer RTTATGYAAB 1e-22 -5.071e+01 0.0000 68.0 26.36% 3160.3 6.72% PPARa(NR),DR1/Liver-Ppara-ChIP-Seq(GSE47954)/Homer VNAGGKCAAAGGTCA 1e-21 -4.937e+01 0.0000 80.0 31.01% 4443.0 9.44% PPARE(NR),DR1/3T3L1-Pparg-ChIP-Seq(GSE13511)/Homer TGACCTTTGCCCCA 1e-20 -4.788e+01 0.0000 75.0 29.07% 4027.1 8.56% RXR(NR),DR1/3T3L1-RXR-ChIP-Seq(GSE13511)/Homer TAGGGCAAAGGTCA 1e-18 -4.282e+01 0.0000 79.0 30.62% 4838.4 10.28% NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer VTTACGTAAYNNNNN 1e-17 -3.977e+01 0.0000 50.0 19.38% 2147.2 4.56% .. list-table:: Homer motif file :widths: 10 5 10 40 * - File - Size - Last modified - Description * - `Adipocyte.knownResults.txt `_ - 46K - 03/24/20 - Significant motif files * - `Atrial_cardiomyocyte.knownResults.txt `_ - 47K - 03/24/20 - Significant motif files * - `Endothelial.knownResults.txt `_ - 47K - 03/24/20 - Significant motif files * - `Fibroblast.knownResults.txt `_ - 47K - 03/24/20 - Significant motif files * - `Lymphocyte.knownResults.txt `_ - 46K - 03/24/20 - Significant motif files * - `Macrophage.knownResults.txt `_ - 47K - 03/24/20 - Significant motif files * - `Nervous.knownResults.txt `_ - 46K - 03/24/20 - Significant motif files * - `Smooth_muscle.knownResults.txt `_ - 47K - 03/24/20 - Significant motif files * - `Ventricular_cardiomyocyte.knownResults.txt `_ - 47K - 03/24/20 - Significant motif files GREAT motif analysis for snATAC cluster-specific features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Below are the outputs of the Genomic Regions Enrichment of Annotations Tool (GREAT; http://great.stanford.edu/public/html/) run with default settings on each of the 9 sets of cluster-specific-features described above. .. list-table:: GREAT motif analysis :widths: 10 5 10 50 * - File - Size - Last modified - Description * - `Adipocyte.clusterspecific.great.output.tsv `_ - 1.2M - 03/24/20 - GREAT cluster-specific motif file * - `Atrial_cardiomyocyte.clusterspecific.great.output.tsv `_ - 2.6M - 03/24/20 - GREAT cluster-specific motif file * - `Endothelial.clusterspecific.great.output.tsv `_ - 2.3M - 03/24/20 - GREAT cluster-specific motif file * - `Fibroblast.clusterspecific.great.output.tsv `_ - 4.3M - 03/24/20 - GREAT cluster-specific motif file * - `Lymphocyte.clusterspecific.great.output.tsv `_ - 978K - 03/24/20 - GREAT cluster-specific motif file * - `Macrophage.clusterspecific.great.output.tsv `_ - 3.5M - 03/24/20 - GREAT cluster-specific motif file * - `Nervous.clusterspecific.great.output.tsv `_ - 1.3M - 03/24/20 - GREAT cluster-specific motif file * - `Smooth_muscle.clusterspecific.great.output.tsv `_ - 2.5M - 03/24/20 - GREAT cluster-specific motif file * - `Ventricular_cardiomyocyte.clusterspecific.great.output.tsv `_ - 2.8M - 03/24/20 - GREAT cluster-specific motif file Cicero co-accessibility sites ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Below are the outputs of Cicero analysis performed for each chromosome and using for each a random subset of 15000 cells and a genomic window of 250K pb. The results are stored using the bedpe format with the two enhancer specificities in the first 6 columns and the last column is the cicero score (between 0 and 1) .. code-block:: shell chr1 817100 817600 chr1 817812 818312 0.0473509619252082 chr1 817100 817600 chr1 827285 827785 0.0296988833234847 chr1 817100 817600 chr1 905193 905693 0.0150213679162837 chr1 817100 817600 chr1 924640 925140 0.0124425158998563 .. list-table:: Cicero co-accessible sites :widths: 10 5 10 50 * - File - Size - Last modified - Description * - `cicero.linkages.snATAC.nocutoff.bedpe `_ - 155M - 03/24/20 - All cicero links in bedpe format * - `cicero.linkages.snATAC.015cutoff.bedpe `_ - 20M - 04/10/20 - cicero links with score > 0.15 in bedpe format * - `cicero.linkages.snATAC.020cutoff.bedpe `_ - 12M - 04/10/20 - cicero links with score > 0.20 in bedpe format ChromVAR enriched Motifs per cell type -------------------------------------- File containing motif scores at the single-cell resolution using the chromVAR library. We used the center of the 287K peaks extended by +-250 base pairs and a custom set of 870 `non redundant motifs `_ as input. To identify the differentially enriched motifs per cell type, we used the following strategy. For each cell type and each motif, we computed the ranksum test between the chromVAR Z-score distributions from cells within and without the cell type. These tests were effectued using a random sampling of 40000 cells. Then for each cell type we used 1e-8 as p-value cutoff. In addition we applied a Bonferroni correction to account for multiple tests correction which was equivalent of selecting motifs with p-value < 1e-11. .. list-table:: Cicero co-accessible sites :widths: 10 5 10 50 * - File - Size - Last modified - Description * - `chromVAR_ranked_motifsRawName_meta.tsv `_ - 225K - 03/1/20 - chromVAR motif cell type score