Cell type diversity

This section provides data related to the different subtypes present in the heart, i.e. snATAC cluster peaks, specific loci, feature clusters, HOMER motifs, GREAT ontology and co-accessible loci, RNA-Seq specific genes.

snATAC-Seq Cluster peaks

Below are the table of the peaks obtained for each cluster using MACS2 software with the following settings: –nomodel –keep-dup all -q 0.01 –shift 37 –extsize 73

macs2 cluster specific peaks

File

Size

Last modified

Description

Adipocyte.narrowPeak.bed

2.8M

03/25/20

macs2 narrowPeak file

all.merged.annotated.bed

8.4M

03/25/20

merged macs2 narrowPeak file for all clusters

all.merged.bed

6.6M

03/25/20

macs2 narrowPeak file

Atrial_cardiomyocyte.narrowPeak.bed

12M

03/25/20

macs2 narrowPeak file

Endothelial.narrowPeak.bed

6.0M

03/25/20

macs2 narrowPeak file

Fibroblast.narrowPeak.bed

12M

03/25/20

macs2 narrowPeak file

Lymphocyte.narrowPeak.bed

1.1M

03/25/20

macs2 narrowPeak file

Macrophage.narrowPeak.bed

6.7M

03/25/20

macs2 narrowPeak file

Merged.Consensus.narrowPeak.gz

7.6M

03/25/20

macs2 narrowPeak file

Nervous.narrowPeak.bed

2.0M

03/25/20

macs2 narrowPeak file

Smooth_muscle.narrowPeak.bed

6.8M

03/25/20

macs2 narrowPeak file

Ventricular_cardiomyocyte.narrowPeak.bed

15M

03/25/20

macs2 narrowPeak file

snATAC-Seq Cluster specific features

Below are lists of statistically significant cluster specific features (FDR < 0.01) derived from the 287K set of merged peaks from all clusters. They were detected by using edgeR analysis followed by K-means clustering. FDR, test statistic, etc for each of these elements can be found below in the “EdgeR analysis statistics” subsection.

feature bed file

File

Size

Last modified

Kmeans and edgeR features

Adipocyte.specific.elements.bed

8.6K

03/24/20

Kmeans and edgeR features

Atrial_cardiomyocyte.specific.elements.bed

75K

03/24/20

Kmeans and edgeR features

Endothelial.specific.elements.bed

44K

03/24/20

Kmeans and edgeR features

Fibroblast.specific.elements.bed

94K

03/24/20

Kmeans and edgeR features

Lymphocyte.specific.elements.bed

3.7K

03/24/20

Kmeans and edgeR features

Macrophage.specific.elements.bed

78K

03/24/20

Kmeans and edgeR features

Nervous_cell.specific.elements.bed

13K

03/24/20

Kmeans and edgeR features

Smooth_muscle.specific.elements.bed

58K

03/24/20

Kmeans and edgeR features

Ventricular_cardiomyocyte..specific.elements.bed

84K

03/24/20

Kmeans and edgeR features

snATAC EdgeR analysis statistics

The edgeR results files contain the edgeR statistics for snATAC cluster significant peaks. The protocol followed was a) extraction of subtype specific features identified by a kmeans analysis and b) testing this features for each subtype using a one vs the rest strategy with edgeR. The covariates used were: donorID, sex, and read depth. These files contain a count table for cells from each donor inside and outside the subtype. The last columns report edgeR pvalues, FDR, log fold change (logFC). These statistics are reported for all the features tested, even the not significant. The subtype specific features have logFC < 0 and FDR < 0.01.

head Adipocyte.1e-2.sex.edgeR.tsv|cut -f1,2,3,7,8,30,31,32,32,34,35,36,37
######## coordinates ##############       ########## count table ###########    ###################################### statistics #########################################
                                                                                                        #### logFC ####                                   ##### FDR #####
"chr7   114084392       114086804"      456     22      0       130     ...     3.02723597551046        124.710942232424        5.88756270976376e-29    1.14413006138839e-24
"chr4   99642626        99643349"       402     0       0       35      ...     3.13377334565615        97.5446708424732        5.26548775020566e-23    5.11621117248733e-19
"chr1   226972955       226975414"      511     10      0       38      ...     2.72457802893285        92.5225725063236        6.65612519270069e-22    4.31161602899175e-18
"chr14  81831044        81831953"       179     4       0       33      ...     3.22672031578897        90.6376381541516        1.72546700135439e-21    8.38275005932995e-18
"chr16  57829523        57830435"       166     5       0       27      ...     3.10956679858585        87.7589533880059        7.39374830753615e-21    2.873654217207e-17
"chr20  29444936        29445950"       121     15      0       82      ...     3.18135932367438        86.9928215326865        1.08915781214527e-20    3.52760062723649e-17
feature bed file

File

Size

Last modified

Description

Adipocyte

3.3M

03/25/20

edgeR significant features statistics table

Atrial Cardiomyocyte

11M

03/25/20

edgeR significant features statistics table

Endothelial

3.0M

03/25/20

edgeR significant features statistics table

Fibroblast

4.6M

03/25/20

edgeR significant features statistics table

Lymphocyte

5.9M

03/25/20

edgeR significant features statistics table

Macrophage

4.5M

03/25/20

edgeR significant features statistics table

Nervous

2.9M

03/25/20

edgeR significant features statistics table

Smooth Muscle

2.9M

03/25/20

edgeR significant features statistics table

Ventricular Cardiomyocyte

11M

03/25/20

edgeR significant features statistics table

HOMER Motifs for snATAC cluster-specific features

These files contain the knownMotifs enrichment result of HOMER analysis conducted on the set of cluster-specific-features from each cell type cluster. The output files report enrichment of known transcription factor binding site residues within open chromatin determined to be specific to each of the 9 snATAC-seq cell types. HOMER was run with the following command:

findMotifsGenome.pl {cluster-specific-features} hg38 {output directory} -size 200 -mask -p 1

Example of a homer result file:

Motif Name      Consensus       P-value Log P-value     q-value (Benjamini)     # of Target Sequences with Motif(of 258) % of Target Sequences with Motif        # of Background Sequences with Motif(of 47047)  %
of Background Sequences with Motif
CEBP(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer       ATTGCGCAAC      1e-40   -9.305e+01      0.0000  89.0     34.50%  2987.5  6.35%
HLF(bZIP)/HSC-HLF.Flag-ChIP-Seq(GSE69817)/Homer RTTATGYAAB      1e-22   -5.071e+01      0.0000  68.0    26.36%   3160.3  6.72%
PPARa(NR),DR1/Liver-Ppara-ChIP-Seq(GSE47954)/Homer      VNAGGKCAAAGGTCA 1e-21   -4.937e+01      0.0000  80.0     31.01%  4443.0  9.44%
PPARE(NR),DR1/3T3L1-Pparg-ChIP-Seq(GSE13511)/Homer      TGACCTTTGCCCCA  1e-20   -4.788e+01      0.0000  75.0     29.07%  4027.1  8.56%
RXR(NR),DR1/3T3L1-RXR-ChIP-Seq(GSE13511)/Homer  TAGGGCAAAGGTCA  1e-18   -4.282e+01      0.0000  79.0    30.62%   4838.4  10.28%
NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer  VTTACGTAAYNNNNN 1e-17   -3.977e+01      0.0000  50.0    19.38%   2147.2  4.56%
Homer motif file

File

Size

Last modified

Description

Adipocyte.knownResults.txt

46K

03/24/20

Significant motif files

Atrial_cardiomyocyte.knownResults.txt

47K

03/24/20

Significant motif files

Endothelial.knownResults.txt

47K

03/24/20

Significant motif files

Fibroblast.knownResults.txt

47K

03/24/20

Significant motif files

Lymphocyte.knownResults.txt

46K

03/24/20

Significant motif files

Macrophage.knownResults.txt

47K

03/24/20

Significant motif files

Nervous.knownResults.txt

46K

03/24/20

Significant motif files

Smooth_muscle.knownResults.txt

47K

03/24/20

Significant motif files

Ventricular_cardiomyocyte.knownResults.txt

47K

03/24/20

Significant motif files

GREAT motif analysis for snATAC cluster-specific features

Below are the outputs of the Genomic Regions Enrichment of Annotations Tool (GREAT; http://great.stanford.edu/public/html/) run with default settings on each of the 9 sets of cluster-specific-features described above.

GREAT motif analysis

File

Size

Last modified

Description

Adipocyte.clusterspecific.great.output.tsv

1.2M

03/24/20

GREAT cluster-specific motif file

Atrial_cardiomyocyte.clusterspecific.great.output.tsv

2.6M

03/24/20

GREAT cluster-specific motif file

Endothelial.clusterspecific.great.output.tsv

2.3M

03/24/20

GREAT cluster-specific motif file

Fibroblast.clusterspecific.great.output.tsv

4.3M

03/24/20

GREAT cluster-specific motif file

Lymphocyte.clusterspecific.great.output.tsv

978K

03/24/20

GREAT cluster-specific motif file

Macrophage.clusterspecific.great.output.tsv

3.5M

03/24/20

GREAT cluster-specific motif file

Nervous.clusterspecific.great.output.tsv

1.3M

03/24/20

GREAT cluster-specific motif file

Smooth_muscle.clusterspecific.great.output.tsv

2.5M

03/24/20

GREAT cluster-specific motif file

Ventricular_cardiomyocyte.clusterspecific.great.output.tsv

2.8M

03/24/20

GREAT cluster-specific motif file

Cicero co-accessibility sites

Below are the outputs of Cicero analysis performed for each chromosome and using for each a random subset of 15000 cells and a genomic window of 250K pb. The results are stored using the bedpe format with the two enhancer specificities in the first 6 columns and the last column is the cicero score (between 0 and 1)

chr1 817100  817600  chr1    817812  818312  0.0473509619252082
chr1 817100  817600  chr1    827285  827785  0.0296988833234847
chr1 817100  817600  chr1    905193  905693  0.0150213679162837
chr1 817100  817600  chr1    924640  925140  0.0124425158998563
Cicero co-accessible sites

File

Size

Last modified

Description

cicero.linkages.snATAC.nocutoff.bedpe

155M

03/24/20

All cicero links in bedpe format

cicero.linkages.snATAC.015cutoff.bedpe

20M

04/10/20

cicero links with score > 0.15 in bedpe format

cicero.linkages.snATAC.020cutoff.bedpe

12M

04/10/20

cicero links with score > 0.20 in bedpe format

ChromVAR enriched Motifs per cell type

File containing motif scores at the single-cell resolution using the chromVAR library. We used the center of the 287K peaks extended by +-250 base pairs and a custom set of 870 non redundant motifs as input. To identify the differentially enriched motifs per cell type, we used the following strategy. For each cell type and each motif, we computed the ranksum test between the chromVAR Z-score distributions from cells within and without the cell type. These tests were effectued using a random sampling of 40000 cells. Then for each cell type we used 1e-8 as p-value cutoff. In addition we applied a Bonferroni correction to account for multiple tests correction which was equivalent of selecting motifs with p-value < 1e-11.

Cicero co-accessible sites

File

Size

Last modified

Description

chromVAR_ranked_motifsRawName_meta.tsv

225K

03/1/20

chromVAR motif cell type score