.. _ATAC:

Chromatin Accessibility Maps
============================

CARE project snATAC-Seq data available for download. This section reports data related to snATAC-seq analysis of 79,515 human cardiac cells which yielded 287,515 open chromatin peaks, including peaks called on each cell type cluster and peaks determined to be statistically significant for each cell type cluster. In addition, `data related to differential analysis between heart chambers <DA_analysis>`_  for snATACand snRNA datasets and `Disease genetic data <DISEASE_analysis>`_ are reported in specific sections. Alternatively, all the snATAC data be downloaded from the `HTTP server <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/>`_.

.. seealso:: Our interactive single-cell browsers are available `here <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/ucsc_browser/>`_ and our interactive genome browser explorer can be accessed `from this address <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/IGV/IGV_chamber_bigwig.html>`_


Realease updates
----------------

* Initial release for the CARE study (03/25/2020)


Metadata
--------
The reference genome used to align the fastQ files is hg38. We used the gencode metadata to identify TSS and promoters. For each cell in our snATAC dataset, we report here the number of ATAC fragments, the doublet score derived from Scrubet, the enrichment of fragments at annotated transcription start sites (TSSe), the cluster membership, UMAP coordinates, Fraction of mitochondrial reads and fraction of reads duplicated.


.. list-table:: Data files
   :widths: 10 5 10 40

   * - File
     - Size
     - Last modified
     - Description
   * - `Merged_metadata.tsv <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/metadata/Merged_metadata.tsv>`_
     - 13M
     - 03/24/20
     - Metadata file for each snATAC cell. Columns are: cell barcode, cluster membership (FB = 1, vCM = 2, aCM = 3, EC = 4, SM = 5, MAC = 6, LC = 7, AD = 8, SWC = 9), TSS enrichment, nb of fragments, fraction of mitochondiral reads, doublet score, fraction of duplicated reads).
   * - `all.cluster <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/clusters/all.cluster>`_
     - 3.1M
     - 03/25/20
     - .tsv file reporting cell ID -> cluster ID
   * - `all.group <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/clusters/all.group>`_
     - 4.0M
     - 03/25/20
     - .tsv file reporting cell ID -> batch ID
   * - `Homo_sapiens.GRCh38.99.TSS.2K.bed <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/metadata/Homo_sapiens.GRCh38.99.TSS.2K.bed>`_
     - 1.9M
     - 03/24/20
     - hg38 gene promoters used

Matrices
--------

Below are download links for the snATAC-seq dataset sparse matrices (~80K cells by 287K features). We provide these matrixes in a Column Oriented Object (COO) format and a python COO sparse matrix (npz). COO format uses integer indexes for both columns (features) and rows (cells). For that purpose, we provide the cell and peak index files.

.. note::

   These matrices can be easily loaded into python or R. See the tutorial page.

Here is the header of the .coo matrix:
First column is the barcode ID, second column is the peak ID and third column is the value

.. code-block:: shell

   0       189944  1
   0       217284  1
   0       267230  1
   0       160334  1
   0       284529  1
   0       151795  1


.. list-table:: Matrices table
   :widths: 10 5 10 50

   * - `all.coo.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/matrices/all.coo.gz>`_
     - 478M
     - 03/25/20
     - 90K cells x 287K peaks in COO format and with binary values (only 1)
   * - `all.index <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/matrices/all.index>`_
     - 3.3M
     - 03/25/20
     - barcode ID index to use for the matrices
   * - `all.merged.ygi <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/matrices/all.merged.ygi>`_
     - 6.6M
     - 03/25/20
     - peak ID index to use for the matrices
   * - `all.npz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/matrices/all.npz>`_
     - 386M
     - 03/25/20
     - Matrices in Python format


Bed files
---------

These bed files contain the information linking the cell’s barcode ID and the ATAC-seq fragment coordinates. for example:

.. code-block:: shell

   chr1    1021426 1021472 CARE181125_3B+AACGAGAGCTATTTGCCCAGCT    60      +
   chr1    1021448 1021493 CARE181125_3B+AACGAGAGCTATTTGCCCAGCT    60      -
   chr1    1070611 1070657 CARE181125_3B+AACGAGAGCTATTTGCCCAGCT    60      +
   chr1    1070674 1070719 CARE181125_3B+AACGAGAGCTATTTGCCCAGCT    60      -
   chr1    1375196 1375242 CARE181125_3B+AACGAGAGCTATTTGCCCAGCT    60      +
   chr1    1375218 1375263 CARE181125_3B+AACGAGAGCTATTTGCCCAGCT    60      -
   chr1    1424022 1424068 CARE181125_3B+AACGAGAGCTATTTGCCCAGCT    40      +
   chr1    1424167 1424212 CARE181125_3B+AACGAGAGCTATTTGCCCAGCT    40      -


.. list-table:: Bed files
   :widths: 10 5 10 40

   * - File
     - Size
     - Last modified
     - Description
   * - `all.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/all.bed.gz>`_
     - 5.5G
     - 03/25/20
     - Bed file for all the cells
   * - `Adipocyte.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Adipocyte.bed.gz>`_
     - 48M
     - 03/24/20
     - Cluster bed file
   * - `Atrial_cardiomyocyte.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Atrial_cardiomyocyte.bed.gz>`_
     - 765M
     - 03/24/20
     - Cluster bed file
   * - `Endothelial.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Endothelial.bed.gz>`_
     - 352M
     - 03/24/20
     - Cluster bed file
   * - `Fibroblast.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Fibroblast.bed.gz>`_
     - 1.3G
     - 03/24/20
     - Cluster bed file
   * - `Lymphocyte.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Lymphocyte.bed.gz>`_
     - 40M
     - 03/24/20
     - Cluster bed file
   * - `Macrophage.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Macrophage.bed.gz>`_
     - 317M
     - 03/24/20
     - Cluster bed file
   * - `Nervous.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Nervous.bed.gz>`_
     - 36M
     - 03/24/20
     - Cluster bed file
   * - `Smooth_muscle.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Smooth_muscle.bed.gz>`_
     - 338M
     - 03/24/20
     - Cluster bed file
   * - `Ventricular_cardiomyocyte.bed.gz <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bedfiles/Ventricular_cardiomyocyte.bed.gz>`_
     - 2.0G
     - 03/24/20
     - Cluster bed file


Bigwig tracks
-------------

These files are the bigWig tracks of the cluster bed files, using a window of 1bp and RPM normalized. They can be loaded onto a genome browser such as IGV or UCSC genome browser for visualization of regions of open chromatin pileup in individual cardiac cell types.

.. list-table:: Bigwig
   :widths: 10 5 10 40

   * - File
     - Size
     - Last modified
     - Description
   * - `Adipocyte.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Adipocyte.bw>`_
     - 54M
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Aggregate.snATAC.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Aggregate.snATAC.bw>`_
     - 2.4G
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Atrial_cardiomyocyte.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Atrial_cardiomyocyte.bw>`_
     - 656M
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Endothelial.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Endothelial.bw>`_
     - 355M
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Fibroblast.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Fibroblast.bw>`_
     - 968M
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Lymphocyte.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Lymphocyte.bw>`_
     - 45M
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Macrophage.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Macrophage.bw>`_
     - 333M
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Nervous.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Nervous.bw>`_
     - 41M
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Smooth_muscle.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Smooth_muscle.bw>`_
     - 332M
     - 03/24/20
     - bigWig formated track for cluster.
   * - `Ventricular_cardiomyocyte.bw <http://ns104190.ip-147-135-44.us/data_CARE_portal/snATAC/bigwigs/Ventricular_cardiomyocyte.bw>`_
     - 1.3G
     - 03/24/20
     - bigWig formated track for cluster.