ATAC-seq Analysis Pipeline

A comprehensive pipeline for processing and analyzing ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data. This pipeline automates the entire workflow from raw FastQ files to peak calling, including quality control and visualization steps.

Features

Automated processing of paired-end ATAC-seq data
Comprehensive quality control metrics
Adapter trimming and read preprocessing
Alignment to reference genome
Peak calling with MACS2
Generate genome browser visualization files
TSS enrichment analysis
Insert size distribution analysis
Library complexity estimation

Prerequisites

Python 2.7+
FastQC (v0.11.2+)
Bowtie2 (v2.2.4+)
MACS2 (v2.1.1+)
Samtools (v1.3+)
Picard Tools (v1.92+)
BedTools (v2.25.0+)
R (v3.2.2+)
IGVtools (v2.3.3+)
Preseq (v1.0.2+)
NGSplot (v2.47+)

Installation

Clone the repository:

git clone https://github.com/rohitrrj/ATACseq_Pipeline.git
cd ATACseq_Pipeline

Ensure all required modules are available:

module load python fastqc/0.11.2 bowtie/2.2.4 MACS2/2.1.1 java/latest
module load samtools/1.3 picard-tools/1.92 preseq/1.0.2 bedtools/2.25.0
module load r/3.2.2 igvtools/2.3.3 ngsplot/2.47

Configure your project:

cp conf.txt.example conf.txt
# Edit conf.txt with your project-specific paths

Usage

Prepare your sample file (one sample per line):

echo "sample1" > samples.txt
echo "sample2" >> samples.txt

Structure your input data:

/path/to/data/
├── sample1_R1.fastq
├── sample1_R2.fastq
├── sample2_R1.fastq
└── sample2_R2.fastq

Run the pipeline:

./ATAC_seq_preprocess.sh

Pipeline Steps

Quality Control (FastQC)
- Raw read quality assessment
- Adapter content analysis
Read Preprocessing
- Adapter trimming
- Quality filtering
Alignment (Bowtie2)
- Mapping to reference genome
- Remove unmapped/low-quality reads
Post-alignment Processing
- Remove duplicates
- Remove mitochondrial reads
- Filter for chromosomes 1-22
Peak Calling (MACS2)
- Identify accessible regions
- Generate signal tracks

Output Structure

Analysis/
├── trim/                  # Trimmed FastQ files
├── fastqc/                # Quality control reports
├── bowtie/                # Alignment files
├── QC/                    # QC metrics and plots
└── Peaks/                 # MACS2 peak calls

Quality Control

The pipeline generates several QC metrics:

TSS enrichment scores
Insert size distributions
Library complexity estimates
Alignment statistics
Peak quality metrics

Configuration

Edit conf.txt to specify:

Data directory
Project directory
Scripts directory
Sample file location

Example configuration:

myDATADIR="/path/to/data"
myPROJDIR="/path/to/project"
mySCRIPTSDIR="/path/to/scripts"
mySampleFile="samples.txt"

Troubleshooting

Common Issues

Module Load Errors
- Ensure all required modules are available on your system
- Check module versions match requirements
Memory Issues
- Adjust -l h_vmem in script header
- Split processing into smaller batches
Missing Reference Files
- Verify reference genome paths
- Check file permissions

Error Messages

Error: cannot find input file - Check file paths in conf.txt
Segmentation fault - Increase memory allocation
Command not found - Ensure all modules are loaded

License

This project is licensed under the MIT License - see the LICENSE file for details.

Applications

This pipeline has been used in the following publications:

"Epigenetic signature of PD-1+ TCF1+ CD8 T cells that act as resource cells during chronic viral infection and respond to PD-1 blockade"
- Proc Natl Acad Sci U S A. 2022 Feb 22;119(8):e2117314119
- DOI: 10.1073/pnas.1903520116
- PMID: 31227606
- Used for analyzing T cell chromatin accessibility patterns
"PD-1 combination therapy with IL-2 modifies CD8+ T cell exhaustion program"
- Nature. 2022 Oct;610(7933):737-743
- DOI: 10.1038/s41586-022-05257-0
- PMID: 36171288
- Applied for chromatin accessibility analysis in T cell exhaustion studies

Code availability: ⭐ rohitrrj/ATACseq_Pipeline - Comprehensive ATAC-seq data analysis pipeline for chromatin accessibility studies

Contributing

Contributions are welcome! Please read the contributing guidelines before submitting pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
ATAC_scripts		ATAC_scripts
ATAC_seq_preprocess.sh		ATAC_seq_preprocess.sh
README.md		README.md
ReadMe.txt		ReadMe.txt
Run1.txt		Run1.txt
conf.txt		conf.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATAC-seq Analysis Pipeline

Table of Contents

Features

Prerequisites

Installation

Usage

Pipeline Steps

Output Structure

Quality Control

Configuration

Troubleshooting

Common Issues

Error Messages

License

Applications

Contributing

About

Releases

Packages

Languages

rohitrrj/ATACseq_Pipeline

Folders and files

Latest commit

History

Repository files navigation

ATAC-seq Analysis Pipeline

Table of Contents

Features

Prerequisites

Installation

Usage

Pipeline Steps

Output Structure

Quality Control

Configuration

Troubleshooting

Common Issues

Error Messages

License

Applications

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages