RNA-seq Analysis Pipeline

A comprehensive pipeline for processing RNA sequencing data using STAR aligner and generating gene expression counts. This pipeline automates the entire workflow from raw FastQ files to gene count matrix generation.

Features

Automated processing of paired-end RNA-seq data
Quality control using FastQC
Efficient read alignment using STAR
Gene-level quantification using featureCounts
Support for multiple samples
Parallel processing capabilities
Comprehensive logging and error reporting

Prerequisites

Python
FastQC (v0.11.2+)
STAR (v2.5.3a+)
Subread/featureCounts (v1.6.0+)
Samtools (v1.3+)
R (v3.2.2+)
Reference genome and annotation files
Sufficient computational resources (recommended: 32GB+ RAM)

Installation

Clone the repository:

git clone https://github.com/rohitrrj/RNAseq_Pipeline.git
cd RNAseq_Pipeline

Ensure all required modules are available:

module load python fastqc/0.11.2 STAR/2.5.3a subread/1.6.0
module load samtools/1.3 r/3.2.2

Configure your project:

cp conf.txt.example conf.txt
# Edit conf.txt with your project-specific paths

Usage

Prepare your input data:

Place paired-end FastQ files in the data directory
Naming convention: sample_R1_001.fastq.gz and sample_R2_001.fastq.gz

Set up configuration (conf.txt):

myDATADIR="/path/to/fastq/files"
myGenomeDIR="/path/to/reference/genome"
myGenomeGTF="/path/to/annotation.gtf"
N_CPUS=8  # Number of CPU cores to use

Run the pipeline:

./RNA_seq_pipeline_STAR.sh

Pipeline Steps

Quality Control (FastQC)
- Raw read quality assessment
- Adapter content analysis
- Quality metrics visualization
Read Alignment (STAR)
- Genome loading
- Splice-aware alignment
- BAM file generation
Expression Quantification (featureCounts)
- Gene-level count generation
- Multi-threaded processing
- Comprehensive counting statistics

Output Structure

project_directory/
├── fastQC_output/           # Quality control reports
│   ├── *_fastqc.html
│   └── *_fastqc.zip
├── star_output/            # STAR alignment results
│   ├── *Aligned.out.bam
│   ├── *Log.final.out
│   └── *SJ.out.tab
└── featureCount_output/    # Gene count matrices
    ├── *.count.txt
    └── *.count.txt.summary

Configuration

Edit conf.txt to specify:

# Required paths
myDATADIR="/path/to/data"              # FastQ files location
myGenomeDIR="/path/to/genome"          # Reference genome directory
myGenomeGTF="/path/to/annotation.gtf"  # Gene annotation file
STAR_HG19_GENOME="/path/to/star/index" # STAR genome index
N_CPUS=8                              # Number of CPU cores

# Optional parameters
h_vmem="10G"                          # Memory per core
h_rt="24:00:00"                       # Maximum runtime

STAR Alignment Parameters

Key alignment parameters used:

--outSAMstrandField intronMotif       # Include strand field
--outFilterIntronMotifs RemoveNoncanonical  # Filter non-canonical junctions
--outSAMtype BAM Unsorted             # Output unsorted BAM
--outReadsUnmapped Fastx              # Save unmapped reads

featureCounts Parameters

Gene quantification settings:

-t exon           # Feature type
-g gene_id        # Attribute type
-T $N_CPUS        # Number of threads

Troubleshooting

Common Issues

Memory Issues
- Increase h_vmem in script header
- Reduce number of parallel processes
- Consider using smaller chunks of data
STAR Alignment Errors
- Verify genome index
- Check disk space
- Validate input FastQ format
featureCounts Problems
- Verify GTF file format
- Check BAM file integrity
- Ensure sufficient file permissions

Error Messages

STAR: command not found - Module not loaded correctly
ERROR: can't open GTF file - Check file path and permissions
ERROR: no input files specified - Verify FastQ file naming

Performance Optimization

Resource Allocation
- Adjust N_CPUS based on system
- Balance memory per core
- Monitor disk I/O
File Management
- Use SSD for temporary files
- Clean up intermediate files
- Implement staged processing

License

This project is licensed under the MIT License - see the LICENSE file for details.

Applications

This pipeline has been used in the following publications:

"PD-1 combination therapy with IL-2 modifies CD8+ T cell exhaustion program"
- Nature. 2022 Oct;610(7933):737-743
- DOI: 10.1038/s41586-022-05257-0
- PMID: 36171288
- PMCID: PMC9793890
- Used for transcriptome analysis of exhausted T cells
"Aging-associated HELIOS deficiency in naive CD4+ T cells alters chromatin remodeling and promotes effector cell responses"
- Nat Immunol.. 2023 Jan;24(1):96-109
- DOI: 10.1038/s41590-022-01369-x
- PMID: 36510022
- PMCID: PMC10118794
- Used for analyzing bone marrow T cell progenitor transcriptome

Code availability: ⭐ rohitrrj/RNAseq_Pipeline - High-throughput RNA sequencing analysis pipeline

Contributing

Contributions are welcome! Please read the contributing guidelines before submitting pull requests.

Acknowledgments

STAR aligner development team
Subread/featureCounts developers
Supporting institutions and funding

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Genome_files		Genome_files
README.md		README.md
RNA_seq_pineline_STAR.sh		RNA_seq_pineline_STAR.sh
ReadMe.txt		ReadMe.txt
conf.txt		conf.txt
normalize.R		normalize.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-seq Analysis Pipeline

Table of Contents

Features

Prerequisites

Installation

Usage

Pipeline Steps

Output Structure

Configuration

STAR Alignment Parameters

featureCounts Parameters

Troubleshooting

Common Issues

Error Messages

Performance Optimization

License

Applications

Contributing

Acknowledgments

About

Releases

Packages

Languages

rohitrrj/RNAseq_Pipeline

Folders and files

Latest commit

History

Repository files navigation

RNA-seq Analysis Pipeline

Table of Contents

Features

Prerequisites

Installation

Usage

Pipeline Steps

Output Structure

Configuration

STAR Alignment Parameters

featureCounts Parameters

Troubleshooting

Common Issues

Error Messages

Performance Optimization

License

Applications

Contributing

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages