TCR-seq Analysis Pipeline

A comprehensive pipeline for analyzing T cell receptor (TCR) repertoire sequencing data, specifically optimized for human TCR-β chain analysis. This pipeline automates the workflow from raw FastQ files to clonotype identification using MiXCR.

Features

Automated processing of paired-end TCR-seq data
Barcode-based sample demultiplexing
TCR alignment using MiXCR
Clonotype assembly and export
Support for multiple samples
Parallel processing capabilities
Comprehensive logging

Prerequisites

Java 8+
Python 2.7+
MiXCR 2.0+
FASTX-Toolkit 0.0.14+
Reference databases:
- IMGT library (v201711-1 or later)
- Human TCR references

Installation

Clone the repository:

git clone https://github.com/yourusername/TCRseq_Pipeline.git
cd TCRseq_Pipeline

Ensure all required modules are available:

module load java/8u66
module load python fastx_toolkit/0.0.14

Configure your project:

cp conf.txt.example conf.txt
# Edit conf.txt with your project-specific paths

Usage

Prepare your sample barcode file:

# barcode.txt format
BARCODE1    sample1
BARCODE2    sample2

Set up configuration (conf.txt):

myRawDATADIR="/path/to/raw/fastq/files"
myDATADIR="/path/to/processed/data"
myPROJDIR="/path/to/project"
myTCRScriptDIR="/path/to/scripts"
mySampleFile="barcode.txt"

Run the pipeline:

./MixR_pipeline_human.sh

Pipeline Steps

Sample Preparation
- Merge paired-end reads
- Remove random sequences
- Split samples by barcodes
Read Processing
- Separate reads into R1/R2
- Quality filtering
- Adapter trimming
TCR Analysis (MiXCR)
- Alignment to reference sequences
- Clonotype assembly
- Clone export and quantification

Output Structure

project_directory/
├── Analysis/
│   ├── align/              # MiXCR alignment files
│   │   └── sample_name/
│   │       ├── alignments.vdjca
│   │       └── alignmentReport.log
│   ├── assemble/          # Assembled clonotypes
│   │   └── sample_name/
│   │       ├── clones.clns
│   │       └── assembleReport.log
│   └── export/            # Final results
│       └── sample_name/
│           └── clones.txt
└── split_reads/           # Demultiplexed samples

MiXCR Parameters

Alignment

--species hsa              # Human species
--chains TRB              # TCR beta chain
--library imgt.201711-1.s # IMGT library version
--OvParameters.geneFeatureToAlign=VRegion

Assembly

# Default assembly parameters for optimal clonotype detection

Export

--chains TRB              # Export TCR beta chain results

Configuration

Edit conf.txt to specify:

# Required paths
myRawDATADIR="/path/to/raw/data"     # Raw FastQ files
myDATADIR="/path/to/processed/data"   # Processed data
myPROJDIR="/path/to/project"          # Project directory
myTCRScriptDIR="/path/to/scripts"     # Analysis scripts
mySampleFile="barcode.txt"            # Sample barcodes

# Resource allocation
h_vmem="10G"                         # Memory per core
N_CPUS=6                            # Number of CPU cores

Input Data Requirements

FastQ Files

Paired-end reads
Naming convention: *R1_001.fastq.gz, *R2_001.fastq.gz

Barcode File Format

AAGGTTCC    patient1
CCTTAAGG    patient2

Troubleshooting

Common Issues

Memory Issues
- Increase h_vmem in script header
- Process fewer samples in parallel
- Check Java heap settings
MiXCR Errors
- Verify IMGT library installation
- Check input FastQ format
- Validate species parameter
Barcode Splitting Issues
- Verify barcode format
- Check for contamination
- Adjust mismatch tolerance

Error Messages

MiXCR: command not found - Check Java/MiXCR installation
Unable to split samples - Verify barcode file format
Alignment failed - Check input file quality

Performance Optimization

Resource Management
- Adjust CPU allocation
- Optimize memory usage
- Monitor disk I/O
Processing Tips
- Split large batches
- Clean intermediate files
- Use SSD for temporary storage

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this pipeline in your research, please cite:

Jadhav RR, Im SJ, Dixit PY, Tso Fan Yiu, Cao L, Sy MD, Lauer GM, Bernard NF, Wood C, Wilson P, Li C, Goronzy JJ. Loss of T cell progenitor reprogramming potential in aging bone marrow niches. JCI Insight. 2020 Apr 9;5(7):e134356. doi: 10.1172/jci.insight.134356. PMID: 32191644; PMCID: PMC7101137.

You can also cite this repository:

Jadhav R. (2025). TCRseq_Pipeline: A comprehensive pipeline for TCR repertoire analysis.
GitHub repository: https://github.com/rohitrrj/TCRseq_Pipeline

Related Tools

Contributing

Contributions are welcome! Please read the contributing guidelines before submitting pull requests.

Acknowledgments

MiXCR development team
IMGT database maintainers
Supporting institutions and funding

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
TCR_scripts		TCR_scripts
MixR_pipeline_human.sh		MixR_pipeline_human.sh
README.md		README.md
ReadMe.txt		ReadMe.txt
Test.txt		Test.txt
conf.txt		conf.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCR-seq Analysis Pipeline

Table of Contents

Features

Prerequisites

Installation

Usage

Pipeline Steps

Output Structure

MiXCR Parameters

Alignment

Assembly

Export

Configuration

Input Data Requirements

FastQ Files

Barcode File Format

Troubleshooting

Common Issues

Error Messages

Performance Optimization

License

Citation

Related Tools

Contributing

Acknowledgments

About

Releases

Packages

Languages

rohitrrj/TCRseq_Pipeline

Folders and files

Latest commit

History

Repository files navigation

TCR-seq Analysis Pipeline

Table of Contents

Features

Prerequisites

Installation

Usage

Pipeline Steps

Output Structure

MiXCR Parameters

Alignment

Assembly

Export

Configuration

Input Data Requirements

FastQ Files

Barcode File Format

Troubleshooting

Common Issues

Error Messages

Performance Optimization

License

Citation

Related Tools

Contributing

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages