Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
describe workflow in words and how charged vs uncharged reads are classified
  • Loading branch information
lkwhite authored Jan 9, 2025
1 parent 8e8d103 commit e8cedda
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,11 @@ See [README.md in the config directory](https://github.com/rnabioco/aa-tRNA-seq-

## Workflow overview

The workflow for the aa-tRNA-seq pipeline is illustrated below. This Directed Acyclic Graph (DAG) provides an overview of the pipeline structure for a single sample.
Given a directory of pod5 files, this pipeline merges all files from the sample into a single pod5, rebasecalls them to generate an unmapped bam with move table information (for downstream use by Remora), converts the bam into a fastq, and aligns that fastq to a reference containing tRNA + adapter sequences with BWA MEM. The resulting data (pod5s and aligned reads) are then fed to a model trained using Remora to classify charged vs. uncharged reads in the rule `cca_classify`. The final steps of the pipeline calculate a number of outputs that may be useful for analysis and visualization, including normalized counts for charged and uncharged tRNA (`get_cca_trna_cpm`), basecalling error values (`bcerror`), alignment statistics (`align_stats`) and information on raw nanopore signal from Remora (`remora_signal_stats`).

A few notes about Remora classification for charged vs. uncharged tRNA reads: First, this step retains only full length tRNA reads (with an allowance for signal loss at the 5´ end of nanopore direct RNA sequencing). Additionally, due to the iterative nature of sequencing method development, the present approach does not rely on differences in adapter sequences attached to charged vs. uncharged tRNA molecules (though these sequences are retained as separate entries in the alignment reference and downstream files). While we anticipate being able to leverage this information in the future, the current pipeline relies exclusively on signal data over a 6-nt modification kmer spanning the universal CCA 3′ end of tRNA and the first three nucleotides of the 3′ adapter (CCAGGC) to distinguish charged and uncharged reads.

The Directed Acyclic Graph (DAG) below provides an overview of the pipeline structure for a single sample.

![Workflow DAG](https://github.com/rnabioco/aa-tRNA-seq-pipeline/blob/main/workflow/workflow_dag.png)

Expand Down

0 comments on commit e8cedda

Please sign in to comment.