Skip to content

Commit

Permalink
script to extract fusion-supporting reads
Browse files Browse the repository at this point in the history
  • Loading branch information
suhrig committed Apr 4, 2021
1 parent 50bef0e commit baa88b8
Show file tree
Hide file tree
Showing 4 changed files with 78 additions and 2 deletions.
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,16 @@ Please refer to the [user manual](http://arriba.readthedocs.io/en/latest/) for i
- [Multiple transcript variants](https://arriba.readthedocs.io/en/latest/interpretation-of-results/#multiple-transcript-variants)
- [Cohort analysis](https://arriba.readthedocs.io/en/latest/interpretation-of-results/#cohort-analysis)

8. [Current limitations](https://arriba.readthedocs.io/en/latest/current-limitations/)
8. [Utility scripts](https://arriba.readthedocs.io/en/latest/utility-scripts/)

- [Extracting fusion-supporting alignments](https://arriba.readthedocs.io/en/latest/utility-scripts/#extracting-fusion-supporting-alignments)

9. [Current limitations](https://arriba.readthedocs.io/en/latest/current-limitations/)

- [Intragenic deletions](https://arriba.readthedocs.io/en/latest/current-limitations/#intragenic-deletions)
- [Memory consumption](https://arriba.readthedocs.io/en/latest/current-limitations/#memory-consumption)

9. [Internal algorithm](https://arriba.readthedocs.io/en/latest/internal-algorithm/)
10. [Internal algorithm](https://arriba.readthedocs.io/en/latest/internal-algorithm/)

- [Read-level filters](https://arriba.readthedocs.io/en/latest/internal-algorithm/#read-level-filters)
- [Event-level filters](https://arriba.readthedocs.io/en/latest/internal-algorithm/#event-level-filters)
Expand Down
15 changes: 15 additions & 0 deletions documentation/utility-scripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
The folder `scripts` contains small utility scripts for common tasks related to fusion detection.

Extracting fusion-supporting alignments
---------------------------------------

**Usage:**

```
extract_fusion-supporting_alignments.sh fusions.tsv Aligned.sortedByCoord.out.bam output_prefix
```

**Description:**

This script takes fusion predictions from Arriba (`fusions.tsv`) and extracts the fusion-supporting alignments listed in the column `read_identifiers` from the given input BAM file (`Aligned.sortedByCoord.out.bam`). The input BAM file must be sorted and indexed. For each fusion, a separate BAM file is created containing only the fusion-supporting alignments. The created BAM files are named after the given output prefix and the rank of the fusion in Arriba's output file.

1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@ pages:
- 'visualization.md'
- 'command-line-options.md'
- 'interpretation-of-results.md'
- 'utility-scripts.md'
- 'current-limitations.md'
- 'internal-algorithm.md'
56 changes: 56 additions & 0 deletions scripts/extract_fusion-supporting_alignments.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash

# parse command-line arguments
if [ $# -ne 3 ]; then
echo Usage: $(basename "$0") fusions.tsv Aligned.sortedByCoord.out.bam output_prefix
echo
echo "Description: This script takes fusion predictions from Arriba (fusions.tsv) and extracts the fusion-supporting alignments listed in the column 'read_identifiers' from the given input BAM file (Aligned.sortedByCoord.out.bam). The input BAM file must be sorted and indexed. For each fusion, a separate BAM file is created containing only the fusion-supporting alignments. The created BAM files are named after the given output prefix and the rank of the fusion in Arriba's output file."
exit 1
fi 1>&2
FUSIONS_FILE="$1"
ALIGNMENTS="$2"
OUTPUT_PREFIX="$3"

# tell bash to abort on error
set -e -u -o pipefail

# make sure required software is installed
if ! [[ $(samtools --version-only 2> /dev/null) =~ ^1\. ]]; then
echo "samtools >= 1.0 must be installed" 1>&2
exit 1
fi

ID=0
SEARCH_WINDOW=1000000 # search this many bases up- and downstream of breakpoint for alignments
tail -n+2 "$FUSIONS_FILE" | while read LINE; do
ID=$((ID+1))
echo "Extracting alignments of fusion $ID"

# extract columns from Arriba output line
BREAKPOINT1=$(cut -f5 <<<"$LINE")
BREAKPOINT2=$(cut -f6 <<<"$LINE")
CHROMOSOME1="${BREAKPOINT1%:*}"
CHROMOSOME2="${BREAKPOINT2%:*}"
POSITION1="${BREAKPOINT1##*:}"
POSITION2="${BREAKPOINT2##*:}"
if [ "$POSITION1" -lt $SEARCH_WINDOW ]; then POSITION1=$SEARCH_WINDOW; fi
if [ "$POSITION2" -lt $SEARCH_WINDOW ]; then POSITION2=$SEARCH_WINDOW; fi

# make a new BAM file containing only the alignments of the given fusion
(
samtools view -H "$ALIGNMENTS"
(
cut -f30 <<<"$LINE" | tr ',' '\n' # extract read identifiers from Arriba output
samtools view "$ALIGNMENTS" $CHROMOSOME1:$((POSITION1-SEARCH_WINDOW))-$((POSITION1+SEARCH_WINDOW))
samtools view "$ALIGNMENTS" $CHROMOSOME2:$((POSITION2-SEARCH_WINDOW))-$((POSITION2+SEARCH_WINDOW))
) |
awk -F '\t' '
!duplicate_line[$0]++ && NF>1 && $1 in reads{print} # extract alignments
NF==1{reads[$0]=1} # remember names of relevant reads
'
) |
samtools view -Sbu - |
samtools sort - | tee "${OUTPUT_PREFIX}_$ID.bam" |
samtools index - "${OUTPUT_PREFIX}_$ID.bam.bai"
done

0 comments on commit baa88b8

Please sign in to comment.