forked from jiantao/Tangram
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
107 lines (80 loc) · 3.92 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
=========================================================================
Tangram 0.3.1 Release Distribution Documentation 2014-02-09
Author: Jiantao Wu ([email protected])
Wan-Ping Lee ([email protected])
Marth Lab [1], Boston College Biology Department
=========================================================================
Introduction
=========================
Tangram is a C/C++ command line toolbox for structural variation(SV)
detection. It takes advantage of both read-pair and split-read algorithms
and is extremely fast and memory-efficient. Powered by the Bamtools API
[3], Tangram can call SV events on multiple BAM files (a population)
simutaneously to increase the sensitivity on low-coverage dataset.
Currently it reports mobile element insertions (MEI). More other SV event
types will be introduced soon. For SNP calling and short INDEL calling,
please check an other toolbox from our lab: FreeBayes[4].
Obtaining and Compiling
=========================
> git clone git://github.com/jiantao/Tangram.git
> cd src
> make
To successfully compile Tangram, it requires:
1. g++ 4.2.0 and above
2. zlib
3. pthread lib
Detection pipeline
=========================
Currently, Tangram contains seven sub-programs:
0. tangram_bam : If the input bam files are not generated by MOSAIK [2],
tangram_bam will add ZA tags that are necessary for the
following steps.
1. tangram_scan : Scan through the bam file and calculate the fragment
length distribution for each library in that bam file.
It will output the fragment length distribution files
for each input bam file.
2. tangram_merge : If more than one bam files need to be scanned, this
program will combine all the fragment length distribution
files together. It will output the merged fragment length
distribution file that enable the detection of multiple
bam files simutaneously. This step is optional if only one
bam file (pooled bam file) was used.
3. tangram_index : Index the normal and special (MEI sequences) reference
file. It will output the indexed refrence file. This step
is required for split read algorithm.
4. tangram_detect : Detect and genotype the SV events from the MOSAIK aligned
BAM files. It will output the unfiltered VCF files.
5. tangram_filter : Filter the raw VCF file generated by the detector.
NOTE: this program requires the windowBed
(from bedtools) [5], Unix sort and grep to be in the
default path.
6. tangram_view_scan_file : Provide functions to view or change the contents
in the lib_table.dat and hist.dat files (in
binary format) that are generated by
tangram_scan. This script can be used for
a sanity check of the input bam files, such
as missing MEI reference names or abnormal
read groups.
The overall detection pipeline for Tangram looks like the following
tangram_bam
(BAM Input)
\
\
tangram_scan \
(BAM Input) \
-----> tangram_detect --> tangram_filter --> VCF file(s)
/ (BAM input)
tangram_index /
(Ref Fasta)
For the detailed usage of each program, please run "$PROGRAM -help"
Bug Report
=========================
Please report bugs using the built-in bug reporting feature in github or
by sending the authors an email.
References
=========================
[1] http://bioinformatics.bc.edu/marthlab/Main_Page
[2] https://github.com/wanpinglee/MOSAIK
[3] https://github.com/pezmaster31/bamtools
[4] https://github.com/ekg/freebayes
[5] http://code.google.com/p/bedtools