diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 1bcbfaf..6a21b94 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-22T01:47:55","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2023-12-22T02:00:53","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/api/consensus/index.html b/dev/api/consensus/index.html index 3c98bf6..b7ec3d0 100644 --- a/dev/api/consensus/index.html +++ b/dev/api/consensus/index.html @@ -3,9 +3,9 @@ reference::NucleotideSeq, variants::Union{AbstractString,AbstractPath}; frequency::Float64=0.5, -)

Get the consensus Haplotypefrom variants applied to reference.

Arguments

Keywords

source
HapLink.consensus_recordMethod
consensus_record(
+)

Get the consensus Haplotypefrom variants applied to reference.

Arguments

  • reference::NucleotideSeq: Sequence of the reference genome that variants were called from
  • variants::Union{AbstractString,AbstractPath}: Path to variant call file that mutations will be applied from

Keywords

  • frequency::Float64=0.5: Fraction of total reads that must have supported the alternate position in order to be included as part of the consensus. In other words, only VCF records that have a AF (allele/alternate frequency) higher than this will be considered to contribute to the consensus.
source
HapLink.consensus_recordMethod
consensus_record(
     reference::Union{AbstractString,AbstractPath},
     variants::Union{AbstractString,AbstractPath};
     frequency::Float64=0.5,
     prefix::Union{AbstractString,Nothing}=nothing,
-)

Get the consensus FASTA.Recordfrom variants applied to the first sequence in reference.

Arguments

  • reference::Union{AbstractString,AbstractPath}: Path to the reference genome that variants were called from. Only the first sequence will be used.
  • variants::Union{AbstractString,AbstractPath}: Path to variant call file that mutations will be applied from

Keywords

  • frequency::Float64=0.5: Fraction of total reads that must have supported the alternate position in order to be included as part of the consensus. In other words, only VCF records that have a AF (allele/alternate frequency) higher than this will be considered to contribute to the consensus.
  • prefix::Union{AbstractString,Nothing}=nothing: Name to give to the output record. By default, the name of the output record will be the same as the name of the input record with _CONSENSUS appended. If prefix is supplied, then the name of the output record will be $(prefix)_CONSENSUS.
source
+)

Get the consensus FASTA.Recordfrom variants applied to the first sequence in reference.

Arguments

Keywords

source diff --git a/dev/api/haplotype/index.html b/dev/api/haplotype/index.html index ead0ab3..892d9fa 100644 --- a/dev/api/haplotype/index.html +++ b/dev/api/haplotype/index.html @@ -1,2 +1,2 @@ -Haplotype Extensions · HapLink.jl
+Haplotype Extensions · HapLink.jl
diff --git a/dev/api/haplotypecall/index.html b/dev/api/haplotypecall/index.html index 678b723..b5901c5 100644 --- a/dev/api/haplotypecall/index.html +++ b/dev/api/haplotypecall/index.html @@ -1,10 +1,10 @@ -HaplotypeCall · HapLink.jl

HaplotypeCall

HapLink.HaplotypeCallType
HaplotypeCall{S,T}

A Haplotype that has had linkage disequilibrium calculations performed along with metadata to verify or refute its validity. Very similar to a VariationCall, but for Haplotypes.

Fields

  • depth::UInt64: How many sequencing reads contain all the Variations in haplotype
  • linkage::Float64: The unweighted linkage disequilibrium coefficient of this call
  • frequency::Float64: How often this haplotype occurs compared to the entire read pool
  • significance::Float64: The $χ^2$ statistical significance ($p$-value) of this call
  • haplotype::Haplotype{S,T}: The set of Variations that are inherited together in this call
source
HapLink.HaplotypeCallMethod
HaplotypeCall(
+HaplotypeCall · HapLink.jl

HaplotypeCall

HapLink.HaplotypeCallType
HaplotypeCall{S,T}

A Haplotype that has had linkage disequilibrium calculations performed along with metadata to verify or refute its validity. Very similar to a VariationCall, but for Haplotypes.

Fields

  • depth::UInt64: How many sequencing reads contain all the Variations in haplotype
  • linkage::Float64: The unweighted linkage disequilibrium coefficient of this call
  • frequency::Float64: How often this haplotype occurs compared to the entire read pool
  • significance::Float64: The $χ^2$ statistical significance ($p$-value) of this call
  • haplotype::Haplotype{S,T}: The set of Variations that are inherited together in this call
source
HapLink.HaplotypeCallMethod
HaplotypeCall(
     haplotype::Haplotype{S,T}, reads::AbstractArray{Haplotype{S,T}}
 ) where {S<:BioSequence,T<:BioSymbol}
 HaplotypeCall(
     haplotype::AbstractArray{Variation{S,T}}, reads::AbstractArray{Haplotype{S,T}}
-) where {S<:BioSequence,T<:BioSymbol}

Calls haplotypes by calculating the linkage disequilibrium of haplotype within reads

source
HapLink.ishaplotypeMethod
function ishaplotype(
+) where {S<:BioSequence,T<:BioSymbol}

Calls haplotypes by calculating the linkage disequilibrium of haplotype within reads

source
HapLink.ishaplotypeMethod
function ishaplotype(
     haplotype::Union{AbstractArray{Variation{S,T}}, Haplotype{S,T}},
     reads::AbstractArray{Haplotype{S,T}};
     frequency::Union{Float64,Nothing}=nothing,
@@ -15,14 +15,14 @@
         }
         {
             \left(\prod_i^N {P_{ref,i} \left(1 - P_{ref,i}\right)}\right)^\frac{2}{N}
-        }\]

where

  • $N$ is the number of Variations in haplotype (i.e., length(haplotype))
  • $n$ is the number of total reads sampled (i.e. length(reads))

The significance is then calculated from the cumulative $χ^2$ distribution function.

source
HapLink.linkageMethod
linkage(counts::AbstractArray{<:Integer})

Calculates the linkage disequilibrium of a haplotype given its $N$-dimensional contingency matrix, counts.

counts is an $N$-dimensional array where the $N$th dimension represents the $N$th variant call locus within a haplotype. findoccurrences produces such an array.

Extended help

linkage(::AbstractArray{<:Integer}) calculates an unweighted linkage disequilibrium as given by Equation (6) of Slatkin (1972).

\[D(1..N) = E\left( \prod_k^N i_k - P_k \right)\]

where

  • $N$ is the number of variant loci
  • $D(1..N)$ is the linkage disequilibrium of all $N$ variant loci
  • $E$ is an operator returning the arithmetic mean of its argument over every read
  • $i_k$ is a function that returns $1$ if the $k$-th locus of the given read contains the reference allele, and returns $0$ otherwise.
  • $P_k$ is the probability of any read containing the reference allele in the $k$-th locus, i.e. the frequency at which the reference allele is found within the entire read set at the $k$-th locus
source
HapLink.linkageMethod
linkage(hc::HaplotypeCall)

Gets the unweighted linkage disequilibrium coefficient of hc

source
HapLink.occurrence_matrixMethod
occurrence_matrix(
+        }\]

where

  • $N$ is the number of Variations in haplotype (i.e., length(haplotype))
  • $n$ is the number of total reads sampled (i.e. length(reads))

The significance is then calculated from the cumulative $χ^2$ distribution function.

source
HapLink.linkageMethod
linkage(hc::HaplotypeCall)

Gets the unweighted linkage disequilibrium coefficient of hc

source
HapLink.linkageMethod
linkage(counts::AbstractArray{<:Integer, N}) where N

Calculates the linkage disequilibrium of a haplotype given its $N$-dimensional contingency matrix, counts.

counts is an $N$-dimensional array where the $N$th dimension represents the $N$th variant call locus within a haplotype. findoccurrences produces such an array.

Extended help

linkage(::AbstractArray{<:Integer, N}) calculates an unweighted linkage disequilibrium as given by Equation (6) of Slatkin (1972).

\[D(1..N) = E\left( \prod_k^N i_k - P_k \right)\]

where

  • $N$ is the number of variant loci
  • $D(1..N)$ is the linkage disequilibrium of all $N$ variant loci
  • $E$ is an operator returning the arithmetic mean of its argument over every read
  • $i_k$ is a function that returns $1$ if the $k$-th locus of the given read contains the reference allele, and returns $0$ otherwise.
  • $P_k$ is the probability of any read containing the reference allele in the $k$-th locus, i.e. the frequency at which the reference allele is found within the entire read set at the $k$-th locus
source
HapLink.occurrence_matrixMethod
occurrence_matrix(
     haplotype::AbstractArray{Variation{S,T}},
     reads::AbstractArray{Haplotype{S,T}},
 ) where {S<:BioSequence,T<:BioSymbol}
 occurrence_matrix(
     haplotype::Haplotype{S,T},
     reads::AbstractArray{S,T}
-) where {S<:BioSequence,T<:BioSymbol}

Determine how many times the variants in haplotype appear in reads as an $N$- dimensional matrix.

Arguments

  • haplotype::AbstractArray{Variation}: A Vector of Variations or a Haplotype to search for as a haplotype
  • reads::AbstractArray{Haplotype}: The reads to search for haplotype in

Returns

  • 2x2x... Array{Int, length(haplotype)}: An $N$-dimensional matrix where $N$ is the number of variant positions in readmatches. The $1$ index position in the $n$th dimension represents the number of times the $n$th variant position was found to have the reference base called, while the $2$ index position represents the number of times the $n$th variant position was found to have the alternate base called. E.g. first(occurrence_matrix(reads)) gives the number of times the all-reference base haplotype was found in reads, while occurrence_matrix(reads)[end] gives the number of times the all-alternate base haplotype was found.
source
HapLink.sumslicedFunction
sumsliced(A::AbstractArray, dim::Int, pos::Int=1)

Sum all elements that are that can be referenced by pos in the dim dimension of A.

Example

julia> A = reshape(1:8, 2, 2, 2)
+) where {S<:BioSequence,T<:BioSymbol}

Determine how many times the variants in haplotype appear in reads as an $N$- dimensional matrix.

Arguments

  • haplotype::AbstractArray{Variation}: A Vector of Variations or a Haplotype to search for as a haplotype
  • reads::AbstractArray{Haplotype}: The reads to search for haplotype in

Returns

  • 2x2x... Array{Int, length(haplotype)}: An $N$-dimensional matrix where $N$ is the number of variant positions in readmatches. The $1$ index position in the $n$th dimension represents the number of times the $n$th variant position was found to have the reference base called, while the $2$ index position represents the number of times the $n$th variant position was found to have the alternate base called. E.g. first(occurrence_matrix(reads)) gives the number of times the all-reference base haplotype was found in reads, while occurrence_matrix(reads)[end] gives the number of times the all-alternate base haplotype was found.
source
HapLink.sumslicedFunction
sumsliced(A::AbstractArray, dim::Int, pos::Int=1)

Sum all elements that are that can be referenced by pos in the dim dimension of A.

Example

julia> A = reshape(1:8, 2, 2, 2)
 2×2×2 reshape(::UnitRange{Int64}, 2, 2, 2) with eltype Int64:
 [:, :, 1] =
  1  3
@@ -36,4 +36,4 @@
 14
 
 julia> sumsliced(A, 2, 2)
-22

Heavily inspired by Holy, Tim "Multidimensional algorithms and iteration" https://julialang.org/blog/2016/02/iteration/#filtering_along_a_specified_dimension_exploiting_multiple_indexes

source
+22

Heavily inspired by Holy, Tim "Multidimensional algorithms and iteration" https://julialang.org/blog/2016/02/iteration/#filtering_along_a_specified_dimension_exploiting_multiple_indexes

source
diff --git a/dev/api/private/index.html b/dev/api/private/index.html index 47d603e..ddd915c 100644 --- a/dev/api/private/index.html +++ b/dev/api/private/index.html @@ -1,16 +1,16 @@ -Private API Reference · HapLink.jl

Private API Reference

FASTX.FASTA.RecordMethod
FASTX.FASTA.Record(ps::Pseudoread; prefix::AbstractString="", is_consensus::Bool=false)

Specialized constructor for outputting Pseudoreads to FASTA.Records that can be written to files.

Arguments

  • ps::Pseudoread: The modified sequence and read window to be converted into the sequence of the new record

Keywords

  • prefix::AbstractString="": A string to start the sequence identifier with, usually based on the reference sequence of ps
  • is_consensus::Bool=false: Normally, the new identifier of the record is a combination of the prefix and the SHA1 hash of the alternate sequence. Set is_consensus to true to instead use the word "CONSENSUS" in place of the hash
source
SequenceVariation.HaplotypeMethod
SequenceVariation.Haplotype(
+Private API Reference · HapLink.jl

Private API Reference

FASTX.FASTA.RecordMethod
FASTX.FASTA.Record(ps::Pseudoread; prefix::AbstractString="", is_consensus::Bool=false)

Specialized constructor for outputting Pseudoreads to FASTA.Records that can be written to files.

Arguments

  • ps::Pseudoread: The modified sequence and read window to be converted into the sequence of the new record

Keywords

  • prefix::AbstractString="": A string to start the sequence identifier with, usually based on the reference sequence of ps
  • is_consensus::Bool=false: Normally, the new identifier of the record is a combination of the prefix and the SHA1 hash of the alternate sequence. Set is_consensus to true to instead use the word "CONSENSUS" in place of the hash
source
SequenceVariation.HaplotypeMethod
SequenceVariation.Haplotype(
     query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq
-) -> SequenceVariation.Haplotype

Specialized constructor that allows converting the alignment in a XAM.Record into a Haplotype.

source
HapLink._cigar_betweenMethod
_cigar_between(x::Variation{S,T}, y::Variation{S,T}) where {S,T}

Returns a CIGAR operation for the (assumed) matching bases between x and y.

See also _cigar

source
HapLink._indexed_readerMethod
_indexed_reader(bam::Union{AbstractPath,AbstractString}, int::Interval)

Provides an iterator over records in bam that overlap with int. Note that if no index file is available for bam, then the return value will fall back to an iterator of all records in bam.

source
HapLink._lendiffMethod
_lendiff(ps::Pseudoread)

Gets the difference between the number of bases within the read window of ps on the reference sequence from the number of bases within the read window on the mutated sequence, taking into account insertions and deletions.

source
HapLink._phrederrorMethod
_phrederror(quality::Number)

Converts a PHRED33-scaled error number into the expected fractional error of basecall

source
HapLink._pos_to_edgeMethod
_pos_to_edge(pos::Number)

Converts pos from a number between 0 (beginning) and 1 (end) to a number ranging from 0 (beginning) to 1 (middle) to 0 (end), i.e. convert a relative position to a relative distance from the edge.

source
HapLink._posinMethod
_posin(ps::Pseudoread, v::Variation)

Determines if the positions that make up v are wholly contained by ps

source
HapLink._push_filter!Function
_push_filter!(
+) -> SequenceVariation.Haplotype

Specialized constructor that allows converting the alignment in a XAM.Record into a Haplotype.

source
HapLink._cigar_betweenMethod
_cigar_between(x::Variation{S,T}, y::Variation{S,T}) where {S,T}

Returns a CIGAR operation for the (assumed) matching bases between x and y.

See also _cigar

source
HapLink._indexed_readerMethod
_indexed_reader(bam::Union{AbstractPath,AbstractString}, int::Interval)

Provides an iterator over records in bam that overlap with int. Note that if no index file is available for bam, then the return value will fall back to an iterator of all records in bam.

source
HapLink._lendiffMethod
_lendiff(ps::Pseudoread)

Gets the difference between the number of bases within the read window of ps on the reference sequence from the number of bases within the read window on the mutated sequence, taking into account insertions and deletions.

source
HapLink._phrederrorMethod
_phrederror(quality::Number)

Converts a PHRED33-scaled error number into the expected fractional error of basecall

source
HapLink._pos_to_edgeMethod
_pos_to_edge(pos::Number)

Converts pos from a number between 0 (beginning) and 1 (end) to a number ranging from 0 (beginning) to 1 (middle) to 0 (end), i.e. convert a relative position to a relative distance from the edge.

source
HapLink._posinMethod
_posin(ps::Pseudoread, v::Variation)

Determines if the positions that make up v are wholly contained by ps

source
HapLink._push_filter!Function
_push_filter!(
     vc::VariationCall,
     label::Char,
     value::Union{Nothing,Number},
     filter::Function=(var, val) -> true,
-)

Adds a FILTER entry to vc of the form "$label$value" if filter returns true.

Arguments

  • vc::VariationCall: The VariationCall to annotate
  • label::Char: The first character of the filter text to add
  • value::Union{Nothing,Number}: The value to compare vc against, and to append to the filter text. If set to nothing, _push_filter! will return without evaluating or adding any filters, which may be useful for processing multiple inputs
  • filter::Function=(var, val) -> true: A function handle to determine if a filter should be applied or not. Note that this function must return true if and only if the filter should be added to vc. The function will be passed vc and value. Defaults to always applying filters regardless of the values of vc and value.
source
HapLink.magnitudeMethod
magnitude(x::UnitRange)

Gets the difference between the last and first elements of x

Example

julia> using HapLink: magnitude
+)

Adds a FILTER entry to vc of the form "$label$value" if filter returns true.

Arguments

  • vc::VariationCall: The VariationCall to annotate
  • label::Char: The first character of the filter text to add
  • value::Union{Nothing,Number}: The value to compare vc against, and to append to the filter text. If set to nothing, _push_filter! will return without evaluating or adding any filters, which may be useful for processing multiple inputs
  • filter::Function=(var, val) -> true: A function handle to determine if a filter should be applied or not. Note that this function must return true if and only if the filter should be added to vc. The function will be passed vc and value. Defaults to always applying filters regardless of the values of vc and value.
source
HapLink.magnitudeMethod
magnitude(x::UnitRange)

Gets the difference between the last and first elements of x

Example

julia> using HapLink: magnitude
 
 julia> magnitude(0:10)
 10
-
source
HapLink.overlapMethod
overlap(x::UnitRange{S}, y::UnitRange{S}) where {S}

Finds the inclusive overlap interval of x and y

Example

julia> using HapLink: overlap
+
source
HapLink.overlapMethod
overlap(x::UnitRange{S}, y::UnitRange{S}) where {S}

Finds the inclusive overlap interval of x and y

Example

julia> using HapLink: overlap
 
 julia> overlap(1:5, 3:10)
 3:5
@@ -19,7 +19,7 @@
 6:5
 
 julia> overlap(1:10, 2:9)
-2:9
source
HapLink.findsetFunction
findset(lst::AbstractArray{T}, comparator::Function) where {T}

Finds every possible set of items in lst where comparator returns true for the set.

Example

function divisible_by_same_number(x, i)
+2:9
source
HapLink.findsetFunction
findset(lst::AbstractArray{T}, comparator::Function) where {T}

Finds every possible set of items in lst where comparator returns true for the set.

Example

function divisible_by_same_number(x, i)
     for j in 2:i
         if all(y -> y % j == 0, x)
             return true
@@ -36,4 +36,4 @@
  [5, 10]
  [1]
  [7]
- [11]
source
+ [11]
source
diff --git a/dev/api/psuedoread/index.html b/dev/api/psuedoread/index.html index bdf5af6..f617248 100644 --- a/dev/api/psuedoread/index.html +++ b/dev/api/psuedoread/index.html @@ -1,6 +1,6 @@ Pseudoread · HapLink.jl

PseudoRead

HapLink.PseudoreadType
Pseudoread(startpos::Integer, endpos::Integer, read::Haplotype)
-Pseudoread(query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq)

A stand-in struct for a sequencing read (real or imaginary) represented by its position and differences relative to a reference sequence.

source
HapLink.contradictsMethod
contradicts(var::Variation{S,T}, ref::Haplotype{S,T}) where {S,T}

Returns true if var contains a modification incompatible with any of the Variations that make up ref.

Example

julia> using BioSequences, SequenceVariation
+Pseudoread(query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq)

A stand-in struct for a sequencing read (real or imaginary) represented by its position and differences relative to a reference sequence.

source
HapLink.contradictsMethod
contradicts(var::Variation{S,T}, ref::Haplotype{S,T}) where {S,T}

Returns true if var contains a modification incompatible with any of the Variations that make up ref.

Example

julia> using BioSequences, SequenceVariation
 
 julia> ref = Haplotype(dna"GATTACA", [Variation(dna"GATTACA", "A2T"), Variation(dna"GATTACA", "Δ5-6")]);
 
@@ -20,7 +20,7 @@
 false
 
 julia> contradicts(Variation(dna"GATTACA", "4G"), ref)
-false
source
HapLink.haplotypeMethod
haplotype(ps::Pseudoread)

Gets the differences between ps and its reference sequence as a Haplotype

source
HapLink.haplotypeMethod
haplotype(ps::Pseudoread)

Gets the differences between ps and its reference sequence as a Haplotype

source
HapLink.overlap_inrangeMethod
overlap_inrange(
     x::Pseudoread, y::Pseudoread, overlap_min::Integer, overlap_max::Integer
 )

Returns true if the overlap between x and y is within overlap_min and overlap_max, returns false otherwise

Example

julia> using BioSequences, SequenceVariation
 
@@ -35,11 +35,11 @@
 false
 
 julia> overlap_inrange(ps1, ps2, 1, 25)
-false
source
HapLink.pseudoreadsMethod
pseudoreads(sam::Union{AbstractString,AbstractPath}, consensus::NucleotideSeq)

Create a set of Pseudoreads from the alignments present in sam when aligned against consensus.

source
HapLink.pseudoreadsMethod
pseudoreads(sam::Union{AbstractString,AbstractPath}, consensus::NucleotideSeq)

Create a set of Pseudoreads from the alignments present in sam when aligned against consensus.

source
HapLink.simulateMethod
simulate(
     pool::AbstractArray{Pseudoread},
     refseq::BioSequence,
     subcon::AbstractArray{Variation{S,T}};
     reverse_order::Union{Bool,Nothing}=nothing,
     overlap_min::Int64=0,
     overlap_max::Int64=500,
-) where {S,T} -> Union{Haplotype{S,T},Nothing}

Creates a new set of Pseudoreads based on the pool of real reads spliced spliced together using maximum likelihood simulation.

Arguments

  • pool::AbstractArray{Pseudoread}: A set of (presumably) real sequencing reads that will serve as pieces of the new simulated reads.
  • refseq::BioSequence: The reference against which all of the sequences in pool and subcon were aligned
  • subcon::AbstractArray{Variation{S,T}}: The subconsensus Variations which are known to be both present in pool and real (i.e., not due to sequencing errors)

Options

  • reverse_order::Union{Bool,Nothing}=nothing: Whether the splicing should start at the beginning or end of the reference sequence. If left blank, will randomly decide.
  • overlap_min::Int64=0: The amount that reads from pool are required to overlap in order to be spliced together. Can be negative, indicating that reads must be spaced apart by this amount, instead. -overlap_max::Int64=500: The amount that reads from pool can overlap before being discarded. Like overlap_min, can be negative.

Returns

A Haplotype{S,T} representing the simulated read, or nothing if the simulation encountered a dead end.

Extended help

The simulation procedure works by picking a read from pool that contains the first (if reverse_order is false, the last otherwise) variant from subcon. The procedure examines every position in subcon in ascending (or descending, if reverse_order == true) order until the chosen read no longer carries that position. The simulation will then, at random, pick a read from pool that contains the next position from subcon and that also has matching variations at every position in subcon as the previous read, as well as meets the overlap requirements. These two reads may differ in sites other than those in subcon: it is assumed that appropriate variant calling has already been performed and that any variation in those sites is due to sequencing error. The procedure repeats for each new read from pool until the simulation has simulated every position of subcon, or until it reaches a dead-end and cannot find a read that matches every requirement.

source
HapLink.variations_matchMethod
variations_match(reference::Pseudoread, query::Union{Pseudoread,Haplotype})

Returns true if query could be a read from reference.

Additional Variations found within query that are not present in reference do not invalid a positive match result so long as none of those Variations contradicts reference. These Variations can either

  1. Extend beyond the length of reference, or
  2. Exist as "sequencing errors" within the interval of reference

On the other hand, every Variation within the overlapping interval of reference and query that is present in reference must also be found in query. In other words, query must have all of reference (within the overlap), but reference does not need all of query.

This arrangment allows for the creation of bodies of matching Pseudoreads, as done via simulate.

source
+) where {S,T} -> Union{Haplotype{S,T},Nothing}

Creates a new set of Pseudoreads based on the pool of real reads spliced spliced together using maximum likelihood simulation.

Arguments

Options

Returns

A Haplotype{S,T} representing the simulated read, or nothing if the simulation encountered a dead end.

Extended help

The simulation procedure works by picking a read from pool that contains the first (if reverse_order is false, the last otherwise) variant from subcon. The procedure examines every position in subcon in ascending (or descending, if reverse_order == true) order until the chosen read no longer carries that position. The simulation will then, at random, pick a read from pool that contains the next position from subcon and that also has matching variations at every position in subcon as the previous read, as well as meets the overlap requirements. These two reads may differ in sites other than those in subcon: it is assumed that appropriate variant calling has already been performed and that any variation in those sites is due to sequencing error. The procedure repeats for each new read from pool until the simulation has simulated every position of subcon, or until it reaches a dead-end and cannot find a read that matches every requirement.

source
HapLink.variations_matchMethod
variations_match(reference::Pseudoread, query::Union{Pseudoread,Haplotype})

Returns true if query could be a read from reference.

Additional Variations found within query that are not present in reference do not invalid a positive match result so long as none of those Variations contradicts reference. These Variations can either

  1. Extend beyond the length of reference, or
  2. Exist as "sequencing errors" within the interval of reference

On the other hand, every Variation within the overlapping interval of reference and query that is present in reference must also be found in query. In other words, query must have all of reference (within the overlap), but reference does not need all of query.

This arrangment allows for the creation of bodies of matching Pseudoreads, as done via simulate.

source
diff --git a/dev/api/variation/index.html b/dev/api/variation/index.html index 2919faf..d150c48 100644 --- a/dev/api/variation/index.html +++ b/dev/api/variation/index.html @@ -1,9 +1,9 @@ -Variation Extensions · HapLink.jl

Variation Methods

These methods extend the functionality of SequenceVariation.Variation for calculation of data related to Variations created from NGS read alignments.

HapLink.qualityMethod
quality(v::Variation, r::Union{SAM.Record,BAM.Record}) -> Float64

Get the phred-scalled basecall quality of v within the sequencing read of r.

source
HapLink.relativeposMethod
relativepos(v::Variation, r::Union{SAM.Record,BAM.Record})

Calculates the fractional position of v within the sequence of r. If v is out-of-bounds of r, then will return 0 for positions before r and 1 for positions after r.

source
HapLink.seqposMethod
seqpos(v::Variation, a::Union{Alignment,AlignedSequence,PairwiseAlignment})

Get the position of v in the sequence of a.

Example

using BioAlignments, BioSequences, SequenceVariation
+Variation Extensions · HapLink.jl

Variation Methods

These methods extend the functionality of SequenceVariation.Variation for calculation of data related to Variations created from NGS read alignments.

HapLink.qualityMethod
quality(v::Variation, r::Union{SAM.Record,BAM.Record}) -> Float64

Get the phred-scalled basecall quality of v within the sequencing read of r.

source
HapLink.relativeposMethod
relativepos(v::Variation, r::Union{SAM.Record,BAM.Record})

Calculates the fractional position of v within the sequence of r. If v is out-of-bounds of r, then will return 0 for positions before r and 1 for positions after r.

source
HapLink.seqposMethod
seqpos(v::Variation, a::Union{Alignment,AlignedSequence,PairwiseAlignment})

Get the position of v in the sequence of a.

Example

using BioAlignments, BioSequences, SequenceVariation
 v = Variation(dna"AAAAA", "A3T")
 a = Alignment("2=1X2=", 1, 1)
 seqpos(v, a)
 
 # output
 
-3
source
HapLink.subconsensus_variationsMethod
subconsensus_variations(vcf::Union{AbstractPath,AbstractString}, consensus::Haplotype)

Get a Vector{Variation} with passing variant calls from vcf that do not appear in consensus

source
HapLink.variationMethod
variation(r::VCF.Record, refseq::NucleotideSeq)

Construct a Variation from r applying to refseq. There is no validation that r's actually describes a mutation in refseq.

source
+3
source
HapLink.subconsensus_variationsMethod
subconsensus_variations(vcf::Union{AbstractPath,AbstractString}, consensus::Haplotype)

Get a Vector{Variation} with passing variant calls from vcf that do not appear in consensus

source
HapLink.variationMethod
variation(r::VCF.Record, refseq::NucleotideSeq)

Construct a Variation from r applying to refseq. There is no validation that r's actually describes a mutation in refseq.

source
diff --git a/dev/api/variationcall/index.html b/dev/api/variationcall/index.html index 2d0f197..4f682f1 100644 --- a/dev/api/variationcall/index.html +++ b/dev/api/variationcall/index.html @@ -1,5 +1,5 @@ -VariationCall · HapLink.jl

VariationCall

HapLink.VariationCallType
VariationCall

Represents a Variation that is supported by aligned reads with sufficient metadata to support or refute its validity. It is designed to be converted into a line in Variant Call Format, or a VCF.Record.

Fields

  • variation::Variation: The Variation of this call
  • quality::Union{Nothing,Number}: Phred quality of the basecall for variation
  • filter::Vector{String}: Indicator if variation has passed filters and is actually a variant call, or a list of criteria that have failed it
  • depth::Union{Nothing,UInt}: The number of reads that cover leftposition(variation)
  • strandbias::Union{Nothing,Float64}: The fraction of times variation appears on a positive strand
  • altdepth::Union{Nothing,UInt}: The number of types variation occurs
  • readpos::Union{Nothing,UInt}: The averagerelative position of variation in each read
  • pvalue::Union{Nothing,Float64}: The Fisher's Exact Test $p$-value of this call
source
HapLink.call_variantMethod
call_variant(
+VariationCall · HapLink.jl

VariationCall

HapLink.VariationCallType
VariationCall

Represents a Variation that is supported by aligned reads with sufficient metadata to support or refute its validity. It is designed to be converted into a line in Variant Call Format, or a VCF.Record.

Fields

  • variation::Variation: The Variation of this call
  • quality::Union{Nothing,Number}: Phred quality of the basecall for variation
  • filter::Vector{String}: Indicator if variation has passed filters and is actually a variant call, or a list of criteria that have failed it
  • depth::Union{Nothing,UInt}: The number of reads that cover leftposition(variation)
  • strandbias::Union{Nothing,Float64}: The fraction of times variation appears on a positive strand
  • altdepth::Union{Nothing,UInt}: The number of types variation occurs
  • readpos::Union{Nothing,UInt}: The averagerelative position of variation in each read
  • pvalue::Union{Nothing,Float64}: The Fisher's Exact Test $p$-value of this call
source
HapLink.call_variantMethod
call_variant(
     pileup::VariationPileup,
     α::Float64;
     D::Union{Nothing,Int}=nothing,
@@ -7,4 +7,4 @@
     X::Union{Nothing,Float64}=nothing,
     F::Union{Nothing,Float64}=nothing,
     S::Union{Nothing,Float64}=nothing,
-) -> VariationCall

Calls variant from a pileup.

Arguments

  • pileup::VariationPileup: The pileup to call variants from
  • α::Float64: Fisher's Exact Test significance ($α$) level to call variants

Keywords

Note

Leave any keyword undefined to skip filtering based on that field

  • D::Union{Nothing,Int}=nothing: Minimum total read depth for a variant to be called
  • Q::Union{Nothing,Float64}=nothing: Minimum average Phred quality for a variant to be called
  • X::Union{Nothing,Float64}=nothing: Maximum average distance variant can be from read edge to be called
  • F::Union{Nothing,Float64}=nothing: Minimum alternate frequency for a variant to be called
  • S::Union{Nothing,Float64}=nothing: Maximum proportion of the number of times variant can appear on one strand versus the other
source
HapLink.filtersMethod
filters(vc::VariationCall) -> Vector{String}

Gets all filters that have been applied to vc. Note that an empty FILTER entry is not permitted under the VCF spec, and an empty array should not automatically be considered to have PASSed all filters.

source
HapLink.p_valueMethod
p_value(vc::VariationCall) -> Union{Nothing,Float64}

Gets the $p$-value of the observed statistic of vc. Returns nothing if unknown.

See also variation_test

source
HapLink.variation_testMethod
variation_test(depth::Int, altdepth::Int, quality::Float64)

Conducts a Fisher's Exact Test to deterimine the likelihood of a variant with total depth and variation depth altdepth occuring, given an average basecall quality. Returns the $p$-value of the test.

source
HapLink.vcfMethod
vcf(vc::VariationCall, refname::AbstractString) -> VCF.Record

Converts vc into a VCF.Record. refname is required and used as the CHROM field in the record.

source
+) -> VariationCall

Calls variant from a pileup.

Arguments

  • pileup::VariationPileup: The pileup to call variants from
  • α::Float64: Fisher's Exact Test significance ($α$) level to call variants

Keywords

Note

Leave any keyword undefined to skip filtering based on that field

  • D::Union{Nothing,Int}=nothing: Minimum total read depth for a variant to be called
  • Q::Union{Nothing,Float64}=nothing: Minimum average Phred quality for a variant to be called
  • X::Union{Nothing,Float64}=nothing: Maximum average distance variant can be from read edge to be called
  • F::Union{Nothing,Float64}=nothing: Minimum alternate frequency for a variant to be called
  • S::Union{Nothing,Float64}=nothing: Maximum proportion of the number of times variant can appear on one strand versus the other
source
HapLink.filtersMethod
filters(vc::VariationCall) -> Vector{String}

Gets all filters that have been applied to vc. Note that an empty FILTER entry is not permitted under the VCF spec, and an empty array should not automatically be considered to have PASSed all filters.

source
HapLink.p_valueMethod
p_value(vc::VariationCall) -> Union{Nothing,Float64}

Gets the $p$-value of the observed statistic of vc. Returns nothing if unknown.

See also variation_test

source
HapLink.variation_testMethod
variation_test(depth::Int, altdepth::Int, quality::Float64)

Conducts a Fisher's Exact Test to deterimine the likelihood of a variant with total depth and variation depth altdepth occuring, given an average basecall quality. Returns the $p$-value of the test.

source
HapLink.vcfMethod
vcf(vc::VariationCall, refname::AbstractString) -> VCF.Record

Converts vc into a VCF.Record. refname is required and used as the CHROM field in the record.

source
diff --git a/dev/api/variationinfo/index.html b/dev/api/variationinfo/index.html index 38dbb04..7b8f758 100644 --- a/dev/api/variationinfo/index.html +++ b/dev/api/variationinfo/index.html @@ -1,8 +1,8 @@ -VariationInfo · HapLink.jl

VariationInfo

HapLink.VariationInfoType
VariationInfo{S<:BioSequence,T<:BioSymbol}

Represents statistics associated with a SequenceVariation.Variation within an aligned sequencing read.

Fields

  • variation::Variation{S,T}: The Variation this object represents
  • readpos::Float64: The location where variation occurs within a sequencing read
  • quality::Float64: The phred-scaled basecall quality of variation
  • Strand::GenomicFeatures.Strand: Which strand variation appears on
source
HapLink.qualityMethod
quality(vi::VariationInfo) -> Float64

Gets the phred-scaled basecall quality of variation(vi) within a sequencing read

source
HapLink.readposMethod
readpos(vi::VariationInfo) -> Float64

Gets the position of variation(vi) within a sequencing read

source
HapLink.strandMethod
strand(vi::VariationInfo) -> GenomicFeatures.Strand

Gets the strand that variation(vi) appears on

source
HapLink.variationinfosMethod
variationinfos(
+VariationInfo · HapLink.jl

VariationInfo

HapLink.VariationInfoType
VariationInfo{S<:BioSequence,T<:BioSymbol}

Represents statistics associated with a SequenceVariation.Variation within an aligned sequencing read.

Fields

  • variation::Variation{S,T}: The Variation this object represents
  • readpos::Float64: The location where variation occurs within a sequencing read
  • quality::Float64: The phred-scaled basecall quality of variation
  • Strand::GenomicFeatures.Strand: Which strand variation appears on
source
HapLink.qualityMethod
quality(vi::VariationInfo) -> Float64

Gets the phred-scaled basecall quality of variation(vi) within a sequencing read

source
HapLink.readposMethod
readpos(vi::VariationInfo) -> Float64

Gets the position of variation(vi) within a sequencing read

source
HapLink.strandMethod
strand(vi::VariationInfo) -> GenomicFeatures.Strand

Gets the strand that variation(vi) appears on

source
HapLink.variationinfosMethod
variationinfos(
     query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq
 ) -> Vector{VariationInfo}
 variationinfos(
     query::Union{AbstractString,AbstractPath},
     reference::Union{AbstractString,AbstractPath}
-) -> Vector{VariationInfo}

Calls Variations based on the alignments in query against reference, and returns every variation call found within query as a Vector{VariationInfo}

source
+) -> Vector{VariationInfo}

Calls Variations based on the alignments in query against reference, and returns every variation call found within query as a Vector{VariationInfo}

source
diff --git a/dev/api/variationpileup/index.html b/dev/api/variationpileup/index.html index fdfe9d3..2e06bef 100644 --- a/dev/api/variationpileup/index.html +++ b/dev/api/variationpileup/index.html @@ -1,2 +1,2 @@ -VariationPileup · HapLink.jl

VariationPileup

HapLink.VariationPileupType
VariationPileup

Summarizes the basecalls that support a Variation within a set of alignments.

Fields

  • variation::Variation: The Variation of this pileup entry
  • depth::UInt: The number of times the position of variation appears in this set of alignments
  • readpos::Vector{Float64}: The relative positions of variation within each read
  • quality::Vector{Float64}: The phred quality of variation within each read
  • strand::Vector{Strand}: Which strand each variation is found on
source
HapLink.depthMethod
depth(vp::VariationPileup)

Gets the number of times the position of vp.variation appears total (variant and wild-type)

source
HapLink.frequencyMethod
frequency(vp::VariationPileup)

Gets the alternate allele frequency of vp.variation

source
HapLink.pileupMethod
pileup(sam::AbstractPath, ref::AbstractPath) -> Vector{VariationPileup}

Generates a pileup of Variations based on the alignments in sam aligned against ref.

source
HapLink.strand_biasMethod
strand_bias(vp::VariationPileup)

Gets the frequency of positive strands that variation appears on relative to all variation reads

source
+VariationPileup · HapLink.jl

VariationPileup

HapLink.VariationPileupType
VariationPileup

Summarizes the basecalls that support a Variation within a set of alignments.

Fields

  • variation::Variation: The Variation of this pileup entry
  • depth::UInt: The number of times the position of variation appears in this set of alignments
  • readpos::Vector{Float64}: The relative positions of variation within each read
  • quality::Vector{Float64}: The phred quality of variation within each read
  • strand::Vector{Strand}: Which strand each variation is found on
source
HapLink.depthMethod
depth(vp::VariationPileup)

Gets the number of times the position of vp.variation appears total (variant and wild-type)

source
HapLink.frequencyMethod
frequency(vp::VariationPileup)

Gets the alternate allele frequency of vp.variation

source
HapLink.pileupMethod
pileup(sam::AbstractPath, ref::AbstractPath) -> Vector{VariationPileup}

Generates a pileup of Variations based on the alignments in sam aligned against ref.

source
HapLink.strand_biasMethod
strand_bias(vp::VariationPileup)

Gets the frequency of positive strands that variation appears on relative to all variation reads

source
diff --git a/dev/cli/consensus/index.html b/dev/cli/consensus/index.html index 2492eb0..db4051c 100644 --- a/dev/cli/consensus/index.html +++ b/dev/cli/consensus/index.html @@ -1,2 +1,2 @@ -haplink consensus · HapLink.jl

haplink consensus

HapLink.consensusFunction
haplink consensus [options] reference variants

Convert variant calls to consensus sequence

Introduction

Generates a consensus sequence based on a reference genome and previously called variants. Will only consider variants with a "PASS" filter to be able to contribute to the consensus. Outputs results in FASTA format.

Arguments

  • reference: The path to the reference genome. Must be in FASTA format.
  • variants: The path to the variant calls. Must be in VCF v4 format.

Options

  • --outfile=<path>: The file to write the consensus sequence to. If left blank, the consensus sequence is written to standard output.
  • --frequency=<float>: The minimum frequency at which a variant must appear to be considered part of the consensus. Note that HapLink does not support ambigous base calling (e.g. N, R, Y, etc.) at low frequencies unlike many variant callers.
  • --prefix=<string>: Name of the new sequence. Defaults to using the FASTA identifier of the reference sequence.
source
+haplink consensus · HapLink.jl

haplink consensus

HapLink.consensusFunction
haplink consensus [options] reference variants

Convert variant calls to consensus sequence

Introduction

Generates a consensus sequence based on a reference genome and previously called variants. Will only consider variants with a "PASS" filter to be able to contribute to the consensus. Outputs results in FASTA format.

Arguments

  • reference: The path to the reference genome. Must be in FASTA format.
  • variants: The path to the variant calls. Must be in VCF v4 format.

Options

  • --outfile=<path>: The file to write the consensus sequence to. If left blank, the consensus sequence is written to standard output.
  • --frequency=<float>: The minimum frequency at which a variant must appear to be considered part of the consensus. Note that HapLink does not support ambigous base calling (e.g. N, R, Y, etc.) at low frequencies unlike many variant callers.
  • --prefix=<string>: Name of the new sequence. Defaults to using the FASTA identifier of the reference sequence.
source
diff --git a/dev/cli/haplotypes/index.html b/dev/cli/haplotypes/index.html index 0649fbc..d032bc1 100644 --- a/dev/cli/haplotypes/index.html +++ b/dev/cli/haplotypes/index.html @@ -1,2 +1,2 @@ -haplink haplotypes · HapLink.jl

haplink haplotypes

HapLink.haplotypesFunction
haplink haplotypes [options] reference variants bam

Call haplotypes

Introduction

Calls haplotypes based on the linkage disequilibrium between subconsensus variant sites on long reads. Variant sites are chosen based on having a "PASS" filter in the variants file, and linkage is calculated based on the reads present in the bam file. Note this means that haplotypes can be called on a different set of sequences than variants were (e.g. variant calling using high accuracy short-read chemistry like Illumina and haplotype calling using low accuracy long-read chemistry like Oxford Nanopore). There are no guarantees that the variants file and bam file match, so use this feature with caution!

Arguments

  • reference: path to the reference genome to call haplotypes against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.
  • variants: path to the variants file that will define variant sites to call haplotypes from. Must be in VCF (not BCF) v4 format. haplink variants generates a compatible file, although output from other tools can also be used.
  • bam: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.

Flags

  • --simulated-reads: Use maximum likelihood simulation of long reads based on overlapping short reads

Options

  • --outfile=<path>: The file to write haplotype calls to. If left blank, haplotype calls are written to standard output.
  • --consensus-frequency=<float>: The minimum frequency at which a variant must appear to be considered part of the consensus.
  • --significance=<float>: The alpha value for statistical significance of haplotype calls.
  • --depth=<int>: Minimum number of times a variant combination must be observed within the set of reads to be called a haplotype
  • --frequency=<float>: The minimum proportion of reads that a variant combination must be observed within compared to all reads covering its position for that haplotype to be called
  • --overlap-min=<int>: The minimum number of bases that must overlap for two short reads to be combined into one simulated read. Can be negative to indicate a minimum distance between reads. Only applies when --simulated-reads is set.
  • --overlap-max=<int>: The maximum number of bases that may overlap for two short reads to be combined into one simulated read. Can be negative to indicate a cap on how far two reads must be apart from one another. Must be greater than --overlap-min. Only applies when --simulated-reads is set.
  • --iterations=<int>: The number of simulated reads to create before calling haplotypes. Only applies when --simulated-reads is set.
  • --seed=<int>: The random seed used for picking short reads to create simulated reads when using maximum likelihood methods. Leaving unset will use the default Julia RNG and seed. See Julia's documentation on randomness for implementation details. Only applies when --simulated-reads is set.
source
+haplink haplotypes · HapLink.jl

haplink haplotypes

HapLink.haplotypesFunction
haplink haplotypes [options] reference variants bam

Call haplotypes

Introduction

Calls haplotypes based on the linkage disequilibrium between subconsensus variant sites on long reads. Variant sites are chosen based on having a "PASS" filter in the variants file, and linkage is calculated based on the reads present in the bam file. Note this means that haplotypes can be called on a different set of sequences than variants were (e.g. variant calling using high accuracy short-read chemistry like Illumina and haplotype calling using low accuracy long-read chemistry like Oxford Nanopore). There are no guarantees that the variants file and bam file match, so use this feature with caution!

Arguments

  • reference: path to the reference genome to call haplotypes against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.
  • variants: path to the variants file that will define variant sites to call haplotypes from. Must be in VCF (not BCF) v4 format. haplink variants generates a compatible file, although output from other tools can also be used.
  • bam: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.

Flags

  • --simulated-reads: Use maximum likelihood simulation of long reads based on overlapping short reads

Options

  • --outfile=<path>: The file to write haplotype calls to. If left blank, haplotype calls are written to standard output.
  • --consensus-frequency=<float>: The minimum frequency at which a variant must appear to be considered part of the consensus.
  • --significance=<float>: The alpha value for statistical significance of haplotype calls.
  • --depth=<int>: Minimum number of times a variant combination must be observed within the set of reads to be called a haplotype
  • --frequency=<float>: The minimum proportion of reads that a variant combination must be observed within compared to all reads covering its position for that haplotype to be called
  • --overlap-min=<int>: The minimum number of bases that must overlap for two short reads to be combined into one simulated read. Can be negative to indicate a minimum distance between reads. Only applies when --simulated-reads is set.
  • --overlap-max=<int>: The maximum number of bases that may overlap for two short reads to be combined into one simulated read. Can be negative to indicate a cap on how far two reads must be apart from one another. Must be greater than --overlap-min. Only applies when --simulated-reads is set.
  • --iterations=<int>: The number of simulated reads to create before calling haplotypes. Only applies when --simulated-reads is set.
  • --seed=<int>: The random seed used for picking short reads to create simulated reads when using maximum likelihood methods. Leaving unset will use the default Julia RNG and seed. See Julia's documentation on randomness for implementation details. Only applies when --simulated-reads is set.
source
diff --git a/dev/cli/sequences/index.html b/dev/cli/sequences/index.html index 3ee0af5..8adc3b8 100644 --- a/dev/cli/sequences/index.html +++ b/dev/cli/sequences/index.html @@ -1,2 +1,2 @@ -haplink sequences · HapLink.jl

haplink sequences

HapLink.sequencesFunction
haplink sequences [options] reference haplotypes

Convert haplotype calls into haplotype sequences

Introduction

Takes a YAML file output from haplink haplotypes and converts each called haplotype into its sequence and outputs those sequences in FASTA format. Useful for downstream processing by other tools, but loses all of the metadata and statistical details of each haplotype.

Arguments

  • reference: path to the reference genome to mutate haplotypes from. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single- segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.
  • haplotypes: path to the haplotype calls file that will be used to mutate the reference sequence to produce haplotype sequences. Must be an output from haplink haplotypes.

Options

  • --outfile=<path>: The file to write the haplotype sequences to. If left blank, the sequences are written to standard output.
  • --prefix=<string>: Name of the new sequences. Defaults to using the FASTA identifier of the reference sequence.
source
+haplink sequences · HapLink.jl

haplink sequences

HapLink.sequencesFunction
haplink sequences [options] reference haplotypes

Convert haplotype calls into haplotype sequences

Introduction

Takes a YAML file output from haplink haplotypes and converts each called haplotype into its sequence and outputs those sequences in FASTA format. Useful for downstream processing by other tools, but loses all of the metadata and statistical details of each haplotype.

Arguments

  • reference: path to the reference genome to mutate haplotypes from. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single- segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.
  • haplotypes: path to the haplotype calls file that will be used to mutate the reference sequence to produce haplotype sequences. Must be an output from haplink haplotypes.

Options

  • --outfile=<path>: The file to write the haplotype sequences to. If left blank, the sequences are written to standard output.
  • --prefix=<string>: Name of the new sequences. Defaults to using the FASTA identifier of the reference sequence.
source
diff --git a/dev/cli/variants/index.html b/dev/cli/variants/index.html index c568096..85f3eee 100644 --- a/dev/cli/variants/index.html +++ b/dev/cli/variants/index.html @@ -1,2 +1,2 @@ -haplink variants · HapLink.jl

haplink variants

HapLink.variantsFunction
haplink variants [options] reference bam

Call variants

Introduction

Decides which variations found within an alignment are real, and which are due to random chance. HapLink uses Fisher's Exact Test to determine the statistical significance of sequence variations, and optionally allows for other thresholds to reduce random noise in the variant calling. Outputs a Variant Call Format (VCF) file compliant with VCF v4.

Arguments

  • reference: path to the reference genome to call variants against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.
  • bam: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.

Options

  • --outfile=<path>: The file to write variant calls to. If left blank, variant calls are written to standard output.
  • --significance=<float>: The alpha value for statistical significance of variant calls.
  • --depth=<int>: Minimum number of times the variation must be observed within the alignment to be called a variant
  • --quality=<float>: The minimum average basecall quality score for a variation to be called a variant
  • --frequency=<float>: The minimum proportion of reads that the variation must be observed within compared to all reads covering its position for that variation to be called a variant
  • --position=<float>: The distance (as a percentage) from the edge of reads that a variation must be observed at to be called a variant
  • --strandedness=<float>: The maximum proportion of times that a variation can be observed on one strand versus the other to be called a variant. This metric is totally useless on single-stranded sequencing protocols like Oxford Nanopore, but can be useful for combining data between stranded protocols like most Illumina and Pacific Bio.
source
+haplink variants · HapLink.jl

haplink variants

HapLink.variantsFunction
haplink variants [options] reference bam

Call variants

Introduction

Decides which variations found within an alignment are real, and which are due to random chance. HapLink uses Fisher's Exact Test to determine the statistical significance of sequence variations, and optionally allows for other thresholds to reduce random noise in the variant calling. Outputs a Variant Call Format (VCF) file compliant with VCF v4.

Arguments

  • reference: path to the reference genome to call variants against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.
  • bam: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.

Options

  • --outfile=<path>: The file to write variant calls to. If left blank, variant calls are written to standard output.
  • --significance=<float>: The alpha value for statistical significance of variant calls.
  • --depth=<int>: Minimum number of times the variation must be observed within the alignment to be called a variant
  • --quality=<float>: The minimum average basecall quality score for a variation to be called a variant
  • --frequency=<float>: The minimum proportion of reads that the variation must be observed within compared to all reads covering its position for that variation to be called a variant
  • --position=<float>: The distance (as a percentage) from the edge of reads that a variation must be observed at to be called a variant
  • --strandedness=<float>: The maximum proportion of times that a variation can be observed on one strand versus the other to be called a variant. This metric is totally useless on single-stranded sequencing protocols like Oxford Nanopore, but can be useful for combining data between stranded protocols like most Illumina and Pacific Bio.
source
diff --git a/dev/index.html b/dev/index.html index b84a821..37c5181 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,5 +1,5 @@ -Home · HapLink.jl

HapLink

Welcome

Howdy! 🤠 And welcome to HapLink! 👋 HapLink is a command-line suite of tools to enable the exploration of viral quasispecies within a single metagenomic sample. Every piece eventually builds up to our viral haplotype caller, which uses linkage disequilibrium on long sequencing reads (💡 think Oxford Nanopore or PacBio HiFi) to identify genetic mutations that are conserved within a single virus particle.

This manual will cover the different ways of using HapLink, starting with a few tutorials before diving into the details of our reference section.

Contents

Getting started

Ready to dive in? 🤿 Here's a 30,000-foot view

curl -fsSL https://install.julialang.org | sh -s -- --yes
+Home · HapLink.jl

HapLink

Welcome

Howdy! 🤠 And welcome to HapLink! 👋 HapLink is a command-line suite of tools to enable the exploration of viral quasispecies within a single metagenomic sample. Every piece eventually builds up to our viral haplotype caller, which uses linkage disequilibrium on long sequencing reads (💡 think Oxford Nanopore or PacBio HiFi) to identify genetic mutations that are conserved within a single virus particle.

This manual will cover the different ways of using HapLink, starting with a few tutorials before diving into the details of our reference section.

Contents

Getting started

Ready to dive in? 🤿 Here's a 30,000-foot view

curl -fsSL https://install.julialang.org | sh -s -- --yes
 source ~/.bashrc
 julia \
   --startup-file=no \
@@ -38,4 +38,4 @@
 haplink sequences \
   reference.fasta \
   sample.yml \
-  | tee sample.fasta
+ | tee sample.fasta
diff --git a/dev/search_index.js b/dev/search_index.js index 1eaff75..a9d599a 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"api/haplotypecall/#HaplotypeCall","page":"HaplotypeCall","title":"HaplotypeCall","text":"","category":"section"},{"location":"api/haplotypecall/","page":"HaplotypeCall","title":"HaplotypeCall","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/haplotypecall/","page":"HaplotypeCall","title":"HaplotypeCall","text":"Modules = [HapLink]\nPages = [\"haplotypecalling.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/haplotypecall/#HapLink.HaplotypeCall","page":"HaplotypeCall","title":"HapLink.HaplotypeCall","text":"HaplotypeCall{S,T}\n\nA Haplotype that has had linkage disequilibrium calculations performed along with metadata to verify or refute its validity. Very similar to a VariationCall, but for Haplotypes.\n\nFields\n\ndepth::UInt64: How many sequencing reads contain all the Variations in haplotype\nlinkage::Float64: The unweighted linkage disequilibrium coefficient of this call\nfrequency::Float64: How often this haplotype occurs compared to the entire read pool\nsignificance::Float64: The χ^2 statistical significance (p-value) of this call\nhaplotype::Haplotype{S,T}: The set of Variations that are inherited together in this call\n\n\n\n\n\n","category":"type"},{"location":"api/haplotypecall/#HapLink.HaplotypeCall-Union{Tuple{T}, Tuple{S}, Tuple{SequenceVariation.Haplotype{S, T}, AbstractArray{SequenceVariation.Haplotype{S, T}}}} where {S<:BioSequences.BioSequence, T<:BioSymbols.BioSymbol}","page":"HaplotypeCall","title":"HapLink.HaplotypeCall","text":"HaplotypeCall(\n haplotype::Haplotype{S,T}, reads::AbstractArray{Haplotype{S,T}}\n) where {S<:BioSequence,T<:BioSymbol}\nHaplotypeCall(\n haplotype::AbstractArray{Variation{S,T}}, reads::AbstractArray{Haplotype{S,T}}\n) where {S<:BioSequence,T<:BioSymbol}\n\nCalls haplotypes by calculating the linkage disequilibrium of haplotype within reads\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.depth-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.depth","text":"depth(hc::HaplotypeCall)\n\nGets the number of times hc was found\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.frequency-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.frequency","text":"frequency(hc::HaplotypeCall)\n\nGets the relative number of times hc was found\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.haplotype-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.haplotype","text":"haplotype(hc::HaplotypeCall)\n\nGets the underlying Haplotype of hc\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.ishaplotype-Union{Tuple{T}, Tuple{S}, Tuple{AbstractArray{SequenceVariation.Variation{S, T}}, AbstractArray{SequenceVariation.Haplotype{S, T}}}} where {S<:BioSequences.BioSequence, T<:BioSymbols.BioSymbol}","page":"HaplotypeCall","title":"HapLink.ishaplotype","text":"function ishaplotype(\n haplotype::Union{AbstractArray{Variation{S,T}}, Haplotype{S,T}},\n reads::AbstractArray{Haplotype{S,T}};\n frequency::Union{Float64,Nothing}=nothing,\n significance::Union{Float64,Nothing}=nothing,\n depth::Union{Int,Nothing}=nothing,\n) where {S<:BioSequence,T<:BioSymbol}\n\nDetermines if a call of haplotype is supported by the sequences in reads based upon the provided keyword criteria.\n\nArguments\n\nhaplotype::Union{AbstractArray{Variation}, Haplotype}: A Vector of Variations or a Haplotype to search for as a haplotype\nreads::AbstractArray{Haplotype}: The reads to search for haplotype in\n\nKeywords\n\nfrequency::Union{Float64,Nothing}=nothing: The minimum number of times the entire haplotype must appear within reads compared to the number of reads to return true\nsignificance::Union{Float64,Nothing}=nothing: The χ^2 significance level (α) of linkage disequilibrium that a haplotype must surpass to return true\ndepth::Union{Int,Nothing}=nothing: The minimum number of times the entire haplotype must appear within reads to return true\n\nExtended help\n\nLinkage disequilibrium (Δ) is calculated by\n\nΔ = P_reference - prod_i P_refi\n\nwhere\n\nP_reference is the probability of finding a read that contains only reference (consensus) bases\nP_refi is the probability of finding a read that contains the reference (consensus) base for variant i within a haplotype\n\nand the χ^2 statistic is calculated by\n\nχ^2 = frac\n Δ^2 n\n \n \n left(prod_i^N P_refi left(1 - P_refiright)right)^frac2N\n \n\nwhere\n\nN is the number of Variations in haplotype (i.e., length(haplotype))\nn is the number of total reads sampled (i.e. length(reads))\n\nThe significance is then calculated from the cumulative χ^2 distribution function.\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.linkage-Tuple{AbstractArray{<:Integer}}","page":"HaplotypeCall","title":"HapLink.linkage","text":"linkage(counts::AbstractArray{<:Integer})\n\nCalculates the linkage disequilibrium of a haplotype given its N-dimensional contingency matrix, counts.\n\ncounts is an N-dimensional array where the Nth dimension represents the Nth variant call locus within a haplotype. findoccurrences produces such an array.\n\nExtended help\n\nlinkage(::AbstractArray{<:Integer}) calculates an unweighted linkage disequilibrium as given by Equation (6) of Slatkin (1972).\n\nD(1N) = Eleft( prod_k^N i_k - P_k right)\n\nwhere\n\nN is the number of variant loci\nD(1N) is the linkage disequilibrium of all N variant loci\nE is an operator returning the arithmetic mean of its argument over every read\ni_k is a function that returns 1 if the k-th locus of the given read contains the reference allele, and returns 0 otherwise.\nP_k is the probability of any read containing the reference allele in the k-th locus, i.e. the frequency at which the reference allele is found within the entire read set at the k-th locus\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.linkage-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.linkage","text":"linkage(hc::HaplotypeCall)\n\nGets the unweighted linkage disequilibrium coefficient of hc\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.occurrence_matrix-Union{Tuple{T}, Tuple{S}, Tuple{AbstractArray{SequenceVariation.Variation{S, T}}, AbstractArray{SequenceVariation.Haplotype{S, T}}}} where {S<:BioSequences.BioSequence, T<:BioSymbols.BioSymbol}","page":"HaplotypeCall","title":"HapLink.occurrence_matrix","text":"occurrence_matrix(\n haplotype::AbstractArray{Variation{S,T}},\n reads::AbstractArray{Haplotype{S,T}},\n) where {S<:BioSequence,T<:BioSymbol}\noccurrence_matrix(\n haplotype::Haplotype{S,T},\n reads::AbstractArray{S,T}\n) where {S<:BioSequence,T<:BioSymbol}\n\nDetermine how many times the variants in haplotype appear in reads as an N- dimensional matrix.\n\nArguments\n\nhaplotype::AbstractArray{Variation}: A Vector of Variations or a Haplotype to search for as a haplotype\nreads::AbstractArray{Haplotype}: The reads to search for haplotype in\n\nReturns\n\n2x2x... Array{Int, length(haplotype)}: An N-dimensional matrix where N is the number of variant positions in readmatches. The 1 index position in the nth dimension represents the number of times the nth variant position was found to have the reference base called, while the 2 index position represents the number of times the nth variant position was found to have the alternate base called. E.g. first(occurrence_matrix(reads)) gives the number of times the all-reference base haplotype was found in reads, while occurrence_matrix(reads)[end] gives the number of times the all-alternate base haplotype was found.\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.significance-Tuple{AbstractArray{<:Integer}}","page":"HaplotypeCall","title":"HapLink.significance","text":"significance(counts::AbstractArray{<:Integer})\n\nCalculates the χ^2 significance of a haplotype given its N-dimensional contingency matrix, counts\n\nSee the documentation on linkage(::AbstractArray{<:Integer}) or occurrence_matrix for details on how counts should be constructed.\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.significance-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.significance","text":"significance(hc::HaplotypeCall)\n\nGets the χ^2 statistical significance (p-value) of hc\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.sumsliced","page":"HaplotypeCall","title":"HapLink.sumsliced","text":"sumsliced(A::AbstractArray, dim::Int, pos::Int=1)\n\nSum all elements that are that can be referenced by pos in the dim dimension of A.\n\nExample\n\njulia> A = reshape(1:8, 2, 2, 2)\n2×2×2 reshape(::UnitRange{Int64}, 2, 2, 2) with eltype Int64:\n[:, :, 1] =\n 1 3\n 2 4\n\n[:, :, 2] =\n 5 7\n 6 8\n\njulia> sumsliced(A, 2)\n14\n\njulia> sumsliced(A, 2, 2)\n22\n\nHeavily inspired by Holy, Tim \"Multidimensional algorithms and iteration\" https://julialang.org/blog/2016/02/iteration/#filtering_along_a_specified_dimension_exploiting_multiple_indexes\n\n\n\n\n\n","category":"function"},{"location":"tutorial/2-examples/#cli-tutorial","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"At this point, we'll play with the example sequences included gratis 💰 with HapLink. No, they don't represent anything ☁️, and they aren't particularly interesting 🥱, but they do run fast 🏇, so we can get a handle on how the interface and workflow operate.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"Pages = [\"2-examples.md\"]","category":"page"},{"location":"tutorial/2-examples/#Getting-the-goods","page":"Kicking the tires (Fake sequences)","title":"Getting the goods","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"Let's get the example files from the code repository. In your terminal, run","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"wget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/reference.fasta\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/sample.bam\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/sample.bam.bai","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nreference.fasta\nsample.bam\nsample.bam.bai","category":"page"},{"location":"tutorial/2-examples/#Spot-the-difference","page":"Kicking the tires (Fake sequences)","title":"Spot the difference","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"In order for HapLink to call haplotypes, it needs to know which sequence differences are due to sequencing errors, and which are due to genetic mutation. This process is known as variant calling, and HapLink comes bundled with a variant caller for just this type of occasion, which requires the reference genome and the alignment BAM file. Since we have both of those, let's run variant calling now.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink variants reference.fasta sample.bam","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nNone","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"HapLink by default outputs to standard output, so the variant calls were printed on your screen instead of saved 😡. That's okay, though 😌. It's often good to visually check your variant calls, and it this case we absolutely needed to. Notice that none of the variants got a PASS filter. In fact, all of them were weeded out by too high of thresholds for depth (remember we only have 10 sequences) and significance. Let's readjust (and save our results this time).","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink \\\n variants \\\n reference.fasta \\\n sample.bam \\\n --depth 3 \\\n --significance 0.1 \\\n | tee sample.vcf","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.vcf","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"These settings seemed to work out well. Let's stick with them and move on.","category":"page"},{"location":"tutorial/2-examples/#The-general-lay-of-the-land","page":"Kicking the tires (Fake sequences)","title":"The general lay of the land","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"At this point, we're going to take a break from haplotype calling and convert those variant calls into a useful summary: the consensus sequence. HapLink can call the consensus sequence based solely off variant calls. Let's see that now.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink consensus reference.fasta sample.vcf | tee sample.consensus.fasta","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.consensus.fasta","category":"page"},{"location":"tutorial/2-examples/#The-star-attraction","page":"Kicking the tires (Fake sequences)","title":"The star attraction","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"And now it's time for haplotype calling. Before you get your hopes up, there are no true haplotypes in this file. If 10 reads could yield subconsenus mysteries, then bioinformatics would be a super easy job. Alas, we live in the real world, and we'll have to stretch mathematical constructs to get anything out of these reads.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink \\\n haplotypes \\\n reference.fasta \\\n sample.vcf \\\n sample.bam \\\n --consensus-frequency 0.75 \\\n | tee sample.yml","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.yml","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"You can see that HapLink found only one haplotype in this alignment, but (spoilers!) this isn't really a haplotype. This is just the consensus sequence, formatted in HapLink's haplotype scheme. The first haplotype in any output file is always the consensus sequence.","category":"page"},{"location":"tutorial/2-examples/#Haplotypes-in-the-Matrix","page":"Kicking the tires (Fake sequences)","title":"Haplotypes in the Matrix","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"If you have reads that don't span the entire genome (like we have here), you can use HapLink's maximum likelihood simulator to \"create\" full-length reads by splicing together reads and look for haplotypes on them. Even though we know there aren't any haplotypes in this sample, let's get out the simulator and give it a try.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink \\\n haplotypes \\\n reference.fasta \\\n sample.vcf \\\n sample.bam \\\n --consensus-frequency 0.75 \\\n --simulated-reads \\\n --iterations 100 \\\n --overlap-min 0 \\\n --overlap-max 100 \\\n | tee sample.ml.yml","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.ml.yml","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"Still nothing, huh? Like I said, no haplotypes here, and simulation can't change that. Note that simulating full-length reads used a lot more computational power, so you should try to stick with full-length reads when you can!","category":"page"},{"location":"tutorial/2-examples/#But,-what-does-it-mean?","page":"Kicking the tires (Fake sequences)","title":"But, what does it mean?","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"HapLink's haplotype YAML files contain everything needed to recreate the haplotype computation, but they can't really be used by any other programs. That's why there's the sequences command, so haplotype sequences can be saved into FASTA format for use by other tools. Let's try this now.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink sequences reference.fasta sample.yml > sample.fasta","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.fasta","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"The output only contains the consensus sequence. Since we knew that there were no haplotypes in this sample we could have used haplink consensus, instead to get the same result.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"You are now ready to move on to the next tutorial!","category":"page"},{"location":"api/psuedoread/#PseudoRead","page":"Pseudoread","title":"PseudoRead","text":"","category":"section"},{"location":"api/psuedoread/","page":"Pseudoread","title":"Pseudoread","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/psuedoread/","page":"Pseudoread","title":"Pseudoread","text":"Modules = [HapLink]\nPages = [\"pseudoread.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/psuedoread/#HapLink.Pseudoread","page":"Pseudoread","title":"HapLink.Pseudoread","text":"Pseudoread(startpos::Integer, endpos::Integer, read::Haplotype)\nPseudoread(query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq)\n\nA stand-in struct for a sequencing read (real or imaginary) represented by its position and differences relative to a reference sequence.\n\n\n\n\n\n","category":"type"},{"location":"api/psuedoread/#HapLink.contradicts-Union{Tuple{T}, Tuple{S}, Tuple{SequenceVariation.Variation{S, T}, SequenceVariation.Haplotype{S, T}}} where {S, T}","page":"Pseudoread","title":"HapLink.contradicts","text":"contradicts(var::Variation{S,T}, ref::Haplotype{S,T}) where {S,T}\n\nReturns true if var contains a modification incompatible with any of the Variations that make up ref.\n\nExample\n\njulia> using BioSequences, SequenceVariation\n\njulia> ref = Haplotype(dna\"GATTACA\", [Variation(dna\"GATTACA\", \"A2T\"), Variation(dna\"GATTACA\", \"Δ5-6\")]);\n\njulia> contradicts(Variation(dna\"GATTACA\", \"A2C\"), ref)\ntrue\n\njulia> contradicts(Variation(dna\"GATTACA\", \"T4A\"), ref)\nfalse\n\njulia> contradicts(Variation(dna\"GATTACA\", \"Δ2-2\"), ref)\ntrue\n\njulia> contradicts(Variation(dna\"GATTACA\", \"Δ4-4\"), ref)\nfalse\n\njulia> contradicts(Variation(dna\"GATTACA\", \"2G\"), ref)\nfalse\n\njulia> contradicts(Variation(dna\"GATTACA\", \"4G\"), ref)\nfalse\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.haplotype-Tuple{Pseudoread}","page":"Pseudoread","title":"HapLink.haplotype","text":"haplotype(ps::Pseudoread)\n\nGets the differences between ps and its reference sequence as a Haplotype\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.overlap_inrange-Tuple{Pseudoread, Pseudoread, Integer, Integer}","page":"Pseudoread","title":"HapLink.overlap_inrange","text":"overlap_inrange(\n x::Pseudoread, y::Pseudoread, overlap_min::Integer, overlap_max::Integer\n)\n\nReturns true if the overlap between x and y is within overlap_min and overlap_max, returns false otherwise\n\nExample\n\njulia> using BioSequences, SequenceVariation\n\njulia> ps1 = Pseudoread(1, 100, Haplotype(dna\"A\", Variation{LongDNA{4}, DNA}[]));\n\njulia> ps2 = Pseudoread(50, 150, Haplotype(dna\"A\", Variation{LongDNA{4}, DNA}[]));\n\njulia> overlap_inrange(ps1, ps2, 1, 500)\ntrue\n\njulia> overlap_inrange(ps1, ps2, 75, 500)\nfalse\n\njulia> overlap_inrange(ps1, ps2, 1, 25)\nfalse\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.overlapping_variations-Tuple{Pseudoread, SequenceVariation.Haplotype}","page":"Pseudoread","title":"HapLink.overlapping_variations","text":"overlapping_variations(ps::Pseudoread, v::Haplotype)\n\nFind all Variations within v that are contained within the range defined by ps\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.pseudoreads-Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}}","page":"Pseudoread","title":"HapLink.pseudoreads","text":"pseudoreads(sam::Union{AbstractString,AbstractPath}, consensus::NucleotideSeq)\n\nCreate a set of Pseudoreads from the alignments present in sam when aligned against consensus.\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.simulate-Union{Tuple{T}, Tuple{S}, Tuple{AbstractArray{Pseudoread}, BioSequences.BioSequence, AbstractArray{SequenceVariation.Variation{S, T}}}} where {S, T}","page":"Pseudoread","title":"HapLink.simulate","text":"simulate(\n pool::AbstractArray{Pseudoread},\n refseq::BioSequence,\n subcon::AbstractArray{Variation{S,T}};\n reverse_order::Union{Bool,Nothing}=nothing,\n overlap_min::Int64=0,\n overlap_max::Int64=500,\n) where {S,T} -> Union{Haplotype{S,T},Nothing}\n\nCreates a new set of Pseudoreads based on the pool of real reads spliced spliced together using maximum likelihood simulation.\n\nArguments\n\npool::AbstractArray{Pseudoread}: A set of (presumably) real sequencing reads that will serve as pieces of the new simulated reads.\nrefseq::BioSequence: The reference against which all of the sequences in pool and subcon were aligned\nsubcon::AbstractArray{Variation{S,T}}: The subconsensus Variations which are known to be both present in pool and real (i.e., not due to sequencing errors)\n\nOptions\n\nreverse_order::Union{Bool,Nothing}=nothing: Whether the splicing should start at the beginning or end of the reference sequence. If left blank, will randomly decide.\noverlap_min::Int64=0: The amount that reads from pool are required to overlap in order to be spliced together. Can be negative, indicating that reads must be spaced apart by this amount, instead. -overlap_max::Int64=500: The amount that reads from pool can overlap before being discarded. Like overlap_min, can be negative.\n\nReturns\n\nA Haplotype{S,T} representing the simulated read, or nothing if the simulation encountered a dead end.\n\nExtended help\n\nThe simulation procedure works by picking a read from pool that contains the first (if reverse_order is false, the last otherwise) variant from subcon. The procedure examines every position in subcon in ascending (or descending, if reverse_order == true) order until the chosen read no longer carries that position. The simulation will then, at random, pick a read from pool that contains the next position from subcon and that also has matching variations at every position in subcon as the previous read, as well as meets the overlap requirements. These two reads may differ in sites other than those in subcon: it is assumed that appropriate variant calling has already been performed and that any variation in those sites is due to sequencing error. The procedure repeats for each new read from pool until the simulation has simulated every position of subcon, or until it reaches a dead-end and cannot find a read that matches every requirement.\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.variations_match-Tuple{Pseudoread, SequenceVariation.Haplotype}","page":"Pseudoread","title":"HapLink.variations_match","text":"variations_match(reference::Pseudoread, query::Union{Pseudoread,Haplotype})\n\nReturns true if query could be a read from reference.\n\nAdditional Variations found within query that are not present in reference do not invalid a positive match result so long as none of those Variations contradicts reference. These Variations can either\n\nExtend beyond the length of reference, or\nExist as \"sequencing errors\" within the interval of reference\n\nOn the other hand, every Variation within the overlapping interval of reference and query that is present in reference must also be found in query. In other words, query must have all of reference (within the overlap), but reference does not need all of query.\n\nThis arrangment allows for the creation of bodies of matching Pseudoreads, as done via simulate.\n\n\n\n\n\n","category":"method"},{"location":"cli/sequences/#haplink-sequences","page":"haplink sequences","title":"haplink sequences","text":"","category":"section"},{"location":"cli/sequences/","page":"haplink sequences","title":"haplink sequences","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"cli/sequences/","page":"haplink sequences","title":"haplink sequences","text":"HapLink.sequences","category":"page"},{"location":"cli/sequences/#HapLink.sequences","page":"haplink sequences","title":"HapLink.sequences","text":"haplink sequences [options] reference haplotypes\n\nConvert haplotype calls into haplotype sequences\n\nIntroduction\n\nTakes a YAML file output from haplink haplotypes and converts each called haplotype into its sequence and outputs those sequences in FASTA format. Useful for downstream processing by other tools, but loses all of the metadata and statistical details of each haplotype.\n\nArguments\n\nreference: path to the reference genome to mutate haplotypes from. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single- segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.\nhaplotypes: path to the haplotype calls file that will be used to mutate the reference sequence to produce haplotype sequences. Must be an output from haplink haplotypes.\n\nOptions\n\n--outfile=: The file to write the haplotype sequences to. If left blank, the sequences are written to standard output.\n--prefix=: Name of the new sequences. Defaults to using the FASTA identifier of the reference sequence.\n\n\n\n\n\n","category":"function"},{"location":"api/variationcall/#VariationCall","page":"VariationCall","title":"VariationCall","text":"","category":"section"},{"location":"api/variationcall/","page":"VariationCall","title":"VariationCall","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/variationcall/","page":"VariationCall","title":"VariationCall","text":"Modules = [HapLink]\nPages = [\"variationcall.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/variationcall/#HapLink.VariationCall","page":"VariationCall","title":"HapLink.VariationCall","text":"VariationCall\n\nRepresents a Variation that is supported by aligned reads with sufficient metadata to support or refute its validity. It is designed to be converted into a line in Variant Call Format, or a VCF.Record.\n\nFields\n\nvariation::Variation: The Variation of this call\nquality::Union{Nothing,Number}: Phred quality of the basecall for variation\nfilter::Vector{String}: Indicator if variation has passed filters and is actually a variant call, or a list of criteria that have failed it\ndepth::Union{Nothing,UInt}: The number of reads that cover leftposition(variation)\nstrandbias::Union{Nothing,Float64}: The fraction of times variation appears on a positive strand\naltdepth::Union{Nothing,UInt}: The number of types variation occurs\nreadpos::Union{Nothing,UInt}: The averagerelative position of variation in each read\npvalue::Union{Nothing,Float64}: The Fisher's Exact Test p-value of this call\n\n\n\n\n\n","category":"type"},{"location":"api/variationcall/#HapLink.altdepth-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.altdepth","text":"altdepth(vc::VariationCall) -> Union{Nothing,UInt}\n\nGets the number of times vc appears. Returns nothing if unknown.\n\nSee also altdepth(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.call_variant-Tuple{VariationPileup, Float64}","page":"VariationCall","title":"HapLink.call_variant","text":"call_variant(\n pileup::VariationPileup,\n α::Float64;\n D::Union{Nothing,Int}=nothing,\n Q::Union{Nothing,Float64}=nothing,\n X::Union{Nothing,Float64}=nothing,\n F::Union{Nothing,Float64}=nothing,\n S::Union{Nothing,Float64}=nothing,\n) -> VariationCall\n\nCalls variant from a pileup.\n\nArguments\n\npileup::VariationPileup: The pileup to call variants from\nα::Float64: Fisher's Exact Test significance (α) level to call variants\n\nKeywords\n\nnote: Note\nLeave any keyword undefined to skip filtering based on that field\n\nD::Union{Nothing,Int}=nothing: Minimum total read depth for a variant to be called\nQ::Union{Nothing,Float64}=nothing: Minimum average Phred quality for a variant to be called\nX::Union{Nothing,Float64}=nothing: Maximum average distance variant can be from read edge to be called\nF::Union{Nothing,Float64}=nothing: Minimum alternate frequency for a variant to be called\nS::Union{Nothing,Float64}=nothing: Maximum proportion of the number of times variant can appear on one strand versus the other\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.depth-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.depth","text":"depth(vc::VariationCall) -> Union{Nothing,UInt}\n\nGets the number of times the position of vc appears total. Returns nothing if unknown.\n\nSee also depth(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.filters-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.filters","text":"filters(vc::VariationCall) -> Vector{String}\n\nGets all filters that have been applied to vc. Note that an empty FILTER entry is not permitted under the VCF spec, and an empty array should not automatically be considered to have PASSed all filters.\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.frequency-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.frequency","text":"frequency(vc::VariationCall) -> Float64\n\nGets the alternate allele frequency of vc\n\nSee also frequency(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.p_value-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.p_value","text":"p_value(vc::VariationCall) -> Union{Nothing,Float64}\n\nGets the p-value of the observed statistic of vc. Returns nothing if unknown.\n\nSee also variation_test\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.quality-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.quality","text":"quality(vc::VariationCall) -> Union{Nothing,Float64}\n\nGets the average phred quality score of vc, if known. Returns nothing if unknown.\n\nSee also quality(::VariationInfo), quality(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.readpos-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.readpos","text":"readpos(vc::VariationCall) -> Union{Nothing,Float64}\n\nGets the average relative position of vc. Returns nothing if unknown.\n\nSee also readpos(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.strand_bias-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.strand_bias","text":"strand_bias(vc::VariationCall) -> Union{Nothing,Float64}\n\nGets the fraction of times vc appears on the positive strand. Returns nothing if unknown.\n\nSee also strand(::VariationInfo), strand(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.variation-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.variation","text":"variation(vc::VariationCall) -> SequenceVariation.Variation\n\nGets the Variation of a VariationCall\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.variation_test-Tuple{Int64, Int64, Float64}","page":"VariationCall","title":"HapLink.variation_test","text":"variation_test(depth::Int, altdepth::Int, quality::Float64)\n\nConducts a Fisher's Exact Test to deterimine the likelihood of a variant with total depth and variation depth altdepth occuring, given an average basecall quality. Returns the p-value of the test.\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.vcf-Tuple{VariationCall, AbstractString}","page":"VariationCall","title":"HapLink.vcf","text":"vcf(vc::VariationCall, refname::AbstractString) -> VCF.Record\n\nConverts vc into a VCF.Record. refname is required and used as the CHROM field in the record.\n\n\n\n\n\n","category":"method"},{"location":"api/haplotype/#Haplotype-Methods","page":"Haplotype Extensions","title":"Haplotype Methods","text":"","category":"section"},{"location":"api/haplotype/","page":"Haplotype Extensions","title":"Haplotype Extensions","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/haplotype/","page":"Haplotype Extensions","title":"Haplotype Extensions","text":"These methods extend the functionality of SequenceVariation.Haplotype for better manipulation and calling.","category":"page"},{"location":"api/haplotype/","page":"Haplotype Extensions","title":"Haplotype Extensions","text":"Modules = [HapLink]\nPages = [\"haplotype.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/haplotype/#HapLink.cigar-Union{Tuple{SequenceVariation.Haplotype{S, T}}, Tuple{T}, Tuple{S}} where {S, T}","page":"Haplotype Extensions","title":"HapLink.cigar","text":"cigar(hap::Haplotype{S,T}) where {S,T}\n\nConstructs a CIGAR string representing the alignment of the sequence of hap to its reference.\n\n\n\n\n\n","category":"method"},{"location":"cli/consensus/#haplink-consensus","page":"haplink consensus","title":"haplink consensus","text":"","category":"section"},{"location":"cli/consensus/","page":"haplink consensus","title":"haplink consensus","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"cli/consensus/","page":"haplink consensus","title":"haplink consensus","text":"HapLink.consensus","category":"page"},{"location":"cli/consensus/#HapLink.consensus","page":"haplink consensus","title":"HapLink.consensus","text":"haplink consensus [options] reference variants\n\nConvert variant calls to consensus sequence\n\nIntroduction\n\nGenerates a consensus sequence based on a reference genome and previously called variants. Will only consider variants with a \"PASS\" filter to be able to contribute to the consensus. Outputs results in FASTA format.\n\nArguments\n\nreference: The path to the reference genome. Must be in FASTA format.\nvariants: The path to the variant calls. Must be in VCF v4 format.\n\nOptions\n\n--outfile=: The file to write the consensus sequence to. If left blank, the consensus sequence is written to standard output.\n--frequency=: The minimum frequency at which a variant must appear to be considered part of the consensus. Note that HapLink does not support ambigous base calling (e.g. N, R, Y, etc.) at low frequencies unlike many variant callers.\n--prefix=: Name of the new sequence. Defaults to using the FASTA identifier of the reference sequence.\n\n\n\n\n\n","category":"function"},{"location":"cli/haplotypes/#haplink-haplotypes","page":"haplink haplotypes","title":"haplink haplotypes","text":"","category":"section"},{"location":"cli/haplotypes/","page":"haplink haplotypes","title":"haplink haplotypes","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"cli/haplotypes/","page":"haplink haplotypes","title":"haplink haplotypes","text":"HapLink.haplotypes","category":"page"},{"location":"cli/haplotypes/#HapLink.haplotypes","page":"haplink haplotypes","title":"HapLink.haplotypes","text":"haplink haplotypes [options] reference variants bam\n\nCall haplotypes\n\nIntroduction\n\nCalls haplotypes based on the linkage disequilibrium between subconsensus variant sites on long reads. Variant sites are chosen based on having a \"PASS\" filter in the variants file, and linkage is calculated based on the reads present in the bam file. Note this means that haplotypes can be called on a different set of sequences than variants were (e.g. variant calling using high accuracy short-read chemistry like Illumina and haplotype calling using low accuracy long-read chemistry like Oxford Nanopore). There are no guarantees that the variants file and bam file match, so use this feature with caution!\n\nArguments\n\nreference: path to the reference genome to call haplotypes against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.\nvariants: path to the variants file that will define variant sites to call haplotypes from. Must be in VCF (not BCF) v4 format. haplink variants generates a compatible file, although output from other tools can also be used.\nbam: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.\n\nFlags\n\n--simulated-reads: Use maximum likelihood simulation of long reads based on overlapping short reads\n\nOptions\n\n--outfile=: The file to write haplotype calls to. If left blank, haplotype calls are written to standard output.\n--consensus-frequency=: The minimum frequency at which a variant must appear to be considered part of the consensus.\n--significance=: The alpha value for statistical significance of haplotype calls.\n--depth=: Minimum number of times a variant combination must be observed within the set of reads to be called a haplotype\n--frequency=: The minimum proportion of reads that a variant combination must be observed within compared to all reads covering its position for that haplotype to be called\n--overlap-min=: The minimum number of bases that must overlap for two short reads to be combined into one simulated read. Can be negative to indicate a minimum distance between reads. Only applies when --simulated-reads is set.\n--overlap-max=: The maximum number of bases that may overlap for two short reads to be combined into one simulated read. Can be negative to indicate a cap on how far two reads must be apart from one another. Must be greater than --overlap-min. Only applies when --simulated-reads is set.\n--iterations=: The number of simulated reads to create before calling haplotypes. Only applies when --simulated-reads is set.\n--seed=: The random seed used for picking short reads to create simulated reads when using maximum likelihood methods. Leaving unset will use the default Julia RNG and seed. See Julia's documentation on randomness for implementation details. Only applies when --simulated-reads is set.\n\n\n\n\n\n","category":"function"},{"location":"api/consensus/#Consensus-Tools","page":"Consensus Tools","title":"Consensus Tools","text":"","category":"section"},{"location":"api/consensus/","page":"Consensus Tools","title":"Consensus Tools","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/consensus/","page":"Consensus Tools","title":"Consensus Tools","text":"Modules = [HapLink]\nPages = [\"consensus.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/consensus/#HapLink.consensus_haplotype-Tuple{BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}, Union{AbstractString, FilePathsBase.AbstractPath}}","page":"Consensus Tools","title":"HapLink.consensus_haplotype","text":"consensus_haplotype(\n reference::NucleotideSeq,\n variants::Union{AbstractString,AbstractPath};\n frequency::Float64=0.5,\n)\n\nGet the consensus Haplotypefrom variants applied to reference.\n\nArguments\n\nreference::NucleotideSeq: Sequence of the reference genome that variants were called from\nvariants::Union{AbstractString,AbstractPath}: Path to variant call file that mutations will be applied from\n\nKeywords\n\nfrequency::Float64=0.5: Fraction of total reads that must have supported the alternate position in order to be included as part of the consensus. In other words, only VCF records that have a AF (allele/alternate frequency) higher than this will be considered to contribute to the consensus.\n\n\n\n\n\n","category":"method"},{"location":"api/consensus/#HapLink.consensus_record-Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, Union{AbstractString, FilePathsBase.AbstractPath}}","page":"Consensus Tools","title":"HapLink.consensus_record","text":"consensus_record(\n reference::Union{AbstractString,AbstractPath},\n variants::Union{AbstractString,AbstractPath};\n frequency::Float64=0.5,\n prefix::Union{AbstractString,Nothing}=nothing,\n)\n\nGet the consensus FASTA.Recordfrom variants applied to the first sequence in reference.\n\nArguments\n\nreference::Union{AbstractString,AbstractPath}: Path to the reference genome that variants were called from. Only the first sequence will be used.\nvariants::Union{AbstractString,AbstractPath}: Path to variant call file that mutations will be applied from\n\nKeywords\n\nfrequency::Float64=0.5: Fraction of total reads that must have supported the alternate position in order to be included as part of the consensus. In other words, only VCF records that have a AF (allele/alternate frequency) higher than this will be considered to contribute to the consensus.\nprefix::Union{AbstractString,Nothing}=nothing: Name to give to the output record. By default, the name of the output record will be the same as the name of the input record with _CONSENSUS appended. If prefix is supplied, then the name of the output record will be $(prefix)_CONSENSUS.\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#VariationInfo","page":"VariationInfo","title":"VariationInfo","text":"","category":"section"},{"location":"api/variationinfo/","page":"VariationInfo","title":"VariationInfo","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/variationinfo/","page":"VariationInfo","title":"VariationInfo","text":"Modules = [HapLink]\nPages = [\"variationinfo.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/variationinfo/#HapLink.VariationInfo","page":"VariationInfo","title":"HapLink.VariationInfo","text":"VariationInfo{S<:BioSequence,T<:BioSymbol}\n\nRepresents statistics associated with a SequenceVariation.Variation within an aligned sequencing read.\n\nFields\n\nvariation::Variation{S,T}: The Variation this object represents\nreadpos::Float64: The location where variation occurs within a sequencing read\nquality::Float64: The phred-scaled basecall quality of variation\nStrand::GenomicFeatures.Strand: Which strand variation appears on\n\n\n\n\n\n","category":"type"},{"location":"api/variationinfo/#HapLink.quality-Tuple{VariationInfo}","page":"VariationInfo","title":"HapLink.quality","text":"quality(vi::VariationInfo) -> Float64\n\nGets the phred-scaled basecall quality of variation(vi) within a sequencing read\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#HapLink.readpos-Tuple{VariationInfo}","page":"VariationInfo","title":"HapLink.readpos","text":"readpos(vi::VariationInfo) -> Float64\n\nGets the position of variation(vi) within a sequencing read\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#HapLink.strand-Tuple{VariationInfo}","page":"VariationInfo","title":"HapLink.strand","text":"strand(vi::VariationInfo) -> GenomicFeatures.Strand\n\nGets the strand that variation(vi) appears on\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#HapLink.variation-Tuple{VariationInfo}","page":"VariationInfo","title":"HapLink.variation","text":"variation(vi::VariationInfo) -> SequenceVariation.Variation\n\nGets the Variation of a VariationInfo\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#HapLink.variationinfos-Tuple{Union{XAM.BAM.Record, XAM.SAM.Record}, BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}}","page":"VariationInfo","title":"HapLink.variationinfos","text":"variationinfos(\n query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq\n) -> Vector{VariationInfo}\nvariationinfos(\n query::Union{AbstractString,AbstractPath},\n reference::Union{AbstractString,AbstractPath}\n) -> Vector{VariationInfo}\n\nCalls Variations based on the alignments in query against reference, and returns every variation call found within query as a Vector{VariationInfo}\n\n\n\n\n\n","category":"method"},{"location":"tutorial/1-install/#install-tutorial","page":"In the beginning (Installation)","title":"In the beginning","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"There are many different ways to install HapLink. Here we walk you through two of the most common. If you're one of the 0.01% who needs a different method, then we trust you can extrapolate from these instructions. Note that all of these tutorials assume you have a Unix-type system (MacOS, BSD, Linux). Windows command-line support is basically non-existant!","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Pages = [\"1-install.md\"]","category":"page"},{"location":"tutorial/1-install/#Bioconda","page":"In the beginning (Installation)","title":"Bioconda","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"We understand: every bioinformatian is addicted to miniconda. 🐍 And we're not here to judge. 👩‍⚖️ It's easy and portable and is bundled on most HPCs. If you already have conda (or mamba), then this route is probably for you.","category":"page"},{"location":"tutorial/1-install/#Install-HapLink-inside-a-conda-environment","page":"In the beginning (Installation)","title":"Install HapLink inside a conda environment","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"We'll make a new environment with the totally original name \"haplink,\" to house the new haplink install and add it directly.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"conda create -n haplink -c bioconda -c conda-forge haplink -y","category":"page"},{"location":"tutorial/1-install/#Activate-the-environment","page":"In the beginning (Installation)","title":"Activate the environment","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Now that we've made the environment, let's use it!","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"conda activate haplink","category":"page"},{"location":"tutorial/1-install/#Test-for-errors","page":"In the beginning (Installation)","title":"Test for errors","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Next, cross your fingers 🤞 and run the following command:","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"haplink --help","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Check for error messages, but otherwise you're done. You can reuse your haplink environment for the next tutorial.","category":"page"},{"location":"tutorial/1-install/#Comonicon","page":"In the beginning (Installation)","title":"Comonicon","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"HapLink is unashamedly a Julia program. If you already have Julia installed, then you can leverage that existing Julia install to install HapLink thanks to the power of Comonicon.jl.","category":"page"},{"location":"tutorial/1-install/#Check-your-Julia-version","page":"In the beginning (Installation)","title":"Check your Julia version","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"HapLink requires Julia v1.6 or later to run. No exceptions, Kemosabe. Run a check now to see if you have a high enough version.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"julia --version","category":"page"},{"location":"tutorial/1-install/#Add-HapLink-to-a-temporary-environment-and-install","page":"In the beginning (Installation)","title":"Add HapLink to a temporary environment and install","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Using the magic 🪄 of Julia's environments, we can do a \"temp install\" of the HapLink package to a temporary directory environment. Because this is a fresh install, though, it will trigger Comonicon to install the application to a brand-new isolated environment.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"julia \\\n --startup-file=no \\\n --history-file=no \\\n --quiet \\\n -e 'using Pkg; Pkg.activate(;temp=true); Pkg.add(HapLink)'","category":"page"},{"location":"tutorial/1-install/#Add-HapLink-to-PATH","page":"In the beginning (Installation)","title":"Add HapLink to PATH","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Comonicon will install HapLink, but chances are it won't be accessible yet. We need to add HapLink to the PATH, first.","category":"page"},{"location":"tutorial/1-install/#Bash","page":"In the beginning (Installation)","title":"Bash","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"echo 'export PATH=$HOME/.julia/bin:$PATH' >> $HOME/.bashrc\n. ~/.bashrc","category":"page"},{"location":"tutorial/1-install/#Zsh","page":"In the beginning (Installation)","title":"Zsh","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"echo 'export PATH=$HOME/.julia/bin:$PATH' >> $HOME/.zshrc\n. ~/.zshrc","category":"page"},{"location":"tutorial/1-install/#Testing","page":"In the beginning (Installation)","title":"Testing","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Now, run the haplink command.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"haplink --help","category":"page"},{"location":"tutorial/1-install/#Wait,-the-test-failed!","page":"In the beginning (Installation)","title":"Wait, the test failed!","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Odds are good that you already had an install of HapLink somewhere in your Julia depot. If that happens, then you'll have to trigger the install manually. Just run","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"julia \\\n --startup-file=no \\\n --history-file=no \\\n --quiet \\\n -e 'using Pkg; Pkg.activate(;temp=true); Pkg.add(HapLink); using HapLink; HapLink.comonicon_install()'","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"then return to the Add HapLink to PATH step.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Success! We now have a working installation of HapLink. You are now ready to move on to the next tutorial.","category":"page"},{"location":"api/variation/#Variation-Methods","page":"Variation Extensions","title":"Variation Methods","text":"","category":"section"},{"location":"api/variation/","page":"Variation Extensions","title":"Variation Extensions","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/variation/","page":"Variation Extensions","title":"Variation Extensions","text":"These methods extend the functionality of SequenceVariation.Variation for calculation of data related to Variations created from NGS read alignments.","category":"page"},{"location":"api/variation/","page":"Variation Extensions","title":"Variation Extensions","text":"Modules = [HapLink]\nPages = [\"variation.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/variation/#HapLink.quality-Tuple{SequenceVariation.Variation, Union{XAM.BAM.Record, XAM.SAM.Record}}","page":"Variation Extensions","title":"HapLink.quality","text":"quality(v::Variation, r::Union{SAM.Record,BAM.Record}) -> Float64\n\nGet the phred-scalled basecall quality of v within the sequencing read of r.\n\n\n\n\n\n","category":"method"},{"location":"api/variation/#HapLink.relativepos-Tuple{SequenceVariation.Variation, Union{XAM.BAM.Record, XAM.SAM.Record}}","page":"Variation Extensions","title":"HapLink.relativepos","text":"relativepos(v::Variation, r::Union{SAM.Record,BAM.Record})\n\nCalculates the fractional position of v within the sequence of r. If v is out-of-bounds of r, then will return 0 for positions before r and 1 for positions after r.\n\n\n\n\n\n","category":"method"},{"location":"api/variation/#HapLink.seqpos-Tuple{SequenceVariation.Variation, Union{BioAlignments.Alignment, BioAlignments.AlignedSequence, BioAlignments.PairwiseAlignment}}","page":"Variation Extensions","title":"HapLink.seqpos","text":"seqpos(v::Variation, a::Union{Alignment,AlignedSequence,PairwiseAlignment})\n\nGet the position of v in the sequence of a.\n\nExample\n\nusing BioAlignments, BioSequences, SequenceVariation\nv = Variation(dna\"AAAAA\", \"A3T\")\na = Alignment(\"2=1X2=\", 1, 1)\nseqpos(v, a)\n\n# output\n\n3\n\n\n\n\n\n","category":"method"},{"location":"api/variation/#HapLink.subconsensus_variations-Union{Tuple{T}, Tuple{S}, Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, SequenceVariation.Haplotype{S, T}}} where {S, T}","page":"Variation Extensions","title":"HapLink.subconsensus_variations","text":"subconsensus_variations(vcf::Union{AbstractPath,AbstractString}, consensus::Haplotype)\n\nGet a Vector{Variation} with passing variant calls from vcf that do not appear in consensus\n\n\n\n\n\n","category":"method"},{"location":"api/variation/#HapLink.variation-Tuple{VariantCallFormat.Record, BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}}","page":"Variation Extensions","title":"HapLink.variation","text":"variation(r::VCF.Record, refseq::NucleotideSeq)\n\nConstruct a Variation from r applying to refseq. There is no validation that r's actually describes a mutation in refseq.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#Private-API-Reference","page":"Private API Reference","title":"Private API Reference","text":"","category":"section"},{"location":"api/private/","page":"Private API Reference","title":"Private API Reference","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/private/","page":"Private API Reference","title":"Private API Reference","text":"Modules = [HapLink]\nPrivate = true\nPublic = false\nFilter = f -> f ∉ [\n HapLink.variants,\n HapLink.consensus,\n HapLink.haplotypes,\n HapLink.sequences,\n]","category":"page"},{"location":"api/private/#FASTX.FASTA.Record-Tuple{Pseudoread}","page":"Private API Reference","title":"FASTX.FASTA.Record","text":"FASTX.FASTA.Record(ps::Pseudoread; prefix::AbstractString=\"\", is_consensus::Bool=false)\n\nSpecialized constructor for outputting Pseudoreads to FASTA.Records that can be written to files.\n\nArguments\n\nps::Pseudoread: The modified sequence and read window to be converted into the sequence of the new record\n\nKeywords\n\nprefix::AbstractString=\"\": A string to start the sequence identifier with, usually based on the reference sequence of ps\nis_consensus::Bool=false: Normally, the new identifier of the record is a combination of the prefix and the SHA1 hash of the alternate sequence. Set is_consensus to true to instead use the word \"CONSENSUS\" in place of the hash\n\n\n\n\n\n","category":"method"},{"location":"api/private/#SequenceVariation.Haplotype-Tuple{Union{XAM.BAM.Record, XAM.SAM.Record}, BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}}","page":"Private API Reference","title":"SequenceVariation.Haplotype","text":"SequenceVariation.Haplotype(\n query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq\n) -> SequenceVariation.Haplotype\n\nSpecialized constructor that allows converting the alignment in a XAM.Record into a Haplotype.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._cigar-Union{Tuple{SequenceVariation.Variation{S, T}}, Tuple{T}, Tuple{S}} where {S, T}","page":"Private API Reference","title":"HapLink._cigar","text":"_cigar(var::Variation{S,T}) where {S,T}\n\nReturns a CIGAR operation for var. Only supports insertions and deletions.\n\nSee also _cigar_between\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._cigar_between-Union{Tuple{T}, Tuple{S}, Tuple{SequenceVariation.Variation{S, T}, SequenceVariation.Variation{S, T}}} where {S, T}","page":"Private API Reference","title":"HapLink._cigar_between","text":"_cigar_between(x::Variation{S,T}, y::Variation{S,T}) where {S,T}\n\nReturns a CIGAR operation for the (assumed) matching bases between x and y.\n\nSee also _cigar\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._indexed_reader-Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, GenomicFeatures.Interval}","page":"Private API Reference","title":"HapLink._indexed_reader","text":"_indexed_reader(bam::Union{AbstractPath,AbstractString}, int::Interval)\n\nProvides an iterator over records in bam that overlap with int. Note that if no index file is available for bam, then the return value will fall back to an iterator of all records in bam.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._lendiff-Tuple{Pseudoread}","page":"Private API Reference","title":"HapLink._lendiff","text":"_lendiff(ps::Pseudoread)\n\nGets the difference between the number of bases within the read window of ps on the reference sequence from the number of bases within the read window on the mutated sequence, taking into account insertions and deletions.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._phrederror-Tuple{Number}","page":"Private API Reference","title":"HapLink._phrederror","text":"_phrederror(quality::Number)\n\nConverts a PHRED33-scaled error number into the expected fractional error of basecall\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._pos_to_edge-Tuple{Number}","page":"Private API Reference","title":"HapLink._pos_to_edge","text":"_pos_to_edge(pos::Number)\n\nConverts pos from a number between 0 (beginning) and 1 (end) to a number ranging from 0 (beginning) to 1 (middle) to 0 (end), i.e. convert a relative position to a relative distance from the edge.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._posin-Tuple{Pseudoread, SequenceVariation.Variation}","page":"Private API Reference","title":"HapLink._posin","text":"_posin(ps::Pseudoread, v::Variation)\n\nDetermines if the positions that make up v are wholly contained by ps\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._push_filter!","page":"Private API Reference","title":"HapLink._push_filter!","text":"_push_filter!(\n vc::VariationCall,\n label::Char,\n value::Union{Nothing,Number},\n filter::Function=(var, val) -> true,\n)\n\nAdds a FILTER entry to vc of the form \"$label$value\" if filter returns true.\n\nArguments\n\nvc::VariationCall: The VariationCall to annotate\nlabel::Char: The first character of the filter text to add\nvalue::Union{Nothing,Number}: The value to compare vc against, and to append to the filter text. If set to nothing, _push_filter! will return without evaluating or adding any filters, which may be useful for processing multiple inputs\nfilter::Function=(var, val) -> true: A function handle to determine if a filter should be applied or not. Note that this function must return true if and only if the filter should be added to vc. The function will be passed vc and value. Defaults to always applying filters regardless of the values of vc and value.\n\n\n\n\n\n","category":"function"},{"location":"api/private/#HapLink.comonicon_install-Tuple{}","page":"Private API Reference","title":"HapLink.comonicon_install","text":"comonicon_install(;kwargs...)\n\nInstall the CLI manually. This will use the default configuration in Comonicon.toml, if it exists. For more detailed reference, please refer to Comonicon documentation.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink.comonicon_install_path-Tuple{}","page":"Private API Reference","title":"HapLink.comonicon_install_path","text":"comonicon_install_path(;[yes=false])\n\nInstall the PATH and FPATH to your shell configuration file. You can use comonicon_install_path(;yes=true) to skip interactive prompt. For more detailed reference, please refer to Comonicon documentation.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink.magnitude-Tuple{UnitRange}","page":"Private API Reference","title":"HapLink.magnitude","text":"magnitude(x::UnitRange)\n\nGets the difference between the last and first elements of x\n\nExample\n\njulia> using HapLink: magnitude\n\njulia> magnitude(0:10)\n10\n\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink.overlap-Union{Tuple{S}, Tuple{UnitRange{S}, UnitRange{S}}} where S","page":"Private API Reference","title":"HapLink.overlap","text":"overlap(x::UnitRange{S}, y::UnitRange{S}) where {S}\n\nFinds the inclusive overlap interval of x and y\n\nExample\n\njulia> using HapLink: overlap\n\njulia> overlap(1:5, 3:10)\n3:5\n\njulia> overlap(2:4, 6:8)\n6:5\n\njulia> overlap(1:10, 2:9)\n2:9\n\n\n\n\n\n","category":"method"},{"location":"api/private/#SequenceVariation.variations-Tuple{AbstractVector{<:SequenceVariation.Haplotype}}","page":"Private API Reference","title":"SequenceVariation.variations","text":"variations(vs::AbstractVector{Haplotype})\n\nExtracts all SequenceVariation.Variations from vs.\n\n\n\n\n\n","category":"method"},{"location":"api/private/","page":"Private API Reference","title":"Private API Reference","text":"findset","category":"page"},{"location":"api/private/#HapLink.findset","page":"Private API Reference","title":"HapLink.findset","text":"findset(lst::AbstractArray{T}, comparator::Function) where {T}\n\nFinds every possible set of items in lst where comparator returns true for the set.\n\nExample\n\nfunction divisible_by_same_number(x, i)\n for j in 2:i\n if all(y -> y % j == 0, x)\n return true\n end\n end\n return false\nend\nfindset(1:12, x -> divisible_by_same_number(x, 5))\n\n# output\n6-element Vector{Vector{Int64}}:\n [2, 4, 6, 8, 10, 12]\n [6, 3, 9, 12]\n [5, 10]\n [1]\n [7]\n [11]\n\n\n\n\n\n","category":"function"},{"location":"api/variationpileup/#VariationPileup","page":"VariationPileup","title":"VariationPileup","text":"","category":"section"},{"location":"api/variationpileup/","page":"VariationPileup","title":"VariationPileup","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/variationpileup/","page":"VariationPileup","title":"VariationPileup","text":"Modules = [HapLink]\nPages = [\"variationpileup.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/variationpileup/#HapLink.VariationPileup","page":"VariationPileup","title":"HapLink.VariationPileup","text":"VariationPileup\n\nSummarizes the basecalls that support a Variation within a set of alignments.\n\nFields\n\nvariation::Variation: The Variation of this pileup entry\ndepth::UInt: The number of times the position of variation appears in this set of alignments\nreadpos::Vector{Float64}: The relative positions of variation within each read\nquality::Vector{Float64}: The phred quality of variation within each read\nstrand::Vector{Strand}: Which strand each variation is found on\n\n\n\n\n\n","category":"type"},{"location":"api/variationpileup/#HapLink.altdepth-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.altdepth","text":"altdepth(vp::VariationPileup)\n\nGets the number of times vp.variation appears\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.depth-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.depth","text":"depth(vp::VariationPileup)\n\nGets the number of times the position of vp.variation appears total (variant and wild-type)\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.frequency-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.frequency","text":"frequency(vp::VariationPileup)\n\nGets the alternate allele frequency of vp.variation\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.pileup-Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, Union{AbstractString, FilePathsBase.AbstractPath}}","page":"VariationPileup","title":"HapLink.pileup","text":"pileup(sam::AbstractPath, ref::AbstractPath) -> Vector{VariationPileup}\n\nGenerates a pileup of Variations based on the alignments in sam aligned against ref.\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.quality-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.quality","text":"quality(vp::VariationPileup)\n\nGets the phred qualities of vp.variation\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.readpos-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.readpos","text":"readpos(vp::VariationPileup)\n\nGets the relative positions of vp.variation\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.strand-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.strand","text":"strand(vp::VariationPileup)\n\nGets the strands of vp.variation\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.strand_bias-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.strand_bias","text":"strand_bias(vp::VariationPileup)\n\nGets the frequency of positive strands that variation appears on relative to all variation reads\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.variation-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.variation","text":"variation(vp::VariationPileup)\n\nGets the Variation of vp\n\n\n\n\n\n","category":"method"},{"location":"tutorial/3-other/#integration-tutorial","page":"Plays well with others (External tools)","title":"Playing well with others","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"HapLink is not a one-man show: it definitely knows how to cooperate with other tools! In this tutorial, we'll let HapLink do the haplotype calling, but use other tools to go from reads to variant calls, and from haplotypes to phylogenies.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Pages = [\"3-other.md\"]","category":"page"},{"location":"tutorial/3-other/#Install-the-other-tools","page":"Plays well with others (External tools)","title":"Install the other tools","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Before we get too far, let's make sure that we actually have all of the tools that we need. Let's create a brand new conda environment.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"conda create \\\n -n haplink-tutorial-3 \\\n -c bioconda \\\n -c conda-forge \\\n wget=1.20 \\\n sra-tools=3.0 \\\n entrez-direct=16.2 \\\n samtools=1.17 \\\n minimap2=2.26 \\\n lofreq=2.1 \\\n haplink \\\n mafft=7.520 \\\n raxml-ng=1.2 \\\n -y\nconda activate haplink-tutorial-3","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nNone","category":"page"},{"location":"tutorial/3-other/#Get-sample-data","page":"Plays well with others (External tools)","title":"Get sample data","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"We'll pull some of the validation data data from NCBI. First, the reference genome from GenBank.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"esearch \\\n -db nucleotide \\\n -query \"NC_036618.1\" \\\n | efetch \\\n -format fasta \\\n > idv4.fasta","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Next, we'll download one of the pools from the validation set from SRA.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"fasterq-dump \"SUB13489216\"","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nidv4.fasta\nIDV-Aug2022-P2.fastq.gz","category":"page"},{"location":"tutorial/3-other/#Align-sequences","page":"Plays well with others (External tools)","title":"Align sequences","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"We have a set of Nanopore reads and a reference genome to go with them. We'll use minimap2 to align the reads to reference. minimap2 requires the -a flag to output in SAM format, and uses the -x flag to tweak the settings for optimal Nanoore alignment. We then run those reads through samtools sort and samtools index to reduce the computational load needed to find reads by our downstream tools, and samtools view -b to convert the SAM file into a compressed BAM file.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"minimap2 \\\n -t $(($(nproc)/2)) \\\n -ax map-pb \\\n idv4.fasta \\\n IDV-Aug2022-P2.fastq.gz \\\n | samtools sort \\\n | samtools view \\\n -@ $(($(nproc)/2)) \\\n -b \\\n -h \\\n > IDV-Aug2022-P2.bam\nsamtools index IDV-Aug2022-P2.bam","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nIDV-Aug2022-P2.bam\nIDV-Aug2022-P2.bam.bai","category":"page"},{"location":"tutorial/3-other/#Call-variants","page":"Plays well with others (External tools)","title":"Call variants","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"People are picky about their variant callers. Even viralrecon can't decide which caller to use. If you don't like HapLink's built-in variant caller, that's fine! Here, we'll use LoFreq (another fine variant caller) to decide for us whether nucleotide variations are variants, or just randomness. The only requirement is that the output must be in VCF format (looking at you, iVar) for HapLink to read it.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"lofreq faidx idv4.fasta\nlofreq call-parallel \\\n --pp-threads $(($(nproc)/2)) \\\n --call-indels \\\n --ref idv4.fasta \\\n IDV-Aug2022-P2.bam \\\n --out IDV-Aug2022-P2.vcf","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nIDV-Aug2022-P2.vcf","category":"page"},{"location":"tutorial/3-other/#Call-haplotypes","page":"Plays well with others (External tools)","title":"Call haplotypes","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"And we're back... We now have each piece (reference, alignment, variant calls) needed for HapLink to find haplotypes.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"export JULIA_NUM_THREADS=$(($(nproc)/2))\nhaplink haplotypes \\\n idv4.fasta \\\n IDV-Aug2022-P2.vcf \\\n IDV-Aug2022-P2.bam \\\n > IDV-Aug2022-P2.yml\nhaplink sequences \\\n idv4.fasta \\\n IDV-Aug2022-P2.yml \\\n > IDV-Aug2022-P2.fasta","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nIDV-Aug2022-P2.yml\nIDV-Aug2022-P2.fasta","category":"page"},{"location":"tutorial/3-other/#Alignment-(again,-but-on-a-bigger-scale)","page":"Plays well with others (External tools)","title":"Alignment (again, but on a bigger scale)","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Now that we have the sequences of every haplotype that HapLink found, we can go farther and see how they compare altogether. We'll use MAFFT to generate a multiple sequence alignment of every haplotype. By comparing everything to everything, we can start to see patterns that may have been masked by comparing to the reference genome.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"cat idv4.fasta IDV-Aug2022-P2.fasta > sequences.fasta\nmafft \\\n --thread $(($(nproc)/2)) \\\n --auto \\\n sequences.fasta \\\n > haplotypes.fas","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nsequences.fasta\nhaplotypes.fas","category":"page"},{"location":"tutorial/3-other/#Phylogeny","page":"Plays well with others (External tools)","title":"Phylogeny","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Although the human brain is reasonably good at picking up pattern differences, we can mathematically determine the similarities between sequences using a phylogeny. We'll use RAxML-NG to solve the phylogeny for us. Note that RAxML-NG produces a lot of output.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"raxml-ng --all \\\n --threads auto{MAX} \\\n --msa haplotypes.fas \\\n --model GTR+G","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nhaplotypes.fas\nhaplotypes.fas.raxml.bestModel\nhaplotypes.fas.raxml.bestTree\nhaplotypes.fas.raxml.bestTreeCollapsed\nhaplotypes.fas.raxml.bootstraps\nhaplotypes.fas.raxml.log\nhaplotypes.fas.raxml.mlTrees\nhaplotypes.fas.raxml.rba\nhaplotypes.fas.raxml.startTree\nhaplotypes.fas.raxml.support","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"You can now import the haplotypes.fas.raxml.bestTree file into a phylogeny viewer to view the tree. A personal favorite around our lab is iTOL, but many others will work.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"(Image: iTOL tree)","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Now you can do some heavy lifting with HapLink and its friends! You can leave the tutorials and check out the CLI reference, for an in-depth examination of everything you just learned, or dive into the next tutorial to learn how to call haplotypes without leaving the Julia REPL.","category":"page"},{"location":"cli/variants/#variants-cli","page":"haplink variants","title":"haplink variants","text":"","category":"section"},{"location":"cli/variants/","page":"haplink variants","title":"haplink variants","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"cli/variants/","page":"haplink variants","title":"haplink variants","text":"HapLink.variants","category":"page"},{"location":"cli/variants/#HapLink.variants","page":"haplink variants","title":"HapLink.variants","text":"haplink variants [options] reference bam\n\nCall variants\n\nIntroduction\n\nDecides which variations found within an alignment are real, and which are due to random chance. HapLink uses Fisher's Exact Test to determine the statistical significance of sequence variations, and optionally allows for other thresholds to reduce random noise in the variant calling. Outputs a Variant Call Format (VCF) file compliant with VCF v4.\n\nArguments\n\nreference: path to the reference genome to call variants against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.\nbam: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.\n\nOptions\n\n--outfile=: The file to write variant calls to. If left blank, variant calls are written to standard output.\n--significance=: The alpha value for statistical significance of variant calls.\n--depth=: Minimum number of times the variation must be observed within the alignment to be called a variant\n--quality=: The minimum average basecall quality score for a variation to be called a variant\n--frequency=: The minimum proportion of reads that the variation must be observed within compared to all reads covering its position for that variation to be called a variant\n--position=: The distance (as a percentage) from the edge of reads that a variation must be observed at to be called a variant\n--strandedness=: The maximum proportion of times that a variation can be observed on one strand versus the other to be called a variant. This metric is totally useless on single-stranded sequencing protocols like Oxford Nanopore, but can be useful for combining data between stranded protocols like most Illumina and Pacific Bio.\n\n\n\n\n\n","category":"function"},{"location":"tutorial/4-repl/#repl-tutorial","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Julia is an ahead-of-time compiled language. Practically, that means that every time you restart Julia, you have to recompile all the code you were running. Using HapLink on the command line involves up to four different commands. Translation: up to four cases where you lose time to recompiling code that was just running. Surely there's a better way, right? Yep, you can stay within a single Julia session by using HapLink's REPL mode.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Pages = [\"4-repl.md\"]","category":"page"},{"location":"tutorial/4-repl/#Installing-HapLink-...-into-a-project","page":"For advanced beginners (REPL mode)","title":"Installing HapLink ... into a project","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Wait, isn't that what the first tutorial was about? Yes and no. Those projects installed HapLink's command line interface to a global environment. We're going to locally install HapLink as a library into a local package environment.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"tip: Tip\nIf you want to learn more about package environments, check out the official Julia environments documentation or Julius Krumbiegel's tutorial.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Press ] to enter Pkg mode, then follow along with these prompts.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"(@v1.6) pkg> activate .\n Activating project at `~/haplink-tutorial-4`\n\n(haplink-tutorial-4) pkg> add HapLink","category":"page"},{"location":"tutorial/4-repl/#Finding-the-example-files","page":"For advanced beginners (REPL mode)","title":"Finding the example files","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Instead of downloading the example files from the internet, we can reference them directly from the package.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"using HapLink\nconst PACKAGE_DIR = dirname(dirname(pathof(HapLink)))\nconst EXAMPLE_DIR = joinpath(PACKAGE_DIR, \"example\")\nconst EXAMPLE_BAM = joinpath(EXAMPLE_DIR, \"sample.bam\")\nconst EXAMPLE_REFERENCE = joinpath(EXAMPLE_DIR, \"reference.fasta\")","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Perfect!","category":"page"},{"location":"tutorial/4-repl/#Pileup-time","page":"For advanced beginners (REPL mode)","title":"Pileup time","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Variants can't exist as a single unit, they have to be supported by a set of reads. We can view those reads in a pileup, which can easily be made by using the pileup function.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"sample_pileup = pileup(EXAMPLE_BAM, EXAMPLE_REFERENCE)","category":"page"},{"location":"tutorial/4-repl/#But-is-it-real?","page":"For advanced beginners (REPL mode)","title":"But is it real?","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Pileups tell us what the sequences are saying, but they can't tell us if there are any real variants. We can use the call_variant function to filter a pileup into a set of variants.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"all_variant_calls = VariationCall[]\nfor pile in sample_pileup\n push!(all_variant_calls, call_variant(pile, 0.5; D=0x02, Q=10.0, X=0.01, F=0.05))\nend\nall_variant_calls","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We can now remove all variants that did not pass our filters.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"filter!(var -> filters(var) == [\"PASS\"], all_variant_calls)","category":"page"},{"location":"tutorial/4-repl/#Rule-of-the-majority","page":"For advanced beginners (REPL mode)","title":"Rule of the majority","text":"","category":"section"},{"location":"tutorial/4-repl/#Filtering-variants","page":"For advanced beginners (REPL mode)","title":"Filtering variants","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We can sort into consensus and subconsensus variants. Note that for this fake set, consensus variants will be taken at a >75% frequency.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"consensus_variant_calls = filter(var -> frequency(var) > 0.75, all_variant_calls)\nsubconsensus_variant_calls = filter(\n var -> !(var in consensus_variant_calls), all_variant_calls\n)\nconsensus_variations = variation.(consensus_variant_calls)\nsubconsensus_variations = variation.(subconsensus_variant_calls)","category":"page"},{"location":"tutorial/4-repl/#Creating-the-consensus-sequence","page":"For advanced beginners (REPL mode)","title":"Creating the consensus sequence","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We will now need to convert the consensus variations into a consensus sequence. HapLink doesn't have these tools in-and-of itself, so we'll have to reach into its dependency packages.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"(haplink-tutorial-4) pkg> add BioAlignments SequenceVariation\n Resolving package versions...\n Updating `~/haplink-tutorial-4/Project.toml`\n [00701ae9] + BioAlignments v3.1.0\n [eef6e190] + SequenceVariation v0.2.2\n No Changes to `~/haplink-tutorial-4/Manifest.toml`","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"using BioAlignments, SequenceVariation\nreference_sequence = reference(first(consensus_variations))\nconsensus_haplotype = Haplotype(reference_sequence, consensus_variations)\nconsensus_sequence = reconstruct(consensus_haplotype)\nconsensus_alignment = PairwiseAlignment(\n AlignedSequence(consensus_sequence, Alignment(HapLink.cigar(consensus_haplotype))),\n reference_sequence,\n)","category":"page"},{"location":"tutorial/4-repl/#Minority-report","page":"For advanced beginners (REPL mode)","title":"Minority report","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We are interested in looking at the subconsensus variants, but they need to be reoriented to be put in terms of the consensus sequence.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"map!(\n var -> translate(var, consensus_alignment),\n subconsensus_variations,\n subconsensus_variations,\n)","category":"page"},{"location":"tutorial/4-repl/#Bringing-in-the-reads","page":"For advanced beginners (REPL mode)","title":"Bringing in the reads","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Now that we have a consensus sequence, we can properly import the reads for haplotype calling into HapLink's specialized Pseudoread class. There is a convient pseudoreads function that can directly convert a BAM file for us.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"haplotyping_pseudoreads = pseudoreads(EXAMPLE_BAM, consensus_sequence)","category":"page"},{"location":"tutorial/4-repl/#Cutting-the-chaff","page":"For advanced beginners (REPL mode)","title":"Cutting the chaff","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We only need reads that span every one of our subconsensus variants, so let's get rid of every other read.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"(haplink-tutorial-4) pkg> add BioGenerics\n Resolving package versions...\n Updating `~/haplink-tutorial-4/Project.toml`\n [47718e42] + BioGenerics v0.1.2\n No Changes to `~/haplink-tutorial-4/Manifest.toml`","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"using BioGenerics\nfilter!(\n var -> leftposition(var) >= leftposition(first(subconsensus_variations)),\n haplotyping_pseudoreads,\n)\nfilter!(\n var -> rightposition(var) >= rightposition(first(subconsensus_variations)),\n haplotyping_pseudoreads,\n)","category":"page"},{"location":"tutorial/4-repl/#Finding-haplotypes...","page":"For advanced beginners (REPL mode)","title":"Finding haplotypes...","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"This one is a bit of a mind-bender, but bear with me.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"haplotyping_haploypes = haplotype.(haplotyping_pseudoreads)\nvalid_haplotypes = findset(\n subconsensus_variations, hap -> ishaplotype(hap, haplotyping_haploypes)\n)","category":"page"},{"location":"tutorial/4-repl/#...and-deciding-if-they're-real","page":"For advanced beginners (REPL mode)","title":"...and deciding if they're real","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Just because a haplotype could exist, doesn't mean it should. You can check the statistics of a haplotype by converting it into a HaplotypeCall.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"map(hap -> HaplotypeCall(hap, haplotyping_haploypes), valid_haplotypes);\nhc = first(ans)\ndepth(hc)\nlinkage(hc)\nsignificance(hc)","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Nope! This one sure isn't!","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"This tutorial only scratches the surface of what's possible with HapLink's REPL mode. Take a look at the full API reference for more details on how to tweak your haplotype calling. Have fun!","category":"page"},{"location":"#HapLink","page":"Home","title":"HapLink","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"HapLink","category":"page"},{"location":"#HapLink.HapLink","page":"Home","title":"HapLink.HapLink","text":"HapLink\n\nViral haplotype calling via linkage disequilibrium\n\n\n\n\n\n","category":"module"},{"location":"#Welcome","page":"Home","title":"Welcome","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Howdy! 🤠 And welcome to HapLink! 👋 HapLink is a command-line suite of tools to enable the exploration of viral quasispecies within a single metagenomic sample. Every piece eventually builds up to our viral haplotype caller, which uses linkage disequilibrium on long sequencing reads (💡 think Oxford Nanopore or PacBio HiFi) to identify genetic mutations that are conserved within a single virus particle.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This manual will cover the different ways of using HapLink, starting with a few tutorials before diving into the details of our reference section.","category":"page"},{"location":"#Contents","page":"Home","title":"Contents","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Getting-started","page":"Home","title":"Getting started","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Ready to dive in? 🤿 Here's a 30,000-foot view","category":"page"},{"location":"","page":"Home","title":"Home","text":"curl -fsSL https://install.julialang.org | sh -s -- --yes\nsource ~/.bashrc\njulia \\\n --startup-file=no \\\n --history-file=no \\\n --quiet \\\n -e 'using Pkg; Pkg.activate(;temp=true); Pkg.add(HapLink)'\necho 'export PATH=$HOME/.julia/bin:$PATH' >> $HOME/.bashrc\nsource ~/.bashrc\n\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/reference.fasta\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/sample.bam\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/sample.bam.bai\n\nhaplink variants \\\n reference.fasta \\\n sample.bam \\\n --significance 0.5 \\\n --depth 1 \\\n --quality 10.0 \\\n --position 0.01 \\\n --frequency 0.05 \\\n | tee sample.vcf\n\nhaplink consensus \\\n reference.fasta \\\n sample.vcf \\\n | tee consensus.fasta\n\nhaplink haplotypes \\\n reference.fasta \\\n sample.vcf \\\n sample.bam \\\n --frequency 0.75 \\\n | tee sample.yml\n\nhaplink sequences \\\n reference.fasta \\\n sample.yml \\\n | tee sample.fasta","category":"page"}] +[{"location":"api/haplotypecall/#HaplotypeCall","page":"HaplotypeCall","title":"HaplotypeCall","text":"","category":"section"},{"location":"api/haplotypecall/","page":"HaplotypeCall","title":"HaplotypeCall","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/haplotypecall/","page":"HaplotypeCall","title":"HaplotypeCall","text":"Modules = [HapLink]\nPages = [\"haplotypecalling.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/haplotypecall/#HapLink.HaplotypeCall","page":"HaplotypeCall","title":"HapLink.HaplotypeCall","text":"HaplotypeCall{S,T}\n\nA Haplotype that has had linkage disequilibrium calculations performed along with metadata to verify or refute its validity. Very similar to a VariationCall, but for Haplotypes.\n\nFields\n\ndepth::UInt64: How many sequencing reads contain all the Variations in haplotype\nlinkage::Float64: The unweighted linkage disequilibrium coefficient of this call\nfrequency::Float64: How often this haplotype occurs compared to the entire read pool\nsignificance::Float64: The χ^2 statistical significance (p-value) of this call\nhaplotype::Haplotype{S,T}: The set of Variations that are inherited together in this call\n\n\n\n\n\n","category":"type"},{"location":"api/haplotypecall/#HapLink.HaplotypeCall-Union{Tuple{T}, Tuple{S}, Tuple{SequenceVariation.Haplotype{S, T}, AbstractArray{SequenceVariation.Haplotype{S, T}}}} where {S<:BioSequences.BioSequence, T<:BioSymbols.BioSymbol}","page":"HaplotypeCall","title":"HapLink.HaplotypeCall","text":"HaplotypeCall(\n haplotype::Haplotype{S,T}, reads::AbstractArray{Haplotype{S,T}}\n) where {S<:BioSequence,T<:BioSymbol}\nHaplotypeCall(\n haplotype::AbstractArray{Variation{S,T}}, reads::AbstractArray{Haplotype{S,T}}\n) where {S<:BioSequence,T<:BioSymbol}\n\nCalls haplotypes by calculating the linkage disequilibrium of haplotype within reads\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.depth-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.depth","text":"depth(hc::HaplotypeCall)\n\nGets the number of times hc was found\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.frequency-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.frequency","text":"frequency(hc::HaplotypeCall)\n\nGets the relative number of times hc was found\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.haplotype-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.haplotype","text":"haplotype(hc::HaplotypeCall)\n\nGets the underlying Haplotype of hc\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.ishaplotype-Union{Tuple{T}, Tuple{S}, Tuple{AbstractArray{SequenceVariation.Variation{S, T}}, AbstractArray{SequenceVariation.Haplotype{S, T}}}} where {S<:BioSequences.BioSequence, T<:BioSymbols.BioSymbol}","page":"HaplotypeCall","title":"HapLink.ishaplotype","text":"function ishaplotype(\n haplotype::Union{AbstractArray{Variation{S,T}}, Haplotype{S,T}},\n reads::AbstractArray{Haplotype{S,T}};\n frequency::Union{Float64,Nothing}=nothing,\n significance::Union{Float64,Nothing}=nothing,\n depth::Union{Int,Nothing}=nothing,\n) where {S<:BioSequence,T<:BioSymbol}\n\nDetermines if a call of haplotype is supported by the sequences in reads based upon the provided keyword criteria.\n\nArguments\n\nhaplotype::Union{AbstractArray{Variation}, Haplotype}: A Vector of Variations or a Haplotype to search for as a haplotype\nreads::AbstractArray{Haplotype}: The reads to search for haplotype in\n\nKeywords\n\nfrequency::Union{Float64,Nothing}=nothing: The minimum number of times the entire haplotype must appear within reads compared to the number of reads to return true\nsignificance::Union{Float64,Nothing}=nothing: The χ^2 significance level (α) of linkage disequilibrium that a haplotype must surpass to return true\ndepth::Union{Int,Nothing}=nothing: The minimum number of times the entire haplotype must appear within reads to return true\n\nExtended help\n\nLinkage disequilibrium (Δ) is calculated by\n\nΔ = P_reference - prod_i P_refi\n\nwhere\n\nP_reference is the probability of finding a read that contains only reference (consensus) bases\nP_refi is the probability of finding a read that contains the reference (consensus) base for variant i within a haplotype\n\nand the χ^2 statistic is calculated by\n\nχ^2 = frac\n Δ^2 n\n \n \n left(prod_i^N P_refi left(1 - P_refiright)right)^frac2N\n \n\nwhere\n\nN is the number of Variations in haplotype (i.e., length(haplotype))\nn is the number of total reads sampled (i.e. length(reads))\n\nThe significance is then calculated from the cumulative χ^2 distribution function.\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.linkage-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.linkage","text":"linkage(hc::HaplotypeCall)\n\nGets the unweighted linkage disequilibrium coefficient of hc\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.linkage-Union{Tuple{AbstractArray{<:Integer, N}}, Tuple{N}} where N","page":"HaplotypeCall","title":"HapLink.linkage","text":"linkage(counts::AbstractArray{<:Integer, N}) where N\n\nCalculates the linkage disequilibrium of a haplotype given its N-dimensional contingency matrix, counts.\n\ncounts is an N-dimensional array where the Nth dimension represents the Nth variant call locus within a haplotype. findoccurrences produces such an array.\n\nExtended help\n\nlinkage(::AbstractArray{<:Integer, N}) calculates an unweighted linkage disequilibrium as given by Equation (6) of Slatkin (1972).\n\nD(1N) = Eleft( prod_k^N i_k - P_k right)\n\nwhere\n\nN is the number of variant loci\nD(1N) is the linkage disequilibrium of all N variant loci\nE is an operator returning the arithmetic mean of its argument over every read\ni_k is a function that returns 1 if the k-th locus of the given read contains the reference allele, and returns 0 otherwise.\nP_k is the probability of any read containing the reference allele in the k-th locus, i.e. the frequency at which the reference allele is found within the entire read set at the k-th locus\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.occurrence_matrix-Union{Tuple{T}, Tuple{S}, Tuple{AbstractArray{SequenceVariation.Variation{S, T}}, AbstractArray{SequenceVariation.Haplotype{S, T}}}} where {S<:BioSequences.BioSequence, T<:BioSymbols.BioSymbol}","page":"HaplotypeCall","title":"HapLink.occurrence_matrix","text":"occurrence_matrix(\n haplotype::AbstractArray{Variation{S,T}},\n reads::AbstractArray{Haplotype{S,T}},\n) where {S<:BioSequence,T<:BioSymbol}\noccurrence_matrix(\n haplotype::Haplotype{S,T},\n reads::AbstractArray{S,T}\n) where {S<:BioSequence,T<:BioSymbol}\n\nDetermine how many times the variants in haplotype appear in reads as an N- dimensional matrix.\n\nArguments\n\nhaplotype::AbstractArray{Variation}: A Vector of Variations or a Haplotype to search for as a haplotype\nreads::AbstractArray{Haplotype}: The reads to search for haplotype in\n\nReturns\n\n2x2x... Array{Int, length(haplotype)}: An N-dimensional matrix where N is the number of variant positions in readmatches. The 1 index position in the nth dimension represents the number of times the nth variant position was found to have the reference base called, while the 2 index position represents the number of times the nth variant position was found to have the alternate base called. E.g. first(occurrence_matrix(reads)) gives the number of times the all-reference base haplotype was found in reads, while occurrence_matrix(reads)[end] gives the number of times the all-alternate base haplotype was found.\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.significance-Tuple{AbstractArray{<:Integer}}","page":"HaplotypeCall","title":"HapLink.significance","text":"significance(counts::AbstractArray{<:Integer})\n\nCalculates the χ^2 significance of a haplotype given its N-dimensional contingency matrix, counts\n\nSee the documentation on linkage(::AbstractArray{<:Integer}) or occurrence_matrix for details on how counts should be constructed.\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.significance-Tuple{HaplotypeCall}","page":"HaplotypeCall","title":"HapLink.significance","text":"significance(hc::HaplotypeCall)\n\nGets the χ^2 statistical significance (p-value) of hc\n\n\n\n\n\n","category":"method"},{"location":"api/haplotypecall/#HapLink.sumsliced","page":"HaplotypeCall","title":"HapLink.sumsliced","text":"sumsliced(A::AbstractArray, dim::Int, pos::Int=1)\n\nSum all elements that are that can be referenced by pos in the dim dimension of A.\n\nExample\n\njulia> A = reshape(1:8, 2, 2, 2)\n2×2×2 reshape(::UnitRange{Int64}, 2, 2, 2) with eltype Int64:\n[:, :, 1] =\n 1 3\n 2 4\n\n[:, :, 2] =\n 5 7\n 6 8\n\njulia> sumsliced(A, 2)\n14\n\njulia> sumsliced(A, 2, 2)\n22\n\nHeavily inspired by Holy, Tim \"Multidimensional algorithms and iteration\" https://julialang.org/blog/2016/02/iteration/#filtering_along_a_specified_dimension_exploiting_multiple_indexes\n\n\n\n\n\n","category":"function"},{"location":"tutorial/2-examples/#cli-tutorial","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"At this point, we'll play with the example sequences included gratis 💰 with HapLink. No, they don't represent anything ☁️, and they aren't particularly interesting 🥱, but they do run fast 🏇, so we can get a handle on how the interface and workflow operate.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"Pages = [\"2-examples.md\"]","category":"page"},{"location":"tutorial/2-examples/#Getting-the-goods","page":"Kicking the tires (Fake sequences)","title":"Getting the goods","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"Let's get the example files from the code repository. In your terminal, run","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"wget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/reference.fasta\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/sample.bam\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/sample.bam.bai","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nreference.fasta\nsample.bam\nsample.bam.bai","category":"page"},{"location":"tutorial/2-examples/#Spot-the-difference","page":"Kicking the tires (Fake sequences)","title":"Spot the difference","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"In order for HapLink to call haplotypes, it needs to know which sequence differences are due to sequencing errors, and which are due to genetic mutation. This process is known as variant calling, and HapLink comes bundled with a variant caller for just this type of occasion, which requires the reference genome and the alignment BAM file. Since we have both of those, let's run variant calling now.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink variants reference.fasta sample.bam","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nNone","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"HapLink by default outputs to standard output, so the variant calls were printed on your screen instead of saved 😡. That's okay, though 😌. It's often good to visually check your variant calls, and it this case we absolutely needed to. Notice that none of the variants got a PASS filter. In fact, all of them were weeded out by too high of thresholds for depth (remember we only have 10 sequences) and significance. Let's readjust (and save our results this time).","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink \\\n variants \\\n reference.fasta \\\n sample.bam \\\n --depth 3 \\\n --significance 0.1 \\\n | tee sample.vcf","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.vcf","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"These settings seemed to work out well. Let's stick with them and move on.","category":"page"},{"location":"tutorial/2-examples/#The-general-lay-of-the-land","page":"Kicking the tires (Fake sequences)","title":"The general lay of the land","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"At this point, we're going to take a break from haplotype calling and convert those variant calls into a useful summary: the consensus sequence. HapLink can call the consensus sequence based solely off variant calls. Let's see that now.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink consensus reference.fasta sample.vcf | tee sample.consensus.fasta","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.consensus.fasta","category":"page"},{"location":"tutorial/2-examples/#The-star-attraction","page":"Kicking the tires (Fake sequences)","title":"The star attraction","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"And now it's time for haplotype calling. Before you get your hopes up, there are no true haplotypes in this file. If 10 reads could yield subconsenus mysteries, then bioinformatics would be a super easy job. Alas, we live in the real world, and we'll have to stretch mathematical constructs to get anything out of these reads.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink \\\n haplotypes \\\n reference.fasta \\\n sample.vcf \\\n sample.bam \\\n --consensus-frequency 0.75 \\\n | tee sample.yml","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.yml","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"You can see that HapLink found only one haplotype in this alignment, but (spoilers!) this isn't really a haplotype. This is just the consensus sequence, formatted in HapLink's haplotype scheme. The first haplotype in any output file is always the consensus sequence.","category":"page"},{"location":"tutorial/2-examples/#Haplotypes-in-the-Matrix","page":"Kicking the tires (Fake sequences)","title":"Haplotypes in the Matrix","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"If you have reads that don't span the entire genome (like we have here), you can use HapLink's maximum likelihood simulator to \"create\" full-length reads by splicing together reads and look for haplotypes on them. Even though we know there aren't any haplotypes in this sample, let's get out the simulator and give it a try.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink \\\n haplotypes \\\n reference.fasta \\\n sample.vcf \\\n sample.bam \\\n --consensus-frequency 0.75 \\\n --simulated-reads \\\n --iterations 100 \\\n --overlap-min 0 \\\n --overlap-max 100 \\\n | tee sample.ml.yml","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.ml.yml","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"Still nothing, huh? Like I said, no haplotypes here, and simulation can't change that. Note that simulating full-length reads used a lot more computational power, so you should try to stick with full-length reads when you can!","category":"page"},{"location":"tutorial/2-examples/#But,-what-does-it-mean?","page":"Kicking the tires (Fake sequences)","title":"But, what does it mean?","text":"","category":"section"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"HapLink's haplotype YAML files contain everything needed to recreate the haplotype computation, but they can't really be used by any other programs. That's why there's the sequences command, so haplotype sequences can be saved into FASTA format for use by other tools. Let's try this now.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"haplink sequences reference.fasta sample.yml > sample.fasta","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"info: Output\nsample.fasta","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"The output only contains the consensus sequence. Since we knew that there were no haplotypes in this sample we could have used haplink consensus, instead to get the same result.","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"","category":"page"},{"location":"tutorial/2-examples/","page":"Kicking the tires (Fake sequences)","title":"Kicking the tires (Fake sequences)","text":"You are now ready to move on to the next tutorial!","category":"page"},{"location":"api/psuedoread/#PseudoRead","page":"Pseudoread","title":"PseudoRead","text":"","category":"section"},{"location":"api/psuedoread/","page":"Pseudoread","title":"Pseudoread","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/psuedoread/","page":"Pseudoread","title":"Pseudoread","text":"Modules = [HapLink]\nPages = [\"pseudoread.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/psuedoread/#HapLink.Pseudoread","page":"Pseudoread","title":"HapLink.Pseudoread","text":"Pseudoread(startpos::Integer, endpos::Integer, read::Haplotype)\nPseudoread(query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq)\n\nA stand-in struct for a sequencing read (real or imaginary) represented by its position and differences relative to a reference sequence.\n\n\n\n\n\n","category":"type"},{"location":"api/psuedoread/#HapLink.contradicts-Union{Tuple{T}, Tuple{S}, Tuple{SequenceVariation.Variation{S, T}, SequenceVariation.Haplotype{S, T}}} where {S, T}","page":"Pseudoread","title":"HapLink.contradicts","text":"contradicts(var::Variation{S,T}, ref::Haplotype{S,T}) where {S,T}\n\nReturns true if var contains a modification incompatible with any of the Variations that make up ref.\n\nExample\n\njulia> using BioSequences, SequenceVariation\n\njulia> ref = Haplotype(dna\"GATTACA\", [Variation(dna\"GATTACA\", \"A2T\"), Variation(dna\"GATTACA\", \"Δ5-6\")]);\n\njulia> contradicts(Variation(dna\"GATTACA\", \"A2C\"), ref)\ntrue\n\njulia> contradicts(Variation(dna\"GATTACA\", \"T4A\"), ref)\nfalse\n\njulia> contradicts(Variation(dna\"GATTACA\", \"Δ2-2\"), ref)\ntrue\n\njulia> contradicts(Variation(dna\"GATTACA\", \"Δ4-4\"), ref)\nfalse\n\njulia> contradicts(Variation(dna\"GATTACA\", \"2G\"), ref)\nfalse\n\njulia> contradicts(Variation(dna\"GATTACA\", \"4G\"), ref)\nfalse\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.haplotype-Tuple{Pseudoread}","page":"Pseudoread","title":"HapLink.haplotype","text":"haplotype(ps::Pseudoread)\n\nGets the differences between ps and its reference sequence as a Haplotype\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.overlap_inrange-Tuple{Pseudoread, Pseudoread, Integer, Integer}","page":"Pseudoread","title":"HapLink.overlap_inrange","text":"overlap_inrange(\n x::Pseudoread, y::Pseudoread, overlap_min::Integer, overlap_max::Integer\n)\n\nReturns true if the overlap between x and y is within overlap_min and overlap_max, returns false otherwise\n\nExample\n\njulia> using BioSequences, SequenceVariation\n\njulia> ps1 = Pseudoread(1, 100, Haplotype(dna\"A\", Variation{LongDNA{4}, DNA}[]));\n\njulia> ps2 = Pseudoread(50, 150, Haplotype(dna\"A\", Variation{LongDNA{4}, DNA}[]));\n\njulia> overlap_inrange(ps1, ps2, 1, 500)\ntrue\n\njulia> overlap_inrange(ps1, ps2, 75, 500)\nfalse\n\njulia> overlap_inrange(ps1, ps2, 1, 25)\nfalse\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.overlapping_variations-Tuple{Pseudoread, SequenceVariation.Haplotype}","page":"Pseudoread","title":"HapLink.overlapping_variations","text":"overlapping_variations(ps::Pseudoread, v::Haplotype)\n\nFind all Variations within v that are contained within the range defined by ps\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.pseudoreads-Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}}","page":"Pseudoread","title":"HapLink.pseudoreads","text":"pseudoreads(sam::Union{AbstractString,AbstractPath}, consensus::NucleotideSeq)\n\nCreate a set of Pseudoreads from the alignments present in sam when aligned against consensus.\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.simulate-Union{Tuple{T}, Tuple{S}, Tuple{AbstractArray{Pseudoread}, BioSequences.BioSequence, AbstractArray{SequenceVariation.Variation{S, T}}}} where {S, T}","page":"Pseudoread","title":"HapLink.simulate","text":"simulate(\n pool::AbstractArray{Pseudoread},\n refseq::BioSequence,\n subcon::AbstractArray{Variation{S,T}};\n reverse_order::Union{Bool,Nothing}=nothing,\n overlap_min::Int64=0,\n overlap_max::Int64=500,\n) where {S,T} -> Union{Haplotype{S,T},Nothing}\n\nCreates a new set of Pseudoreads based on the pool of real reads spliced spliced together using maximum likelihood simulation.\n\nArguments\n\npool::AbstractArray{Pseudoread}: A set of (presumably) real sequencing reads that will serve as pieces of the new simulated reads.\nrefseq::BioSequence: The reference against which all of the sequences in pool and subcon were aligned\nsubcon::AbstractArray{Variation{S,T}}: The subconsensus Variations which are known to be both present in pool and real (i.e., not due to sequencing errors)\n\nOptions\n\nreverse_order::Union{Bool,Nothing}=nothing: Whether the splicing should start at the beginning or end of the reference sequence. If left blank, will randomly decide.\noverlap_min::Int64=0: The amount that reads from pool are required to overlap in order to be spliced together. Can be negative, indicating that reads must be spaced apart by this amount, instead. -overlap_max::Int64=500: The amount that reads from pool can overlap before being discarded. Like overlap_min, can be negative.\n\nReturns\n\nA Haplotype{S,T} representing the simulated read, or nothing if the simulation encountered a dead end.\n\nExtended help\n\nThe simulation procedure works by picking a read from pool that contains the first (if reverse_order is false, the last otherwise) variant from subcon. The procedure examines every position in subcon in ascending (or descending, if reverse_order == true) order until the chosen read no longer carries that position. The simulation will then, at random, pick a read from pool that contains the next position from subcon and that also has matching variations at every position in subcon as the previous read, as well as meets the overlap requirements. These two reads may differ in sites other than those in subcon: it is assumed that appropriate variant calling has already been performed and that any variation in those sites is due to sequencing error. The procedure repeats for each new read from pool until the simulation has simulated every position of subcon, or until it reaches a dead-end and cannot find a read that matches every requirement.\n\n\n\n\n\n","category":"method"},{"location":"api/psuedoread/#HapLink.variations_match-Tuple{Pseudoread, SequenceVariation.Haplotype}","page":"Pseudoread","title":"HapLink.variations_match","text":"variations_match(reference::Pseudoread, query::Union{Pseudoread,Haplotype})\n\nReturns true if query could be a read from reference.\n\nAdditional Variations found within query that are not present in reference do not invalid a positive match result so long as none of those Variations contradicts reference. These Variations can either\n\nExtend beyond the length of reference, or\nExist as \"sequencing errors\" within the interval of reference\n\nOn the other hand, every Variation within the overlapping interval of reference and query that is present in reference must also be found in query. In other words, query must have all of reference (within the overlap), but reference does not need all of query.\n\nThis arrangment allows for the creation of bodies of matching Pseudoreads, as done via simulate.\n\n\n\n\n\n","category":"method"},{"location":"cli/sequences/#haplink-sequences","page":"haplink sequences","title":"haplink sequences","text":"","category":"section"},{"location":"cli/sequences/","page":"haplink sequences","title":"haplink sequences","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"cli/sequences/","page":"haplink sequences","title":"haplink sequences","text":"HapLink.sequences","category":"page"},{"location":"cli/sequences/#HapLink.sequences","page":"haplink sequences","title":"HapLink.sequences","text":"haplink sequences [options] reference haplotypes\n\nConvert haplotype calls into haplotype sequences\n\nIntroduction\n\nTakes a YAML file output from haplink haplotypes and converts each called haplotype into its sequence and outputs those sequences in FASTA format. Useful for downstream processing by other tools, but loses all of the metadata and statistical details of each haplotype.\n\nArguments\n\nreference: path to the reference genome to mutate haplotypes from. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single- segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.\nhaplotypes: path to the haplotype calls file that will be used to mutate the reference sequence to produce haplotype sequences. Must be an output from haplink haplotypes.\n\nOptions\n\n--outfile=: The file to write the haplotype sequences to. If left blank, the sequences are written to standard output.\n--prefix=: Name of the new sequences. Defaults to using the FASTA identifier of the reference sequence.\n\n\n\n\n\n","category":"function"},{"location":"api/variationcall/#VariationCall","page":"VariationCall","title":"VariationCall","text":"","category":"section"},{"location":"api/variationcall/","page":"VariationCall","title":"VariationCall","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/variationcall/","page":"VariationCall","title":"VariationCall","text":"Modules = [HapLink]\nPages = [\"variationcall.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/variationcall/#HapLink.VariationCall","page":"VariationCall","title":"HapLink.VariationCall","text":"VariationCall\n\nRepresents a Variation that is supported by aligned reads with sufficient metadata to support or refute its validity. It is designed to be converted into a line in Variant Call Format, or a VCF.Record.\n\nFields\n\nvariation::Variation: The Variation of this call\nquality::Union{Nothing,Number}: Phred quality of the basecall for variation\nfilter::Vector{String}: Indicator if variation has passed filters and is actually a variant call, or a list of criteria that have failed it\ndepth::Union{Nothing,UInt}: The number of reads that cover leftposition(variation)\nstrandbias::Union{Nothing,Float64}: The fraction of times variation appears on a positive strand\naltdepth::Union{Nothing,UInt}: The number of types variation occurs\nreadpos::Union{Nothing,UInt}: The averagerelative position of variation in each read\npvalue::Union{Nothing,Float64}: The Fisher's Exact Test p-value of this call\n\n\n\n\n\n","category":"type"},{"location":"api/variationcall/#HapLink.altdepth-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.altdepth","text":"altdepth(vc::VariationCall) -> Union{Nothing,UInt}\n\nGets the number of times vc appears. Returns nothing if unknown.\n\nSee also altdepth(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.call_variant-Tuple{VariationPileup, Float64}","page":"VariationCall","title":"HapLink.call_variant","text":"call_variant(\n pileup::VariationPileup,\n α::Float64;\n D::Union{Nothing,Int}=nothing,\n Q::Union{Nothing,Float64}=nothing,\n X::Union{Nothing,Float64}=nothing,\n F::Union{Nothing,Float64}=nothing,\n S::Union{Nothing,Float64}=nothing,\n) -> VariationCall\n\nCalls variant from a pileup.\n\nArguments\n\npileup::VariationPileup: The pileup to call variants from\nα::Float64: Fisher's Exact Test significance (α) level to call variants\n\nKeywords\n\nnote: Note\nLeave any keyword undefined to skip filtering based on that field\n\nD::Union{Nothing,Int}=nothing: Minimum total read depth for a variant to be called\nQ::Union{Nothing,Float64}=nothing: Minimum average Phred quality for a variant to be called\nX::Union{Nothing,Float64}=nothing: Maximum average distance variant can be from read edge to be called\nF::Union{Nothing,Float64}=nothing: Minimum alternate frequency for a variant to be called\nS::Union{Nothing,Float64}=nothing: Maximum proportion of the number of times variant can appear on one strand versus the other\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.depth-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.depth","text":"depth(vc::VariationCall) -> Union{Nothing,UInt}\n\nGets the number of times the position of vc appears total. Returns nothing if unknown.\n\nSee also depth(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.filters-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.filters","text":"filters(vc::VariationCall) -> Vector{String}\n\nGets all filters that have been applied to vc. Note that an empty FILTER entry is not permitted under the VCF spec, and an empty array should not automatically be considered to have PASSed all filters.\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.frequency-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.frequency","text":"frequency(vc::VariationCall) -> Float64\n\nGets the alternate allele frequency of vc\n\nSee also frequency(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.p_value-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.p_value","text":"p_value(vc::VariationCall) -> Union{Nothing,Float64}\n\nGets the p-value of the observed statistic of vc. Returns nothing if unknown.\n\nSee also variation_test\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.quality-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.quality","text":"quality(vc::VariationCall) -> Union{Nothing,Float64}\n\nGets the average phred quality score of vc, if known. Returns nothing if unknown.\n\nSee also quality(::VariationInfo), quality(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.readpos-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.readpos","text":"readpos(vc::VariationCall) -> Union{Nothing,Float64}\n\nGets the average relative position of vc. Returns nothing if unknown.\n\nSee also readpos(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.strand_bias-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.strand_bias","text":"strand_bias(vc::VariationCall) -> Union{Nothing,Float64}\n\nGets the fraction of times vc appears on the positive strand. Returns nothing if unknown.\n\nSee also strand(::VariationInfo), strand(::VariationPileup)\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.variation-Tuple{VariationCall}","page":"VariationCall","title":"HapLink.variation","text":"variation(vc::VariationCall) -> SequenceVariation.Variation\n\nGets the Variation of a VariationCall\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.variation_test-Tuple{Int64, Int64, Float64}","page":"VariationCall","title":"HapLink.variation_test","text":"variation_test(depth::Int, altdepth::Int, quality::Float64)\n\nConducts a Fisher's Exact Test to deterimine the likelihood of a variant with total depth and variation depth altdepth occuring, given an average basecall quality. Returns the p-value of the test.\n\n\n\n\n\n","category":"method"},{"location":"api/variationcall/#HapLink.vcf-Tuple{VariationCall, AbstractString}","page":"VariationCall","title":"HapLink.vcf","text":"vcf(vc::VariationCall, refname::AbstractString) -> VCF.Record\n\nConverts vc into a VCF.Record. refname is required and used as the CHROM field in the record.\n\n\n\n\n\n","category":"method"},{"location":"api/haplotype/#Haplotype-Methods","page":"Haplotype Extensions","title":"Haplotype Methods","text":"","category":"section"},{"location":"api/haplotype/","page":"Haplotype Extensions","title":"Haplotype Extensions","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/haplotype/","page":"Haplotype Extensions","title":"Haplotype Extensions","text":"These methods extend the functionality of SequenceVariation.Haplotype for better manipulation and calling.","category":"page"},{"location":"api/haplotype/","page":"Haplotype Extensions","title":"Haplotype Extensions","text":"Modules = [HapLink]\nPages = [\"haplotype.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/haplotype/#HapLink.cigar-Union{Tuple{SequenceVariation.Haplotype{S, T}}, Tuple{T}, Tuple{S}} where {S, T}","page":"Haplotype Extensions","title":"HapLink.cigar","text":"cigar(hap::Haplotype{S,T}) where {S,T}\n\nConstructs a CIGAR string representing the alignment of the sequence of hap to its reference.\n\n\n\n\n\n","category":"method"},{"location":"cli/consensus/#haplink-consensus","page":"haplink consensus","title":"haplink consensus","text":"","category":"section"},{"location":"cli/consensus/","page":"haplink consensus","title":"haplink consensus","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"cli/consensus/","page":"haplink consensus","title":"haplink consensus","text":"HapLink.consensus","category":"page"},{"location":"cli/consensus/#HapLink.consensus","page":"haplink consensus","title":"HapLink.consensus","text":"haplink consensus [options] reference variants\n\nConvert variant calls to consensus sequence\n\nIntroduction\n\nGenerates a consensus sequence based on a reference genome and previously called variants. Will only consider variants with a \"PASS\" filter to be able to contribute to the consensus. Outputs results in FASTA format.\n\nArguments\n\nreference: The path to the reference genome. Must be in FASTA format.\nvariants: The path to the variant calls. Must be in VCF v4 format.\n\nOptions\n\n--outfile=: The file to write the consensus sequence to. If left blank, the consensus sequence is written to standard output.\n--frequency=: The minimum frequency at which a variant must appear to be considered part of the consensus. Note that HapLink does not support ambigous base calling (e.g. N, R, Y, etc.) at low frequencies unlike many variant callers.\n--prefix=: Name of the new sequence. Defaults to using the FASTA identifier of the reference sequence.\n\n\n\n\n\n","category":"function"},{"location":"cli/haplotypes/#haplink-haplotypes","page":"haplink haplotypes","title":"haplink haplotypes","text":"","category":"section"},{"location":"cli/haplotypes/","page":"haplink haplotypes","title":"haplink haplotypes","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"cli/haplotypes/","page":"haplink haplotypes","title":"haplink haplotypes","text":"HapLink.haplotypes","category":"page"},{"location":"cli/haplotypes/#HapLink.haplotypes","page":"haplink haplotypes","title":"HapLink.haplotypes","text":"haplink haplotypes [options] reference variants bam\n\nCall haplotypes\n\nIntroduction\n\nCalls haplotypes based on the linkage disequilibrium between subconsensus variant sites on long reads. Variant sites are chosen based on having a \"PASS\" filter in the variants file, and linkage is calculated based on the reads present in the bam file. Note this means that haplotypes can be called on a different set of sequences than variants were (e.g. variant calling using high accuracy short-read chemistry like Illumina and haplotype calling using low accuracy long-read chemistry like Oxford Nanopore). There are no guarantees that the variants file and bam file match, so use this feature with caution!\n\nArguments\n\nreference: path to the reference genome to call haplotypes against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.\nvariants: path to the variants file that will define variant sites to call haplotypes from. Must be in VCF (not BCF) v4 format. haplink variants generates a compatible file, although output from other tools can also be used.\nbam: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.\n\nFlags\n\n--simulated-reads: Use maximum likelihood simulation of long reads based on overlapping short reads\n\nOptions\n\n--outfile=: The file to write haplotype calls to. If left blank, haplotype calls are written to standard output.\n--consensus-frequency=: The minimum frequency at which a variant must appear to be considered part of the consensus.\n--significance=: The alpha value for statistical significance of haplotype calls.\n--depth=: Minimum number of times a variant combination must be observed within the set of reads to be called a haplotype\n--frequency=: The minimum proportion of reads that a variant combination must be observed within compared to all reads covering its position for that haplotype to be called\n--overlap-min=: The minimum number of bases that must overlap for two short reads to be combined into one simulated read. Can be negative to indicate a minimum distance between reads. Only applies when --simulated-reads is set.\n--overlap-max=: The maximum number of bases that may overlap for two short reads to be combined into one simulated read. Can be negative to indicate a cap on how far two reads must be apart from one another. Must be greater than --overlap-min. Only applies when --simulated-reads is set.\n--iterations=: The number of simulated reads to create before calling haplotypes. Only applies when --simulated-reads is set.\n--seed=: The random seed used for picking short reads to create simulated reads when using maximum likelihood methods. Leaving unset will use the default Julia RNG and seed. See Julia's documentation on randomness for implementation details. Only applies when --simulated-reads is set.\n\n\n\n\n\n","category":"function"},{"location":"api/consensus/#Consensus-Tools","page":"Consensus Tools","title":"Consensus Tools","text":"","category":"section"},{"location":"api/consensus/","page":"Consensus Tools","title":"Consensus Tools","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/consensus/","page":"Consensus Tools","title":"Consensus Tools","text":"Modules = [HapLink]\nPages = [\"consensus.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/consensus/#HapLink.consensus_haplotype-Tuple{BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}, Union{AbstractString, FilePathsBase.AbstractPath}}","page":"Consensus Tools","title":"HapLink.consensus_haplotype","text":"consensus_haplotype(\n reference::NucleotideSeq,\n variants::Union{AbstractString,AbstractPath};\n frequency::Float64=0.5,\n)\n\nGet the consensus Haplotypefrom variants applied to reference.\n\nArguments\n\nreference::NucleotideSeq: Sequence of the reference genome that variants were called from\nvariants::Union{AbstractString,AbstractPath}: Path to variant call file that mutations will be applied from\n\nKeywords\n\nfrequency::Float64=0.5: Fraction of total reads that must have supported the alternate position in order to be included as part of the consensus. In other words, only VCF records that have a AF (allele/alternate frequency) higher than this will be considered to contribute to the consensus.\n\n\n\n\n\n","category":"method"},{"location":"api/consensus/#HapLink.consensus_record-Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, Union{AbstractString, FilePathsBase.AbstractPath}}","page":"Consensus Tools","title":"HapLink.consensus_record","text":"consensus_record(\n reference::Union{AbstractString,AbstractPath},\n variants::Union{AbstractString,AbstractPath};\n frequency::Float64=0.5,\n prefix::Union{AbstractString,Nothing}=nothing,\n)\n\nGet the consensus FASTA.Recordfrom variants applied to the first sequence in reference.\n\nArguments\n\nreference::Union{AbstractString,AbstractPath}: Path to the reference genome that variants were called from. Only the first sequence will be used.\nvariants::Union{AbstractString,AbstractPath}: Path to variant call file that mutations will be applied from\n\nKeywords\n\nfrequency::Float64=0.5: Fraction of total reads that must have supported the alternate position in order to be included as part of the consensus. In other words, only VCF records that have a AF (allele/alternate frequency) higher than this will be considered to contribute to the consensus.\nprefix::Union{AbstractString,Nothing}=nothing: Name to give to the output record. By default, the name of the output record will be the same as the name of the input record with _CONSENSUS appended. If prefix is supplied, then the name of the output record will be $(prefix)_CONSENSUS.\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#VariationInfo","page":"VariationInfo","title":"VariationInfo","text":"","category":"section"},{"location":"api/variationinfo/","page":"VariationInfo","title":"VariationInfo","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/variationinfo/","page":"VariationInfo","title":"VariationInfo","text":"Modules = [HapLink]\nPages = [\"variationinfo.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/variationinfo/#HapLink.VariationInfo","page":"VariationInfo","title":"HapLink.VariationInfo","text":"VariationInfo{S<:BioSequence,T<:BioSymbol}\n\nRepresents statistics associated with a SequenceVariation.Variation within an aligned sequencing read.\n\nFields\n\nvariation::Variation{S,T}: The Variation this object represents\nreadpos::Float64: The location where variation occurs within a sequencing read\nquality::Float64: The phred-scaled basecall quality of variation\nStrand::GenomicFeatures.Strand: Which strand variation appears on\n\n\n\n\n\n","category":"type"},{"location":"api/variationinfo/#HapLink.quality-Tuple{VariationInfo}","page":"VariationInfo","title":"HapLink.quality","text":"quality(vi::VariationInfo) -> Float64\n\nGets the phred-scaled basecall quality of variation(vi) within a sequencing read\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#HapLink.readpos-Tuple{VariationInfo}","page":"VariationInfo","title":"HapLink.readpos","text":"readpos(vi::VariationInfo) -> Float64\n\nGets the position of variation(vi) within a sequencing read\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#HapLink.strand-Tuple{VariationInfo}","page":"VariationInfo","title":"HapLink.strand","text":"strand(vi::VariationInfo) -> GenomicFeatures.Strand\n\nGets the strand that variation(vi) appears on\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#HapLink.variation-Tuple{VariationInfo}","page":"VariationInfo","title":"HapLink.variation","text":"variation(vi::VariationInfo) -> SequenceVariation.Variation\n\nGets the Variation of a VariationInfo\n\n\n\n\n\n","category":"method"},{"location":"api/variationinfo/#HapLink.variationinfos-Tuple{Union{XAM.BAM.Record, XAM.SAM.Record}, BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}}","page":"VariationInfo","title":"HapLink.variationinfos","text":"variationinfos(\n query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq\n) -> Vector{VariationInfo}\nvariationinfos(\n query::Union{AbstractString,AbstractPath},\n reference::Union{AbstractString,AbstractPath}\n) -> Vector{VariationInfo}\n\nCalls Variations based on the alignments in query against reference, and returns every variation call found within query as a Vector{VariationInfo}\n\n\n\n\n\n","category":"method"},{"location":"tutorial/1-install/#install-tutorial","page":"In the beginning (Installation)","title":"In the beginning","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"There are many different ways to install HapLink. Here we walk you through two of the most common. If you're one of the 0.01% who needs a different method, then we trust you can extrapolate from these instructions. Note that all of these tutorials assume you have a Unix-type system (MacOS, BSD, Linux). Windows command-line support is basically non-existant!","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Pages = [\"1-install.md\"]","category":"page"},{"location":"tutorial/1-install/#Bioconda","page":"In the beginning (Installation)","title":"Bioconda","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"We understand: every bioinformatian is addicted to miniconda. 🐍 And we're not here to judge. 👩‍⚖️ It's easy and portable and is bundled on most HPCs. If you already have conda (or mamba), then this route is probably for you.","category":"page"},{"location":"tutorial/1-install/#Install-HapLink-inside-a-conda-environment","page":"In the beginning (Installation)","title":"Install HapLink inside a conda environment","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"We'll make a new environment with the totally original name \"haplink,\" to house the new haplink install and add it directly.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"conda create -n haplink -c bioconda -c conda-forge haplink -y","category":"page"},{"location":"tutorial/1-install/#Activate-the-environment","page":"In the beginning (Installation)","title":"Activate the environment","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Now that we've made the environment, let's use it!","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"conda activate haplink","category":"page"},{"location":"tutorial/1-install/#Test-for-errors","page":"In the beginning (Installation)","title":"Test for errors","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Next, cross your fingers 🤞 and run the following command:","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"haplink --help","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Check for error messages, but otherwise you're done. You can reuse your haplink environment for the next tutorial.","category":"page"},{"location":"tutorial/1-install/#Comonicon","page":"In the beginning (Installation)","title":"Comonicon","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"HapLink is unashamedly a Julia program. If you already have Julia installed, then you can leverage that existing Julia install to install HapLink thanks to the power of Comonicon.jl.","category":"page"},{"location":"tutorial/1-install/#Check-your-Julia-version","page":"In the beginning (Installation)","title":"Check your Julia version","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"HapLink requires Julia v1.6 or later to run. No exceptions, Kemosabe. Run a check now to see if you have a high enough version.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"julia --version","category":"page"},{"location":"tutorial/1-install/#Add-HapLink-to-a-temporary-environment-and-install","page":"In the beginning (Installation)","title":"Add HapLink to a temporary environment and install","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Using the magic 🪄 of Julia's environments, we can do a \"temp install\" of the HapLink package to a temporary directory environment. Because this is a fresh install, though, it will trigger Comonicon to install the application to a brand-new isolated environment.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"julia \\\n --startup-file=no \\\n --history-file=no \\\n --quiet \\\n -e 'using Pkg; Pkg.activate(;temp=true); Pkg.add(HapLink)'","category":"page"},{"location":"tutorial/1-install/#Add-HapLink-to-PATH","page":"In the beginning (Installation)","title":"Add HapLink to PATH","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Comonicon will install HapLink, but chances are it won't be accessible yet. We need to add HapLink to the PATH, first.","category":"page"},{"location":"tutorial/1-install/#Bash","page":"In the beginning (Installation)","title":"Bash","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"echo 'export PATH=$HOME/.julia/bin:$PATH' >> $HOME/.bashrc\n. ~/.bashrc","category":"page"},{"location":"tutorial/1-install/#Zsh","page":"In the beginning (Installation)","title":"Zsh","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"echo 'export PATH=$HOME/.julia/bin:$PATH' >> $HOME/.zshrc\n. ~/.zshrc","category":"page"},{"location":"tutorial/1-install/#Testing","page":"In the beginning (Installation)","title":"Testing","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Now, run the haplink command.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"haplink --help","category":"page"},{"location":"tutorial/1-install/#Wait,-the-test-failed!","page":"In the beginning (Installation)","title":"Wait, the test failed!","text":"","category":"section"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Odds are good that you already had an install of HapLink somewhere in your Julia depot. If that happens, then you'll have to trigger the install manually. Just run","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"julia \\\n --startup-file=no \\\n --history-file=no \\\n --quiet \\\n -e 'using Pkg; Pkg.activate(;temp=true); Pkg.add(HapLink); using HapLink; HapLink.comonicon_install()'","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"then return to the Add HapLink to PATH step.","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"","category":"page"},{"location":"tutorial/1-install/","page":"In the beginning (Installation)","title":"In the beginning (Installation)","text":"Success! We now have a working installation of HapLink. You are now ready to move on to the next tutorial.","category":"page"},{"location":"api/variation/#Variation-Methods","page":"Variation Extensions","title":"Variation Methods","text":"","category":"section"},{"location":"api/variation/","page":"Variation Extensions","title":"Variation Extensions","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/variation/","page":"Variation Extensions","title":"Variation Extensions","text":"These methods extend the functionality of SequenceVariation.Variation for calculation of data related to Variations created from NGS read alignments.","category":"page"},{"location":"api/variation/","page":"Variation Extensions","title":"Variation Extensions","text":"Modules = [HapLink]\nPages = [\"variation.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/variation/#HapLink.quality-Tuple{SequenceVariation.Variation, Union{XAM.BAM.Record, XAM.SAM.Record}}","page":"Variation Extensions","title":"HapLink.quality","text":"quality(v::Variation, r::Union{SAM.Record,BAM.Record}) -> Float64\n\nGet the phred-scalled basecall quality of v within the sequencing read of r.\n\n\n\n\n\n","category":"method"},{"location":"api/variation/#HapLink.relativepos-Tuple{SequenceVariation.Variation, Union{XAM.BAM.Record, XAM.SAM.Record}}","page":"Variation Extensions","title":"HapLink.relativepos","text":"relativepos(v::Variation, r::Union{SAM.Record,BAM.Record})\n\nCalculates the fractional position of v within the sequence of r. If v is out-of-bounds of r, then will return 0 for positions before r and 1 for positions after r.\n\n\n\n\n\n","category":"method"},{"location":"api/variation/#HapLink.seqpos-Tuple{SequenceVariation.Variation, Union{BioAlignments.Alignment, BioAlignments.AlignedSequence, BioAlignments.PairwiseAlignment}}","page":"Variation Extensions","title":"HapLink.seqpos","text":"seqpos(v::Variation, a::Union{Alignment,AlignedSequence,PairwiseAlignment})\n\nGet the position of v in the sequence of a.\n\nExample\n\nusing BioAlignments, BioSequences, SequenceVariation\nv = Variation(dna\"AAAAA\", \"A3T\")\na = Alignment(\"2=1X2=\", 1, 1)\nseqpos(v, a)\n\n# output\n\n3\n\n\n\n\n\n","category":"method"},{"location":"api/variation/#HapLink.subconsensus_variations-Union{Tuple{T}, Tuple{S}, Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, SequenceVariation.Haplotype{S, T}}} where {S, T}","page":"Variation Extensions","title":"HapLink.subconsensus_variations","text":"subconsensus_variations(vcf::Union{AbstractPath,AbstractString}, consensus::Haplotype)\n\nGet a Vector{Variation} with passing variant calls from vcf that do not appear in consensus\n\n\n\n\n\n","category":"method"},{"location":"api/variation/#HapLink.variation-Tuple{VariantCallFormat.Record, BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}}","page":"Variation Extensions","title":"HapLink.variation","text":"variation(r::VCF.Record, refseq::NucleotideSeq)\n\nConstruct a Variation from r applying to refseq. There is no validation that r's actually describes a mutation in refseq.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#Private-API-Reference","page":"Private API Reference","title":"Private API Reference","text":"","category":"section"},{"location":"api/private/","page":"Private API Reference","title":"Private API Reference","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/private/","page":"Private API Reference","title":"Private API Reference","text":"Modules = [HapLink]\nPrivate = true\nPublic = false\nFilter = f -> f ∉ [\n HapLink.variants,\n HapLink.consensus,\n HapLink.haplotypes,\n HapLink.sequences,\n]","category":"page"},{"location":"api/private/#FASTX.FASTA.Record-Tuple{Pseudoread}","page":"Private API Reference","title":"FASTX.FASTA.Record","text":"FASTX.FASTA.Record(ps::Pseudoread; prefix::AbstractString=\"\", is_consensus::Bool=false)\n\nSpecialized constructor for outputting Pseudoreads to FASTA.Records that can be written to files.\n\nArguments\n\nps::Pseudoread: The modified sequence and read window to be converted into the sequence of the new record\n\nKeywords\n\nprefix::AbstractString=\"\": A string to start the sequence identifier with, usually based on the reference sequence of ps\nis_consensus::Bool=false: Normally, the new identifier of the record is a combination of the prefix and the SHA1 hash of the alternate sequence. Set is_consensus to true to instead use the word \"CONSENSUS\" in place of the hash\n\n\n\n\n\n","category":"method"},{"location":"api/private/#SequenceVariation.Haplotype-Tuple{Union{XAM.BAM.Record, XAM.SAM.Record}, BioSequences.BioSequence{<:BioSequences.NucleicAcidAlphabet}}","page":"Private API Reference","title":"SequenceVariation.Haplotype","text":"SequenceVariation.Haplotype(\n query::Union{SAM.Record,BAM.Record}, reference::NucleotideSeq\n) -> SequenceVariation.Haplotype\n\nSpecialized constructor that allows converting the alignment in a XAM.Record into a Haplotype.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._cigar-Union{Tuple{SequenceVariation.Variation{S, T}}, Tuple{T}, Tuple{S}} where {S, T}","page":"Private API Reference","title":"HapLink._cigar","text":"_cigar(var::Variation{S,T}) where {S,T}\n\nReturns a CIGAR operation for var. Only supports insertions and deletions.\n\nSee also _cigar_between\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._cigar_between-Union{Tuple{T}, Tuple{S}, Tuple{SequenceVariation.Variation{S, T}, SequenceVariation.Variation{S, T}}} where {S, T}","page":"Private API Reference","title":"HapLink._cigar_between","text":"_cigar_between(x::Variation{S,T}, y::Variation{S,T}) where {S,T}\n\nReturns a CIGAR operation for the (assumed) matching bases between x and y.\n\nSee also _cigar\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._indexed_reader-Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, GenomicFeatures.Interval}","page":"Private API Reference","title":"HapLink._indexed_reader","text":"_indexed_reader(bam::Union{AbstractPath,AbstractString}, int::Interval)\n\nProvides an iterator over records in bam that overlap with int. Note that if no index file is available for bam, then the return value will fall back to an iterator of all records in bam.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._lendiff-Tuple{Pseudoread}","page":"Private API Reference","title":"HapLink._lendiff","text":"_lendiff(ps::Pseudoread)\n\nGets the difference between the number of bases within the read window of ps on the reference sequence from the number of bases within the read window on the mutated sequence, taking into account insertions and deletions.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._phrederror-Tuple{Number}","page":"Private API Reference","title":"HapLink._phrederror","text":"_phrederror(quality::Number)\n\nConverts a PHRED33-scaled error number into the expected fractional error of basecall\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._pos_to_edge-Tuple{Number}","page":"Private API Reference","title":"HapLink._pos_to_edge","text":"_pos_to_edge(pos::Number)\n\nConverts pos from a number between 0 (beginning) and 1 (end) to a number ranging from 0 (beginning) to 1 (middle) to 0 (end), i.e. convert a relative position to a relative distance from the edge.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._posin-Tuple{Pseudoread, SequenceVariation.Variation}","page":"Private API Reference","title":"HapLink._posin","text":"_posin(ps::Pseudoread, v::Variation)\n\nDetermines if the positions that make up v are wholly contained by ps\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink._push_filter!","page":"Private API Reference","title":"HapLink._push_filter!","text":"_push_filter!(\n vc::VariationCall,\n label::Char,\n value::Union{Nothing,Number},\n filter::Function=(var, val) -> true,\n)\n\nAdds a FILTER entry to vc of the form \"$label$value\" if filter returns true.\n\nArguments\n\nvc::VariationCall: The VariationCall to annotate\nlabel::Char: The first character of the filter text to add\nvalue::Union{Nothing,Number}: The value to compare vc against, and to append to the filter text. If set to nothing, _push_filter! will return without evaluating or adding any filters, which may be useful for processing multiple inputs\nfilter::Function=(var, val) -> true: A function handle to determine if a filter should be applied or not. Note that this function must return true if and only if the filter should be added to vc. The function will be passed vc and value. Defaults to always applying filters regardless of the values of vc and value.\n\n\n\n\n\n","category":"function"},{"location":"api/private/#HapLink.comonicon_install-Tuple{}","page":"Private API Reference","title":"HapLink.comonicon_install","text":"comonicon_install(;kwargs...)\n\nInstall the CLI manually. This will use the default configuration in Comonicon.toml, if it exists. For more detailed reference, please refer to Comonicon documentation.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink.comonicon_install_path-Tuple{}","page":"Private API Reference","title":"HapLink.comonicon_install_path","text":"comonicon_install_path(;[yes=false])\n\nInstall the PATH and FPATH to your shell configuration file. You can use comonicon_install_path(;yes=true) to skip interactive prompt. For more detailed reference, please refer to Comonicon documentation.\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink.magnitude-Tuple{UnitRange}","page":"Private API Reference","title":"HapLink.magnitude","text":"magnitude(x::UnitRange)\n\nGets the difference between the last and first elements of x\n\nExample\n\njulia> using HapLink: magnitude\n\njulia> magnitude(0:10)\n10\n\n\n\n\n\n\n","category":"method"},{"location":"api/private/#HapLink.overlap-Union{Tuple{S}, Tuple{UnitRange{S}, UnitRange{S}}} where S","page":"Private API Reference","title":"HapLink.overlap","text":"overlap(x::UnitRange{S}, y::UnitRange{S}) where {S}\n\nFinds the inclusive overlap interval of x and y\n\nExample\n\njulia> using HapLink: overlap\n\njulia> overlap(1:5, 3:10)\n3:5\n\njulia> overlap(2:4, 6:8)\n6:5\n\njulia> overlap(1:10, 2:9)\n2:9\n\n\n\n\n\n","category":"method"},{"location":"api/private/#SequenceVariation.variations-Tuple{AbstractVector{<:SequenceVariation.Haplotype}}","page":"Private API Reference","title":"SequenceVariation.variations","text":"variations(vs::AbstractVector{Haplotype})\n\nExtracts all SequenceVariation.Variations from vs.\n\n\n\n\n\n","category":"method"},{"location":"api/private/","page":"Private API Reference","title":"Private API Reference","text":"findset","category":"page"},{"location":"api/private/#HapLink.findset","page":"Private API Reference","title":"HapLink.findset","text":"findset(lst::AbstractArray{T}, comparator::Function) where {T}\n\nFinds every possible set of items in lst where comparator returns true for the set.\n\nExample\n\nfunction divisible_by_same_number(x, i)\n for j in 2:i\n if all(y -> y % j == 0, x)\n return true\n end\n end\n return false\nend\nfindset(1:12, x -> divisible_by_same_number(x, 5))\n\n# output\n6-element Vector{Vector{Int64}}:\n [2, 4, 6, 8, 10, 12]\n [6, 3, 9, 12]\n [5, 10]\n [1]\n [7]\n [11]\n\n\n\n\n\n","category":"function"},{"location":"api/variationpileup/#VariationPileup","page":"VariationPileup","title":"VariationPileup","text":"","category":"section"},{"location":"api/variationpileup/","page":"VariationPileup","title":"VariationPileup","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"api/variationpileup/","page":"VariationPileup","title":"VariationPileup","text":"Modules = [HapLink]\nPages = [\"variationpileup.jl\"]\nPublic = true\nPrivate = false","category":"page"},{"location":"api/variationpileup/#HapLink.VariationPileup","page":"VariationPileup","title":"HapLink.VariationPileup","text":"VariationPileup\n\nSummarizes the basecalls that support a Variation within a set of alignments.\n\nFields\n\nvariation::Variation: The Variation of this pileup entry\ndepth::UInt: The number of times the position of variation appears in this set of alignments\nreadpos::Vector{Float64}: The relative positions of variation within each read\nquality::Vector{Float64}: The phred quality of variation within each read\nstrand::Vector{Strand}: Which strand each variation is found on\n\n\n\n\n\n","category":"type"},{"location":"api/variationpileup/#HapLink.altdepth-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.altdepth","text":"altdepth(vp::VariationPileup)\n\nGets the number of times vp.variation appears\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.depth-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.depth","text":"depth(vp::VariationPileup)\n\nGets the number of times the position of vp.variation appears total (variant and wild-type)\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.frequency-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.frequency","text":"frequency(vp::VariationPileup)\n\nGets the alternate allele frequency of vp.variation\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.pileup-Tuple{Union{AbstractString, FilePathsBase.AbstractPath}, Union{AbstractString, FilePathsBase.AbstractPath}}","page":"VariationPileup","title":"HapLink.pileup","text":"pileup(sam::AbstractPath, ref::AbstractPath) -> Vector{VariationPileup}\n\nGenerates a pileup of Variations based on the alignments in sam aligned against ref.\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.quality-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.quality","text":"quality(vp::VariationPileup)\n\nGets the phred qualities of vp.variation\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.readpos-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.readpos","text":"readpos(vp::VariationPileup)\n\nGets the relative positions of vp.variation\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.strand-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.strand","text":"strand(vp::VariationPileup)\n\nGets the strands of vp.variation\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.strand_bias-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.strand_bias","text":"strand_bias(vp::VariationPileup)\n\nGets the frequency of positive strands that variation appears on relative to all variation reads\n\n\n\n\n\n","category":"method"},{"location":"api/variationpileup/#HapLink.variation-Tuple{VariationPileup}","page":"VariationPileup","title":"HapLink.variation","text":"variation(vp::VariationPileup)\n\nGets the Variation of vp\n\n\n\n\n\n","category":"method"},{"location":"tutorial/3-other/#integration-tutorial","page":"Plays well with others (External tools)","title":"Playing well with others","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"HapLink is not a one-man show: it definitely knows how to cooperate with other tools! In this tutorial, we'll let HapLink do the haplotype calling, but use other tools to go from reads to variant calls, and from haplotypes to phylogenies.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Pages = [\"3-other.md\"]","category":"page"},{"location":"tutorial/3-other/#Install-the-other-tools","page":"Plays well with others (External tools)","title":"Install the other tools","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Before we get too far, let's make sure that we actually have all of the tools that we need. Let's create a brand new conda environment.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"conda create \\\n -n haplink-tutorial-3 \\\n -c bioconda \\\n -c conda-forge \\\n wget=1.20 \\\n sra-tools=3.0 \\\n entrez-direct=16.2 \\\n samtools=1.17 \\\n minimap2=2.26 \\\n lofreq=2.1 \\\n haplink \\\n mafft=7.520 \\\n raxml-ng=1.2 \\\n -y\nconda activate haplink-tutorial-3","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nNone","category":"page"},{"location":"tutorial/3-other/#Get-sample-data","page":"Plays well with others (External tools)","title":"Get sample data","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"We'll pull some of the validation data data from NCBI. First, the reference genome from GenBank.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"esearch \\\n -db nucleotide \\\n -query \"NC_036618.1\" \\\n | efetch \\\n -format fasta \\\n > idv4.fasta","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Next, we'll download one of the pools from the validation set from SRA.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"fasterq-dump \"SUB13489216\"","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nidv4.fasta\nIDV-Aug2022-P2.fastq.gz","category":"page"},{"location":"tutorial/3-other/#Align-sequences","page":"Plays well with others (External tools)","title":"Align sequences","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"We have a set of Nanopore reads and a reference genome to go with them. We'll use minimap2 to align the reads to reference. minimap2 requires the -a flag to output in SAM format, and uses the -x flag to tweak the settings for optimal Nanoore alignment. We then run those reads through samtools sort and samtools index to reduce the computational load needed to find reads by our downstream tools, and samtools view -b to convert the SAM file into a compressed BAM file.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"minimap2 \\\n -t $(($(nproc)/2)) \\\n -ax map-pb \\\n idv4.fasta \\\n IDV-Aug2022-P2.fastq.gz \\\n | samtools sort \\\n | samtools view \\\n -@ $(($(nproc)/2)) \\\n -b \\\n -h \\\n > IDV-Aug2022-P2.bam\nsamtools index IDV-Aug2022-P2.bam","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nIDV-Aug2022-P2.bam\nIDV-Aug2022-P2.bam.bai","category":"page"},{"location":"tutorial/3-other/#Call-variants","page":"Plays well with others (External tools)","title":"Call variants","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"People are picky about their variant callers. Even viralrecon can't decide which caller to use. If you don't like HapLink's built-in variant caller, that's fine! Here, we'll use LoFreq (another fine variant caller) to decide for us whether nucleotide variations are variants, or just randomness. The only requirement is that the output must be in VCF format (looking at you, iVar) for HapLink to read it.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"lofreq faidx idv4.fasta\nlofreq call-parallel \\\n --pp-threads $(($(nproc)/2)) \\\n --call-indels \\\n --ref idv4.fasta \\\n IDV-Aug2022-P2.bam \\\n --out IDV-Aug2022-P2.vcf","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nIDV-Aug2022-P2.vcf","category":"page"},{"location":"tutorial/3-other/#Call-haplotypes","page":"Plays well with others (External tools)","title":"Call haplotypes","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"And we're back... We now have each piece (reference, alignment, variant calls) needed for HapLink to find haplotypes.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"export JULIA_NUM_THREADS=$(($(nproc)/2))\nhaplink haplotypes \\\n idv4.fasta \\\n IDV-Aug2022-P2.vcf \\\n IDV-Aug2022-P2.bam \\\n > IDV-Aug2022-P2.yml\nhaplink sequences \\\n idv4.fasta \\\n IDV-Aug2022-P2.yml \\\n > IDV-Aug2022-P2.fasta","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nIDV-Aug2022-P2.yml\nIDV-Aug2022-P2.fasta","category":"page"},{"location":"tutorial/3-other/#Alignment-(again,-but-on-a-bigger-scale)","page":"Plays well with others (External tools)","title":"Alignment (again, but on a bigger scale)","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Now that we have the sequences of every haplotype that HapLink found, we can go farther and see how they compare altogether. We'll use MAFFT to generate a multiple sequence alignment of every haplotype. By comparing everything to everything, we can start to see patterns that may have been masked by comparing to the reference genome.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"cat idv4.fasta IDV-Aug2022-P2.fasta > sequences.fasta\nmafft \\\n --thread $(($(nproc)/2)) \\\n --auto \\\n sequences.fasta \\\n > haplotypes.fas","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nsequences.fasta\nhaplotypes.fas","category":"page"},{"location":"tutorial/3-other/#Phylogeny","page":"Plays well with others (External tools)","title":"Phylogeny","text":"","category":"section"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Although the human brain is reasonably good at picking up pattern differences, we can mathematically determine the similarities between sequences using a phylogeny. We'll use RAxML-NG to solve the phylogeny for us. Note that RAxML-NG produces a lot of output.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"raxml-ng --all \\\n --threads auto{MAX} \\\n --msa haplotypes.fas \\\n --model GTR+G","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"info: Output\nhaplotypes.fas\nhaplotypes.fas.raxml.bestModel\nhaplotypes.fas.raxml.bestTree\nhaplotypes.fas.raxml.bestTreeCollapsed\nhaplotypes.fas.raxml.bootstraps\nhaplotypes.fas.raxml.log\nhaplotypes.fas.raxml.mlTrees\nhaplotypes.fas.raxml.rba\nhaplotypes.fas.raxml.startTree\nhaplotypes.fas.raxml.support","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"You can now import the haplotypes.fas.raxml.bestTree file into a phylogeny viewer to view the tree. A personal favorite around our lab is iTOL, but many others will work.","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"(Image: iTOL tree)","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"","category":"page"},{"location":"tutorial/3-other/","page":"Plays well with others (External tools)","title":"Plays well with others (External tools)","text":"Now you can do some heavy lifting with HapLink and its friends! You can leave the tutorials and check out the CLI reference, for an in-depth examination of everything you just learned, or dive into the next tutorial to learn how to call haplotypes without leaving the Julia REPL.","category":"page"},{"location":"cli/variants/#variants-cli","page":"haplink variants","title":"haplink variants","text":"","category":"section"},{"location":"cli/variants/","page":"haplink variants","title":"haplink variants","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"cli/variants/","page":"haplink variants","title":"haplink variants","text":"HapLink.variants","category":"page"},{"location":"cli/variants/#HapLink.variants","page":"haplink variants","title":"HapLink.variants","text":"haplink variants [options] reference bam\n\nCall variants\n\nIntroduction\n\nDecides which variations found within an alignment are real, and which are due to random chance. HapLink uses Fisher's Exact Test to determine the statistical significance of sequence variations, and optionally allows for other thresholds to reduce random noise in the variant calling. Outputs a Variant Call Format (VCF) file compliant with VCF v4.\n\nArguments\n\nreference: path to the reference genome to call variants against in fasta format. Must not be gzipped, but does not need to be indexed (have a sidecar fai file). HapLink only supports single-segment reference genomes: if reference includes more than one sequence, all but the first will be ignored.\nbam: alignment file to call variants from. Can be in SAM or BAM format, and does not need to be sorted or indexed, but variant calling speed will increase significantly if using a sorted and indexed (has a sidebar bai file) BAM file.\n\nOptions\n\n--outfile=: The file to write variant calls to. If left blank, variant calls are written to standard output.\n--significance=: The alpha value for statistical significance of variant calls.\n--depth=: Minimum number of times the variation must be observed within the alignment to be called a variant\n--quality=: The minimum average basecall quality score for a variation to be called a variant\n--frequency=: The minimum proportion of reads that the variation must be observed within compared to all reads covering its position for that variation to be called a variant\n--position=: The distance (as a percentage) from the edge of reads that a variation must be observed at to be called a variant\n--strandedness=: The maximum proportion of times that a variation can be observed on one strand versus the other to be called a variant. This metric is totally useless on single-stranded sequencing protocols like Oxford Nanopore, but can be useful for combining data between stranded protocols like most Illumina and Pacific Bio.\n\n\n\n\n\n","category":"function"},{"location":"tutorial/4-repl/#repl-tutorial","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Julia is an ahead-of-time compiled language. Practically, that means that every time you restart Julia, you have to recompile all the code you were running. Using HapLink on the command line involves up to four different commands. Translation: up to four cases where you lose time to recompiling code that was just running. Surely there's a better way, right? Yep, you can stay within a single Julia session by using HapLink's REPL mode.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Pages = [\"4-repl.md\"]","category":"page"},{"location":"tutorial/4-repl/#Installing-HapLink-...-into-a-project","page":"For advanced beginners (REPL mode)","title":"Installing HapLink ... into a project","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Wait, isn't that what the first tutorial was about? Yes and no. Those projects installed HapLink's command line interface to a global environment. We're going to locally install HapLink as a library into a local package environment.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"tip: Tip\nIf you want to learn more about package environments, check out the official Julia environments documentation or Julius Krumbiegel's tutorial.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Press ] to enter Pkg mode, then follow along with these prompts.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"(@v1.6) pkg> activate .\n Activating project at `~/haplink-tutorial-4`\n\n(haplink-tutorial-4) pkg> add HapLink","category":"page"},{"location":"tutorial/4-repl/#Finding-the-example-files","page":"For advanced beginners (REPL mode)","title":"Finding the example files","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Instead of downloading the example files from the internet, we can reference them directly from the package.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"using HapLink\nconst PACKAGE_DIR = dirname(dirname(pathof(HapLink)))\nconst EXAMPLE_DIR = joinpath(PACKAGE_DIR, \"example\")\nconst EXAMPLE_BAM = joinpath(EXAMPLE_DIR, \"sample.bam\")\nconst EXAMPLE_REFERENCE = joinpath(EXAMPLE_DIR, \"reference.fasta\")","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Perfect!","category":"page"},{"location":"tutorial/4-repl/#Pileup-time","page":"For advanced beginners (REPL mode)","title":"Pileup time","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Variants can't exist as a single unit, they have to be supported by a set of reads. We can view those reads in a pileup, which can easily be made by using the pileup function.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"sample_pileup = pileup(EXAMPLE_BAM, EXAMPLE_REFERENCE)","category":"page"},{"location":"tutorial/4-repl/#But-is-it-real?","page":"For advanced beginners (REPL mode)","title":"But is it real?","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Pileups tell us what the sequences are saying, but they can't tell us if there are any real variants. We can use the call_variant function to filter a pileup into a set of variants.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"all_variant_calls = VariationCall[]\nfor pile in sample_pileup\n push!(all_variant_calls, call_variant(pile, 0.5; D=0x02, Q=10.0, X=0.01, F=0.05))\nend\nall_variant_calls","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We can now remove all variants that did not pass our filters.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"filter!(var -> filters(var) == [\"PASS\"], all_variant_calls)","category":"page"},{"location":"tutorial/4-repl/#Rule-of-the-majority","page":"For advanced beginners (REPL mode)","title":"Rule of the majority","text":"","category":"section"},{"location":"tutorial/4-repl/#Filtering-variants","page":"For advanced beginners (REPL mode)","title":"Filtering variants","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We can sort into consensus and subconsensus variants. Note that for this fake set, consensus variants will be taken at a >75% frequency.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"consensus_variant_calls = filter(var -> frequency(var) > 0.75, all_variant_calls)\nsubconsensus_variant_calls = filter(\n var -> !(var in consensus_variant_calls), all_variant_calls\n)\nconsensus_variations = variation.(consensus_variant_calls)\nsubconsensus_variations = variation.(subconsensus_variant_calls)","category":"page"},{"location":"tutorial/4-repl/#Creating-the-consensus-sequence","page":"For advanced beginners (REPL mode)","title":"Creating the consensus sequence","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We will now need to convert the consensus variations into a consensus sequence. HapLink doesn't have these tools in-and-of itself, so we'll have to reach into its dependency packages.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"(haplink-tutorial-4) pkg> add BioAlignments SequenceVariation\n Resolving package versions...\n Updating `~/haplink-tutorial-4/Project.toml`\n [00701ae9] + BioAlignments v3.1.0\n [eef6e190] + SequenceVariation v0.2.2\n No Changes to `~/haplink-tutorial-4/Manifest.toml`","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"using BioAlignments, SequenceVariation\nreference_sequence = reference(first(consensus_variations))\nconsensus_haplotype = Haplotype(reference_sequence, consensus_variations)\nconsensus_sequence = reconstruct(consensus_haplotype)\nconsensus_alignment = PairwiseAlignment(\n AlignedSequence(consensus_sequence, Alignment(HapLink.cigar(consensus_haplotype))),\n reference_sequence,\n)","category":"page"},{"location":"tutorial/4-repl/#Minority-report","page":"For advanced beginners (REPL mode)","title":"Minority report","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We are interested in looking at the subconsensus variants, but they need to be reoriented to be put in terms of the consensus sequence.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"map!(\n var -> translate(var, consensus_alignment),\n subconsensus_variations,\n subconsensus_variations,\n)","category":"page"},{"location":"tutorial/4-repl/#Bringing-in-the-reads","page":"For advanced beginners (REPL mode)","title":"Bringing in the reads","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Now that we have a consensus sequence, we can properly import the reads for haplotype calling into HapLink's specialized Pseudoread class. There is a convient pseudoreads function that can directly convert a BAM file for us.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"haplotyping_pseudoreads = pseudoreads(EXAMPLE_BAM, consensus_sequence)","category":"page"},{"location":"tutorial/4-repl/#Cutting-the-chaff","page":"For advanced beginners (REPL mode)","title":"Cutting the chaff","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"We only need reads that span every one of our subconsensus variants, so let's get rid of every other read.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"(haplink-tutorial-4) pkg> add BioGenerics\n Resolving package versions...\n Updating `~/haplink-tutorial-4/Project.toml`\n [47718e42] + BioGenerics v0.1.2\n No Changes to `~/haplink-tutorial-4/Manifest.toml`","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"using BioGenerics\nfilter!(\n var -> leftposition(var) >= leftposition(first(subconsensus_variations)),\n haplotyping_pseudoreads,\n)\nfilter!(\n var -> rightposition(var) >= rightposition(first(subconsensus_variations)),\n haplotyping_pseudoreads,\n)","category":"page"},{"location":"tutorial/4-repl/#Finding-haplotypes...","page":"For advanced beginners (REPL mode)","title":"Finding haplotypes...","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"This one is a bit of a mind-bender, but bear with me.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"haplotyping_haploypes = haplotype.(haplotyping_pseudoreads)\nvalid_haplotypes = findset(\n subconsensus_variations, hap -> ishaplotype(hap, haplotyping_haploypes)\n)","category":"page"},{"location":"tutorial/4-repl/#...and-deciding-if-they're-real","page":"For advanced beginners (REPL mode)","title":"...and deciding if they're real","text":"","category":"section"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Just because a haplotype could exist, doesn't mean it should. You can check the statistics of a haplotype by converting it into a HaplotypeCall.","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"map(hap -> HaplotypeCall(hap, haplotyping_haploypes), valid_haplotypes);\nhc = first(ans)\ndepth(hc)\nlinkage(hc)\nsignificance(hc)","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"Nope! This one sure isn't!","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"","category":"page"},{"location":"tutorial/4-repl/","page":"For advanced beginners (REPL mode)","title":"For advanced beginners (REPL mode)","text":"This tutorial only scratches the surface of what's possible with HapLink's REPL mode. Take a look at the full API reference for more details on how to tweak your haplotype calling. Have fun!","category":"page"},{"location":"#HapLink","page":"Home","title":"HapLink","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = HapLink\nDocTestSetup = quote\n using HapLink\nend","category":"page"},{"location":"","page":"Home","title":"Home","text":"HapLink","category":"page"},{"location":"#HapLink.HapLink","page":"Home","title":"HapLink.HapLink","text":"HapLink\n\nViral haplotype calling via linkage disequilibrium\n\n\n\n\n\n","category":"module"},{"location":"#Welcome","page":"Home","title":"Welcome","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Howdy! 🤠 And welcome to HapLink! 👋 HapLink is a command-line suite of tools to enable the exploration of viral quasispecies within a single metagenomic sample. Every piece eventually builds up to our viral haplotype caller, which uses linkage disequilibrium on long sequencing reads (💡 think Oxford Nanopore or PacBio HiFi) to identify genetic mutations that are conserved within a single virus particle.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This manual will cover the different ways of using HapLink, starting with a few tutorials before diving into the details of our reference section.","category":"page"},{"location":"#Contents","page":"Home","title":"Contents","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"#Getting-started","page":"Home","title":"Getting started","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Ready to dive in? 🤿 Here's a 30,000-foot view","category":"page"},{"location":"","page":"Home","title":"Home","text":"curl -fsSL https://install.julialang.org | sh -s -- --yes\nsource ~/.bashrc\njulia \\\n --startup-file=no \\\n --history-file=no \\\n --quiet \\\n -e 'using Pkg; Pkg.activate(;temp=true); Pkg.add(HapLink)'\necho 'export PATH=$HOME/.julia/bin:$PATH' >> $HOME/.bashrc\nsource ~/.bashrc\n\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/reference.fasta\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/sample.bam\nwget https://github.com/ksumngs/HapLink.jl/raw/v1.0.0-rc1/example/sample.bam.bai\n\nhaplink variants \\\n reference.fasta \\\n sample.bam \\\n --significance 0.5 \\\n --depth 1 \\\n --quality 10.0 \\\n --position 0.01 \\\n --frequency 0.05 \\\n | tee sample.vcf\n\nhaplink consensus \\\n reference.fasta \\\n sample.vcf \\\n | tee consensus.fasta\n\nhaplink haplotypes \\\n reference.fasta \\\n sample.vcf \\\n sample.bam \\\n --frequency 0.75 \\\n | tee sample.yml\n\nhaplink sequences \\\n reference.fasta \\\n sample.yml \\\n | tee sample.fasta","category":"page"}] } diff --git a/dev/tutorial/1-install/index.html b/dev/tutorial/1-install/index.html index cc0ff2e..e6c3f99 100644 --- a/dev/tutorial/1-install/index.html +++ b/dev/tutorial/1-install/index.html @@ -9,4 +9,4 @@ --startup-file=no \ --history-file=no \ --quiet \ - -e 'using Pkg; Pkg.activate(;temp=true); Pkg.add(HapLink); using HapLink; HapLink.comonicon_install()'

then return to the Add HapLink to PATH step.

Success! We now have a working installation of HapLink. You are now ready to move on to the next tutorial.

+ -e 'using Pkg; Pkg.activate(;temp=true); Pkg.add(HapLink); using HapLink; HapLink.comonicon_install()'

then return to the Add HapLink to PATH step.

Success! We now have a working installation of HapLink. You are now ready to move on to the next tutorial.

diff --git a/dev/tutorial/2-examples/index.html b/dev/tutorial/2-examples/index.html index e2a467b..7b354df 100644 --- a/dev/tutorial/2-examples/index.html +++ b/dev/tutorial/2-examples/index.html @@ -23,4 +23,4 @@ --iterations 100 \ --overlap-min 0 \ --overlap-max 100 \ - | tee sample.ml.yml
Output
  • sample.ml.yml

Still nothing, huh? Like I said, no haplotypes here, and simulation can't change that. Note that simulating full-length reads used a lot more computational power, so you should try to stick with full-length reads when you can!

But, what does it mean?

HapLink's haplotype YAML files contain everything needed to recreate the haplotype computation, but they can't really be used by any other programs. That's why there's the sequences command, so haplotype sequences can be saved into FASTA format for use by other tools. Let's try this now.

haplink sequences reference.fasta sample.yml > sample.fasta
Output
  • sample.fasta

The output only contains the consensus sequence. Since we knew that there were no haplotypes in this sample we could have used haplink consensus, instead to get the same result.

You are now ready to move on to the next tutorial!

+ | tee sample.ml.yml
Output
  • sample.ml.yml

Still nothing, huh? Like I said, no haplotypes here, and simulation can't change that. Note that simulating full-length reads used a lot more computational power, so you should try to stick with full-length reads when you can!

But, what does it mean?

HapLink's haplotype YAML files contain everything needed to recreate the haplotype computation, but they can't really be used by any other programs. That's why there's the sequences command, so haplotype sequences can be saved into FASTA format for use by other tools. Let's try this now.

haplink sequences reference.fasta sample.yml > sample.fasta
Output
  • sample.fasta

The output only contains the consensus sequence. Since we knew that there were no haplotypes in this sample we could have used haplink consensus, instead to get the same result.

You are now ready to move on to the next tutorial!

diff --git a/dev/tutorial/3-other/index.html b/dev/tutorial/3-other/index.html index fb56d0c..2441b8b 100644 --- a/dev/tutorial/3-other/index.html +++ b/dev/tutorial/3-other/index.html @@ -52,4 +52,4 @@ > haplotypes.fas
Output
  • sequences.fasta
  • haplotypes.fas

Phylogeny

Although the human brain is reasonably good at picking up pattern differences, we can mathematically determine the similarities between sequences using a phylogeny. We'll use RAxML-NG to solve the phylogeny for us. Note that RAxML-NG produces a lot of output.

raxml-ng --all \
   --threads auto{MAX} \
   --msa haplotypes.fas \
-  --model GTR+G
Output
  • haplotypes.fas
  • haplotypes.fas.raxml.bestModel
  • haplotypes.fas.raxml.bestTree
  • haplotypes.fas.raxml.bestTreeCollapsed
  • haplotypes.fas.raxml.bootstraps
  • haplotypes.fas.raxml.log
  • haplotypes.fas.raxml.mlTrees
  • haplotypes.fas.raxml.rba
  • haplotypes.fas.raxml.startTree
  • haplotypes.fas.raxml.support

You can now import the haplotypes.fas.raxml.bestTree file into a phylogeny viewer to view the tree. A personal favorite around our lab is iTOL, but many others will work.

iTOL tree

Now you can do some heavy lifting with HapLink and its friends! You can leave the tutorials and check out the CLI reference, for an in-depth examination of everything you just learned, or dive into the next tutorial to learn how to call haplotypes without leaving the Julia REPL.

+ --model GTR+G
Output
  • haplotypes.fas
  • haplotypes.fas.raxml.bestModel
  • haplotypes.fas.raxml.bestTree
  • haplotypes.fas.raxml.bestTreeCollapsed
  • haplotypes.fas.raxml.bootstraps
  • haplotypes.fas.raxml.log
  • haplotypes.fas.raxml.mlTrees
  • haplotypes.fas.raxml.rba
  • haplotypes.fas.raxml.startTree
  • haplotypes.fas.raxml.support

You can now import the haplotypes.fas.raxml.bestTree file into a phylogeny viewer to view the tree. A personal favorite around our lab is iTOL, but many others will work.

iTOL tree

Now you can do some heavy lifting with HapLink and its friends! You can leave the tutorials and check out the CLI reference, for an in-depth examination of everything you just learned, or dive into the next tutorial to learn how to call haplotypes without leaving the Julia REPL.

diff --git a/dev/tutorial/4-repl/index.html b/dev/tutorial/4-repl/index.html index 2ade99d..43ef745 100644 --- a/dev/tutorial/4-repl/index.html +++ b/dev/tutorial/4-repl/index.html @@ -211,4 +211,4 @@ T29A A34T T48A - C64G)
julia> depth(hc)0x0000000000000000
julia> linkage(hc)0.0
julia> significance(hc)NaN

Nope! This one sure isn't!

This tutorial only scratches the surface of what's possible with HapLink's REPL mode. Take a look at the full API reference for more details on how to tweak your haplotype calling. Have fun!

+ C64G)
julia> depth(hc)0x0000000000000000
julia> linkage(hc)0.0
julia> significance(hc)NaN

Nope! This one sure isn't!

This tutorial only scratches the surface of what's possible with HapLink's REPL mode. Take a look at the full API reference for more details on how to tweak your haplotype calling. Have fun!