Reference Flow Experiments

The reference flow method utilizes population genomic data to enhance alignment accuracy and reduce reference bias with high computational efficiency. We first aligned reads to a major-allele reference genome based on the 1000 Genomes Project GRCh38 call set. We assigned unmapped and ambiguous reads determined by mapping quality threshold to the "deferred" group. These reads are re-aligned using a set of population genomes based on "superpopulation" labels in the 1000 Genomes Project. We finally merged all the reads into an unified SAM output, which is based on the coordinate system of GRCh38.

Snakemake

The reference flow method is built based on Snakemake for efficient and scalable computing. The GRCh37 and GRCh38 pipelines are put under grch37 and grch38 directories, respectively. Users may modify the *.yaml files based on the environment and run snakemake -np to see if configurations are set correctly.

Finally, run

snakemake -j 32

to start the pipeline. Option -j specifies the number of threads used. In this example, 32 threads are used.

Name		Name	Last commit message	Last commit date
Latest commit History 556 Commits
experiments		experiments
grch37		grch37
grch38		grch38
reference_flow @ 5c3cc19		reference_flow @ 5c3cc19
scripts		scripts
snakemake		snakemake
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reference Flow Experiments

Snakemake

About

Releases 2

Packages

Contributors 3

Languages

License

langmead-lab/reference_flow-experiments

Folders and files

Latest commit

History

Repository files navigation

Reference Flow Experiments

Snakemake

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages