Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize snarls #2336

Open
wants to merge 83 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
f8b2560
Merge pull request #1 from vgteam/master
Robin-Rounthwaite Jan 28, 2019
b619a0f
partially implemented command-line interface
Robin-Rounthwaite Feb 22, 2019
39ed50a
added clean_all_snarls fxn to src/algorithms/0_demo_final_0 files.
Robin-Rounthwaite Mar 5, 2019
ed0f2b6
Merge branch 'master' of https://github.com/Robin-Rounthwaite/vg
Robin-Rounthwaite Mar 5, 2019
d12ad02
updating src/subcommand/mod_main.cpp
Robin-Rounthwaite Mar 5, 2019
3dca93e
update msa_converter to have seqan format
Robin-Rounthwaite Mar 5, 2019
e8207f7
Changes to be committed:
Robin-Rounthwaite Mar 6, 2019
344bcf6
fixed bug in msa converter
Robin-Rounthwaite Mar 7, 2019
8271815
Merge branch 'master' of https://github.com/Robin-Rounthwaite/vg
Robin-Rounthwaite Mar 7, 2019
77365c1
Changes to be committed:
Robin-Rounthwaite Apr 3, 2019
938ad38
testing gbwt helper
Robin-Rounthwaite Apr 3, 2019
9ce6fc8
Merge remote-tracking branch 'upstream/master'
Robin-Rounthwaite Apr 3, 2019
6a96a1d
updating haplotypes_to_strings
Robin-Rounthwaite Apr 8, 2019
1bafef6
update haplotype_to_strings
Robin-Rounthwaite Apr 8, 2019
4917a3b
new haplotype path oriented approach to aligning graphs, with old edg…
Robin-Rounthwaite Jun 13, 2019
745597f
Merge remote-tracking branch 'upstream/master' into normalize_snarls
Robin-Rounthwaite Jun 13, 2019
14f52d5
Fixing makefile merge issue
Robin-Rounthwaite Jun 13, 2019
f4506bd
Makefile edits
Robin-Rounthwaite Jun 13, 2019
435438d
Added normalize snarl argument `vg normalize`.
Robin-Rounthwaite Jul 1, 2019
488df51
jemalloc compilation issue fixed.
Robin-Rounthwaite Jul 1, 2019
4477f94
Merge remote-tracking branch 'upstream/master' into normalize_snarls to
Robin-Rounthwaite Jul 1, 2019
3cddb6c
reverted mod_main.cpp (hackily) and continued to edit 0_draft_haploty…
Robin-Rounthwaite Jul 1, 2019
5dfb8f3
embedded paths now move to new snarls successfully.
Robin-Rounthwaite Jul 5, 2019
ecc47c4
Merge branch 'normalize_snarls' of https://github.com/Robin-Rounthwai…
Robin-Rounthwaite Jul 5, 2019
866275c
Merge remote-tracking branch 'upstream/master' into normalize_snarls
Jul 9, 2019
d92fa39
shell script update
Jul 9, 2019
b0b8fa9
syncing updates
Jul 9, 2019
0dfca26
update shell
Robin-Rounthwaite Jul 9, 2019
82d6e53
added normalize arguments -g and -s
Jul 9, 2019
a3931b6
added arguments to normalize main, further updates
Jul 10, 2019
5f3f955
checking code runs on local machine
Robin-Rounthwaite Jul 10, 2019
1f74351
shell update
Jul 10, 2019
b2549bd
fixing issues bash, also debugging arguments in normalize main
Robin-Rounthwaite Jul 10, 2019
422e42d
bash commands for normalize smaller graph works
Jul 10, 2019
3a86297
made a subset of chromosome 10 for debugging
Jul 10, 2019
3492758
local machine run subsetted chr10 commands added
Robin-Rounthwaite Jul 11, 2019
a0b4bf0
bash debug find command
Jul 11, 2019
e2bfeb1
bash run on full chromosome on local machine
Jul 11, 2019
6ef96ad
bash update
Robin-Rounthwaite Jul 12, 2019
9f39f44
normalize snarls now runs on full hg38 thousand genomes chr10 graph. …
Robin-Rounthwaite Aug 7, 2019
537ef1e
extract_gbwt_haplotypes now catches incorrect gbwt connections betwee…
Robin-Rounthwaite Aug 9, 2019
4186794
removed 200 thread limit (to see how long it takes to normalize all s…
Robin-Rounthwaite Aug 9, 2019
56445fa
debugged timing code
Robin-Rounthwaite Aug 9, 2019
badf5f7
added shebang
Robin-Rounthwaite Aug 9, 2019
808051f
Merge remote-tracking branch 'upstream/master' into normalize_snarls
Robin-Rounthwaite Aug 9, 2019
923d46b
Merge remote-tracking branch 'upstream/master' into normalize_snarls
Robin-Rounthwaite Aug 12, 2019
aa71663
object-orientified the snarl normalizer, also renamed cpp/hpp files
Robin-Rounthwaite Sep 27, 2019
941f8b4
regular update of vg
Robin-Rounthwaite Sep 27, 2019
5790282
misc. updates while testing data for normalize. Fixed a bug with vpkg…
Robin-Rounthwaite Nov 4, 2019
36ae967
normalize update
Robin-Rounthwaite Nov 4, 2019
ce8478d
Merge remote-tracking branch 'upstream/master' into normalize_snarls
Robin-Rounthwaite Nov 4, 2019
f13be40
made a simple vpkg wrap function for old vg files.
Robin-Rounthwaite Nov 5, 2019
92bf363
old snarl normalizer updates
Robin-Rounthwaite Nov 2, 2020
6504c0b
update from origin/master
Robin-Rounthwaite Nov 3, 2020
1c2a1dd
resolving merge conflicts
Robin-Rounthwaite Nov 5, 2020
1b9c340
resolve merge conflict
Robin-Rounthwaite Nov 10, 2020
b3650bb
cleaned some comments
Robin-Rounthwaite Nov 24, 2020
01d2892
made SnarlSequences class for gbwt sequences.
Robin-Rounthwaite Nov 24, 2020
825bea6
exhaustive sequence finding moved to SnarlSequenceFinder
Robin-Rounthwaite Nov 25, 2020
3d27382
moved find_embedded_paths to snarl_sequence_finder
Robin-Rounthwaite Nov 25, 2020
ab48056
deleted now-redundant fxns from SnarlNormalizer
Robin-Rounthwaite Nov 25, 2020
f4f197b
cleaned normalize_snarls hpp
Robin-Rounthwaite Nov 25, 2020
bf626c7
Merge branch 'clean_code'
Robin-Rounthwaite Nov 26, 2020
19c0706
resolved merge conflict
Robin-Rounthwaite Dec 1, 2020
6602922
duplicate haplotypes removed before alignment
Robin-Rounthwaite Dec 4, 2020
5e39863
extraction of chosen subgraph, single snarl normalization.
Robin-Rounthwaite Dec 17, 2020
12ca172
added compatibility for snarls that are recorded 'backwards' in the g…
Robin-Rounthwaite Jan 8, 2021
a7c67be
added debug prints, prep for fixing path moving between snarls
Robin-Rounthwaite Jan 28, 2021
b89f75b
right-to-left directed snarls are now supported.
Robin-Rounthwaite Feb 26, 2021
d1468ec
right-to-left directed snarls are now supported in normalize_snarls.
Robin-Rounthwaite Feb 26, 2021
18625c2
subdivided part of integrate_snarl into new overwrite_node_id fxn.
Robin-Rounthwaite Feb 26, 2021
adbeb25
added a few unit tests directly integrated with code; only prints if …
Robin-Rounthwaite Mar 12, 2021
157dc0b
commented out debug code
Robin-Rounthwaite Mar 20, 2021
b47a8b8
more informative debug info
Robin-Rounthwaite Mar 21, 2021
10921bb
Merge remote-tracking branch 'upstream/master'
Robin-Rounthwaite Mar 21, 2021
f6ebd49
update to most recent vg
Robin-Rounthwaite Mar 24, 2021
4629d2a
minor edits
Robin-Rounthwaite Mar 25, 2021
7031c16
Merge remote-tracking branch 'upstream/master'
Robin-Rounthwaite Mar 25, 2021
c47c4e0
fixed include error with seqan dependencies.
Robin-Rounthwaite Mar 26, 2021
9435915
current draft of normalize snarls without debugging code
Robin-Rounthwaite Mar 30, 2021
c05948d
fixed bug with forcing end-character alignments
Robin-Rounthwaite Apr 2, 2021
37a079f
removed debug prints
Robin-Rounthwaite Apr 2, 2021
ab10f34
fixed typo in comments
Robin-Rounthwaite Apr 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,6 @@
[submodule "deps/atomic_queue"]
path = deps/atomic_queue
url = https://github.com/max0x7ba/atomic_queue.git
[submodule "deps/seqan"]
path = deps/seqan
url = https://github.com/seqan/seqan.git
1 change: 1 addition & 0 deletions deps/seqan
Submodule seqan added at f5f658
142 changes: 142 additions & 0 deletions robin_bash/debug_normalize_snarl.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
#!/bin/bash

export VG_FULL_TRACEBACK=1
set -e

echo compiling!
. ./source_me.sh && make -j 8
echo running!

###After Normalization
dir=/home/robin/paten_lab/vg/test/robin_tests/chr21
# base=hgsvc_construct.chr21.robin_made
base=hgsvc_construct.chr21.robin_made.normalized
haps=HGSVC.haps.vcf.gz
threads=16
# meta=200bp.60000num
reads=hgsvc_construct.chr21.robin_made.normalized.read_sim.200bp.60000num.txt
# reads=hgsvc_construct.chr21.robin_made.normalized.read_sim.400bp.5000num.txt

vg map -t $threads -d $dir/$base -x $dir/$base.xg -T $dir/$reads >$dir/$base.alignment.gam
# vg view -a $dir/$base.alignment.gam -j >$dir/$base.alignment.$meta.json

echo finished mapping.

#jq arbitrary queries for gam files (vg view is easier, more weighty), vg gamcompare with an empty file lets you look at full gam.

###Before Normalization (orignally used in sh file on courtyard - see reconstruct_jmonlong_chr21):
# in_dir=/public/groups/cgl/graph-genomes/jmonlong/hgsvc/haps
# ref=hg38.fa
# in_dir=/home/robin/paten_lab/vg/test/robin_tests/chr21
# vars=HGSVC.haps.vcf.gz
# base=hgsvc_construct.chr21.robin_made
# base_out=hgsvc_construct.chr21.robin_made.test_out
# chrom=chr21
# threads=8

#make graph
#vg construct -r $in_dir/$ref -v $in_dir/$vars -R $chrom -C -m 32 -a -f > $base.vg

#index graph. Note: currently doesn't make .gcsa for some reason.
# vg index -t $threads -x $in_dir/$base_out.xg -G $in_dir/$base_out.gbwt -v $in_dir/$vars -g $in_dir/$base_out.gcsa $in_dir/$base.vg
# vg index -G hgsvc_construct.chr21.robin_made.normalized.gbwt -g hgsvc_construct.chr21.robin_made.normalized.gcsa -v HGSVC.haps.vcf.gz hgsvc_construct.chr21.robin_made.normalized.vg

#make snarls.
# vg snarls -v $in_dir/$vars $base.vg > $base.snarls.pb

# chunk graph?
# vg chunk -x hgsvc_construct.chr21.robin_made.normalized.xg -G hgsvc_construct.chr21.robin_made.normalized.gbwt -r 0:16608 >chunk_normalized_0_to_16608.vg



###Before and During Normalization
##running normalize_snarls on a full chromosome - local machine.
# TEST_DIR=test/robin_tests/full_chr10
# FILE_NAME=hgsvc_chr10_construct
# FILE_NAME_OUT=junk
# FILE_NAME_OUT=chr10_no_gbwt_handles_at_25128
# FILE_NAME_OUT=hgsvc_chr10_construct_normalized_no_max_size

# TEST_DIR=test/robin_tests/chr21
# FILE_NAME=hgsvc_construct.chr21.robin_made.normalized
# FILE_NAME_OUT=hgsvc_construct.chr21.robin_made.normalized
# FILE_NAME_OUT=hgsvc_construct.chr21.robin_made.normalized.subgraph.301929.exp_context

## running full chr21:
# vg normalize -e -g $TEST_DIR/$FILE_NAME.gbwt -s $TEST_DIR/$FILE_NAME.snarls.pb $TEST_DIR/$FILE_NAME.hg >$TEST_DIR/$FILE_NAME_OUT.hg

## running subset of chr21
# vg find -x $TEST_DIR/$FILE_NAME.xg -n 301929 -c 100 >$FILE_NAME_OUT.vg
# ./bin/vg view -dpn $FILE_NAME_OUT.vg| \
# dot -Tsvg -o $FILE_NAME_OUT.svg
# chromium-browser $FILE_NAME_OUT.svg


# vg view -dpn $FILE_NAME_OUT.vg| \
# dot -Tsvg -o $FILE_NAME_OUT.svg
# chromium-browser $FILE_NAME_OUT.svg

## for extracting a prenormalized subgraph for looking at chr10

# TEST_DIR=test/robin_tests/chr21
# FILE_NAME=hgsvc_construct.chr21.robin_made
# FILE_NAME_OUT=hgsvc_construct.chr21.robin_made.subgraph.301929.exp_context

# vg find -x $TEST_DIR/$FILE_NAME.xg -n 301929 -c 100 >$FILE_NAME_OUT.vg
# ./bin/vg view -dpn $FILE_NAME_OUT.vg| \
# dot -Tsvg -o $FILE_NAME_OUT.svg
# chromium-browser $FILE_NAME_OUT.svg


##running full chr10
# echo "running normalize (w/ evaluation)"
# valgrind --leak-check=full vg normalize -e -g $TEST_DIR/$FILE_NAME.gbwt -s $TEST_DIR/$FILE_NAME.snarls $TEST_DIR/$FILE_NAME.hg >$TEST_DIR/$FILE_NAME_OUT.hg
# vg normalize -e -g $TEST_DIR/$FILE_NAME.gbwt -s $TEST_DIR/$FILE_NAME.snarls.pb $TEST_DIR/$FILE_NAME.hg >$TEST_DIR/$FILE_NAME_OUT.hg


# ##running full chr10 with no max size.
# echo "running normalize (w/ evaluation)"
# vg normalize -e -m 0 -g $TEST_DIR/$FILE_NAME.gbwt -s $TEST_DIR/$FILE_NAME.snarls $TEST_DIR/$FILE_NAME.hg >$TEST_DIR/$FILE_NAME_OUT.hg


## for printing out the normalized subsnarl:
# vg normalize -g $TEST_DIR/$FILE_NAME.gbwt -s $TEST_DIR/$FILE_NAME.snarls $TEST_DIR/$FILE_NAME.hg >graph_out.vg
# ./bin/vg view -dpn graph_out.vg| \
# dot -Tsvg -o graph_out.svg
# chromium-browser graph_out.svg

## for extracting a prenormalized subgraph for looking at chr10
# vg find -x $TEST_DIR/$FILE_NAME.xg -n 25128 -c 25 >$FILE_NAME_OUT.vg
# ./bin/vg view -dpn $FILE_NAME_OUT.vg| \disambiguating snarl #85 source: 23053 sink: 23075
# dot -Tsvg -o $FILE_NAME_OUT.svg
# chromium-browser $FILE_NAME_OUT.svg

## looking at an old example
# TEST_DIR=test/robin_tests/robin_haplotypes/complex
# FILE_NAME=chr10_subgraph_0_new
# FILE_NAME_OUT=chr10_subgraph_0_new_normalized_200_max_thread_size
# # vg index -G $TEST_DIR/$FILE_NAME.gbwt -v $TEST_DIR/../../HGSVC.haps.chr10.vcf.gz $TEST_DIR/$FILE_NAME.vg
# # vg convert -v $TEST_DIR/$FILE_NAME.vg -A >$TEST_DIR/$FILE_NAME.hg
# vg normalize -e -g $TEST_DIR/$FILE_NAME.gbwt -s $TEST_DIR/$FILE_NAME.snarls $TEST_DIR/$FILE_NAME.hg >$TEST_DIR/$FILE_NAME_OUT.hg
# vg convert -a $TEST_DIR/$FILE_NAME_OUT.hg -V >$TEST_DIR/$FILE_NAME_OUT.vg
# vg mod -g 609548 -x 65 $TEST_DIR/$FILE_NAME_OUT.vg | vg view -dpn - | dot -Tsvg -o $TEST_DIR_OUT/$FILE_NAME_OUT.svg
# chromium-browser $TEST_DIR_OUT/$FILE_NAME_OUT.svg




### After Normalization:
## for making a snarls file:
# vg convert -a $TEST_DIR/$FILE_NAME_OUT.hg -V >$TEST_DIR/$FILE_NAME_OUT.vg
# echo "hg converted to vg"
# vg snarls $TEST_DIR/$FILE_NAME_OUT.vg >$TEST_DIR/$FILE_NAME_OUT.snarls
# echo ".snarls made"

## for evaluating normalized graph:
# echo "getting vg stats:"
# vg stats -z -l $TEST_DIR/$FILE_NAME_OUT.vg

## creating a new gbwt graph from the outgraph:
# vg index -G $TEST_DIR/$FILE_NAME_OUT.gbwt -v $TEST_DIR/../HGSVC.haps.chr10.vcf.gz $TEST_DIR/$FILE_NAME_OUT.vg
# echo "gbwt made"

Loading