Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent quadratic untangle cut point explosions #474

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

Conversation

ekg
Copy link
Contributor

@ekg ekg commented Feb 6, 2023

That's the premise of this PR. It seems that untangling isn't behaving as expected in deep loops, and this branch will have hacks required to figure out why. It may be due to a quadratic expansion of the number of possible cut points.

The current patch makes a change that does affect the untangle cut points. We need to validate that the output makes sense.

@ekg
Copy link
Contributor Author

ekg commented Feb 9, 2023

Are we ready to merge? I guess we should check larger scale tests.

@AndreaGuarracino
Copy link
Member

In terms of performance, it is good. I tried it on multiple chrXY graphs of primates. The slowest run took ~6 hours.

octopus11
[odgi::untangle] warning: no step index specified. Building one with a sample rate of 8. This may take additional time. A step index can be provided via -a, --step-index. A step index can be built using odgi stepindex.
[odgi::algorithms::stepindex] Collecting Steps Progress: 100.00% @ 3.97e+00/s elapsed: 00:00:00:03 remain: 00:00:00:00
[odgi::algorithms::stepindex] Building Progress: 100.00% @ 2.89e+00/s elapsed: 00:00:00:04 remain: 00:00:00:00
[odgi::algorithms::untangle] untangling 14 queries with 1 targets
[odgi::algorithms::untangle] establishing initial cuts for 14 paths
[odgi::algorithms::untangle] set target nodes 100.00% @ 5.29e-01/s elapsed: 00:00:00:01 remain: 00:00:00:00
[odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 1.42e-03/s elapsed: 00:00:11:44 remain: 00:00:00:00
[odgi::algorithms::untangle] mark nodes every 50000 bp 100.00% @ 3.17e+07/s elapsed: 00:00:00:01 remain: 00:00:00:00
[odgi::algorithms::untangle] add new cut points 100.00% @ 4.08e+00/s elapsed: 00:00:00:03 remain: 00:00:00:00
[odgi::algorithms::untangle] building target segment index
[odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 6.20e-04/s elapsed: 00:00:26:53 remain: 00:00:00:00
[odgi::algorithms::untangle] prepare segment cuts 100.00% @ 4.38e-01/s elapsed: 00:00:00:02 remain: 00:00:00:00
[odgi::algorithms::untangle] make the mapping 100.00% @ 3.73e+07/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::algorithms::untangle] untangling 14 queries 100.00% @ 7.26e-04/s elapsed: 00:05:21:32 remain: 00:00:00:00
	Command being timed: "/home/guarracino/tools/odgi/bin/odgi-547e4d76f340dc0da34bae1522308e9b97efd0c9 untangle -i /lizardfs/guarracino/tree_of_life_alignment/graphs/primates14.chrXY.p70.s5000/primates14.chrXY.fa.gz.174e029.417fcdf.a903a91.smooth.final.og -r mGorGor1#chrY -e 50000 -m 1000 -j 0 -n 28 -t 14 -P"
	User time (seconds): 75629.28
	System time (seconds): 140.30
	Percent of CPU this job got: 349%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:01:10
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 25976056
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 42828892
	Voluntary context switches: 41562533
	Involuntary context switches: 82776
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@ekg
Copy link
Contributor Author

ekg commented Feb 9, 2023 via email

@AndreaGuarracino
Copy link
Member

There is still a problem with the reverse orientation. This is from the injection tutorial, but without the odgi flip step:

image
master on the left, this branch on the right

The tracks in reverse are split into more arrows. Applying odgi flip, the output becomes identical.

@AndreaGuarracino
Copy link
Member

Actually, there are also a few problems in forward.

@AndreaGuarracino
Copy link
Member

A bit better, but still more arrows than the master (on the left):

image

@AndreaGuarracino
Copy link
Member

And also different best targets (different colors).

@AndreaGuarracino
Copy link
Member

Almost 20 hours. Not 10 minutes as before the last fixing commit, but neither days of waiting as with the master.

octopus03
[odgi::untangle] warning: no step index specified. Building one with a sample rate of 8. This may take additional time. A step index can be provided via -a, --step-index. A step index can be built using odgi stepindex.
[odgi::algorithms::stepindex] Collecting Steps Progress: 100.00% @ 4.59e+00/s elapsed: 00:00:00:03 remain: 00:00:00:00
[odgi::algorithms::stepindex] Building Progress: 100.00% @ 3.41e+00/s elapsed: 00:00:00:04 remain: 00:00:00:00
[odgi::algorithms::untangle] untangling 14 queries with 1 targets
[odgi::algorithms::untangle] establishing initial cuts for 14 paths
[odgi::algorithms::untangle] set target nodes 100.00% @ 4.79e-01/s elapsed: 00:00:00:02 remain: 00:00:00:00
[odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 2.75e-05/s elapsed: 00:10:06:48 remain: 00:00:00:00
[odgi::algorithms::untangle] mark nodes every 50000 bp 100.00% @ 2.58e+07/s elapsed: 00:00:00:01 remain: 00:00:00:00
[odgi::algorithms::untangle] add new cut points 100.00% @ 6.22e+00/s elapsed: 00:00:00:02 remain: 00:00:00:00
[odgi::algorithms::untangle] building target segment index
[odgi::algorithms::untangle] untangle and merge cuts 100.00% @ 1.02e-03/s elapsed: 00:00:16:16 remain: 00:00:00:00
[odgi::algorithms::untangle] prepare segment cuts 100.00% @ 4.79e-01/s elapsed: 00:00:00:02 remain: 00:00:00:00
[odgi::algorithms::untangle] make the mapping 100.00% @ 2.44e+07/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::algorithms::untangle] untangling 14 queries 100.00% @ 4.42e-04/s elapsed: 00:08:48:01 remain: 00:00:00:00
	Command being timed: "/home/guarracino/tools/odgi/bin/odgi-2c78159a1b4bf122493075e436ea9c53033f430f untangle -i /lizardfs/guarracino/tree_of_life_alignment/graphs/primates14.chrXY.p90.s20000/primates14.chrXY.fa.gz.e4728fd.417fcdf.8abe853.smooth.final.og -r chm13v2#chrX -e 50000 -m 1000 -j 0 -n 28 -t 14 -P"
	User time (seconds): 349928.09
	System time (seconds): 282.15
	Percent of CPU this job got: 506%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 19:11:58
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 22090900
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 328758411
	Voluntary context switches: 82374953
	Involuntary context switches: 544237
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@ekg
Copy link
Contributor Author

ekg commented Feb 15, 2023

Yup this is still not done.

@ekg
Copy link
Contributor Author

ekg commented Feb 15, 2023

Screenshot from 2023-02-15 16-57-38

This is looking more correct. I'm not seeing the same problems. Looking to see if there are further wins to be made.

@AndreaGuarracino AndreaGuarracino marked this pull request as draft March 23, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants