Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent duplicate matches #5

Open
hepcat72 opened this issue Mar 24, 2017 · 0 comments
Open

Prevent duplicate matches #5

hepcat72 opened this issue Mar 24, 2017 · 0 comments

Comments

@hepcat72
Copy link
Owner

hepcat72 commented Mar 24, 2017

There's a problem that has been an issue for a long time, and that is duplicates in the added feature columns. The same feature can get added as annotation to a data row multiple times. Here's an example:

Output:

>cat JS11_bamCoverage.bedgraph.nocov.simple
#featureProximity.pl Version 3.8
# Created: 11/2/2011
# Last modified: Wed Mar 15 16:13:59 2017
#Wed Mar 15 16:14:46 2017
#/usr/bin/perl /Users/rleach/pub/seqtools/featureProximity.pl -i deletions_unique.txt -f JS11_bamCoverage.bedgraph.nocov -r 0 -c 1 -b 2 -e 3 -a 1 -j 2 -k 3 -w 1
#datcol(1)	datcol(2)	datcol(3)	Distance	ftcol(1)
1	33574	33575		
1	36126	36141		
1	580578	580581		
1	2078797	2078798	0	1,1
1	2622551	2622908		
1	2652306	2652316	0	1
1	2653569	2653597		
1	2653683	2653691	0	1,1
1	2743286	2743305	0	1,1
1	2744893	2744896		
1	2745174	2745199	0	1
1	2850668	2850689		
1	2852896	2852903	0	1
1	3062741	3062792	0	1
1	3071674	3071683	0	1
1	3127342	3127343		
1	3178221	3178235	0	1,1
1	3217467	3217492	0	1,1
1	3223021	3223022		
2	26955	26956		
2	136633	136669	0	2
2	1021481	1021482		
2	1021487	1021488	0	2,2
2	1022198	1022199		
2	1334133	1334134	0	2,2
2	1776702	1777433		

Note the "1,1" and "2,2" feature annotations. Those are duplicate matches separated by commas. There is really only 1 such feature that overlaps the matching positions. Note the inputs below:

Inputs:

>cat JS11_bamCoverage.bedgraph.nocov
1	37318	37335	0
1	1895314	1895318	0
1	2078797	2078798	0
1	2410087	2410091	0
1	2652314	2652315	0
1	2653071	2653169	0
1	2653683	2653690	0
1	2743286	2743301	0
1	2745176	2745197	0
1	2851251	2851256	0
1	2852897	2852901	0
1	3060846	3060899	0
1	3061005	3061065	0
1	3061165	3061179	0
1	3062772	3062781	0
1	3066730	3066852	0
1	3071681	3071682	0
1	3072186	3072347	0
1	3110774	3110988	0
1	3112783	3112785	0
1	3117996	3118031	0
1	3121421	3121471	0
1	3178226	3178235	0
1	3178327	3178337	0
1	3217490	3217492	0
2	134711	134715	0
2	135204	135328	0
2	136639	136661	0
2	1021487	1021488	0
2	1334133	1334134	0
>cat deletions_unique.txt
1	33574	33575
1	36126	36141
1	580578	580581
1	2078797	2078798
1	2622551	2622908
1	2652306	2652316
1	2653569	2653597
1	2653683	2653691
1	2743286	2743305
1	2744893	2744896
1	2745174	2745199
1	2850668	2850689
1	2852896	2852903
1	3062741	3062792
1	3071674	3071683
1	3127342	3127343
1	3178221	3178235
1	3217467	3217492
1	3223021	3223022
2	26955	26956
2	136633	136669
2	1021481	1021482
2	1021487	1021488
2	1022198	1022199
2	1334133	1334134
2	1776702	1777433

I suspect that this issue has to do with separate loops that search for features with different orientations or searches different ways relative to the data coordinates. I recall having looked into this in the past and I remember that it was a difficult bug to resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant