Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count based on narrowPeak score #34

Closed
donaldcampbelljr opened this issue Oct 14, 2024 · 5 comments
Closed

Count based on narrowPeak score #34

donaldcampbelljr opened this issue Oct 14, 2024 · 5 comments
Labels

Comments

@donaldcampbelljr
Copy link
Member

Currently, we count overlaps in uniwig, but what if we could weigh the counts based on the associated score within a narrowPeak file?

@donaldcampbelljr
Copy link
Member Author

@nleroy917 Per our discussion, it appears that this was not quite working as desired. Could you post some examples for later troubleshooting?

@nleroy917
Copy link
Member

Yes, I've tried as best I could to isolate all this to a github repo here: https://github.com/nleroy917/uniwig-test/blob/master/README.md

@donaldcampbelljr
Copy link
Member Author

donaldcampbelljr commented Dec 16, 2024

Some additional information:

I made a smaller narrowPeak, extracting some of the "problem areas".

smaller narrowPeak

chr1	121519059	121519235	cluster-0_peak_1297	29	.	3.62554	5.46648	2.92334	125
chr1	121519548	121519715	cluster-0_peak_1298	73	.	5.25078	10.2188	7.35808	34
chr1	125064547	125064750	cluster-0_peak_1299	24	.	3.19496	4.89208	2.41557	25
chr1	125081058	125081252	cluster-0_peak_1300	44	.	4.29138	7.17037	4.48801	66
chr1	125168367	125168693	cluster-0_peak_1301	41	.	3.94715	6.80498	4.14707	103
chr1	125180089	125180402	cluster-0_peak_1302	1510	.	20.0832	156.344	151.034	122
chr1	143194647	143195722	cluster-0_peak_1303	378	.	10.3177	41.7861	37.8509	889
chr1	143202076	143202522	cluster-0_peak_1304	86	.	5.80194	11.6223	8.69297	113
chr1	143203567	143203874	cluster-0_peak_1305	61	.	4.93165	8.94378	6.15061	24
chr1	143214640	143214833	cluster-0_peak_1306	293	.	8.84296	33.0706	29.3898	87
chr1	143220064	143220238	cluster-0_peak_1307	29	.	3.18705	5.48859	2.94339	24
chr1	143222023	143222224	cluster-0_peak_1308	39	.	3.59587	6.59635	3.9533	183
chr1	143222403	143222730	cluster-0_peak_1309	45	.	3.78323	7.2509	4.56303	24

Looking at a few workflows:

narrowPeak -> bw (all gtars)
narrowPeak -> bedGraph -> bg to bw (bigtools)
narrowPeak -> bedGraph -> bedGraphToBigWig (kent utils)
narrowPeak -> wig -> wigToBigWig (kent utils)

image

It appears that the wig and bedGraph files being produced by gtars are good/make sense and the kent tools are converting to bw where, in IGV, they make sense visually.

example output bedGraph (via gtars)

chr1	121519054	121519066	102
chr1	121519066	121519544	44
chr1	121519544	121519555	68
chr1	121519555	125064543	0
chr1	125064543	125064554	44
chr1	125064554	125081054	20
chr1	125081054	125081065	61
chr1	125081065	125168363	17
chr1	125168363	125168374	1527
chr1	125168374	125180085	1486
chr1	125180085	125180096	1864
chr1	125180096	143194643	354
chr1	143194643	143194654	440
chr1	143194654	143202072	62
chr1	143202072	143202083	123
chr1	143202083	143203563	37
chr1	143203563	143203574	330
chr1	143203574	143214636	269
chr1	143214636	143214647	298
chr1	143214647	143220060	5
chr1	143220060	143220071	44
chr1	143220071	143222019	15
chr1	143222019	143222030	60
chr1	143222030	143222399	21
chr1	143222399	143222410	22
chr1	143222410	248956422	0

command used (for gtars):

./gtars uniwig -f smaller_sample.narrowPeak -t narrowpeak -y wig -m 5 -s 1 -l /output/wig/ -c /hg38.chrom.sizes -p 2 --score

Next steps:

Need to use bigtools and see if a parameter is incorrectly set that could be causing this discrepancy.

@donaldcampbelljr
Copy link
Member Author

Interestingly, I was curious why this conversion seems to be working for our recently included bam to bw workflow (which also uses bigtools behind the scenes). This workflow streams bedGraph line by line to bigtools. Therefore, I attempted to manually convert bedGraphs and used the stdin as input and used the uncompressed vs no compression, setting or not setting zoom flags and found:

image

Example command:

cat _core.bedGraph | ./target/release/bedgraphtobigwig -u - /hg38.chrom.sizes /bigtools_core_nozoom_uncompressed_stdin.bw 

So streaming bedGraph input seems to be working (and leaving the zoom to default).

There was also a previous discussion in this bigtools issue discussing zoomlevels and singlepass vs multipass: jackh726/bigtools#63

This led me to also consider trying single pass mode while passing a bedGraph file (not using stdin), this also appears to provide a bw file in IGV that is expected, while adding a zoom level appears to cause issues, using default multipass mode with zoom=1 also works but all other zooms (including the default z=10) shows dicrepancy:
image

Recommendations

  • option a) force zoom to 1 for these conversions (and continue to use multipass mode)
  • option b) or, do single pass mode and remove zoom from the parameters.
  • Future: we should be streaming these like we do for bam inputs (it will help with performance) but that will require future refactoring

@donaldcampbelljr
Copy link
Member Author

This is solved and we will track streaming work in #59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants