Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTrack - undesired behavior when using transformation and windowing #90

Open
jayoung opened this issue Apr 15, 2024 · 2 comments
Open

Comments

@jayoung
Copy link

jayoung commented Apr 15, 2024

hi there,

I'm loving Gviz - thanks for this package!

This isn't exactly a bug report - more a request to consider handling transformation/windowing a bit differently.

In DataTrack I can see that the data gets transformed BEFORE it gets windowed. If we use a transformation of log10, just to plot on a log scale, this leads to undesired outcomes (because log10(a) + log10(b) != log10(a+b)). I think what most people would want is simply to have the same plot on a log scale, rather than a different-looking plot.

I figured out a way to get what I want (using aggregation=function(x) {log10(sum(x))}) but maybe it would also make sense to change the underlying DataTrack code to transform AFTER aggregating over windows. Kind of depends on what typical use case for transformation is, but I would think this will be a very common one.

Here's some example code to demonstrate is below.

What do you think?

thanks very much,

Janet Young
Fred Hutch Cancer Center

library(Gviz)

## set up data
# middle position has two overlapping regions, but all positions sum to the same count. 
# Looks fine on a linear scale but not on a log scale
test_sites <- GRanges(seqnames="chr1",
                      ranges=IRanges(start=c(10,20,20,30), width=1),
                      counts=c(100,50,50,100)) 
test_region <- GRanges(seqnames="chr1",
                       ranges=IRanges(start=1, end=40)) 

## axis track
myTrack1 <- GenomeAxisTrack(test_region)

## raw counts track
myTrack2a <- DataTrack(
    test_sites, 
    name="counts",
    region=test_region,
    type = c("h"),
    window=width(test_region), 
    aggregation="sum",
    ylim=c(0,150)
)

## use transformation option to log10 scale - the plot shows that the transformation is occurring BEFORE the window aggregation.  This isn't just like plotting the data on a log scale
myTrack2b <- DataTrack(
    test_sites, 
    name="log10 counts v1 ",
    region=test_region,
    type = c("h"),
    window=width(test_region), 
    aggregation="sum",
    transformation=function(x) {log10(x)},
    ylim=c(0,3.5)
)

## SOLUTION - do the log10 scale transformation during window aggregation instead, so we can control the order of aggregation and transformation:
myTrack2c <- DataTrack(
    test_sites, 
    name="log10 counts v2",
    region=test_region,
    type = c("h"),
![gviz_transformation_question](https://github.com/ivanek/Gviz/assets/2569397/0c6c3f78-fb98-48b9-bc3e-500c02fe8899)
[](url)
    window=width(test_region), 
    aggregation=function(x) {log10(sum(x))},
    ylim=c(0,3.5)
)

myTrackList <- list(myTrack2c,myTrack2b,myTrack2a,myTrack1)

png(filename="bug_testing/gviz_transformation_question.png")
plotTracks(myTrackList)
dev.off()
@jayoung
Copy link
Author

jayoung commented Apr 15, 2024

gviz_transformation_question

@jayoung
Copy link
Author

jayoung commented Apr 15, 2024

sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Gviz_1.46.1          GenomicRanges_1.54.1 GenomeInfoDb_1.38.8  IRanges_2.36.0      
[5] S4Vectors_0.40.2     BiocGenerics_0.48.1 

loaded via a namespace (and not attached):
  [1] DBI_1.2.2                   bitops_1.0-7                deldir_2.0-4               
  [4] gridExtra_2.3               biomaRt_2.58.2              rlang_1.1.3                
  [7] magrittr_2.0.3              biovizBase_1.50.0           matrixStats_1.3.0          
 [10] compiler_4.3.2              RSQLite_2.3.6               GenomicFeatures_1.54.4     
 [13] png_0.1-8                   vctrs_0.6.5                 ProtGenerics_1.34.0        
 [16] stringr_1.5.1               pkgconfig_2.0.3             crayon_1.5.2               
 [19] fastmap_1.1.1               backports_1.4.1             dbplyr_2.5.0               
 [22] XVector_0.42.0              utf8_1.2.4                  Rsamtools_2.18.0           
 [25] rmarkdown_2.26              bit_4.0.5                   xfun_0.43                  
 [28] zlibbioc_1.48.2             cachem_1.0.8                progress_1.2.3             
 [31] blob_1.2.4                  DelayedArray_0.28.0         BiocParallel_1.36.0        
 [34] jpeg_0.1-10                 parallel_4.3.2              prettyunits_1.2.0          
 [37] cluster_2.1.6               VariantAnnotation_1.48.1    R6_2.5.1                   
 [40] stringi_1.8.3               RColorBrewer_1.1-3          rtracklayer_1.62.0         
 [43] pkgload_1.3.4               rpart_4.1.23                Rcpp_1.0.12                
 [46] SummarizedExperiment_1.32.0 knitr_1.45                  base64enc_0.1-3            
 [49] Matrix_1.6-5                nnet_7.3-19                 tidyselect_1.2.1           
 [52] rstudioapi_0.16.0           dichromat_2.0-0.1           abind_1.4-5                
 [55] yaml_2.3.8                  codetools_0.2-20            curl_5.2.1                 
 [58] lattice_0.22-6              tibble_3.2.1                Biobase_2.62.0             
 [61] KEGGREST_1.42.0             evaluate_0.23               foreign_0.8-86             
 [64] BiocFileCache_2.10.2        xml2_1.3.6                  Biostrings_2.70.3          
 [67] pillar_1.9.0                filelock_1.0.3              MatrixGenerics_1.14.0      
 [70] checkmate_2.3.1             generics_0.1.3              RCurl_1.98-1.14            
 [73] ensembldb_2.26.0            hms_1.1.3                   ggplot2_3.5.0              
 [76] munsell_0.5.1               scales_1.3.0                glue_1.7.0                 
 [79] lazyeval_0.2.2              Hmisc_5.1-2                 tools_4.3.2                
 [82] interp_1.1-6                BiocIO_1.12.0               data.table_1.15.4          
 [85] BSgenome_1.70.2             GenomicAlignments_1.38.2    XML_3.99-0.16.1            
 [88] latticeExtra_0.6-30         AnnotationDbi_1.64.1        colorspace_2.1-0           
 [91] GenomeInfoDbData_1.2.11     htmlTable_2.4.2             restfulr_0.0.15            
 [94] Formula_1.2-5               cli_3.6.2                   rappdirs_0.3.3             
 [97] fansi_1.0.6                 S4Arrays_1.2.1              dplyr_1.1.4                
[100] AnnotationFilter_1.26.0     gtable_0.3.4                digest_0.6.35              
[103] SparseArray_1.2.4           rjson_0.2.21                htmlwidgets_1.6.4          
[106] memoise_2.0.1               htmltools_0.5.8.1           lifecycle_1.0.4            
[109] httr_1.4.7                  bit64_4.0.5 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant