Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RcisTarget::addSignificantGenes error #27

Open
joel-tuberosa opened this issue Aug 16, 2022 · 5 comments
Open

RcisTarget::addSignificantGenes error #27

joel-tuberosa opened this issue Aug 16, 2022 · 5 comments

Comments

@joel-tuberosa
Copy link

Hello,

I would like to perform an enrichment analysis with the following data:

target_genes - a vector of gene names corresponding to the tested set

motif_rankings - the loaded database mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather downloaded from here

motifAnnotations_mgi - annotation data loaded from the package with data(motifAnnotations_mgi)

I am running the following commands:

motifs_AUC <- calcAUC(target_genes, motif_rankings)
motifEnrichmentTable <- addMotifAnnotation(motifs_AUC,  motifAnnot=motifAnnotations_mgi)
motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable, 
                                                   geneSets=target_genes,
                                                   rankings=motif_rankings, 
                                                   nCores=1,
                                                   method="aprox")

And I got this error message from the last command:

Error in data.frame(row.names = motifNames, rankings[, geneSet]) : 
  duplicate row.names: 16388, 13294, 17330, 17112, 4188, 16530, 16844, 18101, 17737, 18186, 16084, 18886, 11338, 12655, 16219, 18026, 15061, 16371, 14701, 17214, 18246, 16884, 14225, 6681, 18323, 17761, 17628, 16022, 17015, 18869, 15726, 16565, 16104, 14604, 16384, 15421, 16625, 16326, 15902, 17124, 18335, 18696, 9916, 15847, 14092, 17177, 15993, 17593, 16026, 18152, 14512, 16552, 16644, 19879, 18012, 17748, 18443, 16515, 17100, 17378, 17796, 19198, 18076, 16489, 18470, 14162, 17199, 18253, 16231, 17396, 18081, 17258, 15458, 17295, 15894, 17249, 18312, 17144, 13580, 8484, 16764, 15581, 12946, 19774, 15787, 18527, 18199, 18438, 17575, 17425, 16641, 11742, 18372, 17682, 16088, 17187, 15967, 18070, 17644, 14814, 14675, 17816, 18090, 14718, 17172, 14284, 18289, 18512, 16494, 17723, 15823, 18852, 14540, 17799, 15400, 11594, 17008, 18074, 16253, 17293, 18373, 14628, 13187, 18236, 14654, 17097, 16927, 15662, 11932, 17926, 18632, 18596, 17650, 17991, 17725, 16096, 16249, 10919, 17093, 1

Do you have an idea how to fix this?

Thank you in advance.

Joël

@ZYT-ZhangYunTao19941116

on 17 Aug

I encountered the same problem, and after reviewing the source code I found that the problem was "motif_rankings-mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather". It has a lot of duplicate motif names in it. The solution is to use the old file named "mm9-tss-centered-10kb-10species.mc9nr.feather" from https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/

@davidsanin
Copy link

@joel-tuberosa, did you get anywhere with this other than changing to an old annotation file? I am having the exact same error.

@ZYT-ZhangYunTao19941116
Copy link

@jdenavascues
Copy link

jdenavascues commented Apr 25, 2023

I think I know what is the problem: the new and old version of the databases have a the column where the names of the motifs are stored in differen positions. In old databases it is the first position (colum name 'features', while in the new ones it is at the end (column name 'motifs').

Unfortunately, the code for 03_addSignificantGenes.R assumes that the first column contains the motif names (my comments):

.getSignificantGenes <- function(geneSet,
                                 rankings,
                                 signifRankingNames=NULL,
                                 method="iCisTarget",
                                 maxRank=5000,
                                 plotCurve=FALSE,
                                 genesFormat=c("geneList", "incidMatrix"),
                                 nCores=1,
                                 digits=3,
                                 nMean=50)
{...
  # the motifRankings S4 object becomes a dataframe
  rankings <- getRanking(rankings)
  # the 'indices' are obtained from the FIRST column!!!
  indexCol <- colnames(rankings)[1]
  ...
  # this will give you now a series of ranking values, as character... and not necessarily unique
  motifNames <- as.character(unlist(rankings[,indexCol]))
  # now you get repeated row.names as you have a list of numbers instead of unique motif names:
  gSetRanks <- data.frame(row.names=motifNames, rankings[,geneSet])
  # and this is where the error originates
  ...
}

I think this was intended to be handled before, within importRankings, where it does:

indexCol <- intersect(allColumns, c('motifs', 'tracks', 'features'))#  [1]
if(verbose) message("Using the column '", indexCol, "' as feature index for the ranking database.")

So in principle it is independent of position, but indexCol is not passed on to cisTarget, I think, and also it is clear from the comment that the motifName information is expected to be at the beginning of the dataframe.

However, I do not get the intended results from this message when I run importRankings. I have been using the Drosophila motifRankings, both "new" and "old".
When I import them I get, with the old, the expected message:

> motifRankings_old <- importRankings("resources/motifdbs/old/dm6-5kb-upstream-full-tx-11species.mc8nr.feather")
Using the column 'features' as feature index for the ranking database.

But with the new, I get:

> motifRankings_new <- importRankings(".../.../dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather")
Using the column '128up' as feature index for the ranking database.

'128up' is the name of the first Drosophila gene by alphanumeric ordering... but this cannot be the result of intersect(allColumns, c('motifs', 'tracks', 'features'))... I must be missing something ¯\_(ツ)_/¯

Anyway, the solution is to place the last column of the new database at the beginning before running cisTarget:

motifRankings_new@rankings <- dplyr::relocate(motifRankings_new@rankings, motifs)

Hope this helps.

@davidsanin
Copy link

Anyway, the solution is to place the last column of the new database at the beginning before running cisTarget:

motifRankings_new@rankings <- dplyr::relocate(motifRankings_new@rankings, motifs)

This does it! Thanks for the advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants