RcisTarget::addSignificantGenes error #27

joel-tuberosa · 2022-08-16T16:01:35Z

Hello,

I would like to perform an enrichment analysis with the following data:

target_genes - a vector of gene names corresponding to the tested set

motif_rankings - the loaded database mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather downloaded from here

motifAnnotations_mgi - annotation data loaded from the package with data(motifAnnotations_mgi)

I am running the following commands:

motifs_AUC <- calcAUC(target_genes, motif_rankings)
motifEnrichmentTable <- addMotifAnnotation(motifs_AUC,  motifAnnot=motifAnnotations_mgi)
motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable, 
                                                   geneSets=target_genes,
                                                   rankings=motif_rankings, 
                                                   nCores=1,
                                                   method="aprox")

And I got this error message from the last command:

Error in data.frame(row.names = motifNames, rankings[, geneSet]) : 
  duplicate row.names: 16388, 13294, 17330, 17112, 4188, 16530, 16844, 18101, 17737, 18186, 16084, 18886, 11338, 12655, 16219, 18026, 15061, 16371, 14701, 17214, 18246, 16884, 14225, 6681, 18323, 17761, 17628, 16022, 17015, 18869, 15726, 16565, 16104, 14604, 16384, 15421, 16625, 16326, 15902, 17124, 18335, 18696, 9916, 15847, 14092, 17177, 15993, 17593, 16026, 18152, 14512, 16552, 16644, 19879, 18012, 17748, 18443, 16515, 17100, 17378, 17796, 19198, 18076, 16489, 18470, 14162, 17199, 18253, 16231, 17396, 18081, 17258, 15458, 17295, 15894, 17249, 18312, 17144, 13580, 8484, 16764, 15581, 12946, 19774, 15787, 18527, 18199, 18438, 17575, 17425, 16641, 11742, 18372, 17682, 16088, 17187, 15967, 18070, 17644, 14814, 14675, 17816, 18090, 14718, 17172, 14284, 18289, 18512, 16494, 17723, 15823, 18852, 14540, 17799, 15400, 11594, 17008, 18074, 16253, 17293, 18373, 14628, 13187, 18236, 14654, 17097, 16927, 15662, 11932, 17926, 18632, 18596, 17650, 17991, 17725, 16096, 16249, 10919, 17093, 1

Do you have an idea how to fix this?

Thank you in advance.

Joël

The text was updated successfully, but these errors were encountered:

ZYT-ZhangYunTao19941116 · 2022-10-15T11:22:17Z

on 17 Aug

I encountered the same problem, and after reviewing the source code I found that the problem was "motif_rankings-mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather". It has a lot of duplicate motif names in it. The solution is to use the old file named "mm9-tss-centered-10kb-10species.mc9nr.feather" from https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/

davidsanin · 2023-04-19T21:50:12Z

@joel-tuberosa, did you get anywhere with this other than changing to an old annotation file? I am having the exact same error.

ZYT-ZhangYunTao19941116 · 2023-04-20T00:36:31Z

I just changed to the old file and then everything went well 发自我的iPhone

…

------------------ Original ------------------ From: DavidS ***@***.***> Date: Thu,Apr 20,2023 5:50 AM To: aertslab/RcisTarget ***@***.***> Cc: ZYT-ZhangYunTao19941116 ***@***.***>, Comment ***@***.***> Subject: Re: [aertslab/RcisTarget] RcisTarget::addSignificantGenes error(Issue #27) @joel-tuberosa, did you get anywhere with this other than changing to an old annotation file? I am having the exact same error. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

jdenavascues · 2023-04-25T14:23:52Z

I think I know what is the problem: the new and old version of the databases have a the column where the names of the motifs are stored in differen positions. In old databases it is the first position (colum name 'features', while in the new ones it is at the end (column name 'motifs').

Unfortunately, the code for 03_addSignificantGenes.R assumes that the first column contains the motif names (my comments):

.getSignificantGenes <- function(geneSet,
                                 rankings,
                                 signifRankingNames=NULL,
                                 method="iCisTarget",
                                 maxRank=5000,
                                 plotCurve=FALSE,
                                 genesFormat=c("geneList", "incidMatrix"),
                                 nCores=1,
                                 digits=3,
                                 nMean=50)
{...
  # the motifRankings S4 object becomes a dataframe
  rankings <- getRanking(rankings)
  # the 'indices' are obtained from the FIRST column!!!
  indexCol <- colnames(rankings)[1]
  ...
  # this will give you now a series of ranking values, as character... and not necessarily unique
  motifNames <- as.character(unlist(rankings[,indexCol]))
  # now you get repeated row.names as you have a list of numbers instead of unique motif names:
  gSetRanks <- data.frame(row.names=motifNames, rankings[,geneSet])
  # and this is where the error originates
  ...
}

I think this was intended to be handled before, within importRankings, where it does:

indexCol <- intersect(allColumns, c('motifs', 'tracks', 'features'))#  [1]
if(verbose) message("Using the column '", indexCol, "' as feature index for the ranking database.")

So in principle it is independent of position, but indexCol is not passed on to cisTarget, I think, and also it is clear from the comment that the motifName information is expected to be at the beginning of the dataframe.

However, I do not get the intended results from this message when I run importRankings. I have been using the Drosophila motifRankings, both "new" and "old".
When I import them I get, with the old, the expected message:

> motifRankings_old <- importRankings("resources/motifdbs/old/dm6-5kb-upstream-full-tx-11species.mc8nr.feather")
Using the column 'features' as feature index for the ranking database.

But with the new, I get:

> motifRankings_new <- importRankings(".../.../dm6-5kb-upstream-full-tx-11species.mc8nr.genes_vs_motifs.rankings.feather")
Using the column '128up' as feature index for the ranking database.

'128up' is the name of the first Drosophila gene by alphanumeric ordering... but this cannot be the result of intersect(allColumns, c('motifs', 'tracks', 'features'))... I must be missing something ¯\_(ツ)_/¯

Anyway, the solution is to place the last column of the new database at the beginning before running cisTarget:

motifRankings_new@rankings <- dplyr::relocate(motifRankings_new@rankings, motifs)

Hope this helps.

davidsanin · 2023-04-25T15:29:16Z

Anyway, the solution is to place the last column of the new database at the beginning before running cisTarget:
motifRankings_new@rankings <- dplyr::relocate(motifRankings_new@rankings, motifs)

This does it! Thanks for the advice!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RcisTarget::addSignificantGenes error #27

RcisTarget::addSignificantGenes error #27

joel-tuberosa commented Aug 16, 2022

ZYT-ZhangYunTao19941116 commented Oct 15, 2022

davidsanin commented Apr 19, 2023

ZYT-ZhangYunTao19941116 commented Apr 20, 2023 via email

jdenavascues commented Apr 25, 2023 •

edited

Loading

davidsanin commented Apr 25, 2023

RcisTarget::addSignificantGenes error #27

RcisTarget::addSignificantGenes error #27

Comments

joel-tuberosa commented Aug 16, 2022

ZYT-ZhangYunTao19941116 commented Oct 15, 2022

davidsanin commented Apr 19, 2023

ZYT-ZhangYunTao19941116 commented Apr 20, 2023 via email

jdenavascues commented Apr 25, 2023 • edited Loading

davidsanin commented Apr 25, 2023

jdenavascues commented Apr 25, 2023 •

edited

Loading