ValueError: need at least one array to concatenate #7

mibo1996 · 2021-04-13T23:51:31Z

Hi,

I am trying to use infercnvpy on my single-cell dataset, but when I run:
cnv.tl.infercnv(
adata,
reference_key="leiden_1.0",
reference_cat='12',
)

I get the error message: ValueError: need at least one array to concatenate.

When I run:
adata.var.loc[:, ["gene_ids", "chromosome", "start", "end"]].head()
I can see that these columns were successfully added to adata.var

When I run:
adata.obs.head()
I can see that there is a column, 'leiden_1.0' with clusters numbered '0'-'16'.

I'm not sure what this error message means or what I can do differently to fix it.

Thank you

suegrimes · 2021-04-24T02:02:15Z

I am also getting the same error, running:
infercnvpy.tl.infercnv(seu_epi, reference_key='seurat_clusters', reference_cat=['0','1'], window_size=200)

ValueError: need at least one array to concatenate

seu_epi
AnnData object with n_obs × n_vars = 5136 × 16784
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mito', 'nCount_SCT', 'nFeature_SCT', 'S.Score', 'G2M.Score', 'Phase', 'CC.Difference', 'SCT_snn_res.0.8', 'seurat_clusters', 'tree.ident', 'condition', 'patientID', 'pANN', 'DF', 'SCT_snn_res.1', 'SCT_snn_res.0.6'
var: 'sct.detection_rate', 'sct.gmean', 'sct.variance', 'sct.residual_mean', 'sct.residual_variance', 'sct.variable', 'chromosome', 'start', 'end', 'gene_id', 'gene_name'
uns: 'neighbors'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'distances'

seu_epi.X
array([[-0.12743472, -0.12882212, -0.06847287, ..., -0.03828714,
-0.01745232, -0.03993058],
[-0.10143117, -0.10215754, -0.0520843 , ..., -0.0263906 ,
-0.00942162, -0.02850649],
[-0.1182593 , -0.11941153, -0.06270597, ..., -0.03406237,
-0.01457542, -0.03586897],
...,
[-0.10151398, -0.10224243, -0.05213675, ..., -0.02642813,
-0.00944661, -0.02854246],
[-0.06104027, -0.06079169, -0.02616703, ..., -0.00829963,
0.00237056, -0.01121682],
[-0.07975152, -0.07994411, -0.03827017, ..., -0.01663157,
-0.00299579, -0.01916628]])

seu_epi.obs.loc[:, ['seurat_clusters','condition']].head()
seurat_clusters condition
AACTGGTCATTCACTT-1 17 tumor_appendix
ACGCCGATCTGAAAGA-1 3 tumor_appendix
ACTGATGAGCACCGTC-1 3 tumor_appendix
AGCATACTCTACTCAT-1 10 tumor_appendix
CAACCTCCACAACTGT-1 7 tumor_appendix

grst · 2021-04-25T06:53:23Z

Hi,

thanks for reporting, I'll look into it.
Could you please provide the full stacktrace of the error (together with the error message, the line numbers in the code where the error occurs should show)? That would be extremely helpful for debugging!

suegrimes · 2021-04-25T16:12:20Z

Hi Gregor - thanks for looking into this! Stacktrace below:

/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/anndata/c                                            ompat/__init__.py:180: FutureWarning: Moving element from .uns['neighbors']['dis                                                     tances'] to .obsp['distances'].

This is where adjacency matrices should go now.
  warn(
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name']
WARNING: GTF file misses annotation for 13 genes in adata.
WARNING: Skipped 13 genes because of duplicate identifiers in GTF file.
... storing 'orig.ident' as categorical
... storing 'Phase' as categorical
... storing 'seurat_clusters' as categorical
... storing 'condition' as categorical
... storing 'patientID' as categorical
... storing 'DF' as categorical
... storing 'chromosome' as categorical
... storing 'gene_id' as categorical
... storing 'gene_name' as categorical
WARNING: Skipped 26 genes because they don't have a genomic position annotated.
/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/numpy/lib                                                     /arraysetops.py:583: FutureWarning: elementwise comparison failed; returning sca                                                     lar instead, but in the future will perform elementwise comparison
  mask |= (ar1 == a)

Traceback (most recent call last):
  File "run_infercnv.py", line 18, in <module>
    infercnvpy.tl.infercnv(seu_epi, reference_key='seurat_clusters', reference_c                                                     at=[0,1], window_size=200)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/i                                                     nfercnvpy/tl/_infercnv.py", line 164, in infercnv
    reference = _get_reference(tmp_adata, reference_key, reference_cat, referenc                                                     e)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/i                                                     nfercnvpy/tl/_infercnv.py", line 305, in _get_reference
    raise ValueError(
ValueError: The following reference categories were not found in adata.obs[refer                                                     ence_key]: [0 1]
(/venvs/anaconda/envs/groupenvs/infercnv-py) sgrimes@ikura:C03_210423_inferCNV_p                                                     yTest$ python run_infercnv.py
/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/anndata/c                                                     ompat/__init__.py:180: FutureWarning: Moving element from .uns['neighbors']['dis                                                     tances'] to .obsp['distances'].

This is where adjacency matrices should go now.
  warn(
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name']
WARNING: GTF file misses annotation for 13 genes in adata.
WARNING: Skipped 13 genes because of duplicate identifiers in GTF file.
... storing 'orig.ident' as categorical
... storing 'Phase' as categorical
... storing 'seurat_clusters' as categorical
... storing 'condition' as categorical
... storing 'patientID' as categorical
... storing 'DF' as categorical
... storing 'chromosome' as categorical
... storing 'gene_id' as categorical
... storing 'gene_name' as categorical
WARNING: Skipped 26 genes because they don't have a genomic position annotated.
  0%|                                                                                                          | 0/2 [00:11<?, ?it/s]
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/process.py", line 198, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/infercnvpy/tl/_infercnv.py", line 353, in _infercnv_chunk
    chr_pos, x_smoothed = _running_mean_by_chromosome(
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/infercnvpy/tl/_infercnv.py", line 273, in _running_mean_by_chromosome
    return chr_start_pos, np.hstack(running_means)
  File "<__array_function__ internals>", line 5, in hstack
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/numpy/core/shape_base.py", line 346, in hstack
    return _nx.concatenate(arrs, 1)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_infercnv.py", line 18, in <module>
    infercnvpy.tl.infercnv(seu_epi, reference_key='seurat_clusters', reference_cat=['0','1'], window_size=200)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/infercnvpy/tl/_infercnv.py", line 169, in infercnv
    *process_map(
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/venvs/anaconda/envs/groupenvs/infercnv-py/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
ValueError: need at least one array to concatenate

suegrimes · 2021-04-25T16:35:25Z

In case it factors into the troubleshooting, my AnnData object started out as a Seurat 3.x object, updated to Seurat 4.0, then converted to hd5, then to AnnData.

grst · 2021-04-26T11:12:28Z

Can you please post the result of

adata.obs["chromosome"].unique()

right before running tl.infercnv?

suegrimes · 2021-04-26T14:34:48Z

I assume you mean adata.var["chromosome"].unique()? Results below. I see that there is a nan entry, probably due to the duplicate genes being removed. Is that the issue?

seu_epi.var["chromosome"].unique()
array(['1', nan, '2', '3', '4', '5', '6', '7', 'X', '8', '9', '11', '10',
'12', '13', '14', '15', '16', '17', '18', '20', '19', 'Y', '22',
'21', 'MT', 'GL000194.1', 'GL000195.1', 'GL000219.1', 'KI270734.1',
'GL000218.1', 'KI270726.1', 'KI270711.1', 'KI270721.1'],
dtype=object)

grst · 2021-04-26T17:33:01Z

Thanks! The problem is that I expected all chromosomes (as opposed to scaffolds) to start with chr.

infercnvpy/infercnvpy/tl/_infercnv.py

Lines 255 to 257 in c5bd39b

    
           chromosomes = _natural_sort( 
        
               [x for x in var["chromosome"].unique() if x.startswith("chr") and x != "chrM"] 
        
           )

What GTF file are you using? I only tried with GENCODE I think.

suegrimes · 2021-04-26T17:38:34Z

Ah, glad it's an easy explanation! I'm using the gtf file provided for cellranger by 10X genomics. I think it is ensembl. The first few lines below:

#!genome-build GRCh38.p12

#!genome-version GRCh38

#!genome-date 2013-12

#!genome-build-accession NCBI:GCA_000001405.27

#!genebuild-last-updated 2018-01

1 havana gene 29554 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"

1 havana transcript 29554 31097 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; tag "basic"; transcript_support_level "5"

1 havana exon 29554 30039 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001947070"; exon_version "1"; tag "basic"; transcript_support_level "5"

1 havana exon 30564 30667 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "2"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001922571"; exon_version "1"; tag "basic"; transcript_support_level "5"

1 havana exon 30976 31097 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "3"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001827679"; exon_version "1"; tag "basic"; transcript_support_level "5"

1 havana transcript 30267 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-201"; transcript_source "havana"; transcript_biotype "lincRNA"; tag "basic"; transcript_support_level "5"

1 havana exon 30267 30667 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; exon_number "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001841699"; exon_version "1"; tag "basic"; transcript_support_level "5"

grst · 2021-04-26T17:43:11Z

Ok, I'll look into improving the genomic_position_from_gtf function at some point that it supports other GTF files as well.
Let's keep the issue open until then.

In the meanwhile, you could try a GENCODE file, or just modify the chromosome column in adata.var such that it has the chr prefix.

suegrimes · 2021-04-26T17:51:27Z

ok got it. I will try one of those two things. Thanks for being so responsive looking into this!

stefanpeidli · 2021-05-17T14:08:29Z

In the meantime a small and dirty trick:
adata.var['chromosome'] = ['chr'+str(i) for i in adata.var['chromosome']]

Then it worked for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: need at least one array to concatenate #7

ValueError: need at least one array to concatenate #7

mibo1996 commented Apr 13, 2021

suegrimes commented Apr 24, 2021

grst commented Apr 25, 2021

suegrimes commented Apr 25, 2021 •

edited by grst

Loading

suegrimes commented Apr 25, 2021

grst commented Apr 26, 2021

suegrimes commented Apr 26, 2021

grst commented Apr 26, 2021

suegrimes commented Apr 26, 2021

grst commented Apr 26, 2021

suegrimes commented Apr 26, 2021

stefanpeidli commented May 17, 2021

ValueError: need at least one array to concatenate #7

ValueError: need at least one array to concatenate #7

Comments

mibo1996 commented Apr 13, 2021

suegrimes commented Apr 24, 2021

grst commented Apr 25, 2021

suegrimes commented Apr 25, 2021 • edited by grst Loading

suegrimes commented Apr 25, 2021

grst commented Apr 26, 2021

suegrimes commented Apr 26, 2021

grst commented Apr 26, 2021

suegrimes commented Apr 26, 2021

grst commented Apr 26, 2021

suegrimes commented Apr 26, 2021

stefanpeidli commented May 17, 2021

suegrimes commented Apr 25, 2021 •

edited by grst

Loading