Saving/reading .h5ad file results in altered adata.uns['log1p'] #2181

oligomyeggo · 2022-03-17T18:10:12Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of scanpy.
(optional) I have confirmed this bug exists on the master branch of scanpy.

Hello. I noticed an issue when trying to run tl.rank_genes_groups, which was the result of adata.uns['log1p'] being blank (i.e., an empty dictionary, {}). I have gone through my workflow and confirmed that adata.uns['log1p'] is the expected {'base': None} after each preprocessing step, so I don't think the issue is with any of preprocessing code. However, when I save my adata object to a .h5ad file using the .write() function and then read my .h5ad file using the sc.read() function, when I check adata.uns['log1p'] it is an empty dictionary - so maybe the issue is either in the writing or reading function? I am able to manually set adata.uns['log1p'] to {'base': None} after reading the file, and can then run downstream functions like tl.rank_genes_groups without issue. I have not had this problem previously when reading .h5ad files (into either the same Jupyter notebook or into a new Jupyter notebook). Since I can manually set adata.uns['log1p'] to {'base': None}, I don't think this issue is pressing. It's just a little strange to me. Thank you for any help/advice!

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Minimal code sample (that we can copy&paste without having any data)

This can all be run in one Jupyter notebook and should produce the issue (unless it's something exclusively on my end; I've been able to reproduce the error with my own data and one of the scanpy built-in test datasets). Sorry the code chunks are broken up/a little long; I am using the scran normalization approach outlined in the single cell tutorial.

adata = sc.datasets.pbmc3k()
sc.pp.filter_genes(adata, min_cells = 1)

# scran normalization
adata_pp = adata.copy()
sc.pp.normalize_per_cell(adata_pp, counts_per_cell_after = 1e6)
sc.pp.log1p(adata_pp)
sc.pp.pca(adata_pp, n_comps = 15)
sc.pp.neighbors(adata_pp)
sc.tl.leiden(adata_pp, key_added = 'groups', resolution = 0.5)
input_groups = adata_pp.obs['groups']
data_mat = adata.X.T

%%R -i data_mat -i input_groups -o size_factors
size_factors = sizeFactors(computeSumFactors(SingleCellExperiment(list(counts = data_mat)), 
                                             clusters = input_groups, 
                                             min.mean = 0.1))

del adata_pp
adata.obs['size_factors'] = size_factors
adata.layers['counts'] = adata.X.copy()
adata.X /= adata.obs['size_factors'].values[:,None]
sc.pp.log1p(adata)
adata.X = sp.sparse.csr_matrix(adata.X)
adata.raw = adata

sc.pp.highly_variable_genes(adata, flavor = 'cell_ranger', n_top_genes = 2000)
sc.pp.pca(adata, n_comps = 50, use_highly_variable = True, svd_solver = 'arpack')
sc.pp.neighbors(adata)
sc.tl.umap(adata)
adata.uns['log1p'] # this produces: {'base': None}
adata.write('test.h5ad')

adata = sc.read('test.h5ad')
adata.uns['log1p'] # the now produces: {}
sc.tl.leiden(adata, key_added='leiden_r1.0')
sc.tl.rank_genes_groups(adata, groupby='leiden_r1.0', key_added='rank_genes_all', method='wilcoxon') # this causes an error

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [13], in <cell line: 2>()
      1 # Calculate marker genes
----> 2 sc.tl.rank_genes_groups(adata, groupby='leiden_r1.0', key_added='rank_genes_all', method='wilcoxon', use_raw=False)

File /usr/local/lib/python3.8/dist-packages/scanpy/tools/_rank_genes_groups.py:590, in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
    580 adata.uns[key_added] = {}
    581 adata.uns[key_added]['params'] = dict(
    582     groupby=groupby,
    583     reference=reference,
   (...)
    587     corr_method=corr_method,
    588 )
--> 590 test_obj = _RankGenes(adata, groups_order, groupby, reference, use_raw, layer, pts)
    592 if check_nonnegative_integers(test_obj.X) and method != 'logreg':
    593     logg.warning(
    594         "It seems you use rank_genes_groups on the raw count data. "
    595         "Please logarithmize your data before calling rank_genes_groups."
    596     )

File /usr/local/lib/python3.8/dist-packages/scanpy/tools/_rank_genes_groups.py:93, in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts)
     82 def __init__(
     83     self,
     84     adata,
   (...)
     90     comp_pts=False,
     91 ):
---> 93     if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None:
     94         self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base']))
     95     else:

KeyError: 'base'

Versions

anndata 0.8.0rc1
scanpy 1.8.2
sinfo 0.3.4

PIL 9.0.1
anndata2ri 1.0.6
annoy NA
asttokens NA
backcall 0.2.0
backports NA
beta_ufunc NA
binom_ufunc NA
cffi 1.15.0
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.5.1
decorator 5.1.1
defusedxml 0.7.1
dunamai 1.11.0
entrypoints 0.4
executing 0.8.3
get_version 3.5.4
h5py 3.6.0
hypergeom_ufunc NA
igraph 0.9.9
ipykernel 6.9.1
ipython_genutils 0.2.0
ipywidgets 7.6.5
jedi 0.18.1
jinja2 3.0.3
joblib 1.1.0
jupyter_server 1.13.5
kiwisolver 1.4.0
leidenalg 0.8.9
llvmlite 0.38.0
markupsafe 2.1.1
matplotlib 3.5.1
matplotlib_inline NA
mpl_toolkits NA
natsort 8.1.0
nbinom_ufunc NA
numba 0.55.1
numexpr 2.8.1
numpy 1.21.0
packaging 21.3
pandas 1.4.1
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prompt_toolkit 3.0.28
ptyprocess 0.7.0
pure_eval 0.2.2
pycparser 2.21
pydev_ipython NA
pydevconsole NA
pydevd 2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.11.2
pynndescent 0.5.6
pyparsing 3.0.7
pytz 2021.3
pytz_deprecation_shim NA
rpy2 3.4.2
scipy 1.8.0
scrublet NA
seaborn 0.11.2
sitecustomize NA
six 1.14.0
skimage 0.19.2
sklearn 1.0.2
stack_data 0.2.0
statsmodels 0.13.2
tables 3.7.0
texttable 1.6.4
threadpoolctl 3.1.0
tornado 6.1
tqdm 4.63.0
traitlets 5.1.1
typing_extensions NA
tzlocal NA
umap 0.5.2
wcwidth 0.2.5
zmq 22.3.0

IPython 8.1.1
jupyter_client 7.1.2
jupyter_core 4.9.2
jupyterlab 3.3.1
notebook 6.4.8

Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
Linux-5.10.76-linuxkit-x86_64-with-glibc2.29
8 logical CPU cores, x86_64

Session information updated at 2022-03-17 17:22

The text was updated successfully, but these errors were encountered:

ghost · 2022-03-25T03:14:49Z

Have the same, changing anndata to 0.7.8 version helps;

bclopesrs · 2022-05-26T19:56:47Z

Could someone please tell me how I can save a h5ad file from a Jupyter Notebook that I've made some modifications with another name and in the same path as before?

Gbighe · 2022-06-25T14:27:04Z

Hi,
I found the answer to this question in another Q&A, please refer to
(https://bytemeta.vip/repo/scverse/scanpy/issues/2239)

maximilianh · 2022-10-12T13:33:58Z

Got this on scanpy 1.9.1, did what #2239 suggested, as a temporary workaround

adata.uns['log1p']["base"] = None

sbwiecko · 2022-12-16T15:39:51Z

It looks like the adata.uns['log1p']["base"] = None is lost at the next reading.

Using an h5ad data set saved in result_file, the following happens:

adata = sc.read_h5ad(results_file)
adata.uns['log1p']['base'] is None
#---------------------------------------------------------------------------
#KeyError                                  Traceback (most recent call last)
#/home/sebastien/workshop_scRNA-Seq_UCLA/scanpy_tut3.py in line 2
#      [153](file:///home/sebastien/workshop_scRNA-Seq_UCLA/scanpy_tut3.py?line=152) adata = sc.read(results_file)
#----> [154](file:///home/sebastien/workshop_scRNA-Seq_UCLA/scanpy_tut3.py?line=153) adata.uns['log1p']['base'] is None

#
#KeyError: 'base'

adata.uns['log1p']["base"] = None # see https://github.com/scverse/scanpy/issues/2239
adata.uns['log1p']['base'] is None
# now returns True

adata.write(results_file)
adata.uns['log1p']['base'] is None
# still returns True

adata = sc.read_h5ad(results_file) # same with adata = sc.read(results_file)
adata.uns['log1p']['base'] is None
#
#KeyError: 'base'

This was with sc.__version__ 1.9.1

hl-xue · 2023-01-24T11:35:42Z

Hello, for information: I encountered the same problem with scanpy=1.9.1 either with anndata=0.8.0 or 0.7.8 or 0.7.6.

Phil-Momoka · 2023-07-14T05:33:53Z

I found it useful by calling
scanpy.pp.log1p(adata)
again before the function that returns the keyerror:base

For example, in the PBMC3K tutorial, calling this function again before step 43: Comparing to a single cluster.
sc.pp.log1p(adata)
sc.tl.rank_genes_groups(adata, 'leiden', groups=['0'], reference='1', method='wilcoxon')
sc.pl.rank_genes_groups(adata, groups=['0'], n_genes=20)

I am new to scanpy so I do not know what exactly is going on here so all I can say is "for some reason" the code works this way

flying-sheep · 2023-07-18T11:48:45Z

The log1p→base thing was fixed in #2546

Zethson added the Bug 🐛 label Apr 6, 2022

auesro mentioned this issue Apr 20, 2022

KeyError: 'base' when running tl.rank_genes_groups #2239

Closed

3 tasks

adkinsrs mentioned this issue May 18, 2022

Group labeling headers show up before click on clustering step, single-cell wb IGS/gEAR#307

Closed

LuckyMD mentioned this issue May 19, 2022

Key Error "base" in section "marker genes & annotation" theislab/single-cell-tutorial#97

Closed

DriesSchaumont mentioned this issue Jul 15, 2022

Fix checking of log1p transformation base value when that value is None. #2294

Closed

flying-sheep closed this as completed Jul 18, 2023

flying-sheep mentioned this issue Jul 18, 2023

Fix getting log1p base #2546

Merged

flying-sheep mentioned this issue Jul 27, 2023

read harmonypy data can not find deg #2440

Closed

3 tasks

fasterius mentioned this issue Apr 4, 2024

Use spatialdata nf-core/spatialvi#67

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving/reading .h5ad file results in altered adata.uns['log1p'] #2181

Saving/reading .h5ad file results in altered adata.uns['log1p'] #2181

oligomyeggo commented Mar 17, 2022

anndata 0.8.0rc1
scanpy 1.8.2
sinfo 0.3.4

IPython 8.1.1
jupyter_client 7.1.2
jupyter_core 4.9.2
jupyterlab 3.3.1
notebook 6.4.8

Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
Linux-5.10.76-linuxkit-x86_64-with-glibc2.29
8 logical CPU cores, x86_64

ghost commented Mar 25, 2022

bclopesrs commented May 26, 2022

Gbighe commented Jun 25, 2022

maximilianh commented Oct 12, 2022

sbwiecko commented Dec 16, 2022

hl-xue commented Jan 24, 2023

Phil-Momoka commented Jul 14, 2023

flying-sheep commented Jul 18, 2023

Saving/reading .h5ad file results in altered adata.uns['log1p'] #2181

Saving/reading .h5ad file results in altered adata.uns['log1p'] #2181

Comments

oligomyeggo commented Mar 17, 2022

Minimal code sample (that we can copy&paste without having any data)

Versions

anndata 0.8.0rc1 scanpy 1.8.2 sinfo 0.3.4

IPython 8.1.1 jupyter_client 7.1.2 jupyter_core 4.9.2 jupyterlab 3.3.1 notebook 6.4.8

Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] Linux-5.10.76-linuxkit-x86_64-with-glibc2.29 8 logical CPU cores, x86_64

ghost commented Mar 25, 2022

bclopesrs commented May 26, 2022

Gbighe commented Jun 25, 2022

maximilianh commented Oct 12, 2022

sbwiecko commented Dec 16, 2022

hl-xue commented Jan 24, 2023

Phil-Momoka commented Jul 14, 2023

flying-sheep commented Jul 18, 2023

anndata 0.8.0rc1
scanpy 1.8.2
sinfo 0.3.4

IPython 8.1.1
jupyter_client 7.1.2
jupyter_core 4.9.2
jupyterlab 3.3.1
notebook 6.4.8

Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
Linux-5.10.76-linuxkit-x86_64-with-glibc2.29
8 logical CPU cores, x86_64