Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving/reading .h5ad file results in altered adata.uns['log1p'] #2181

Closed
2 of 3 tasks
oligomyeggo opened this issue Mar 17, 2022 · 8 comments · Fixed by #2546
Closed
2 of 3 tasks

Saving/reading .h5ad file results in altered adata.uns['log1p'] #2181

oligomyeggo opened this issue Mar 17, 2022 · 8 comments · Fixed by #2546
Labels

Comments

@oligomyeggo
Copy link

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the master branch of scanpy.

Hello. I noticed an issue when trying to run tl.rank_genes_groups, which was the result of adata.uns['log1p'] being blank (i.e., an empty dictionary, {}). I have gone through my workflow and confirmed that adata.uns['log1p'] is the expected {'base': None} after each preprocessing step, so I don't think the issue is with any of preprocessing code. However, when I save my adata object to a .h5ad file using the .write() function and then read my .h5ad file using the sc.read() function, when I check adata.uns['log1p'] it is an empty dictionary - so maybe the issue is either in the writing or reading function? I am able to manually set adata.uns['log1p'] to {'base': None} after reading the file, and can then run downstream functions like tl.rank_genes_groups without issue. I have not had this problem previously when reading .h5ad files (into either the same Jupyter notebook or into a new Jupyter notebook). Since I can manually set adata.uns['log1p'] to {'base': None}, I don't think this issue is pressing. It's just a little strange to me. Thank you for any help/advice!

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Minimal code sample (that we can copy&paste without having any data)

This can all be run in one Jupyter notebook and should produce the issue (unless it's something exclusively on my end; I've been able to reproduce the error with my own data and one of the scanpy built-in test datasets). Sorry the code chunks are broken up/a little long; I am using the scran normalization approach outlined in the single cell tutorial.

adata = sc.datasets.pbmc3k()
sc.pp.filter_genes(adata, min_cells = 1)

# scran normalization
adata_pp = adata.copy()
sc.pp.normalize_per_cell(adata_pp, counts_per_cell_after = 1e6)
sc.pp.log1p(adata_pp)
sc.pp.pca(adata_pp, n_comps = 15)
sc.pp.neighbors(adata_pp)
sc.tl.leiden(adata_pp, key_added = 'groups', resolution = 0.5)
input_groups = adata_pp.obs['groups']
data_mat = adata.X.T
%%R -i data_mat -i input_groups -o size_factors
size_factors = sizeFactors(computeSumFactors(SingleCellExperiment(list(counts = data_mat)), 
                                             clusters = input_groups, 
                                             min.mean = 0.1))
del adata_pp
adata.obs['size_factors'] = size_factors
adata.layers['counts'] = adata.X.copy()
adata.X /= adata.obs['size_factors'].values[:,None]
sc.pp.log1p(adata)
adata.X = sp.sparse.csr_matrix(adata.X)
adata.raw = adata

sc.pp.highly_variable_genes(adata, flavor = 'cell_ranger', n_top_genes = 2000)
sc.pp.pca(adata, n_comps = 50, use_highly_variable = True, svd_solver = 'arpack')
sc.pp.neighbors(adata)
sc.tl.umap(adata)
adata.uns['log1p'] # this produces: {'base': None}
adata.write('test.h5ad')

adata = sc.read('test.h5ad')
adata.uns['log1p'] # the now produces: {}
sc.tl.leiden(adata, key_added='leiden_r1.0')
sc.tl.rank_genes_groups(adata, groupby='leiden_r1.0', key_added='rank_genes_all', method='wilcoxon') # this causes an error
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [13], in <cell line: 2>()
      1 # Calculate marker genes
----> 2 sc.tl.rank_genes_groups(adata, groupby='leiden_r1.0', key_added='rank_genes_all', method='wilcoxon', use_raw=False)

File /usr/local/lib/python3.8/dist-packages/scanpy/tools/_rank_genes_groups.py:590, in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
    580 adata.uns[key_added] = {}
    581 adata.uns[key_added]['params'] = dict(
    582     groupby=groupby,
    583     reference=reference,
   (...)
    587     corr_method=corr_method,
    588 )
--> 590 test_obj = _RankGenes(adata, groups_order, groupby, reference, use_raw, layer, pts)
    592 if check_nonnegative_integers(test_obj.X) and method != 'logreg':
    593     logg.warning(
    594         "It seems you use rank_genes_groups on the raw count data. "
    595         "Please logarithmize your data before calling rank_genes_groups."
    596     )

File /usr/local/lib/python3.8/dist-packages/scanpy/tools/_rank_genes_groups.py:93, in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts)
     82 def __init__(
     83     self,
     84     adata,
   (...)
     90     comp_pts=False,
     91 ):
---> 93     if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None:
     94         self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base']))
     95     else:

KeyError: 'base'

Versions


anndata 0.8.0rc1
scanpy 1.8.2
sinfo 0.3.4

PIL 9.0.1
anndata2ri 1.0.6
annoy NA
asttokens NA
backcall 0.2.0
backports NA
beta_ufunc NA
binom_ufunc NA
cffi 1.15.0
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.5.1
decorator 5.1.1
defusedxml 0.7.1
dunamai 1.11.0
entrypoints 0.4
executing 0.8.3
get_version 3.5.4
h5py 3.6.0
hypergeom_ufunc NA
igraph 0.9.9
ipykernel 6.9.1
ipython_genutils 0.2.0
ipywidgets 7.6.5
jedi 0.18.1
jinja2 3.0.3
joblib 1.1.0
jupyter_server 1.13.5
kiwisolver 1.4.0
leidenalg 0.8.9
llvmlite 0.38.0
markupsafe 2.1.1
matplotlib 3.5.1
matplotlib_inline NA
mpl_toolkits NA
natsort 8.1.0
nbinom_ufunc NA
numba 0.55.1
numexpr 2.8.1
numpy 1.21.0
packaging 21.3
pandas 1.4.1
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prompt_toolkit 3.0.28
ptyprocess 0.7.0
pure_eval 0.2.2
pycparser 2.21
pydev_ipython NA
pydevconsole NA
pydevd 2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.11.2
pynndescent 0.5.6
pyparsing 3.0.7
pytz 2021.3
pytz_deprecation_shim NA
rpy2 3.4.2
scipy 1.8.0
scrublet NA
seaborn 0.11.2
sitecustomize NA
six 1.14.0
skimage 0.19.2
sklearn 1.0.2
stack_data 0.2.0
statsmodels 0.13.2
tables 3.7.0
texttable 1.6.4
threadpoolctl 3.1.0
tornado 6.1
tqdm 4.63.0
traitlets 5.1.1
typing_extensions NA
tzlocal NA
umap 0.5.2
wcwidth 0.2.5
zmq 22.3.0

IPython 8.1.1
jupyter_client 7.1.2
jupyter_core 4.9.2
jupyterlab 3.3.1
notebook 6.4.8

Python 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
Linux-5.10.76-linuxkit-x86_64-with-glibc2.29
8 logical CPU cores, x86_64

Session information updated at 2022-03-17 17:22

@ghost
Copy link

ghost commented Mar 25, 2022

Have the same, changing anndata to 0.7.8 version helps;

@bclopesrs
Copy link

Could someone please tell me how I can save a h5ad file from a Jupyter Notebook that I've made some modifications with another name and in the same path as before?

@Gbighe
Copy link

Gbighe commented Jun 25, 2022

Hi,
I found the answer to this question in another Q&A, please refer to
(https://bytemeta.vip/repo/scverse/scanpy/issues/2239)

@maximilianh
Copy link
Contributor

Got this on scanpy 1.9.1, did what #2239 suggested, as a temporary workaround

adata.uns['log1p']["base"] = None

@sbwiecko
Copy link

It looks like the adata.uns['log1p']["base"] = None is lost at the next reading.

Using an h5ad data set saved in result_file, the following happens:

adata = sc.read_h5ad(results_file)
adata.uns['log1p']['base'] is None
#---------------------------------------------------------------------------
#KeyError                                  Traceback (most recent call last)
#/home/sebastien/workshop_scRNA-Seq_UCLA/scanpy_tut3.py in line 2
#      [153](file:///home/sebastien/workshop_scRNA-Seq_UCLA/scanpy_tut3.py?line=152) adata = sc.read(results_file)
#----> [154](file:///home/sebastien/workshop_scRNA-Seq_UCLA/scanpy_tut3.py?line=153) adata.uns['log1p']['base'] is None

#
#KeyError: 'base'

adata.uns['log1p']["base"] = None # see https://github.com/scverse/scanpy/issues/2239
adata.uns['log1p']['base'] is None
# now returns True

adata.write(results_file)
adata.uns['log1p']['base'] is None
# still returns True

adata = sc.read_h5ad(results_file) # same with adata = sc.read(results_file)
adata.uns['log1p']['base'] is None
#
#KeyError: 'base'

This was with sc.__version__ 1.9.1

@hl-xue
Copy link

hl-xue commented Jan 24, 2023

Hello, for information: I encountered the same problem with scanpy=1.9.1 either with anndata=0.8.0 or 0.7.8 or 0.7.6.

@Phil-Momoka
Copy link

I found it useful by calling
scanpy.pp.log1p(adata)
again before the function that returns the keyerror:base

For example, in the PBMC3K tutorial, calling this function again before step 43: Comparing to a single cluster.
sc.pp.log1p(adata)
sc.tl.rank_genes_groups(adata, 'leiden', groups=['0'], reference='1', method='wilcoxon')
sc.pl.rank_genes_groups(adata, groups=['0'], n_genes=20)

I am new to scanpy so I do not know what exactly is going on here so all I can say is "for some reason" the code works this way

@flying-sheep
Copy link
Member

The log1p→base thing was fixed in #2546

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
9 participants