Skip to content

Commit

Permalink
adds codespell (#155)
Browse files Browse the repository at this point in the history
* adds codespell

* remove notebooks

* remove notebook

* update to ignore ipynb pics

* fix typos

* adds release note
  • Loading branch information
Intron7 authored Mar 27, 2024
1 parent d8ed247 commit a187f4a
Show file tree
Hide file tree
Showing 26 changed files with 90 additions and 57 deletions.
22 changes: 22 additions & 0 deletions .github/workflows/codespell.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
name: Codespell

on:
push:
branches: [main]
pull_request:
branches: [main]

permissions:
contents: read

jobs:
codespell:
name: Check for spelling errors
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4
- name: Codespell
uses: codespell-project/actions-codespell@v2
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,9 @@ repos:
- id: no-commit-to-branch
args: [--branch=main]
- id: detect-private-key
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
- id: codespell
additional_dependencies:
- tomli
2 changes: 1 addition & 1 deletion docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ It is important to ensure that the CUDA environment is set up correctly so that
To view a full guide how to set up a fully functioned single cell GPU accelerated conda environment visit [GPU_SingleCell_Setup](https://github.com/Intron7/GPU_SingleCell_Setup)


# GPU-Memory and System Requierments
# GPU-Memory and System Requirements

*rapids-singlecell* relays for most computation on the GPU. A GPU with sufficient VRAM is therefore required to handle large datasets.
With a RTX 3090 it's possible to analyze 200000 cells without any issues. With an A100 80GB it is even possible to analyze more than 1000000. For even larger datasets, {mod}`~rmm` is required to oversubscribe GPU memory into host memory, similar to SWAP memory. However, using `managed_memory` can result in a performance penalty, but this is still preferable to CPU runtimes.
Expand Down
2 changes: 1 addition & 1 deletion docs/api/decoupler_gpu.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# decoupler-GPU: `dcg`

{mod}`decoupler` contains different statistical methods to extract biological activities. {mod}`rapids_singlecell.dcg` acclerates some of these methods.
{mod}`decoupler` contains different statistical methods to extract biological activities. {mod}`rapids_singlecell.dcg` accelerates some of these methods.

```{eval-rst}
.. module:: rapids_singlecell.dcg
Expand Down
2 changes: 1 addition & 1 deletion docs/api/scanpy_gpu.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# scanpy-GPU

These functions offer accelerated near drop-in replacements for common tools porvided by [`scanpy`](https://scanpy.readthedocs.io/en/stable/api/index.html).
These functions offer accelerated near drop-in replacements for common tools provided by [`scanpy`](https://scanpy.readthedocs.io/en/stable/api/index.html).

## Preprocessing `pp`
Filtering of highly-variable genes, batch-effect correction, per-cell normalization.
Expand Down
2 changes: 1 addition & 1 deletion docs/api/squidpy_gpu.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# squidpy-GPU: `gr`

{mod}`squidpy.gr` is a tool for the analysis of spatial molecular data. {mod}`rapids_singlecell.gr` acclerates some of these functions.
{mod}`squidpy.gr` is a tool for the analysis of spatial molecular data. {mod}`rapids_singlecell.gr` accelerates some of these functions.

```{eval-rst}
.. module:: rapids_singlecell.gr
Expand Down
8 changes: 4 additions & 4 deletions docs/notebooks/demo_gpu-PR.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"id": "comic-moses",
"metadata": {},
"source": [
"To run this notebook please make sure you have a working rapids enviroment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of 500000 brain cells from [Nvidia](https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/1M_brain_cpu_analysis.ipynb)."
"To run this notebook please make sure you have a working rapids environment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of 500000 brain cells from [Nvidia](https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/1M_brain_cpu_analysis.ipynb)."
]
},
{
Expand Down Expand Up @@ -490,7 +490,7 @@
"id": "arctic-upgrade",
"metadata": {},
"source": [
"Now we safe this verion of the AnnData as adata.raw."
"Now we safe this version of the AnnData as adata.raw."
]
},
{
Expand Down Expand Up @@ -777,7 +777,7 @@
"tags": []
},
"source": [
"## Clustering and Visulization"
"## Clustering and Visualization"
]
},
{
Expand All @@ -795,7 +795,7 @@
"source": [
"Next we compute the neighborhood graph using rsc.\n",
"\n",
"Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the excat graph. Both methods are valid, but you might see differences."
"Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the exact graph. Both methods are valid, but you might see differences."
]
},
{
Expand Down
12 changes: 6 additions & 6 deletions docs/notebooks/demo_gpu-seuratv3-brain-1M.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"id": "fda0ac25-cdbc-451f-84a9-d56a65fec2c0",
"metadata": {},
"source": [
"To run this notebook please make sure you have a working enviroment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of 1000000 brain cells from [Nvidia](https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/1M_brain_cpu_analysis.ipynb)."
"To run this notebook please make sure you have a working environment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of 1000000 brain cells from [Nvidia](https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/1M_brain_cpu_analysis.ipynb)."
]
},
{
Expand Down Expand Up @@ -640,7 +640,7 @@
"id": "96c3d84b-a950-4a75-a303-dbbedafe4b40",
"metadata": {},
"source": [
"Now we safe this verion of the AnnData as adata.raw."
"Now we safe this version of the AnnData as adata.raw."
]
},
{
Expand Down Expand Up @@ -717,7 +717,7 @@
"id": "0f8f3372-ac66-4704-bfa7-8b0ec685eec7",
"metadata": {},
"source": [
"Next we regess out effects of counts per cell and the mitochondrial content of the cells. As you can with scanpy you can use every numerical column in `.obs` for this."
"Next we regress out effects of counts per cell and the mitochondrial content of the cells. As you can with scanpy you can use every numerical column in `.obs` for this."
]
},
{
Expand Down Expand Up @@ -897,7 +897,7 @@
"id": "first-reggae",
"metadata": {},
"source": [
"## Clustering and Visulization"
"Visualization## Clustering and Visualization"
]
},
{
Expand All @@ -915,7 +915,7 @@
"source": [
"Next we compute the neighborhood graph using rsc.\n",
"\n",
"Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the excat graph. Both methods are valid, but you might see differences."
"Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the exact graph. Both methods are valid, but you might see differences."
]
},
{
Expand Down Expand Up @@ -1230,7 +1230,7 @@
"id": "informational-dealer",
"metadata": {},
"source": [
"After this you can use `X_diffmap` for `sc.pp.neighbors` and other fuctions. "
"After this you can use `X_diffmap` for `sc.pp.neighbors` and other functions. "
]
},
{
Expand Down
14 changes: 7 additions & 7 deletions docs/notebooks/demo_gpu-seuratv3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"id": "comic-moses",
"metadata": {},
"source": [
"To run this notebook please make sure you have a working rapids enviroment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of ca. 90000 cells from [Quin et al., Cell Research 2020](https://www.nature.com/articles/s41422-020-0355-0)."
"To run this notebook please make sure you have a working rapids environment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of ca. 90000 cells from [Quin et al., Cell Research 2020](https://www.nature.com/articles/s41422-020-0355-0)."
]
},
{
Expand Down Expand Up @@ -634,7 +634,7 @@
"id": "arctic-upgrade",
"metadata": {},
"source": [
"Now we safe this verion of the AnnData as adata.raw."
"Now we safe this version of the AnnData as adata.raw."
]
},
{
Expand Down Expand Up @@ -713,7 +713,7 @@
"id": "seventh-liquid",
"metadata": {},
"source": [
"Next we regess out effects of counts per cell and the mitochondrial content of the cells. As you can with scanpy you can use every numerical column in `.obs` for this."
"Next we regress out effects of counts per cell and the mitochondrial content of the cells. As you can with scanpy you can use every numerical column in `.obs` for this."
]
},
{
Expand Down Expand Up @@ -944,7 +944,7 @@
"id": "first-reggae",
"metadata": {},
"source": [
"## Clustering and Visulization"
"## Clustering and Visualization"
]
},
{
Expand All @@ -962,7 +962,7 @@
"source": [
"Next we compute the neighborhood graph using rsc.\n",
"\n",
"Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the excat graph. Both methods are valid, but you might see differences."
"Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the exact graph. Both methods are valid, but you might see differences."
]
},
{
Expand Down Expand Up @@ -1127,7 +1127,7 @@
"id": "ed1a5b70-54e0-4a22-83a8-e1903c5c7205",
"metadata": {},
"source": [
"We also caluclate the embedding density in the UMAP using cuML"
"We also calculate the embedding density in the UMAP using cuML"
]
},
{
Expand Down Expand Up @@ -1584,7 +1584,7 @@
"id": "informational-dealer",
"metadata": {},
"source": [
"After this you can use `X_diffmap` for `sc.pp.neighbors` and other fuctions. "
"After this you can use `X_diffmap` for `sc.pp.neighbors` and other functions. "
]
},
{
Expand Down
12 changes: 6 additions & 6 deletions docs/notebooks/demo_gpu.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"id": "comic-moses",
"metadata": {},
"source": [
"To run this notebook please make sure you have a working rapids enviroment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of ca. 90000 cells from [Quin et al., Cell Research 2020](https://www.nature.com/articles/s41422-020-0355-0)."
"To run this notebook please make sure you have a working rapids environment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of ca. 90000 cells from [Quin et al., Cell Research 2020](https://www.nature.com/articles/s41422-020-0355-0)."
]
},
{
Expand Down Expand Up @@ -520,7 +520,7 @@
"id": "arctic-upgrade",
"metadata": {},
"source": [
"Now we safe this verion of the AnnData as adata.raw. "
"Now we safe this version of the AnnData as adata.raw. "
]
},
{
Expand Down Expand Up @@ -576,7 +576,7 @@
"id": "seventh-liquid",
"metadata": {},
"source": [
"Next we regess out effects of counts per cell and the mitochondrial content of the cells. As you can with scanpy you can use every numerical column in `.obs` for this."
"Next we regress out effects of counts per cell and the mitochondrial content of the cells. As you can with scanpy you can use every numerical column in `.obs` for this."
]
},
{
Expand Down Expand Up @@ -695,7 +695,7 @@
"id": "first-reggae",
"metadata": {},
"source": [
"## Clustering and Visulization"
"## Clustering and Visualization"
]
},
{
Expand Down Expand Up @@ -778,7 +778,7 @@
"source": [
"Next we compute the neighborhood graph using rsc.\n",
"\n",
"Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the excat graph. Both methods are valid, but you might see differences."
"Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the exact graph. Both methods are valid, but you might see differences."
]
},
{
Expand Down Expand Up @@ -1343,7 +1343,7 @@
"id": "informational-dealer",
"metadata": {},
"source": [
"After this you can use `X_diffmap` for `sc.pp.neighbors` and other fuctions. "
"After this you can use `X_diffmap` for `sc.pp.neighbors` and other functions. "
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/ligrec_benchmark.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"id": "comic-moses",
"metadata": {},
"source": [
"To run this notebook please make sure you have a working rapids enviroment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of ca. 90000 cells from [Quin et al., Cell Research 2020](https://www.nature.com/articles/s41422-020-0355-0)."
"To run this notebook please make sure you have a working rapids environment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we'll be looking at a dataset of ca. 90000 cells from [Quin et al., Cell Research 2020](https://www.nature.com/articles/s41422-020-0355-0)."
]
},
{
Expand Down
6 changes: 3 additions & 3 deletions docs/release-notes/0.10.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@
* switch `utils` functions to `get` {pr}`100` {smaller}`S Dicks`
* added `get.aggregated` to create condensed `anndata` objects {pr}`100` {smaller}`S Dicks`
* added `pp.scrublet` and `pp.scrublet_simulate_doublets` {pr}`129` {smaller}`S Dicks`
* adds the option to return a copyed `AnnData` for `get.anndata_to_CPU` & `get.anndata_to_GPU` {pr}`134` {smaller}`S Dicks`
* adds the option to return a copied `AnnData` for `get.anndata_to_CPU` & `get.anndata_to_GPU` {pr}`134` {smaller}`S Dicks`
* adds `mask` argument to `pp.scale` and `pp.pca` {pr}`135` {smaller}`S Dicks`
* adds the option to run `pp.scale` on sparse matrixes `zero_center = False` without densification {pr}`135` {smaller}`S Dicks`
* updated `ruff` and now requiers paramaters by name/keyword in all public APIs {pr}`140` {smaller}`S Dicks`
* updated `ruff` and now requires parameters by name/keyword in all public APIs {pr}`140` {smaller}`S Dicks`
* adds the option to run `pp.harmony` with `np.float32` {pr}`145` {smaller}`S Dicks`

```{rubric} Bug fixes
```
* Fixes an issue where `pp.normalize` and `pp.log1p` now use `copy` and `inplace` corretly {pr}`129` {smaller}`S Dicks`
* Fixes an issue where `pp.normalize` and `pp.log1p` now use `copy` and `inplace` correctly {pr}`129` {smaller}`S Dicks`
* changes the graph constructor for `tl.leiden` and `tl.louvain` {pr}`143` {smaller}`S Dicks`
* Added a test to handle zero features, that caused issues in the sparse `pp.pca` {pr}`144` {smaller}`S Dicks`
* Added a test to check if sparse matrices are in `canonical format`. For now this only affects `pp.highly_variable_genes`, `pp.scale` and `pp.normalize_pearson_residuals`. {pr}`146` {smaller}`S Dicks`
Expand Down
1 change: 1 addition & 0 deletions docs/release-notes/0.10.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
```{rubric} Misc
```
* Updates CI to work with `uv` {pr}`149` {smaller}`S Dicks`
* Adds `Codespell` {pr}`155` {smaller}`S Dicks`
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,7 @@ source = "vcs"

[tool.hatch.build.targets.wheel]
packages = ['src/rapids_singlecell']

[tool.codespell]
skip = '*.ipynb,*.csv'
ignore-words-list = "nd"
4 changes: 2 additions & 2 deletions src/rapids_singlecell/decoupler_gpu/_method_mlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def fit_mlm(X, y, inv, df):
coef, sse, _, _ = cp.linalg.lstsq(X, y, rcond=-1)
if len(sse) == 0:
raise ValueError(
"""Couldn\'t fit a multivariate linear model. This can happen because there are more sources
"""Couldn't fit a multivariate linear model. This can happen because there are more sources
(covariates) than unique targets (samples), or because the network\'s matrix rank is smaller than the number of
sources."""
)
Expand Down Expand Up @@ -95,7 +95,7 @@ def run_mlm(
weight
Column name in net with weights.
batch_size
Size of the samples to use for each batch. Increasing this will consume more memmory but it will run faster.
Size of the samples to use for each batch. Increasing this will consume more memory but it will run faster.
min_n
Minimum of targets per source. If less, sources are removed.
verbose
Expand Down
4 changes: 2 additions & 2 deletions src/rapids_singlecell/decoupler_gpu/_method_wsum.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def run_perm(mat, net, idxs, times, seed):
net = cp.array(net)
estimate = mat.dot(net)
cp.random.seed(seed)
# Init null distirbution
# Init null distribution
null_dst = cp.zeros((mat.shape[0], net.shape[1], times), dtype=np.float32)
pvals = cp.zeros((mat.shape[0], net.shape[1]), dtype=np.float32)

Expand Down Expand Up @@ -125,7 +125,7 @@ def run_wsum(
times
How many random permutations to do.
batch_size
Size of the batches to use. Increasing this will consume more memmory but it will run faster.
Size of the batches to use. Increasing this will consume more memory but it will run faster.
min_n
Minimum of targets per source. If less, sources are removed.
seed
Expand Down
10 changes: 5 additions & 5 deletions src/rapids_singlecell/preprocessing/_hvg.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ def highly_variable_genes(
Annotate highly variable genes.
Expects logarithmized data, except when `flavor='seurat_v3','pearson_residuals','poisson_gene_selection'`, in which count data is expected.
Reimplentation of scanpy's function.
Reimplementation of scanpy's function.
Depending on flavor, this reproduces the R-implementations of Seurat, Cell Ranger, Seurat v3 and Pearson Residuals.
Flavor `poisson_gene_selection` is an implementation of scvi, which is based on M3Drop. It requiers gpu accelerated pytorch to be installed.
Flavor `poisson_gene_selection` is an implementation of scvi, which is based on M3Drop. It requires gpu accelerated pytorch to be installed.
For these dispersion-based methods, the normalized dispersion is obtained by scaling
with the mean and standard deviation of the dispersions for genes falling into a given
Expand Down Expand Up @@ -98,7 +98,7 @@ def highly_variable_genes(
Returns
-------
upates `adata.var` with the following fields:
updates `adata.var` with the following fields:
`highly_variable` : bool
boolean indicator of highly-variable genes
Expand Down Expand Up @@ -716,7 +716,7 @@ def _poisson_gene_selection(
This is based on M3Drop: https://github.com/tallulandrews/M3Drop
The method accounts for library size internally, a raw count matrix should be provided.
Instead of Z-test, enrichment of zeros is quantified by posterior
probabilites from a binomial model, computed through sampling.
probabilities from a binomial model, computed through sampling.
Parameters
----------
Expand All @@ -731,7 +731,7 @@ def _poisson_gene_selection(
of enrichment of zeros for each gene.
batch_key
key in adata.obs that contains batch info. If None, do not use batch info.
Defatult: ``None``.
Default: ``None``.
minibatch_size
Size of temporary matrix for incremental calculation. Larger is faster but
requires more RAM or GPU memory. (The default should be fine unless
Expand Down
2 changes: 1 addition & 1 deletion src/rapids_singlecell/preprocessing/_regress_out.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def regress_out(
batchsize
Number of genes that should be processed together. \
If `'all'` all genes will be processed together if `.n_obs` <100000. \
If `None` each gene will be analysed seperatly. \
If `None` each gene will be analysed separately. \
Will be ignored if cuML version < 22.12
verbose
Expand Down
Loading

0 comments on commit a187f4a

Please sign in to comment.