Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynast count error:forward_strand = df_counts['GX'].map(lambda gx: gene_infos[gx]['strand']) == '-' KeyError: '-' #17

Open
mingjianPeng opened this issue Apr 14, 2023 · 0 comments

Comments

@mingjianPeng
Copy link

I encountered some problems while running dynast. I would appreciate it if the author could help me with them. Thank you very much.

1. My library is generated using the C4 kit from BGI Genomics, and there is no option in dynast --list to support this technology. So, I would like to know if there is a way to directly analyze C4 data.

2. I tried using the smartseq technology option to analyze only the R2_fastq file, but encountered problems during the counting process. Below is my code:
align
STAR --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI AS NM nM MD GX GN RG
--outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 --soloType SmartSeq --soloUMIdedup Exact
--soloStrand Unstranded --genomeDir ${dynast}/star --runThreadN 20
--outFileNamePrefix ${dynast_align} --outBAMsortingBinsN 12490 \ ###Due to the limitation of my folder, which can create a maximum of 10,000 files, I removed the "--outBAMsortingBinsN" option.###
--readFilesIn cDNAlib_2.fq --outSAMattrRGline ID:C4
count
dynast count -t 40 -g $gtf --no-splicing -o $my_path/dynast_count/ --barcode-tag RG
--conversion TC ${dynast_align}/Aligned.sortedByCoord.out.bam --tmp ${my_path}/tmp --strand unstranded --verbose --keep-tmp

error:
/home/mingjian/.local/lib/python3.8/site-packages/anndata-0.9.0rc1-py3.8.egg/anndata/experimental/pytorch/_annloader.py:18: UserWarning: Сould not load pytorch.
warnings.warn("Сould not load pytorch.")
[2023-04-14 14:04:10,117] DEBUG [main] Printing verbose output
[2023-04-14 14:04:10,117] DEBUG [main] Input args: Namespace(bam='/home/mingjian/workbase/03_zebra5.5hpf/02_C4_5.5hpf_20230412/dynast_result/dynast_align/Aligned.sortedByCoord.out.bam', barcode_tag='RG', barcodes=None, command='count', control=False, conversion=['TC'], dedup_mode=None, exon_overlap='strict', g='/home/mingjian/workbase/01_zebrafish_embryo/zebrafish_ensenmble_ref/Danio_rerio.GRCz11.108.gtf', gene_names=False, gene_tag='GX', keep_tmp=True, list=False, nasc=False, no_splicing=True, o='/home/mingjian/workbase/03_zebra5.5hpf/02_C4_5.5hpf_20230412/dynast_result/dynast_count/', overwrite=False, quality=27, snp_csv=None, snp_min_coverage=1, snp_threshold=None, strand='unstranded', t=40, tmp='/home/mingjian/workbase/03_zebra5.5hpf/02_C4_5.5hpf_20230412/dynast_result/tmp', umi_tag=None, verbose=True)
[2023-04-14 14:04:10,117] DEBUG [main] Creating /home/mingjian/workbase/03_zebra5.5hpf/02_C4_5.5hpf_20230412/dynast_result/tmp directory
[2023-04-14 14:04:10,117] WARNING [main] --barcodes not provided. All cell barcodes will be processed.
[2023-04-14 14:04:12,629] WARNING [count] BAM contains secondary alignments, which will be ignored. Only primary alignments are considered.
[2023-04-14 14:04:12,916] WARNING [count] Skipped BAM parsing because files already exist. Use --overwrite to re-parse the BAM.
[2023-04-14 14:04:13,788] INFO [count] Counting conversions to /home/mingjian/workbase/03_zebra5.5hpf/02_C4_5.5hpf_20230412/dynast_result/dynast_count/counts_TC.csv
[2023-04-14 14:04:13,788] DEBUG [count] Loading index /home/mingjian/workbase/03_zebra5.5hpf/02_C4_5.5hpf_20230412/dynast_result/dynast_count/conversions.idx
[2023-04-14 14:04:14,784] DEBUG [count] Splitting indices into 40 parts
[2023-04-14 14:04:15,414] DEBUG [count] Spawning 40 processes
counting: 100%|########################################################| 9.64M/9.64M [00:29<00:00, 332kit/s]
[2023-04-14 14:04:48,072] DEBUG [count] Combining intermediate parts to /home/mingjian/workbase/03_zebra5.5hpf/02_C4_5.5hpf_20230412/dynast_result/tmp/tmpcw0f0gh9
[2023-04-14 14:04:49,207] DEBUG [count] Loading combined counts from /home/mingjian/workbase/03_zebra5.5hpf/02_C4_5.5hpf_20230412/dynast_result/tmp/tmpcw0f0gh9
[2023-04-14 14:04:56,822] ERROR [main] An exception occurred
Traceback (most recent call last):
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/dynast/main.py", line 1052, in main
COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=args.tmp)
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/dynast/main.py", line 829, in parse_count
count(
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/ngs_tools/logging.py", line 62, in inner
return func(*args, **kwargs)
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/dynast/count.py", line 306, in count
counts_path = preprocessing.count_conversions(
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/dynast/preprocessing/conversion.py", line 541, in count_conversions
df_counts = complement_counts(read_counts(combined_path), gene_infos)
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/dynast/preprocessing/conversion.py", line 116, in complement_counts
forward_strand = df_counts['GX'].map(lambda gx: gene_infos[gx]['strand']) == '-'
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/pandas/core/series.py", line 4539, in map
new_values = self._map_values(arg, na_action=na_action)
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/pandas/core/base.py", line 890, in _map_values
new_values = map_f(values, mapper)
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/pandas/core/base.py", line 873, in
map_f = lambda values, f: values.map(f)
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 1533, in map
new_categories = self.categories.map(mapper)
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 6361, in map
new_values = self._map_values(mapper, na_action=na_action)
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/pandas/core/base.py", line 890, in _map_values
new_values = map_f(values, mapper)
File "pandas/_libs/lib.pyx", line 2924, in pandas._libs.lib.map_infer
File "/home/mingjian/miniconda3/envs/dynast/lib/python3.8/site-packages/dynast/preprocessing/conversion.py", line 116, in
forward_strand = df_counts['GX'].map(lambda gx: gene_infos[gx]['strand']) == '-'
KeyError: '-'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant