Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about create variat #183

Open
wjzzq opened this issue Nov 29, 2023 · 2 comments
Open

Question about create variat #183

wjzzq opened this issue Nov 29, 2023 · 2 comments

Comments

@wjzzq
Copy link

wjzzq commented Nov 29, 2023

Dear Egor,

I want to use ExpansionHunter to identify tandem repeats variants on a plant genome. I used TRF results to generate variant-catalog as below:

[
{
"LocusId": "Cor-Chr1_273_295",
"LocusStructure": "(AT)",
"ReferenceRegion": "Cor-Chr1:273-295",
"VariantId": "Cor-Chr1_273_295",
"VariantType": "Repeat"
},
{
"LocusId": "Cor-Chr1_3195_3215",
"LocusStructure": "(AG)
",
"ReferenceRegion": "Cor-Chr1:3195-3215",
"VariantId": "Cor-Chr1_3195_3215",
"VariantType": "Repeat"
}

}

When I run ExpansionHunter, I get the following error, I want to ask if the variant-catalog file format I generated is wrong?

ExpansionHunter --reads ZD31.sorted.bam
--reference Cbp_pan.fasta
--variant-catalog ../Expansionhunter/Cbp_STR.json
--output-prefix ../Expansionhunter
2023-11-29T01:30:34,[Starting ExpansionHunter v5.0.0]
2023-11-29T01:30:34,[Analyzing sample ZD31.sorted]
2023-11-29T01:30:34,[Initializing reference Cbp_pan.fasta]
2023-11-29T01:30:34,[Loading variant catalog from disk ../Expansionhunter/Cbp_STR.json]
2023-11-29T01:30:35,[Unexpected range format: Cor-Chr1:273-295]

Best whishes!

Zhiqin

@andreasssh
Copy link

I'm not Egor, but I bet the problem is that the chromosome name contains a "-" which is used to split the string to get the range. So, don't use "-" and ":" in chromosome names and try again. Secondly, for the locus structure you might want to add * or + after parentheses, e.g.: (AT)*

If helpful, I also have a script for converting TRF output file (DAT) to an EH catalogue file, available here: https://gitlab.com/andreassh/trf2strcat

@wjzzq
Copy link
Author

wjzzq commented Dec 6, 2023

I'm not Egor, but I bet the problem is that the chromosome name contains a "-" which is used to split the string to get the range. So, don't use "-" and ":" in chromosome names and try again. Secondly, for the locus structure you might want to add * or + after parentheses, e.g.: (AT)*

If helpful, I also have a script for converting TRF output file (DAT) to an EH catalogue file, available here: https://gitlab.com/andreassh/trf2strcat

Thank you very much! Based on your suggestion, I successfully ran ExpansionHunter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants