Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct a generation-level pan-genome #4295

Open
jwli-code opened this issue May 16, 2024 · 5 comments
Open

Construct a generation-level pan-genome #4295

jwli-code opened this issue May 16, 2024 · 5 comments

Comments

@jwli-code
Copy link

jwli-code commented May 16, 2024

In the future, I want to map the resequencing data of different species of onto my map genome.Then the group vcf with good classification is used to carry out the process of vgmpmap
I wonder if this line of thinking makes sense.

Thanks.

@jeizenga
Copy link
Contributor

If the subgenomes contain homologous chromosomes with each other, it would probably make more sense to build graphs for those chromosomes all together. Otherwise, you can create a lot of mapping ambiguity between sequences that are shared between the homologs.

@jwli-code
Copy link
Author

I would like to ask about merging vcf files from the results of minigraph and then filtering for SNP variants (excluding variants larger than 50bp). After that, I plan to process the population data using pangenie and use the population's vcf for the graph transcriptome workflow. I'm not sure if this is the correct approach. How effective is pangenie software for handling small SNP variations? Are there any recommended short genotyper software for SNPs?

@jeizenga
Copy link
Contributor

@glennhickey might know better, but I believe PanGenie does not genotype any small variants. In humans, our collaborators have achieved very good SNP accuracy by projecting graph alignments to a reference with vg surject and then using DeepVariant. I'm unsure if the DeepVariant pipelines can handle non-diploid genomes though. We also have experimental features to call SNPs on non-reference sequences.

@glennhickey
Copy link
Contributor

PanGenie is pretty good for SNPs, but DeepVariant does much better on GIAB benchmarks. I don't think either works on non-diploid genomes, though.

For PanGenie, I think you're better off filtering SVs after genotyping instead of before.

Please avoid using #4113 for the time being.

@jwli-code
Copy link
Author

jwli-code commented May 20, 2024

@glennhickey Thanks. If a polyploid plant is an allopolyploid formed by the hybridization of two species, would it also face the same issues? The chromosome numbers of the two species are inconsistent, and only a small portion of their genetic material is homologous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants