-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About metagenomic data #8
Comments
Hi Jin, In its current state, it is expecting to find combined.bed, hap1.bed, and hap2.bed. However, this is clearly a situation where it could generate combined information without any haplotype specific outputs. I'll make a note to see if we can create an option to disable this requirement (and the corresponding phased outputs). It may be a while as I'm currently on FTO. In the interim, if you want to try to progress with the existing MethBat version, you should be able to just mock the files. I think if you create an empty file with the corresponding filenames (one for hap1, one for hap2), it will recognize it as an empty dataset and continue past the above error. Matt |
Hi Matt, Thank you for your detailed response.
The region file is generated by
This file looks like this:
Do you have any suggestions? Also, I have a slight concern about the generated region result. I found the summary label column to be all Unmethylated. Does this suggest something abnormal in my sequencing data or possible data processing mistakes? Many thanks. Jin |
Jin, To be clear, the output of
In human data, we usually see a variety of states, not just unmethylated. I'm not sure what to expect in metagenomics, but I assume at least some would be methylated. @dportik Do you have any expectations around what we would see if we ran a methylation segmentation algorithm on metagenomic data? Matt |
I spoke with Dan and will relay the key parts of that conversation. The short version is we don't have data on the accuracy of methylation in metagenomics, so we don't really have a clear set of expectations to convey here. We have not performed any validations on metagenomic datasets, so you're definitely in an experimental area just with the baseline 5mC calls. pb-CpG-tools (and MethBat as a consequence) were also designed with diploid organisms in mind, so there are likely some assumptions that will not match up correctly to metagenomics context. Sorry this probably isn't a satisfactory answer, but let me know if you have any follow up questions! Matt |
Dear Matt, Thank you for your response! Your explanation is very helpful. I would like to further clarify the difference between using metagenomic data and diploid data for methylation analysis. Is this difference related to the process of obtaining the methylation profile, which typically requires negative control data? I understand that in many cases, whole genome amplification (WGA) is used to generate unmethylated DNA as the control data. And due to a real control data is not always avaliable, many tools use in silico control to do prediction. Considering this, even though I have HiFi data with kinetic information and MM/ML tags indicating the 5-methylcytosine (5mC) location and score (generated by the SMRT Link tool) in the BAM file, would I still need additional control data to accurately determine the methylation locations? Or do I need to redo the analysis(I think inside the SMRTlink, jasmine is used for 5mC methylation detection) I also came across a tool called ipdSummary, which is part of the KineticTools suite provided by PacBio. This tool supports the detection of 6mA and 4mC modifications. However, according to an issue I found, ipdSummary may not correctly detect 5mC because it uses metagenomic data for training(from this issue). Based on this information, my understanding is that the pipeline consisting of Jasmine, pb-CpG-tool, and MethBat is primarily designed for human genome analysis, which predominantly involves 5mC methylation. In contrast, metagenomic data often exhibits a higher prevalence of 6mA and 4mC modifications. This difference in the predominant methylation types could potentially explain why I obtained all unmethylated labels when using MethBat segmentation on my metagenomic data. Please let me know if my understanding is correct or if there are any additional factors I should consider. Many, many thanks! Jin |
Jin,
My understanding is that this is how the models for methylation were trained for human datasets.
My apologies, but this isn't really something I can speak to. Thus far, I have only used our methylation tooling for human datasets. Given that your scope of questions is growing, I would recommend contacting your PacBio representative to get more information on how methylation might work with metagenomic datasets. What I can say for sure is that MethBat is not designed to work with non-diploid organisms.
I'm unfamiliar with this tooling, so I cannot speak to its performance. You would likely get more information by opening an issue there or through your PacBio representative.
I can't speak to the limits of Jasmine, but pb-CpG-tools and MethBat are both intrinsically tied to 5mC and diploid analysis. Anything outside that scope might work, but we are not building the tooling with it in mind. Sorry there wasn't much I could really answer there, a lot of your questions are beyond the scope of MethBat's functionality at this time. Matt |
Matt, Thank you so much for your kind support. It really helps :) Jin |
Hi,
Thank you for implementing this great tool.
I am trying to use this tool to get methylation profile for my metagenomic data, but I got the error:
pb-CpG-tool
does not generate hap1 adn hap2 file as it is metagenomic data. But I still wonder can we use this tool to ge methylation profile?Best,
Jin
The text was updated successfully, but these errors were encountered: