storing num reads in metagenomes in samples.db #8

meren · 2017-09-11T16:24:03Z

Currently we can get number of metagenomic reads the mapping software took into consideration the following way by parsing the logs:

for i in `ls *bowtie.log`
do
    echo -n "$i" | awk 'BEGIN{FS="-"}{printf("%s ", $2)}'
    grep 'reads; of these:' $i | awk '{print $1}'
done

It would have been very useful to have this in the resulting samples.db in the merged profile.

For this we need two things;

Generating this information for Bowtie (and later for other mapping software) as a TAB-delimited file :)
Adding a new parameter to anvi-profile (i.e., --num-reads-in-source-mg) so each profile stores this information. anvi-merge can take this into consideration, and update the samples.db (which currently keeps track of mapped reads after latest changes).

This will require some thinking and organization, but nothing myself, @ozcan, and @ShaiberAlon can't figure out :)

The text was updated successfully, but these errors were encountered:

ShaiberAlon · 2017-10-08T02:23:17Z

@meren, it looks to me like this should be in issue in anvio and not in MerenLab-workflows.

For now, we can generate the table using logs the way you mentioned (only thing is, I'm not sure if snakemake would allow us to use the log from one rule as an input for another rule). Obviously using logs as input for a rule is ugly. Since we run QC on all metagenomes, and since I generate the table of stats for the metagenomes, we could use that output to get the number of reads in the metagenome. I'm confused by your wording: "number of metagenomic reads the mapping software took into consideration", what is the difference between that number and the total number of reads in the metagenome?

meren · 2017-10-09T07:58:04Z

We could learn the numbers from the QC step, but we do not necessarily QC every metagenomes. Therefore the most reliable source is the mapping software, i.e., how many reads the mapping software considered.

Does this make sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storing num reads in metagenomes in samples.db #8

storing num reads in metagenomes in samples.db #8

meren commented Sep 11, 2017 •

edited

Loading

ShaiberAlon commented Oct 8, 2017

meren commented Oct 9, 2017

storing num reads in metagenomes in samples.db #8

storing num reads in metagenomes in samples.db #8

Comments

meren commented Sep 11, 2017 • edited Loading

ShaiberAlon commented Oct 8, 2017

meren commented Oct 9, 2017

meren commented Sep 11, 2017 •

edited

Loading