Skip to content

Calculate median number of times each k-mer in a sequence occurs across a set

Notifications You must be signed in to change notification settings

CobiontID/fastk-medians

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

fastk-medians

A set of utilities to calculate the median number of times the k-mers in a sequence of interest occur across the whole set. For high-quality sequencing reads, this provides a rough approximation of coverage.

The code is based on the Profex module of FastK, and is intended to be used in conjunction with FastK. Though it is possible to obtain medians by processing the standard Profex output, performing the calculations directly via the C-library interface is more efficient.

ProfMedian

This program is based on FastK's Profex (commit 8b5b988), but returns the median count for each read in a list, instead of a full list of k-mer counts. It requires a FastK profile file as input.

Usage:

Generate a profile for a read set using FASTK:

./FastK -k<k-mer length> -p -v <file with sequences>

Then get the median counts:

./ProfMedian <profile: .prof> <read id: list of integers >= 1>

ProfMedianAll

This tool performs the same calculation as ProfMedian, but iterates through the full read set in the profile.

Usage: ./ProfMedianAll <profile: .prof>

Installation

Download the files from this repository, navigate to the src directory, and run make. This will compile the programs described above, as well as the standard FastK binaries (commit f7365de).

Additional info on FastK

For the original FastK Readme, see here or visit https://github.com/thegenemyers/FASTK

Licensing

This repository incorporates code from multiple pieces of software, for which the terms can be found under:

  • FastK: src/LICENSE
  • Htslib: src/HTSLIB/LICENSE
  • Libdeflate: src/LIBDEFLATE/COPYING

About

Calculate median number of times each k-mer in a sequence occurs across a set

Topics

Resources

Stars

Watchers

Forks

Languages

  • C 93.9%
  • Perl 1.9%
  • Makefile 1.4%
  • Shell 1.2%
  • Roff 0.8%
  • M4 0.7%
  • Other 0.1%