Working with very large datasets (millions of unique amplicons)

Operating systems often buffer the output of programs before writing into files. The longer it takes to fill in the buffer, the less often it is flushed to a file. The stats file is small, while the swarm file receives much more data. Consequently, the stats file might be lagging behind the swarm file by several clusters.

If you are using swarm with the -o output.swarms to indicate an output file for swarms, your OS might buffer the entire swarm file before flushing it to the output file at the end of the clustering process. If you don't want the OS to buffer your results, use a redirection >:

swarm input.fasta > output.swarms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with very large datasets (millions of unique amplicons)

Working with very large datasets (millions of unique amplicons)

Clone this wiki locally