This tool clusters and filters out artificially replicated sequences in 454 data. It returns a fasta file of unique sequences and a list of the sequences in each cluster. This tool is described in Gomez-Alvarez V, Teal TK, Schmidt TM, Systematic artifacts in metagenomes from complex microbial communities, ISME J. 2009 Jul 9
Note:
Sequences that cluster together by CD-HIT and start
with the same beginning base pairs are identified as replicates and clustered.
If many sequences are expected to look similiar and start at the same position, this is not
the right tool for your data, e.g. 454 tag data.
Availability:
These scripts are all open source and distributed under the Gnu GPL.
They can also be run at the command line without the web interface.
The scripts are available for download here.
If you would like to be added to the mailing list for updates to the scripts, please just contact the authors.
Comments/Questions:
If you have any comments or questions about these programs, please contact the authors.