(synopsis"CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.")
(description"CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.")