Starcode:[3] a fast sequence clustering algorithm based on exact all-pairs search.[4]
OrthoFinder:[5] a fast, scalable and accurate method for clustering proteins into gene families (orthogroups)[6][7]
Linclust:[8] first algorithm whose runtime scales linearly with input set size, very fast, part of MMseqs2[9] software suite for fast, sensitive sequence searching and clustering of large sequence sets
TribeMCL: a method for clustering proteins into related groups[10]
BAG: a graph theoretic sequence clustering algorithm[11]
JESAM:[12] Open source parallel scalable DNA alignment engine with optional clustering software component
UICluster:[13] Parallel Clustering of EST (Gene) Sequences
BLASTClust single-linkage clustering with BLAST[14]
Clusterer:[15] extendable java application for sequence grouping and cluster analyses
PATDB: a program for rapidly identifying perfect substrings
nrdb:[16] a program for merging trivially redundant (identical) sequences
CluSTr:[17] A single-linkage protein sequence clustering database from Smith-Waterman sequence similarities; covers over 7 mln sequences including UniProt and IPI
ICAtools[18] - original (ancient) DNA clustering package with many algorithms useful for artifact discovery or EST clustering
Skipredudant EMBOSS tool[19] to remove redundant sequences from a set
CLUSS Algorithm[20] to identify groups of structurally, functionally, or evolutionarily related hard-to-align protein sequences. CLUSS webserver [21]
CLUSS2 Algorithm[22] for clustering families of hard-to-align protein sequences with multiple biological functions. CLUSS2 webserver [21]
UniRef: A non-redundant UniProt sequence database[25]
Uniclust: A clustered UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity.[26]
Virus Orthologous Clusters:[27] A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity
^Kelil A, Wang S, Brzezinski R (2008). "CLUSS2: an alignment-independent algorithm for clustering protein families with multiple biological functions". International Journal of Computational Biology and Drug Design. 1 (2): 122–40. doi:10.1504/ijcbdd.2008.020190. PMID20058485.