List of protein subcellular localization prediction tools
This list of protein subcellular localisation prediction tools includes software, databases, and web services that are used for protein subcellular localization prediction.
BAR 3.0 is a server for the annotation of protein sequences relying on a comparative large-scale analysis on the entire UniProt. With BAR 3.0 and a sequence you can annotate when possible: function (Gene Ontology), structure (Protein Data Bank), protein domains (Pfam). Also if your sequence falls into a cluster with a structural/some structural template/s we provide an alignment towards the template/templates based on the Cluster-HMM (HMM profile) that allows you to directly compute your 3D model. Cluster HMMs are available for downloading. (bio.tools entry)
BASys (Bacterial Annotation System) is a tool for automated annotation of bacterial genomic (chromosomal and plasmid) sequences including gene/protein names, GO functions, COG functions, possible paralogues and orthologues, molecular weights, isoelectric points, operon structures, subcellular localization, signal peptides, transmembrane regions, secondary structures, 3-D structures, reactions, and pathways. (bio.tools entry)
The beta-barrel Outer Membrane protein Predictor (BOMP) takes one or more fasta-formatted polypeptide sequences from Gram-negative bacteria as input and predicts whether or not they are beta-barrel integral outer membrane proteins. (bio.tools entry)
Bayesian PRediction Of Membrane Protein Topology (BPROMPT) uses a Bayesian Belief Network to combine the results of other membrane protein prediction methods for a protein sequence. (bio.tools entry)
BUSCA (Bologna Unified Subcellular Component Annotator) is a web-server for predicting protein subcellular localization. BUSCA integrates different tools to predict localization-related protein features as well as tools for discriminating subcellular localization of both globular and membrane proteins. (bio.tools entry)
CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations.
ComiR is a web tool for combinatorial microRNA (miRNA) target prediction. Given an messenger RNA (mRNA) in human, mouse, fly or worm genomes, ComiR predicts whether a given mRNA is targeted by a set of miRNAs. (bio.tools entry)
DAS (Dense Alignment Surface) is based on low-stringency dot-plots of the query sequence against a set of library sequences - non-homologous membrane proteins - using a previously derived, special scoring matrix. The method provides a high precision hydrophobicity profile for the query from which the location of the potential transmembrane segments can be obtained. The novelty of the DAS-TMfilter algorithm is a second prediction cycle to predict TM segments in the sequences of the TM-library. (bio.tools entry)
Deep learning architecture for predicting eukaryotic subcellular localization and web server which predicts 10 locations for arbitrary amounts of sequences that can be uploaded as .fasta or copy-pasted (bio.tools entry)
Web server which predicts targets for miRNAs and provides functional information on the predicted miRNA:target gene interaction from various online biological resources. Updates enable the association of miRNAs to diseases through bibliographic analysis and connection to the UCSC genome browser. Updates include sophisticated workflows. (bio.tools entry)
DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug (i.e. chemical) data with comprehensive drug target (i.e. protein) information. The database contains >4100 drug entries including >800 FDA approved small molecule and biotech drugs as well as >3200 experimental drugs. Additionally, >14,000 protein or drug target sequences are linked to these drug entries. (bio.tools entry)
Comprehensive guide of information relating to E. coli; home of Echobase: a database of E. coli genes characterized since the completion of the genome. (bio.tools entry)
A suite of open-source world wide web-based tools for the visualization of large-scale data sets from the model organism Arabidopsis thaliana. It can be applied to any model organism. Currently has 3 modules: a sequence conservation explorer that includes homology relationships and single nucleotide polymorphism data, a protein structure model explorer, a molecular interaction network explorer, a gene product subcellular localization explorer, and a gene expression pattern explorer. (bio.tools entry)
ESLpred is a tool for predicting subcellular localization of proteins using support vector machines. The predictions are based on dipeptide and amino acid composition, and physico-chemical properties. (bio.tools entry)
A comprehensive and fully curated database for Herb Ingredients?? Targets (HIT). Those herbal ingredients with protein target information were carefully curated. The molecular target information involves those proteins being directly/indirectly activated/inhibited, protein binders and enzymes whose substrates or products are those compounds. Those up/down regulated genes are also included under the treatment of individual ingredients. In addition, the experimental condition, observed bioactivity and various references are provided as well for user??s reference. The database can be queried via keyword search or similarity search. Crosslinks have been made to TTD, DrugBank, KEGG, PDB, Uniprot, Pfam, NCBI, TCM-ID and other databases. (bio.tools entry)
Allows predicting the subcellular localization of human proteins. This is based on various type of residue composition of proteins using SVM technique. (bio.tools entry)
idTarget is a web server for identifying biomolecular targets of small chemical molecules with robust scoring functions and a divide-and-conquer docking approach. idTarget screens against protein structures in PDB. (bio.tools entry)
lncRNAdb database contains a comprehensive list of long noncoding RNAs (lncRNAs) that have been shown to have, or to be associated with, biological functions in eukaryotes, as well as messenger RNAs that have regulatory roles. Each entry contains referenced information about the RNA, including sequences, structural information, genomic context, expression, subcellular localization, conservation, functional evidence and other relevant information. lncRNAdb can be searched by querying published RNA names and aliases, sequences, species and associated protein-coding genes, as well as terms contained in the annotations, such as the tissues in which the transcripts are expressed and associated diseases. In addition, lncRNAdb is linked to the UCSC Genome Browser for visualization and Noncoding RNA Expression Database (NRED) for expression information from a variety of sources. (bio.tools entry)
LOC3D is a database of predicted subcellular localization for eukaryotic proteins of known three-dimensional (3D) structure and includes tools to predict the subcellular localization for submitted protein sequences. (bio.tools entry)
LocDB is a manually curated database with experimental annotations for the subcellular localizations of proteins in Homo sapiens (HS, human) and Arabidopsis thaliana (AT, thale cress). Each database entry contains the experimentally derived localization in Gene Ontology (GO) terminology, the experimental annotation of localization, localization predictions by state-of-the-art methods and, where available, the type of experimental information. LocDB is searchable by keyword, protein name and subcellular compartment, as well as by identifiers from UniProt, Ensembl and TAIR resources. (bio.tools entry)
LOCtarget is a tool for predicting, and a database of pre-computed predictions for, sub-cellular localization of eukaryotic and prokaryotic proteins. Several methods are employed to make the predictions, including text analysis of SWISS-PROT keywords, nuclear localization signals, and the use of neural networks. (bio.tools entry)
Prediction based on mimicking the cellular sorting mechanism using a hierarchical implementation of support vector machines. LOCtree is a comprehensive predictor incorporating predictions based on PROSITE/PFAM signatures as well as SwissProtkeywords.
Framework to predict localization in life's three domains, including globular and membrane proteins (3 classes for archaea; 6 for bacteria and 18 for eukaryota). The resulting method, LocTree2, works well even for protein fragments. It uses a hierarchical system of support vector machines that imitates the cascading mechanism of cellular sorting. The method reaches high levels of sustained performance (eukaryota: Q18=65%, bacteria: Q6=84%). LocTree2 also accurately distinguishes membrane and non-membrane proteins. In our hands, it compared favorably with top methods when tested on new data (bio.tools entry)
Meta subcellular localization predictor of Gram-negative protein. MetaLocGramN is a gateway to a number of primary prediction methods (various types: signal peptide, beta-barrel, transmembrane helices and subcellular localization predictors). In author's benchmark, MetaLocGramN performed better in comparison to other SCL predictive methods, since the average Matthews correlation coefficient reached 0.806 that enhanced the predictive capability by 12% (compared to PSORTb3). MetaLocGramN can be run via SOAP.
MirZ is a web server that for evaluation and analysis of miRNA. It integrates two miRNA resources: the smiRNAdb miRNA expression atlas and the E1MMo miRNA target prediction algorithm. (bio.tools entry)
Web-server specifically trained to predict the proteins which are destined to localized in mitochondria in yeast and animals particularly. (bio.tools entry)
This web-server was used to predict the subcellular localizations of mycobacterial proteins based on optimal tripeptide compositions. (bio.tools entry)
ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 85.3% to 91.4% across species. (bio.tools entry)
Software we designed to perform organelle-based colocalisation analysis from multi-fluorophore microscopy 2D, 3D and 4D cell imaging. (bio.tools entry)
PA-SUB (Proteome Analyst Specialized Subcellular Localization Server) can be used to predict the subcellular localization of proteins using established machine learning techniques. (bio.tools entry)
PharmMapper is a web server that identifies potential drug targets from its PharmTargetDB for a given input molecule. Potential targets are identified from a prediction of the spatial arrangement of features essential for a given molecule to interact with a target. (bio.tools entry)
PRED-TMBB is a tool that takes a Gram-negative bacteria protein sequence as input and predicts the transmembrane strands and the probability of it being an outer membrane beta-barrel protein. The user has a choice of three different decoding methods. (bio.tools entry)
The PREP (Predictive RNA Editors for Plants) suite predicts sites of RNA editing based on the principle that editing in plant organelles increases the conservation of proteins across species. Predictors for mitochondrial genes, chloroplast genes, and alignments input by the user are included. (bio.tools entry)
ProLoc-GO is an efficient sequence-based method by mining informative Gene Ontology terms for predicting protein subcellular localization. (bio.tools entry)
Evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization. (bio.tools entry)
Protegen is a web-based database and analysis system that curates, stores and analyzes protective antigens. Protegen includes basic antigen information and experimental evidence curated from peer-reviewed articles. It also includes detailed gene/protein information (e.g. DNA and protein sequences, and COG classification). Different antigen features, such as protein weight and pI, and subcellular localizations of bacterial proteins are precomputed. (bio.tools entry)
Proteome Analyst is a high-throughput tool for predicting properties for each protein in a proteome. The user provides a proteome in fasta format, and the system employs Psi-blast, Psipred and Modeller to predict protein function and subcellular localization. Proteome Analyst uses machine-learned classifiers to predict things such as GO molecular function. User-supplied training data can also be used to create custom classifiers. (bio.tools entry)
PSORTb (for “bacterial” PSORT) is a high-precision localization prediction method for bacterial proteins.PSORTb has remained the most precise bacterial protein subcellular localization (SCL) predictor since it was first made available in 2003. PSORTb version improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories. It is the first SCL predictor specifically geared for all prokaryotes, including archaea and bacteria with atypical membrane/cell wall topologies. (bio.tools entry)
PSORTdb (part of the PSORT family) is a database of protein subcellular localizations for bacteria and archaea that contains both information determined through laboratory experimentation (ePSORTdb dataset) and computational predictions (cPSORTdb dataset). (bio.tools entry)
psRobot is a web-based tool for plant small RNA meta-analysis. psRobot computes stem-loop small RNA prediction, which aligns user uploaded sequences to the selected genome, extracts their predicted precursors, and predicts whether the precursors can fold into stem-loop shaped secondary structure. psRobot also computes small RNA target prediction, which predict the possible targets of user provided small RNA sequences from the selected transcript library. (bio.tools entry)
pTARGET predicts the subcellular localization of eukaryotic proteins based on the occurrence patterns of location-specific protein functional domains and the amino acid compositional differences in proteins from nine distinct subcellular locations. (bio.tools entry)
RegPhos is a database for exploration of the phosphorylation network associated with an input of genes/proteins. Subcellular localization information is also included. (bio.tools entry)
RepTar is a database of miRNA target predictions, based on the RepTar algorithm that is independent of evolutionary conservation considerations and is not limited to seed pairing sites. (bio.tools entry)
RNApredator is a web server for the prediction of bacterial sRNA targets. The user can choose from a large selection of genomes. Accessibility of the target to the sRNA is considered. (bio.tools entry)
A novel cell structure-driven classifier construction approach for predicting image-based protein subcellular location by employing the prior biological structural information. (bio.tools entry)
New semi-supervised protocol that can use unlabeled cancer protein data in model construction by an iterative and incremental training strategy.It can result in improved accuracy and sensitivity of subcellular location difference detection. (bio.tools entry)
Computational system for predicting protein subchloroplast locations from its primary sequence. It can locate the protein whose subcellular location is chloroplast in one of the four parts: envelope (which consists of outer membrane and inner membrane), thylakoid lumen, stroma and thylakoid membrane. (bio.tools entry)
The SuperPred web server compares the structural fingerprint of an input molecule to a database of drugs connected to their drug targets and affected pathways. As the biological effect is well predictable, if the structural similarity is sufficient, the web-server allows prognoses about the medical indication area of novel compounds and to find new leads for known targets. Such information can be useful in drug classification and target prediction. (bio.tools entry)
Web resource for analyzing drug-target interactions. Integrates drug-related info associated with medical indications, adverse drug effects, drug metabolism, pathways and Gene Ontology (GO) terms for target proteins. (bio.tools entry)
SwissTargetPrediction is a web server for target prediction of bioactive small molecules. This website allows you to predict the targets of a small molecule. Using a combination of 2D and 3D similarity measures, it compares the query molecule to a library of 280 000 compounds active on more than 2000 targets of 5 different organisms. (bio.tools entry)
The Toxin and Toxin-Target Database (T3DB) is a unique bioinformatics resource that compiles comprehensive information about common or ubiquitous toxins and their toxin-targets. Each T3DB record (ToxCard) contains over 80 data fields providing detailed information on chemical properties and descriptors, toxicity values, protein and gene sequences (for both targets and toxins), molecular and cellular interaction data, toxicological data, mechanistic information and references. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, government documents, textbooks and scientific journals. A key focus of the T3DB is on providing ??depth?? over ??breadth?? with detailed descriptions, mechanisms of action, and information on toxins and toxin-targets. Potential applications of the T3DB include clinical metabolomics, toxin target prediction, toxicity prediction and toxicology education. (bio.tools entry)
Transcription activator-like (TAL) Effector-Nucleotide Targeter 2.0 (TALE-NT) is a suite of web-based tools that allows for custom design of TAL effector repeat arrays for desired targets and prediction of TAL effector binding sites. (bio.tools entry)
Target Fishing Dock (TarFisDock) is a web server that docks small molecules with protein structures in the Potential Drug Target Database (PDTD) in an effort to discover new drug targets. (bio.tools entry)
Tropical Disease Research (TDR) Database: Designed and developed to facilitate the rapid identification and prioritization of molecular targets for drug development, focusing on pathogens responsible for neglected human diseases. The database integrates pathogen specific genomic information with functional data for genes collected from various sources, including literature curation. Information can be browsed and queried. (bio.tools entry)
The TMpred program makes a prediction of membrane-spanning regions and their orientation. The algorithm is based on the statistical analysis of TMbase, a database of naturally occurring transmembrane proteins (bio.tools entry)
Therapeutic Target Database (TTD) has been developed to provide information about therapeutic targets and corresponding drugs. TTD includes information about successful, clinical trial and research targets, approved, clinical trial and experimental drugs linked to their primary targets, new ways to access data by drug mode of action, recursive search of related targets or drugs, similarity target and drug searching, customized and whole data download, and standardized target ID. (bio.tools entry)
The University of Minnesota Pathway Prediction System (UM-PPS) is a web tool that recognizes functional groups in organic compounds that are potential targets of microbial catabolic reactions and predicts transformations of these groups based on biotransformation rules. Multi-level predictions are made. (bio.tools entry)
YLoc is a web server for the prediction of subcellular localization. Predictions are explained and biological properties used for the prediction highlighted. In addition, a confidence estimates rates the reliability of individual predictions. (bio.tools entry)
Zinc Finger Tools provides several tools for selecting zinc finger protein target sites and for designing the proteins that will target them. (bio.tools entry)
^Tantoso E, Li KB (August 2008). "AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices". Amino Acids. 35 (2): 345–53. doi:10.1007/s00726-007-0616-y. PMID18163182. S2CID712299.
^Saravanan V, Lakshmi PT (December 2013). "APSLAP: an adaptive boosting technique for predicting subcellular localization of apoptosis protein". Acta Biotheoretica. 61 (4): 481–97. doi:10.1007/s10441-013-9197-1. PMID23982307. S2CID23858443.
^Chou KC, Shen HB (2008-01-01). "Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms". Nature Protocols. 3 (2): 153–62. doi:10.1038/nprot.2007.494. PMID18274516. S2CID226104.
^Tusnády, Gábor E.; Simon, István (Oct 1998). "Principles governing amino acid composition of integral membrane proteins: application to topology prediction 1 1Edited by J. Thornton". Journal of Molecular Biology. 283 (2): 489–506. doi:10.1006/jmbi.1998.2107. ISSN0022-2836. PMID9769220.
^Chou KC, Wu ZC, Xiao X (February 2012). "iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites". Molecular BioSystems. 8 (2): 629–41. doi:10.1039/c1mb05420a. PMID22134333.
^Goldberg T, Hecht M, Hamp T, Karl T, Yachdav G, Ahmed N, Altermann U, Angerer P, Ansorge S, Balasz K, Bernhofer M, Betz A, Cizmadija L, Do KT, Gerke J, Greil R, Joerdens V, Hastreiter M, Hembach K, Herzog M, Kalemanov M, Kluge M, Meier A, Nasir H, Neumaier U, Prade V, Reeb J, Sorokoumov A, Troshani I, Vorberg S, Waldraff S, Zierer J, Nielsen H, Rost B (July 2014). "LocTree3 prediction of localization". Nucleic Acids Research. 42 (Web Server issue): W350–5. doi:10.1093/nar/gku396. PMC4086075. PMID24848019.
^Panwar B, Raghava GP (May 2012). "Predicting sub-cellular localization of tRNA synthetases from their primary structures". Amino Acids. 42 (5): 1703–13. doi:10.1007/s00726-011-0872-8. PMID21400228. S2CID2996097.
^Magnus M, Pawlowski M, Bujnicki JM (December 2012). "MetaLocGramN: A meta-predictor of protein subcellular localization for Gram-negative bacteria". Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics. 1824 (12): 1425–33. doi:10.1016/j.bbapap.2012.05.018. PMID22705560.
^Zhu PP, Li WC, Zhong ZJ, Deng EZ, Ding H, Chen W, Lin H (February 2015). "Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition". Molecular BioSystems. 11 (2): 558–63. doi:10.1039/c4mb00645c. PMID25437899. S2CID8130819.
^Huang WL, Tung CW, Huang HL, Hwang SF, Ho SY (2007). "ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features". Bio Systems. 90 (2): 573–81. doi:10.1016/j.biosystems.2007.01.001. PMID17291684.
^Du P, Cao S, Li Y (November 2009). "SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm". Journal of Theoretical Biology. 261 (2): 330–5. Bibcode:2009JThBi.261..330D. doi:10.1016/j.jtbi.2009.08.004. PMID19679138.
^Emanuelsson O, Nielsen H, Brunak S, Von Heijne G (July 2000). "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence". Journal of Molecular Biology. 300 (4): 1005–16. doi:10.1006/jmbi.2000.3903. PMID10891285.
^Lin H, Chen W, Yuan LF, Li ZQ, Ding H (June 2013). "Using over-represented tetrapeptides to predict protein submitochondria locations". Acta Biotheoretica. 61 (2): 259–68. doi:10.1007/s10441-013-9181-9. PMID23475502. S2CID30809970.
^Gromiha MM, Ahmad S, Suwa M (April 2004). "Neural network-based prediction of transmembrane beta-strand segments in outer membrane proteins". Journal of Computational Chemistry. 25 (5): 762–7. doi:10.1002/jcc.10386. PMID14978719. S2CID3486330.
^Krogh, Anders; Larsson, Björn; von Heijne, Gunnar; Sonnhammer, Erik L.L (Jan 2001). "Predicting transmembrane protein topology with a hidden markov model: application to complete genomes". Journal of Molecular Biology. 305 (3): 567–580. doi:10.1006/jmbi.2000.4315. ISSN0022-2836. PMID11152613.
^Dreier B, Segal DJ, Barbas CF (November 2000). "Insights into the molecular recognition of the 5'-GNN-3' family of DNA sequences by zinc finger domains". Journal of Molecular Biology. 303 (4): 489–502. doi:10.1006/jmbi.2000.4133. PMID11054286. S2CID11263372.