Databases

The Human Metabolome Database (HMDB) is a freely available database containing detailed information about small molecule metabolites found in the human body. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education. The database is designed to contain or link three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data. HMDB contains over 7900 metabolite entries including both water-soluble and lipid soluble metabolites as well as metabolites that would be regarded as either abundant (> 1 uM) or relatively rare (< 1 nM). Additionally, approximately 7200 protein (and DNA) sequences are linked to these metabolite entries.

References:

1. Wishart DS, Tzur D, Knox C, et al., HMDB: the Human Metabolome Database. Nucleic Acids Res. 2007 Jan;35(Database issue):D521-6.

2. Wishart DS, Knox C, Guo AC, et al., HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res. 2009 37(Database issue):D603-610.

3. Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, et al., HMDB 3.0 — The Human Metabolome Database in 2013. Nucleic Acids Res. 2013. Jan 1;41(D1):D801-7.

4. Wiki: http://en.wikipedia.org/wiki/HMDB


The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental drugs. Additionally, more than 2,500 non-redundant protein (i.e. drug target) sequences are linked to these FDA approved drug entries. Each DrugCard entry contains more than 100 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.

References:

1. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS. "DrugBank 4.0: shedding new light on drug metabolism". Nucleic Acids Res. 2014 Jan 1;42(1):D1091-7.

2. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS. "DrugBank 3.0: a comprehensive resource for 'omics' research on drugs". Nucleic Acids Res. 2011 Jan;39(Database issue):D1035-41.

3. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. "DrugBank: a knowledgebase for drugs, drug actions and drug targets". Nucleic Acids Res 2008 Jan;36(Database issue):D901-6.

4. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. "DrugBank: a comprehensive resource for in silico drug discovery and exploration". .Nucleic Acids Res. 2006 Jan 1;34(Database issue):D668-72.

5. Wiki: http://en.wikipedia.org/wiki/Drugbank


The Protein Property Prediction and Testing Database (PPT-DB) is a collection of protein property databases for over 20 different protein properties including secondary structure, trans-membrane helices and beta barrels, accessible surface area, signal peptides, and more.  

References:

1. David S. Wishart, David Arndt, Mark Berjanskii, An Chi Guo, Yi Shi, Savita Shrivastava, Jianjun Zhou, You Zhou and Guohui Lin: "PPT-DB: the protein property prediction and testing database". Nucleic Acids Research 2008 36(Database issue):D222-D229


FooDB is a database on food constituents, chemistry and biology that has been under development since 2009. It currently has data on 28,500 food compounds and food associations. It is being jointly developed with Dr. Augustin Scalbert (IARC, Lyon). When completed in late 2012 or early 2013 it will be the most comprehensive resource on food composition in the world. It will provide information on both macronutrients and micronutrients, including many of the constituents that give foods their flavor, color, taste, texture and aroma. The link provided here gives some sample pages from the database.


The Toxin and Toxin Target Database (T3DB) is a unique bioinformatics resource that combines detailed toxin data with comprehensive toxin target information. The database currently houses over 2900 toxins described by over 34 200 synonyms, including pollutants, pesticides, drugs, and food toxins, which are linked to over 1300 corresponding toxin target records. Altogether there are over 33 800 toxin, toxin target associations.

References:

1. Lim E, Pon A, Djoumbou Y, Knox C, Shrivastava S, Guo AC, Neveu V, Wishart DS. T3DB: a comprehensively annotated database of common toxins and their targets. Nucleic Acids Res. 2010 Jan 38(Database issue):D781-6.

2. Wiki: http://en.wikipedia.org/wiki/T3DB


The Small Molecule Pathway Database (SMPDB) is an interactive, visual database containing nearly 450 small molecule pathways found in humans. These include standard metabolic pathways (90), disease pathways (116), drug pathways (223) and metabolic signaling pathways (13). More than 70% of the pathways in SMPDB are found in no other pathway database (not even KEGG or HumanCyc). SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, proteomics and systems biology.

References:

1. Wishart DS, Frolkis A, Knox C, et al., SMPDB: The Small Molecule Pathway Database. Nucleic Acids Res. 2010 Jan;38(Database issue):D480-7.

2. Jewison T, Su Y, Disfany FM, et al., SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database Nucleic Acids Res. 2013 Submitted.

3. Wiki: http://en.wikipedia.org/wiki/SMPDB


The CSF Metabolome database is a freely available electronic database containing detailed information about 468 small molecule metabolites found in human CSF along with 1650 concentration values. The data tables may be sorted and searched by concentration values and ranges. The information includes literature and experimentally derived chemical data, clinical data and molecular/biochemistry data.

References:

1. Wishart DS, Lewis MJ, Morrissey JA et al. The human cerebrospinal fluid metabolome. J Chromatogr B Analyt Technol Biomed Life Sci. 2008 Aug 15;871(2):164-73.


The Serum Metabolome database is a freely available electronic database containing detailed information about 4651 small molecule metabolites found in human serum along with 10895 concentration values. The data tables may be sorted and searched by concentration values and ranges. The information includes literature and experimentally derived chemical data, clinical data and molecular/biochemistry data.

References:

1. Psychogios N, Hau DD, Peng J, Guo AC, Mandal R, Bouatra S, Sinelnikov I, Krishnamurthy R, Eisner R, Gautam B, Young N, Xia J, Knox C, Dong E, Huang P, Hollander Z, Pedersen TL, Smith SR, Bamforth F, Greiner R, McManus B, Newman JW, Goodfriend T, Wishart DS. "The human serum metabolome". PLoS One. 2011 Feb 16;6(2):e16957.


The CyberCell Database (CCDB) is a comprehensive, web-accessible database designed to support and coordinate international efforts in modeling an Escherichia coli cell on a computer. The CCDB brings together both observed and derived quantitative data from numerous independent sources covering many aspects of the genomic, proteomic and metabolomic character of E.coli (strain K12).

References:

1. Shan Sundararaj, Anchi Guo, Bahram Habibi-Nazhad, Melania Rouani,Paul Stothard, Michael Ellison,and David S. Wishart "The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli" Nucleic Acids Res. 2004 January 1; 32 (Database issue): D293.D295


The Yeast Metabolome Database (YMDB) is a manually curated database of small molecule metabolites found in or produced by Saccharomyces cerevisiae(also known as Baker’s yeast and Brewer’s yeast). This database covers metabolites described in textbooks, scientific journals, metabolic reconstructions and other electronic databases. YMDB contains metabolites arising from normal S. cerevisiae metabolism under defined laboratory conditions as well as metabolites generated by S. cerevisiae when used in baking and in the production of wines, beers and spirits. YMDB currently contains 2010 small molecules with 857 associated enzymes and 138 associated transporters.

References:

1. Jewison T, Neveu V, Lee J, Knox C, Liu P, Mandal R, Murthy RK, Sinelnikov I, Guo AC, Wilson M, Djoumbou Y and Wishart DS. "YMDB: The Yeast Metabolome Database". Nucleic Acids Res. 2012 Jan;40(Database ussue):D815-20

2. Wiki: http://en.wikipedia.org/wiki/YMDB


The Bovine Metabolome Database (BMDB) The Bovine Metabolome Database (BMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in beef and dairy cattle. The information includes literature and experimentally derived information on bovine meat, bovine serum, bovine milk, bovine urine and bovine ruminal fluid.


E. coli Metabolome Database (ECMDB) is a freely available eletronic database containing detailed information about the >1620 metabolites found in E. coli (strain K12, MG1655). The information includes literature and experimentally derived information on the chemical data, spectral data and the molecular/biochemistry data.

References

1. ECMDB: The E. coli Metabolome Database. Guo AC, Jewison T, Wilson M, Liu Y, Knox C, Djoumbou Y, Lo P, Mandal R, Krishnamurthy R, Wishart DS. Nucleic Acids Res. 2012 Oct 29.

2. Wiki: https://en.wikipedia.org/wiki/E._Coli_Metabolome_Database


MarkerDB will be a freely available resource that attempts to consolidate information on all known clinical biomarkers into a single source. Multiple types of markers are covered including metabolite based, genetic based, protein based and cell based markers.


BacMap is an interactive visual database containing all publicly available bacterial genomes. A fully labeled and zoomable genome map is provided for each genome. Sequence and text queries can be used to identify genes of interest, or maps can be navigated using a simple interface. BacMap is designed to serve as an intuitive and convenient tool for identifying orthologues and paralogues, studying operon conservation, and determining gene function.

References:

1. Joseph Cruz, Yifeng Liu, [...], and David S. Wishart. BacMap: an up-to-date electronic atlas of annotated bacterial genomes. Nucleic Acids Res Jan, 2012; 40(D1):D599-D604

2. Stothard P, Van Domselaar G, Shrivastava S, Guo A, O'Neill B, Cruz J, Ellison M, Wishart DS (2005) BacMap: an interactive picture atlas of annotated bacterial genomes. Nucleic Acids Res 33:D317-D320

2. Wiki: http://en.wikipedia.org/wiki/BacMap


The Re-referenced Protein Chemical shift Database (RefDB) is a database of carefully corrected or re-referenced chemical shifts, derived from the BioMagRes Bank. The process involves predicting protein 1H, 13C and 15N chemical shifts using X-ray or NMR coordinate data via SHIFTX and then comparing those predictions to the observed shifts reported in the BMRB (via SHIFTCOR). RefDB provides a standard chemical shift resource for NMR spectroscopists, wishing to derive or compute chemical shift trends in peptides and proteins.

References:

1. Haiyan Zhang, Stephen Neal and David Wishart (2003) "RefDB: A database of uniformly referenced protein chemical shifts" Journal of Biomolecular NMR, 25: 173-195

2. Wiki: https://en.wikipedia.org/wiki/RefDB_(chemistry)


Web Servers

SuperPose is a protein superposition server. SuperPose calculates protein superpositions using a modified quaternion approach. From a superposition of two or more structures, SuperPose generates sequence alignments, structure alignments, PDB coordinates, RMSD statistics, Difference Distance Plots, and interactive images of the superimposed structures. The SuperPose web server supports the submission of either PDB-formatted files or PDB accession numbers.

References:

1. Rajarshi Maiti, Gary H. Van Domselaar, Haiyan Zhang, and David S. Wishart "SuperPose: a simple server for sophisticated structural superposition" Nucleic Acids Res. 2004 July 1; 32 (Web Server issue): W590W594.


VADAR (Volume, Area, Dihedral Angle Reporter) is a compilation of more than 15 different algorithms and programs for analyzing and assessing peptide and protein structures from their PDB coordinate data.

References:

1. Leigh Willard, Anuj Ranjan,Haiyan Zhang,Hassan Monzavi, Robert F. Boyko, Brian D. Sykes, and David S. Wishart "VADAR: a web server for quantitative evaluation of protein structure quality" Nucleic Acids Res. 2003 July 1; 31 (13): 3316.3319

2. Wiki: https://en.wikipedia.org/wiki/VADAR


MetaboAnalyst is a comprehensive, Web-based tool designed for processing, analyzing, and interpreting metabolomic data. It handles most of the common metabolomic data types including compound concentration lists, spectral bin lists, peak lists, and raw MS spectra.

References:

1. Xia J1, Psychogios N, Young N, Wishart DS. "MetaboAnalyst: a web server for metabolomic data analysis and interpretation". Nucleic Acids Res. 2009 Jul;37(Web Server issue):W652-60. doi: 10.1093/nar/gkp356. Epub 2009 May 8.

2. Wiki: http://en.wikipedia.org/wiki/MetaboAnalyst


MetATT is a easy-to-use, web-based tool designed for time-series and two-factor metabolomics data analysis. MetATT offers a number of complementary approaches including 3D interactive principal component analysis, two-way heatmap visualization, two-way ANOVA, ANOVA-simultaneous component analysis and multivariate empirical Bayes time-series analysis.

References:

1. Xia J1, Sinelnikov IV, Wishart DS.MetATT: a web-based metabolomics tool for analyzing time-series and two-factor datasets.Bioinformatics. 2011 Sep 1;27(17):2455-6. doi: 10.1093/bioinformatics/btr392. Epub 2011 Jun 27.


MetPA (Metabolomics Pathway Analysis) is a free and easy-to-use web application designed to perform pathway analysis and visualization of quantitative metabolomic data.

References:

1. Jianguo Xia and David S. Wishart. MetPA: a web-based metabolomics tool for pathway analysis and visualization. Bioinformatics (2010) 26 (18): 2342-2344. 

2. Wiki: http://en.wikipedia.org/wiki/MetPA


MSEA is a web-based tool to help identify and interpret patterns of metabolite concentration changes in a biologically meaningful context for human and mammalian metabolomic studies.

References:

1. Xia J, Wishart DS. MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W71-7. doi: 10.1093/nar/gkq329. Epub 2010 May 10.

2. Wiki: https://en.wikipedia.org/wiki/Metabolite_Set_Enrichment_Analysis


MetaboMiner is a tool which can be used to automatically or semi-automatically identify metabolites in complex biofluids from 2D NMR spectra. MetaboMiner is able to handle both 1H-1H total correlation spectroscopy (TOCSY) and 1H-13C heteronuclear single quantum correlation (HSQC) data. It identifies compounds by comparing 2D spectral patterns in the NMR spectrum of the biofluid mixture with specially constructed libraries containing reference spectra of approximately 500 pure compounds.

References:

1. Jianguo Xia, Trent C Bjorndahl, Peter Tang and David S Wishart. "MetaboMiner – semi-automated identification of metabolites from 2D NMR spectra of complex biofluids". BMC Bioinformatics 2008, 9:507 doi:10.1186/1471-2105-9-507


PolySearch supports >50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is 'Given X, find all Y's' where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites.

References:

1. Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W399-405. doi: 10.1093/nar/gkn296. Epub 2008 May 16.


Receiver Operating Characteristic (ROC) curves are generally considered the method of choice for evaluating the performance of potential biomarkers. ROCCET is a freely available web-based tool designed to assist clinicians and bench biologists in performing common ROC based analyses on their metabolomic data using both classical univariate and more recently developed multivariate approaches.

References:

1. Jianguo Xia, David I Broadhurst, Michael Wilson, and David S Wishart (2012) Translational Biomarker Discovery in Clinical Metabolomics: An Introductory Tutorial . Metabolomics, 11/2012.



Proteus is a high-performing integrated web server and a stand-alone application three high-performing de novo structure prediction methods (PSIPRED, JNET and TRANSSEC [a locally developed predictor]), a jury-of-experts consensus tool and a robust PDB-based structure alignment process to generate all of its secondary structure predictions. For water-soluble protein Proteus is able to achieve a very high level of accuracy (Q3=88%, SOV=90%). In the rare situation (20-30%) where a query protein shows no similarity whatsoever to any known structure, PROTEUS is still able to achieve a Q3 score of 79%. Proteus is not restricted to generating accurate secondary structures for water-soluble proteins, as it appears to perform well for integral membrane proteins (both helix-containing proteins and beta-sheet containing porins) that have remote homologues or a portion of a homologue in the PDB.

References:

1. Scott Montgomerie, Shan Sundararaj, Warren J Gallin, David S Wishart. Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics. June 2006, 7:301 


PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane β-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline.

References:

1. Scott Montgomerie, Shan Sundararaj, Warren J Gallin, David S Wishart. Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics. June 2006, 7:301


BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal and plasmid) sequences.

References:

1. Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS.BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W455-9.

2. Wiki: http://en.wikipedia.org/wiki/BASys


ResProx (Resolution-by-proxy or Res(p)) is a web server that predicts the atomic resolution of NMR protein structures using only PDB coordinate data as input. More specfically, ResProx uses machine learning techniques to accurately estimate (with a correlation coefficient of 0.92 between observed and calculated) the atomic resolution of a protein structure from 25 measurable features that can be derived from its atomic coordinates. Because atomic resolution is a simple and near-universal measure of structure quality (i.e. < 2.0 Å is good, > 4.0 Å is bad), ResProx offers X-ray crystallographers and NMR spectroscopists the opportunity to easily assess the accuracy and quality of their 3D protein structures. It also allows them to assess whether their refinement methods have made their structures better (or worse) than what the experimental data suggests. Furthermore, since coordinate data is common to both X-ray and NMR, ResProx should allow structural biologists to use a single, easily understood number to compare the structures determined by NMR with those determined by X-ray crystallography.

References:

1. Mark Berjanskii, Jianjun Zhou, Yongjie Liang, Guohui Lin and David S. Wishart "Resolution-by-Proxy: A Simple Measure for Assessing and Comparing the Overall Quality of NMR Protein Structures", J Biomol NMR. 2012 Jul;53(3):167-80

2. Wiki: http://en.wikipedia.org/wiki/ResProx


MovieMaker is a web server that allows short (~10 sec), downloadable movies to be generated of protein dynamics. It accepts PDB files or PDB accession numbers as input and automatically outputs colorful animations covering a wide range of protein motions and other dynamic processes. Users have the option of animating 1) simple rotation 2) morphing between two end conformers 3) short-scale, picosecond vibrations; 4) ligand docking; 5) protein oligomerization; 6) mid-scale nanosecond (ensemble) motions; and 7) protein folding/unfolding. Note: MovieMaker is not a molecular dynamics server and does not perform MD calculations.

References:

1. Maiti R, Van Domselaar GH, Wishart DS. MovieMaker: a web server for rapid rendering of protein motions and interactions. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W358-62.


CS23D (Chemical Shift to 3D structure) 2.0 is a web server for rapidly generating accurate 3D protein structures using only assigned NMR chemical shifts as input. Unlike conventional NMR methods, which require NOE and/or J-coupling data, CS23D2.0 uses only chemical shift information to generate a 3D structure of the protein of interest. CS23D2.0 accepts chemical shift files in either SHIFTY or BMRB formats and produces a set of PDB coordinates for the protein in about 10-15 minutes. CS23D2.0 uses a combination of maximal subfragment assembly, chemical shift threading, shift-based torsion angle prediction and chemical shift refinement to generate and refine the protein coordinates. Tests indicate that CS23D2.0 converges (i.e. finds a solution) for about 90% of protein queries. The performance is dependent on the completeness of the chemical shift assignments and the similarity of the query protein to known 3D folds.

References:

1. Wishart DS, Arndt D, Berjanskii M, Tang P, Zhou J, Lin G. CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W496-502.

2. Wiki: http://en.wikipedia.org/wiki/CS23D


SHIFTX2 predicts both the backbone and side chain 1H, 13C and 15N chemical shifts for proteins using their structural (PDB) coordinates as input. SHIFTX2 combines ensemble machine learning methods with sequence alignment-based methods to calculate protein chemical shifts for backbone and side chain atoms.

Shiftcor compares, identifies, corrects and re-referencs 1H, 13C and 15N backbone chemical shifts of peptides and proteins by comparing the observed chemical shifts with the predicted chemical shifts derived from the 3D structure (PDB corrdinates) of the protein(s)of interest.

References:

1. Haiyan Zhang, Stephen Neal and David Wishart (2003) "RefDB: A database of uniformly referenced protein chemical shifts" Journal of Biomolecular NMR, 25: 173-195

2. Wiki: http://en.wikipedia.org/wiki/SHIFTCOR


RCI (Random Coil Index) webserver predicts protein flexibility by calculating the Random Coil Index from backbone chemical shifts (Cα, CO, Cβ, N, Hα, NH) and estimating values of model-free order parameters as well as per-residue RMSF of NMR and MD ensembles from the Random Coil Index.

References:

1. Mark V. Berjanskii, David S. Wishart (2005) A Simple Method To Predict Protein Flexibility Using Secondary Chemical Shifts. Journal of the American Chemical Society, 127 (43), 14970 -14971

2. Wiki: https://en.wikipedia.org/wiki/Random_Coil_Index


PREDITOR is a program for PREDIcting φ, ψ, χ1, and ω TORsion angles in proteins from 13C, 15N and 1H chemical shifts and sequential homology. PREDITOR 30o-accuracy of predicting φ and ψ is close to 90%. The average χ1 accuracy is 84% while the ω accuracy is 99.98% for trans peptide bond identification and 93% for cis peptide bond identification. Overall, the program is 35X faster and its predictions are approximately 20% better than existing methods.

References:

1. Berjanskii MV, Neal S, Wishart DS. PREDITOR: a web server for predicting protein torsion angle restraints. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W63-9.

2. Wiki: http://en.wikipedia.org/wiki/PREDITOR


PHAST(PHAge Search Tool) is a web server designed to rapidly and accurately identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids. It accepts either raw DNA sequence data or partially annotated GenBank formatted data and rapidly performs a number of database comparisons as well as phage “cornerstone” feature identification steps to locate, annotate and display prophage sequences and prophage features. Relative to other prophage identification tools, PHAST is up to 40 times faster and up to 15% more sensitive. It is also able to process and annotate both raw DNA sequence data and Genbank files, provide richly annotated tables on prophage features and prophage “quality” and distinguish between intact and incomplete prophage. PHAST also generates downloadable, high quality, interactive graphics that display all identified prophage components in both circular and linear genomic views.Furthermore, tests indicate that PHAST is as accurate or slightly more accurate than all available phage finding tools, with sensitivity of 85.4% and positive predictive value of 94.2%.

References:

1. You Zhou, Yongjie Liang, Karlene Lynch, Jonathan J. Dennis, David S. Wishart “PHAST: A Fast Phage Search Tool” Nucl. Acids Res. (2011) 39(suppl 2): W347-W352 [doi:10.1093/nar/gkr485][PMID:21672955]


Proteome Analyst (PA) is an online tool used by researchers to rapidly analyze proteins. Using machine learning technologies, PA can accurately predict the subcellular localization and high level function of user submitted proteins.

Users can also analyze proteins using common bioinformatics tools such as: BLAST, HMMer, PROSITE, and PSIPRED.

References:

1. Duane Szafron, Paul Lu*, Russell Greiner, David S. Wishart, Brett Poulin, Roman Eisner, Zhiyong Lu, John Anvik, Cam Macdonell, Alona Fyshe and David Meeuwis. Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids Research, 2004, Vol. 32, Web Server issue W365–W371. DOI: 10.1093/nar/gkh485


SHIFTX is a web server whihc can predict 1H, 13C and 15N chemical shifts for your favorite protein using only its PDB file as input. ShiftX uses a unique semi-empirical approach to calculate protein chemical shifts. Tests conducted on 47 different proteins indicate that program is able to achieve correlation coefficients between observed and calculated shifts of 0.911 (HA), 0.980 (CA), 0.996 (CB), 0.863 (CO), 0.909 (N), 0.741 (HN) and 0.907 (side H) with an RMS error of 0.23, 0.98, 1.10, 1.16, 2.43, 0.49, 0.30 ppm respectively.

References:

1. Stephen Neal, Alex M. Nip, Haiyan Zhang, David S. Wishart (2003) "Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts" Journal of Biomolecular NMR, 26:215-240.

2. Wiki: https://en.wikipedia.org/wiki/ShiftX


GeNMR (GEnerate NMR structure) is a web server for generating 3D protein structures using NOE-derived distance restraints and NMR chemical shifts. The web server produces an ensemble of PDB coordinates within a period ranging from 20 minutes to 4 hours, depending on protein size, server load, quality and type of experimental information, and selected protocol options.

References:

1. Mark Berjanskii, Peter Tang, Jack Liang, Joseph A. Cruz, Jianjun Zhou, You Zhou, Edward Bassett, Cam MacDonell, Paul Lu, Guohui Lin and David S. Wishart. GeNMR: a web server for rapid NMR-based protein structure determination. Nucleic Acids Research 2009 37(Web Server issue):W670-W677;

2. Wiki: http://en.wikipedia.org/wiki/GeNMR


PROSESS (Protein Structure Evaluation Suite & Server) is a web server designed to evaluate and validate protein structures solved by either X-ray crystallography or NMR spectroscopy. PROSESS integrates a variety of previously developed, well-known and thoroughly tested methods to evaluate both global and residue-specific: 1) covalent and geometric quality; 2) non-bonded/packing quality; 3) torsion angle quality; 4) chemical shift quality and 5) NOE quality. In particular, PROSESS uses VADAR for coordinate, packing, H-bond, secondary structure and geometric analysis, GeNMR for calculating folding, threading and solvent energetics, ShiftX for calculating chemical shift correlations, RCI for correlating structure mobility to chemical shift and Preditor for calculating torsion angle-chemical shifts agreement. PROSESS also incorporates several other programs including MolProbity to assess atomic clashes and His/Asn flips, XPLOR-NIH to identify and quantify NOE restraint violations and NAMD to assess structure energetics. PROSESS produces detailed tables, explanations, structural images and graphs that summarize the results and compare them to values observed in high-quality or high-resolution protein structures. Using a simplified red-amber-green coloring scheme PROSESS also alerts users about both general and residue-specific structural problems. PROSESS is intended to serve as a tool that can be used by structure biologists as well as database curators to assess and validate newly determined protein structures.

References:

1. Berjanskii M, Liang Y, Zhou J, Tang P, Stothard P, Zhou Y, Cruz J, Macdonell C, Lin G, Lu P, Wishart DS. "PROSESS: a protein structure evaluation suite and server;" ; Nucleic Acids Res. Webserver Edition; 2010.

2. Wiki: http://en.wikipedia.org/wiki/PROSESS


CFM-ID provides a method for accurately and efficiently identifying metabolites in spectra generated by electrospray tandem mass spectrometry (ESI-MS/MS). The program uses Competitive Fragmentation Modeling to produce a probabilistic generative model for the MS/MS fragmentation process and machine learning techniques to adapt the model parameters from data.

References:

1. Allen F, Greiner R, and Wishart D. Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics. June 2014.

2. Allen F, Pon A, Wilson M, Greiner R, and Wishart D. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra. Nucleic Acids Res. June 2014.


Bayesil is a web system that automatically identifies and quantifies metabolites from 1D 1H NMR spectra of complex mixtures, including biofluids such as ultra-filtered plasma, serum or cerebrospinal fluid. The NMR spectra must be collected in a standardized fashion (see How To Collect NMR Spectra for Bayesil) for Bayesil to perform optimally. Bayesil first performs all spectral processing steps, including Fourier transformation, phasing, solvent filtering, chemical shift referencing, baseline correction and reference line shape convolution automatically. It then deconvolutes the resulting NMR spectrum using a reference spectral library, which here contains the signatures of more than 60 metabolites (see here for a list). This deconvolution process determines both the identity and quantity of the compounds in the biofluid mixture. Extensive testing shows that Bayesil meets or exceeds the performance of highly trained human experts.

References

Paper in progress ...


Standalone Programs/Applications

SimCell is a DCA Cell simulator used to simulate cellular and biochemical processes.The user, through the use of the SimCell Interface may create: small molecules, membrane, membrane proteins, protein/RNA molecules, DNA molecules and Genes. These cellular components can then interact amongst themselves to create fascinating new processes.

References:

1. David S. Wishart, Robert Yang, David Arndt, Joseph Cruz and Peter Tang. Dynamic cellular automata: a simple but powerful approach to cellular simulation. In Silico Biology.


PANAV is a Java based structure-independent chemical shift validation and re-referencing tool. It is based on using residue-specific and secondary structure-specific chemical shift distributions calculated over small (3-6 residue) fragments to identify mis-assigned resonances. The method is also able to identify and re-reference mis-referenced chemical shift assignments. Comparisons against existing re-referencing or mis-assignment detection programs show that the method is as good or superior to existing approaches.

A standalone version is also available. Download here.

Version 2 of the standalone version is now available. Download here.

References:

1. Wiki: https://en.wikipedia.org/wiki/PANAV


Legacy Applications (Unmaintained)

PepMake generates a PDB coordinate file for polypeptide backbones using only the sequence and backbone dihedral angles as input.


GelScape is a web-based gel viewing and annotation system.

References:

1. http://www.gelscape.ualberta.ca/htm/browser.html


With Shifty you can predict 1H, 13C, and 15N chemical shifts for your favourite protein using only its amino acid sequence as input. The technique uses dynamic programming to detect sequence homologies between your query and sequences of hundreds of previously assigned protein the BioMagResBank.

References:

1. http://shifty.wishartlab.com/


SHIFTOR is a program for predicting φ, ψ, χ1, and ω torsion angles in proteins from 13C, 15N and 1H chemical shifts and sequential homology. For a test set of 31 proteins, SHIFTOR 30o-accuracy of predicting φ and ψ is close to 90%. The average χ1 accuracy is 81% while the ω accuracy is 99.98% for trans peptide bond identification and 93% for cis peptide bond identification. Overall, the program is 100X faster and its predictions are approximately 30% better than existing methods.

References:

1. Neal S, Berjanskii M, Zhang H, Wishart DS. Accurate prediction of protein torsion angles using chemical shifts and sequence homology. Magn Reson Chem. 2006 Jul;44 Spec No:S158-67


Thrifty use threading method etc. to predict protein 3D structures from chemical shifts.


Homodeller is a web server that rapidly generates 3D protein structures (PDB coordinates) from their corresponding protein sequence.


JVIEW is a user friendly program for calculating coupling constants from TOCSY, NOESY, HMQC traces.